...
- KS - talked yesterday about 3 options for member enterprise, (see diagram), tried to simplify it down where we don't have "fixes" in the first cut, need to talk about diff pieces and figure out the phasing, basic contention was no ability to fix at all and should be on back end, but have heard there are folks who will NEED some way to fix data just before reporting, while we dont support first pass find a place to do it, 3 paths show where this can happen: fix after in HDS or fix while in staging area/pipeline (2) or fix before hand in frontloaded process - good idea of whats going on in this box EXCEPT the Adapter
- KS - this is what we plan to install in enterprise of member, one of the tenets is we provide data privacy - could be hosted and priv maintained thru hosted node having properl controls, security is well defined nd agreed to by carriers and carriers not moving data out of their world - but when hosted it is obvious data is outside their walls - if we could do hosted it would be simple, stand up whole stack inside node and partners (Senofi, CHainyard, CapGemini) could host nodes and maintain - feedback from ND, some carriers really want to host the data, striving for with this blue box is a tech stack amenable to as many people as possible - TRV, Hartford, State Farm, etc. - want them to be able to stand up this stack in their world - real exp with carrier who ran up ahainst road blocks in tech stack, policies, etc BIG disagreements - more complicated tech stack, 4 major sets of tech, the lighter the stack to install inside carrier world the more likely to be accepted with as little variation as possible - blue box not intended to be perfect, it has to take into account factors of company policies, etc. - try to minimize and maintain integrity of transactions
- KS - moving forward, call back to whats been done before, always open to best ways to document architectures
- KS - phase - get into each box, figure out rough block diagrams of whats going on, at the adapter
- Blue - member enterprise, Green box AAIS setup, red hosted node - still believe a hosted aspect to this makes sense, complicated tech, young tech, challenging to set up and config, targeted set of people with skills contrilling we will be better off - some day mature (1-button install), Fabric and Kubernetes and AWS has certain complexity we want to encapsulate
- No issues running node at AAIS at this point, will need to interact with analytics node (currently at AAIS, doesn't have to be), any carrier nodes need to interact with carrier enterprise stack where data lives
- Adapter will run extraction, defined in a way it can execute in member's world
- another tenet not challenged, the RAW data is in HDS, the input to the extraction engine, where member is agreeable the processed data can leave their world, into analytics, turned into reports
- JB - improvements on horizon for Fabric/Kubernetes (Fabric Operator), move in that direction, carriers are using AWS, lot of it using TCP/IP to make connections to resources, one set of permissions internally or access adapter or apps to talk to fabric, they will be interactions that req auth for use, ways to use it - thing that runs adapter that interprets request to get data would be binding approach: request comes in, not necc executable code but standards for implementation in HDS, concept of adapter good concept, needs to be somethign that happens when request received, hope there will be things that simplify the config
- KS - moving target, as much as possible, encapsulate that movement, other stuff doesn't break, a buffer of implementation and all that is trade-off, whats in the adapter
- JB lot depends on whats the db of the HDS and what will take, will determine HOW you get data from it
- KS - walk thru couple of different db types, see how they might make things harder or easier, weigh options, pick short/long term opps
- KS - we have HDS
- noSQL (like Mongo, which 1 carrier said no to)
- Relational DB
- format of the data, allow scripting or programming
- JB - common denominator, what is supportable in carrier's domain, some form of standard sql, ought to support something common
- KS - current assumptions
- Adapter
- JB - interact with requests
- carriers need to see and consent to requests, number of API calls
- if hosted node wraps interface thru network, api call?
- adhoc data call - do they know they have that data? (stat plans they know, repeatable), but adhos "what is the nature of the request", some knowledge of whats in the HDS? subsequent phase?
- KS - function is to execute an extraction against HDS and return results (to analytics node? etc.)
- if adapter makes as few assumptions as possible - needs to know format of db, cannot be just an api call
- management among carriers will be handled via Fabric and hosted node, all the adapter does is execute the extraction upon request
- carriers interacting with network thru hosted node, minimize dependencies, all the adapter needs to know "asked to extract data from here to there"
- able to test the extraction pattern to see what it would result in
- extraction should be human readable/understandable
- some test harness "I have a new EP I want to run, test it"
- needs to execute when data call happens but also eval before consent (can we do this?), flow says "I will consent but what does it do?"
- JB - interact with requests
- db could be nosql or relational
- sql? nosql? allow scripting?
- extraction must be stored on-ledger
- meta data, architecture of the request
- sql? elastic search form?
- meta data, architecture of the request
- extraction options
- pure sql
- scripting
- hll
- dsl
- graphql
- data model is (semantically) starting from notion it is the stat plans (transactionally based
- PA - premium transaction and loss transaction
- JB - currently stat plans, may evolve over time, starting point
- PA - what are we adapting to?
- Adapter
- KS - Extraction Processor
- receive requests
- interpret / translate extraction
- execute transaction
- gather results
- return results
- KS - big concern about security in this model
- not sure if carriers have pushed back, passing around code which is gen the thing security folks say is a no-no
- dont pass code that will run on someone else's machine
- JB - push or pull model, automatic = exposure but if it is a request (pull) API pulls and evaluates it, not executing code it is interpreting, more secure approach polling for things to do and launching via human control
- KS - where ledger provides some comfort, immutable ledger is good, seen code coming across, good with executing that, everyone has option to consent, some sort of technical way to test Extract process,
- JM - agree with concern, intro construct to reject queries
- if we had a table that said "if these keywords appear we will reeject query" - tricky to engineer but somethign that can screen filters, config table, system will reject if person approves, will submit request to improve security
- KS - might not need but good to have, would run the initial request thru same scanner
- JB - something validation of request would do
- JM - humans make mistakes, double up on safety, hard to sell idea of arbitrary code execution
- JB - late binding, execute as a way to test against what is in data store
- KS - scanner validator, called as part of extraction processor, will happen via creator of EP, upload, system will run it, standard set of checks
- JB - good add
- KS - elephant in the room - still DB decision, data format really tells us a lot of what EP is capable of, depending on reqs some may not be up to the task, take phased approach not starting with simple problem
- PA - processing with loops and stuff, stored procedures
- JB. - stages or phases with mongo, dont have to do it in all one SQL statement
- KS - Mongo stages of agg pipeline? Mongo proprietary, steer away from prop implementations (like map-reduce)
- JB - reason to not comm request in executable terms, implement in data store, one thing not in the current transaction data of stat reporting dont have the actual policy record itself, relational rep in the future would support
- KS - we do have poilicy identifier in datastore, could be business key to a logical policy record
- JB - dont have attributes of policy in terms or coverage
- KS - imperfect and difficult to put together, not all reports are policy anyway, agg of coverage across all policies
- JB - talked about extracting what fields avail in stat record
- KS - assume relational db and our EP was a pipeline of SQLs, one after next, no definition of statements (1, 2, 3) and EP executes pipeline of SQLs
- JB - all specification of how complex a request
- PA - likes
- KS - 1 not making up a language, not depending on someone looking at lang pre-process and und what it is doing, it is easy to execute, just clear text
- JM -
- 1 can we create views as part of execution
- fundamentally, is our problem a document store prob OR an entity relationship problem
- Mongo is for doc stores, feels like a relational problem, insurance entites well defined and consistent
- doc store not nature of what we are doing
- dealing with well defined entities that relate to each other constantly, at 100k feet a relational problem
- KS - ready to commit to relational, fully supports what we need to do, aligns well with what we are after, transactional records
- JM - supportability - feedback from friday call, whole area comfortable with SQL, step away a more scarce skillset
- KS - document db supports simple model, outstrip it quick but simple way (2 tools - relational db and sql)
- JM - can we define arbitrary views
- JB - views used in staging, define and access, dont have to run first stage of pipeling
- KS - can we create views or use multi stage?
- JM - WITH clause in SQL, how complex do you want to get?
- KS - where the work is, battle b/w devs and db people, if i have to call you every time i want a new field take off, if we have to deploy a new db model every time we want a new view ?
- JM - schema evolution question, need to keep schema updated,
- KS - issue with old queries working
- JM - unless you validate JSON struct AND schema, no magic to schema management
- JB - might not change that often, 1-2 times a year at most
Monday 9/26/2022
- Member's Enterprise Options 1-3 (review of drawing)
- JB - push or pull?
- KS - pull is in the openIDL as a Service (orange box)
- Jb - polling for requests?
- KS - Adapter Tab in drawing
- API external facing (listening for requests), gets requests w/ Extract logic (doc passed in), Extraction by Extract Processor, into DB we choose, EP is testable, before someone consents, eval valid requests, what returned, some kind of Scanner/Validator
- Area - execution of code (arbitrary? ), well known pattern not supposed to send code across and run it - make sure it will be ok, somehting that needs ot happen to that code beforre it can run
- right now - map-reduce (moving away from), because proprietary to mongo and limited, if relational pipeline of queries, graphQL, get to point of assumption
- KS Member Node
- complete processing of particular node
- KS - Responsible for:
- managing the network (Fabric) for a carrier
- all of the interaction with the network
- running chaincode
- managing the ledger
- DR - thing that takes extracted data and joins it with other data - function - acc to funct reqs
- JB - other data, some might be avail on carrier side but not shared, could have universal data, but carrier info, granular and rolled up before sharing
- DR - defined by EP
- KS - extraction can request external data
- DR - whatever comes back comes back
- JB - ancillary data, in the node or carrier, sencitivity of connection - if you have data related to cust accounts, didnt want to share, addresses, etc., rolled up into info not shared (is it in a flood zone)
- KS - part that makes it possible
- DR - pink node, some data from carrier, take it, getting it to point where it can be agg and anon - most likely allready agg at this point, takes data, does intended JOINS, point where combined with other carriers data
- JB - thought pink one of the nodes sending data priv to agg node
- DR - exactly - the bridge, takes data from carrier and sends to other place
- KS - analytics node is the "other place"
- DR - go between
- JB - place whree you manage req, see them in UI
- DR - yes but housekeeping to the biz function - this ghets it there in a permissioned way
- JB -where config (setting up channels, so forth) contained in hosted node AAS
- DR - implementation specific, not business function
- KS - configs the path to the network
- DR = could be stateless too in theory - not a network
- JB - not just data extract, if it. comes over api or logged into hosted node remais to be seen
- KS - if we target the level of "gets data, moves it on", need to make decisions at some point whether in first pass mention Fabric or stay above,
- JB - fabric for now, point is theres a control flow and data flow
- JB - control interactions and data interactions
- KS - node knows when to request results, decides when/initiates request for carrier Extract results
- JB - after req consent - "Can you do this?" precede (what I calll control flow)
- KS - somethign else, - this part of the arch is completely data agnostic, it is control, understands the workflow, does not care about what data is moved around facilitates the data, serialized set of data and pass it along
- JB - data format agnistic because data is serialized, wotn care b/c it will be an opaque object
- KS - would we want data encrypted as it moves thru?
- JB - could be, doesnt have to be, need is another policy decision, if you feel connex is secure, wont be looked at by some router, then no need but it could
- KS - could be per data call decision
- JB - if encrypted need key management of data
- KS - similar discussion with carrier from ND
- JB - fine to do it, info as well as commms security, est end to end priv and pub keys
- KS - what else? high level funct resp
- JB - UI in that box would look at world state, see what requests out there, do it not necc automated, managed requests that come in (human), some type of login to hosted node to access UI - should UI exist w/in carrier? no keep it in hosted node to use the UI
- KS - UI hosted here in AAS,
- JB - need to log in, need access controls,
- KS - only allowing 1 org here?
- JB - make the most of the privacy aspects of this, hosted node assoc with carrier
- KS - as far as Fabric goes, one organization
- JB - still assoc w/ carrier, modular funct, security and privacy of the channel
- KS - privatre only to the carrier, permissions could be granted to other orgs if carrier decided, access to node is controlled by carrier - hosted node not plugged into cloud,
- JB - IP Addresses and ports - might be worth considering if carrier had people signing on and logging in, could be known to openIDL "who" so we can monitor hosted nodes (access not content) - some monitoring, logging in - simplify access capabilities, use as proxies for permissions to do work in the UI, log in and have cred in the UI as well
- KS - 2 layers of permission: manage node and UI
- JB - diff individulas who use UI, access to hosted node use identities for access to the UI inside the node - no need for mult ids and creds - to be investigated
- KS - current tech stack we could start from, all the tech running would be in this node right now, lay those out as a starting point
- JB - great, preaching to the choir for use in the testnet
- KS: Tech Stack (current)
- Fabric
- Kubernetes
- Node JS / angular
- AWS
- mult services)
- KS - it being hosted solution, how much opinion will a carrier want to have about that stack? how would you think you would make your opinions known/how should we govern that stack? data goes thru, do you want to know all? need to know all? any idea HOW it should be governed?
- PA - Carrier X wants to know all
- JB - should be common across all nodes
- KS - what led us to this sep of concerns, does this tech stakc have to follow the policies of each carrier? knew it would run into problems? =- this is a tech stack decided on by the openIDL org, not any one carrier (contributed)
- JB - doesn't have to be part of the CTO standards as it is a hosted node, external interface, never get a stack that agrees with every orgs internal reqs - get agreement
- KS - goes thru the usual gates
- DR - security part they care deeply about, how it functs less so, security is big, had to come open on the stack, long way to build security trust, needs a lot of love for them to every be comfortable, startign from position of "not confident" instead of agnostic
- PA - Carrier X - very into being involved, want to und eveyrthing they are running and connected with, auditability, they are interested in knowing - believe a carrier was "how do you delete, how prove? where does data live at rest? transmitted, positive confimration all data deleted at the end of each cycle - some connected with all aspects
- DH - biz perspective, anything in the box abides by reqs established
- KS - 3 things possible - main concerns, tech stack is a big concern INSIDE, if hosted it is slightly less, very concerned abotu Security and the Privacy of the data
- DR - COST and long term sustainability, important
- KS - while might not have desire to control everything, secure, priv maintained, reaSONABLE cost, governance process
- JB - DH's point about lifecycle of the data, state of the data, keep in mind when data transmit from this node to analytics, uses priv channel, those things written to the cain itself, lifecycle management of the data
- DR - chain is only a piece of it, nothing stopping it from being written from PDC and elslewhere
- JB - PDC is only visible to partner at the other end
- DR - second leaves their node, could be written other places, not a perfect tech solution, good tech solutions, lot of this will be process based - legal agreements, auditability, enhance the tech
- KS - carrier in ND very concerned with this exactly, data sure where it will go, what you said you would do actually happened - visible, auditabl, enforced
- JB - benefits deploy in open source, can be audited as well as the logs, ways to achieve agreement with those convenants
- DR - OSS classic legal doc dump, pretty hard to systemically always check everything, still needs other peices and recourse and audit mechanisms
- JB - which code delployed, procedures can help see whats being used,
- DR - wouldn't over-index on that, 10 year old code has flaws no one knew, def not going to be primacy mechanism
- KS - combo of all those factors, agreements between entities
- PA - stuff we needed, tenets?
- KS
- must pass sec standards of all carriers
- must meet data privacy expectations of all carriers
- cost of running the code is sustainable
- DR - costs should be similar to API costs, not a $1, but commensurate with other services, reasonable
- PA - API is little different, this is data exchange plus cert of data w/ network, central auth validation
- DR - if you consider cost normally paid to stat agent, the tech itself shold not be near that cost - the totality, holistic cost, needs to be better than current state
- KS - stay out of value statement here, overall value may cost more but data priv, might be more important
- JB - bigger pict, efficiencies costs less, fewwer people doing bespoke reports
- DR - still in line with what we expect system to be, maybe problematic at scale, be cog we are building lean infra that is appealing to people
- JB - def keep that analysis in mind, overhead to monitor network, hosted nodes
- DR - watching to keep costs down, managed network is always more costly
- JB - small management staff, low overhead, keep it that way, point in the same direction
- KS - other types of nodes and their responsibilieis, analytics and multi-tenant
- DR - agreement the UI here is extracted data, where carrier resp ends, sounds like confident in tech stack, build this piece, mock up a dataset that fits what we think the EP would be, put it thru the pipes here, next step
- PA - working with Dale, can produce 3 diff sets of test data modifying what he has now,
- DR - stick it thru th plumbing, check the security, hands on keyboard and progress while we look at Extract Engine
- KS - analuytics node - still part of tthis side of the API call, stack for this network, if you have confidence in how that owuld work, attacking that, part of this - go offline and illustrate or do it tomorrow morning, same level, all comfortable, defining the projects to build out the carrier and adapted
- JB - along the lines of making stuff work, area needs thinking, what is the nature of that EP request, in the case of stat data "gimme"
- DR - start with what Peter has, see how it works, idea of what it looks like when done,
- PA - can do 3 json files as 3 mock carriers
- DR - if that confident with NaaS, can point to progress, the EP is the hard part and work on this in parallel
- JB - decision there, when transmitting data across interface, what are the handshakes? serialized, encrypted,
- DR - no persistent listeners, async responses to requests
- DR bigger issues are OPS and SLA maintenance, easier to be responsive than actively polling
- KS: Recap of 9/26 discussion
- PA - stat plan data comes in, 97 characters, using stat plan handbook, converting coded message, (peter shows data model)
- KS - data models, intake model (stat plan - dont want to alter what carriers HAVE to send, flat text based stat plan when it comes in), stuff to the left
- DR - do it the right way, all pieces could change over time
- PA - not running EP against coded message, keep codes so if you need to do a join
- DR - trust PA to set it up, if he has an idea, assume right answer and move onto pieces we dont have a clear idea on
- PA - ingestion format is stat plan, HDS format is Decoded
- for relational
- 1 record per loss and premium
- for relational
- JM - relational db?
- PA - totally put it in a Relational DB, suggest 1 table per stat plan, dont need to do weird stuff with mult values per column
- KS - claims and prem as well?
- PA - 2 tables per line, ea stat plan has prem format and claim format, main thought utilizing SQL lets biz users to contribute more (Mongo is tricky)
- KS - make decision it is relational
- PA - internal project, if switching to relational has significant implications, when do we want to put that to a formal vote, unpack implications, right now plan is debug first part of auto report, pivot to EPs working with postgresSQL, his team can pivot
- DR no strong opinon on which format but doesnt want to throw out work
- PA - most of the work has been und how the math behind the report and connect w/ business - has spent time learning JS and Mongo, finish up test with Mongo, can design tables and write EP after
- KS - internal planning discussion, JZ wants progress, prove stuff, healthier method
- DR - if saying "lets alter" - wants to make sure reasoning is strong, fine with Mongo
- JM - showstopper if we dont , legion of people using SQL, need to read EPs, team that manages and processes have deep SQL knowl and coverage but no JSON, biz unit comfort seeing relational form of data, ops team who will support forever very comfortable w/ SQL, can't do Mongo
- DR - use Mongo enough
- PA - into relational the whole time, wanted to see it thru with mongo
- DR - TRV can work with SQL
- JB - decide internally, move to relational
- DR - want peter to get resources
- PA - 2 fridays from now, wants to present THEN rewrite stat plans in Postgres
- DR - like postgres (can do JSON blobs directly) - what version?
- PA - still considering
- DR - Which version JSON blog functionality? find out
- JM - postgres favored flavor too
- DR - if we do SQL postgres is worth it
- KS - vetting internally (AAIS), impacts workstreams
- DR - ACTIOn - get Hanover opinion on this stage
- JM - denormalized? keep it flat?
- DR - like that too
- PA - def for the raw tables
- Jm -= challenge of flat, redundancy of data
- DR - performance in terms of speed or data minimization not a big issue here
- DR - node as a service, Peter free range
- JM - EP did nothing but SQL in postgres, highly normalized model, minimal connection, plumbing working
- PA - easiest over all, need stored procedures
- KS - why?
- PA - StoredProcs, do EPs in one lang
- KS - pass in on API call, cant do with stored Procs
- DR - could if you sent in API call and EP and stored as SP and...
- PA - Stored Proc - create and destroy temp views, looping, cursors,
- KS - stored procs are utility functs not part of EP itself?
- JM - zoom out - specifics - what auth do we assume we have over DB? Data management people strict
- DR asking for Vers control over DB, what permissions needed, manual requests? 3rd party agent?
- KS - if we assume only SQL, someone screams we address it
- JM - artifact now - table - can I create/replace (not exactly CRUD), limited # - tables, views, stored procs, etc. - then assume we have the right to create these? EPs can create on the fly? only a table of 7-8 objects, make the grid now - tables and we make em in advance - stored procs on the fly? bigger ask, but have it in this grid
- Objects as rows: Tables, Views, Stored Procedures, Functions, Triggers.
What you can do with them: Create in advance, Alter on the fly.
- Objects as rows: Tables, Views, Stored Procedures, Functions, Triggers.
- DR - look at as optimizations, utility helper functs? get it working with raw queries, I like stored procs - sanitize query, known responsees, good things about them, dont need them RIGHT NOW
- PA - better at SQL than Mongo, depends on how use analyutics node to combine the data, fair bit of process
- DR - do I think we will allow Stored Procs be written? No - dont assume initially
- PA - mid level processing layer, or whole thing in SQL
- DR - willneed midlevel processing long term
- JM - EP needs arb code, third party datasets
- DR - another thing it does help, simpler to maintain vers control of mid level middleware, some funct/procs than DB itself - this way DB is just datastore, need to keep it fed, moves vers control out a little bit, logical separation
- JM as little code in DB as possible, part of challenge - writing code in DB, put it in ETL layer - ETL at Query Time - if you find yourself writing code (stored procs) to get data out..
- KS - reality, people put in EP, thats what they can do in the next year, any updates to StoredProcs, will take forever to get approved by eveyrone
- DR - extensibility of Stored Procs is weaker
- Jm - once DB knows you are selecting, can't drop table in the middle whereas you can with Stored Procs - you can't screw up a db in a select clause but stored proc could blow up db
- PA - use middle layer, rest in JS, middle layer in JS and Postgres?
- DR - no pref
- KS - can we not do this with SQL, can't do this with layers of SQL
- JM - views are a logic layer, opp to put v type layer, dont want to overnormalize, can get rid of redundancy w/ aggregate functions, do you restructure your model, query layer or model in need of adjustment
- KS - governance model and bility to run code that has to be reviewed by people not just SQL knowl, optimize for somethign to be run by people ok'ing something coming across the wire, run into a lot of ugly SQL
- DR - sec architects, they will look at "where is my trust plane", arch whole environ to be secure and only expose DB, only run select statements against DB, more likely to approve something (if something funcky happens outside trust plane, can still protect data easily and will help), if "trust plane is behind DB and expose Data to some procedural code they will raise the bar
- PA - instead of everything in one lang and sql,
- DR - unfortunate process-gated thing, not the ideal engineering solution but where we are stuck
- PA - great discussion, how they look at sec, finish in mongo and do it in postgres, no stored procs, JS to make multiple select queries
- JM - dont mind scary looking sql, run mult subqueries and stitch together at bottom,
- PA - depends on who the person is reading it, simple objects WITH clause gets harder
- JM - do we put VIEW layer on here? is my model correct, team is adamant about keeping flat but can read long sql
- DR - plan there, may or may not need ot be middle layer, assume we dont need and only add if PA says he needs it? or assume we need it
- PA - middle layer to enhance EP?
- DR - third party data enrichment
- PA - entire report puts together,
- KS - assume we haver reference data nearby or not true? all transactional data?
- PA - state names replicate, state codes replicate
- JB - consolidation side
- JM - reporting logic, shouldnt occur here, not in the adapter -
- DR - only getting data over the wire to the servicew
- KS - no PII raw data over the wire, some agg happening here
- JB - cert have codes and dereference with labels later
- JM - where does reference data get resolved?
- PA - sourcing ref data from stat handboiok, on load, human readable when going in ETL: dale submits data, stat records go thru ETL, loaded into Relational data store, take code and add column for RI
- JB - in order to be human readable by carrier?
- KS - less Human readable and more out of AAIS codes and into
- JB - standard iso codes better than labels
- KS - bring it into something more standard
- PA - need to look, believe state codes are NAIC codes
- DR - we now have a plan for DB, plan for what happens after adapter done, unless we need it, role of adapter = accept EP,
- JM - long term design issue - if we do enrichment stuff at the grain of native data?
- KS - a lot of times it will be, have to be able to support it
- JM worried that if you run relational query, rule/val prop - run on fine grain data but arb logic from other data sets needs access to fine grained data, then therefore you extract priv data from RelDB, then filter it down
- DR - 2 trust planes
- SQL first
- Node - as long as it happens on the node, technically palatable, depends on how implementation looks, do we need quarantine zone, middle layer
- JM - opp to review data set NOT JUST EP
- DR - biggest design challenge:
- JM - human interface, there needs a gate somewhere that says "give a human ability to review data before it goes out the door
- DR - still one environ, if passing non agg data over the first trust plane, need secondary stop somewhere
- JM - "no humans" is not good, needs review
- KS - intention - testable dry run in final product
- JM - MVP 1 - dont put it but final product
- DR - same page with James, how to make it secure, one thing queries, will require executing foreign code in a runtime, not written by us, more powerful than SQL query
- JM - two trust planes, will be post-db
- JB - run some carrier side? access to the data
- DR - all on the carrier side, wont be in the node, but thing is the hard line in the sand for Sec - no code execution behind the trust plane (sql queries fine)
- JM - core deliverables - deck for the DB teams of large companies - selling to security people, will be work
- DR - most val artifact that comes out of this, JM/DR tell us what to say
- KS - cant accept solution without this box, they can run it or see results, arbitrary code only SQL (read only)
- JB - worry - some pattern that requires raw level, in the output
- JM - do high risk stuff all the time, but work at it (sharding, encrypting, jump thru hoops)
- DR - homomorphic encrypt for all? (laughs)
- KS - concern, are we back in "TRV does this, HRT does that?"
- JM - need to prove no way to do it simpler, hard fight but make the business case
- DR - always an exception process for sec finding, make as simple as possible b/c variances between teams, more and less strict, avoid needing to ask
- KS - not all will sign on to the risk or have the will to review it
- DR - inconsistency, one carrier agrees to somethign another thinks is risky, will run into, w/in a carrier depends on who reviews case and which CIO (who makes risk call), w/in carrier dont have one standard, decision makers still use judgement (there are still no gos)
10/3/2022
- JM - Wed Hartford has 2 new people joining effort, leadership allocated, JM still involved, new person on Arch WG, asked for build team for Q1, Q4 is ramp up time
- KS - talked to AAIS about redirecting PA from Mongo to SQL, AAIS is on board, whatever PA is working on should move towards stated direction
- KS - relational DB, run queries that are coming from community, talked about scanner
- KS - across API layer, all running in carrier, no side effect code like SQL, relational DB, allowed to execute against it to return response, deferring scanner and enrichment - did we say defer test facility
- JM - out the door with smallest MVP we can, solid like is core, dotted p2
- KS - ETL, submit stat plan, ETL turns into relational structure, EP Processeor executes SQL, across API and API returns results of that
- JM - is this good enough to turn into diagramn to show that, MVP: hand nothinjg but sql string over interface, Extract Processor, run it get it back, hand it back to interface have end or end plimbing - this is the core
- KS - nuance to execution of SQL, might be more than one sql? pipeline? worth discussin g now? or hope it can all be done in one sql
- JB - script? series of sql statements
- JM - agree it needs to support multiple queries, how do they communicate? pipelines - how do you get tem to talk to each other?
- JB - temporary tables, not modifying (create/destroy) is safer, in addition need to wrap initial investigation of request, step before
- JM - is validator on this side or the other side?
- KS - put it on both, create SQL, validate on the way in, if SQL dont know how it could
- JB - do need to investigate SQL - stat plan can say "gimme report", but other things will require you look at the SQL
- KS - going back to MVP, scanner/validator is out of scope,
- JB - if all doing now is agreening getting data from stat data, all SQL will addrsss, then scanner/validator do somethign simple, no need for consent dialog until we do the basic stuff
- JM - MVP in loosest sense of word or product out the door - can't do product until scanner/validator is done
- KS - MVP or POC?
- JM - sequennce the lines, 1-4 proves it works, 5, 6, 7 go to the industry - if solid line is #1, what should second be - more about proving own assumptions/proving value to industry?
- DH - business perspective, security from going from TRV node thru adapter to analytics node, showing data privacy
- KS - lot og reqs, go and pick the ones that drive second level of POC, things required and implemented before we go to production
- JM - enrichment 4th on my list, of business-y things, scanner-validator says "ok?", test validation says "run 100s of rules to prove it is what you think" or Enrichment?
- DH - prove my data is protected
- KS - the scanner - #s 1 and 2 show security, DH wants to show rest of the flow and across nodes we make sure the stuff is going the right way over analyutics node
- DH - basic plumbing
- JM - end to end plumbing? least I need to do to prove it
- KS - briunging an arch perspective here, 1st thing: does the plumbing work, 2nd can we install in a carrier - imp it will be acceptable in a consistent way across carriers - what we work on here can abstract or work concretely so everyone can run it
- JM - Enrichment is scariest
- KS - needs robust plugin capability or external data model - trust and maintenance are hard, diff timeline than EPs
- JM - stop at dotted line and focus on solid lines, all we are gonna do
- KS - step 1 solid lines, step 2 end to end plumbing, step 3 up for discussion but for KS "does this work in yoru carrier node"
- JM - focus on solid line
- JM - get mult sql problem, need mult sqls, couple ways to solve this, argue this executes out of schema with no data in it, schema that carries data, give s a level of grant writes to DBA team - ask what is the set of writes asking for in sub schema
- KS - can we est a sandbox DB where the EP works, updats, creates tables as necc
- JM - can say that is a design principle and allow implementor to do either - golden -
- KS - sep schema
- JM - pros and cons of each, takes fight out of DBAs
- HDS Schema, EP Schema
- JM - separating easier to ask for things - default: want to create views on the fly, idea of mult sqls, parsing behaviors, smart enough, put it anyway
- KS - could be a collection of strings
- JM - how interacting with each other
- KS - first can retunr, second consume, you test it - sql string updates intermediate table with data, all yo uhave is a coreogrqapher (run first, point second at results)
- JM - intermediate table, SQLs comm with each other, creation of complimentary tables - DBAs protect data - ask for "Create Views" auth OR temporary tables (sometime dont have robust you want), ony talking about views, temp tables or phys tables
- JB - creation of indicies
- JM - assumed, agree not to use Stored Procs, comfortable saying "want schema, in schema have these rights, agree to flush data (flushing policy)
- KS - drop the whole DB
- JB - temp tables when connection closed by default, but drop statements good to do
- JM - might be worth asking for matrialized views
- JB - temp tables for intermediate results
- KS - keep saying not time bound, if we have to do more X not a prob for performance
- JM - if this is a long list we have to ask DBAs for
- KS - assume "yeah", go ahead and create schema, inside POC, extract processor remove/create tables
- JM - grant grant grant - pretty minimal,
- KS - table creates and stuff
- JM - phys table question on there, more pushback, flushed data is question mark
- JM might find optimization opportunities
- JB - consistency issues, better to treat like workspace and flush it
- JM - no phys tables for now, materialize views?
- KS -seems like EP has more complicated interaction diagram, interaction diag between 4 components?
- <Live diagramming>
- JM - view is a mech to make one sql depndent on another sql, materialize view "love the view but need performance", phys tables fix perf issue but need in advance - want to go as far as we can to
- KS - running first sql, get results, where kept?
- JM - temp table, self-destruct at end of session
- KS -DDL?
- JM - easier ask
- KS - go0ignm to have to describe the structure of results so temp table can hold em, has to be done before run first query
- JM - draw out one query problem - EP should have arrow to postgres SQL and say "SQL", assume it does, will then (logical not phys) will largely read from HDS, retrieve data, postgres to EP schema
- KS - somethign in SQL says "select x from this table " returns result, and persistning temp table?
- JM - retrive from HDS should be enough, in any non trivial case results go in temp table, if we assume we wrote temp table, the Extract Processor runs retrieve
Time | Item | Who | Notes |
---|---|---|---|