...
Deliverables:
Scenarios
Stat Report
Subscribe to Report (automate initiation & consent - assumes stat report)
Define jurisdictional context/req (single or multi versions of same report)
How often it runs (report generation frequency)
- PA - avoid 13 lines and 13 reports per state per carrier - come up with a way to simplify distro of reports
- KS - make sure subsections covered, sections expanded - dont see anywhere we discuss data access
Extraction Details / Metadata
- KS - want an und of what data will be accessed by EP, when a report runs what data will be accesses, field by field, EP says all that, discussed whether or not EP being code is enough, said we want something more than just code to tell us whats being accessed
- PA - work with Auto Coverage Report is complex, want fields going across rows, and know what filtering criteria is, take these fields on this line over these dates
- KS - declarative approach to EP, declarative less writing JS code and more saying "fields accessed in EP" and a section "Aggregations", early on suggested approach (DR?) some sort of a "pre-processor" (was JB) - "this is an EP, know things in the EP, requires way to gen code from it
- PA - great idea, seeing 3 tiered thing: 1 explicitly biz level (top of data call - earner prem, most recent year), 2 more like lang agnostic metadata, 3 full implementation
- DH - really talking about "smart contract" will be - really the REQUEST, detals -agree we want what fields, how info aggregated, how filtered, what date param used
- KS - agg rules, parts of the implementation detail, agg rules for data access, date, param, section on param for the subscription, look at parameters, considered higher level stuff, suggests other things, place to see those things, whether they drive implementation or not - agg, access, filtering - sections of pseudocode, can get out of sync with actual implementation, who puts that in there (REG or implementor of the EP?)
- DH - could be both
- PA - having success w/ Auto Coverage report due to DH doing all the mid-level data stuff, able to reproduce it, whoever is doing EP has to have a clear plan
- KS - basic model we have right now, for REG to put in prose and implementor to create map-reduce to make it, suggest 1 level deeper, more declarative sections like data access, maybe not field by field - data, filtering
- DH - want to have specific fields you are using, worry about "fishing expedition"
- KS - describe sections that need to be filled out as part of data call, specific (field names, actual aggs, pseudocode) - data access is mis
Outputs / Aggregation Rules
- DH - from HDS? sep from final report?
- KS should be same logically
- DH - output from HDS is agg data, will be anonymized?
- KS - not anon until combined in analytics node - what outputs? Data Points, how different from Aggregations - Agg: sum of the coverages by zipcode, what about outputs is different?
- DH - shouldn't be, outputs are aggregations
- KS - other point, consider when we combine things is there an expectation there is a functionality beyond that point, ex: the ND thing, does something on Analytics Node, compares against registered VINs, not just aggs from carriers, some possibility the report itself does work itself (compares to other data, etc.), find a way to describe that as well
- DH - what are you doing with the data when in the Analytics Node (AN)?
- KS - outputs on the EP - aggregations, the reduce - what do we want to know about the AGG itself? in code, do we need to come up with standard lang or just prose?
- DH - should not be computer code, (not tech people need to be able to read and und)
- KS - start with prose (human readable) and aspirational goal for some structure
- JB - whats requested to begin with, subset of EP
- KS - how accurately it can be expressed
- DH - then final report - or sep section,
Analytics Node Function (what are you gonna do with the data after combination?)
- KS - formatting final report
- DH - anonymizing data, deleting data once report created
- KS - req from customer to verify or keep evidence of that (deletion), prove data was removed, any core report logic (other than formatting) - anything like collab correlation data, has to be described here, ND example: comparison against registered VINs
- DH - running against 3rd party data, would want to know "what specifically are you matching on and expected results - what do you hope to get out of that 3rd party data" -
- KS - part of what we showed where we correlate with X data, some you want to do on the carrier node before you agg, uses data you don't want to share, might want this section about correlated data on both inside carrier node and outside node
- DH - on Extraction
- KS - talked agg and outputs, didn't talk about correlated data or third party data
- DH - think it would be more efficient for one party doing it rather than each doing their own
- KS - ex: address, dont want to share it, if we are all ok if it isn't exposed w/in Carrier node, every carrier would do it individually BEFORE aggregation
- DH - would want is within EP, already set up will go against 3rd party data, already set
- KS - hinting, this is not just code, some sort of API call avail, all carriers agree API is call-able, hornets nest, could be complicated, requires another component avail to EP
- JB - specify logically, some may have an API, logical not phys requirements
- KS - dont need to do it for stat reporting, not a near-term goal, good to know its avail -
- KS - has to make report available, put it somewhere to be accessed
- DH - reconciliation / qual check at this point before avail to be accessed (reasonability test)
- KS - manual thing, part of process that says someone gets eyeballs on before automatically released, chance for REG to look at report and say "ok", some Release by the requestor?
- PA - seems like want to have stat agent review before Reg sees it (esp with Stat Reports) - have to be able to have stat reporter access analytics node, thoughts?
- KS - logically making the report in some status avail based on permissions, requestor or stat agent can see pre-public release, these are request specific: same person doesn't have access across board only for that specific request, permissions based on that particular data call/stat report
- DH - before AN, part of EP, part of ask, do we want a mock up of what report will look like? so REG can say "this is the level of info I am looking for"
- KS - dry run of report, tells you what would leave your node but doesn't format
- DH - thinking ask rather than EP, "what is this report going to look like", not sure if REG will mock it up or intermediary, dont want them to receive and say "not what I wanted", format detail (what is in which columns vs rows) - what info are they actually looking for, simple, "written prem, by zipcode, cars color=red" - wants to know, as carrier, what is being asked for (format-wise), so EP gets into "how am I going to do that"
- KS - mock up of report or description (Sean Bohan - ask George and Eric to weigh in, join next Mon-Tues calls)
- DH - descrip fine, complicated= parts a, b, c, Peter's stat reports more complicated than "give me written prem by zip", complexity of request and output expected
- KS - what report am I getting?
- PA - sent Andy and Padma for next Monday
- SB - maybe put mock up on REG to put into the data call (not just why they want the data but what they expect from the report
- KS - talked about carrier being able to see the results before shared, as a dry run, diff section, rolling in mind, does reg or requestor want similar funct at some point, some sort of dry-run capability for the requestor?
- JB - "heres what I want" and get data is problematic, peter's efforts, data I would like and what - role of the intermediary (who is actually creating data) - translates biz request
- KS - is there. aneed for REG to get more than consent/like but some sense report will give them what they need,
- DH - example: want by zip, don't write a lot of "red cars", do they want zeros or noting, or zips where they have red cars (gross ex)
- KS - is it sufficient to say we have made data calls quick enough, you dont get what you want you do it again
- JB - purpose of data call is to see whats there (sorry, we don't have red car + zip)
- KS - formulated question either dont get or get what you want
- JB - clarity of ask, goes back to how to define what you are asking for,
- KS - w/o debugger, have to figure out all code in advance before running, some support of what is avail to REQUESTOR as opposed to wide open report, do whatever you want
- JB - elastic search query form, what CAN be executed, focused topic we need to spend time on anyways
Roles and Permissions
- KS - identified the REG and/or stat agent, role of report reviewer can review report before published (diff than report approver) - merge into others - describing can be done by the REG but might also be done by the implementor, exact fields being returned - creating thing called data call or stat report, more detailed than when impelmentors gets in there, will find they need more, can update EP and configuration
- DH - collab between implementor and REG
UI/Interface
Extraction Pattern
Aggregation Rules
Messaging
Participation Criteria
Two Phase Consent
Data Path (from TRV to X to Y - where is the data going and for what purpose)
Development Process (extraction/code)
Testing
Auditability of data
Identify Report
- Who?
- originally everyone participating in call from Regulators to Carriers to Intermediary (Participants)
- What is it (metadata)
- Naming it
- identifier
- Requestor
- type of input
- generation source
- line of business
- what output should look like
- explicit math for aggregation
- Purpose of data (what being used for)
- similar to what is captured on a data call
- DR - stab at making a vers of this, idea of what it should be (ref reqs), see how it looks, whats missing, etc. - find gaps as opposed to trying to be complete here - for todays putpose some metadata along lines of reqs, would we do first req/draft of what it would look like, anything missing? (feels like reqs lite)
- KS - info req section in reqs table, first iteration/sol will highlight gaps
- SK - any existing samples of data calls/reqs? metadata assoc w/ request, match up, covered in the list?
- PA and KS to discuss what will be shared, integrating w/ other depts, large list of data calls from other systems, working with ops teams to bring it together, high level looking to make big improveements on metadata and reqs
- SK - date thinking couple (date of req, deadline data, expiration date)
- KS - for a report these are the fields we fill in: (a la data dictionary definitions), what data call was intended to capture but inc all of details Dale pointed out, there is bridging vs pointing back to reqs, layout for report - THIS IS WHAT WE ARE TRYING TO DO/WHAT THIS REPORT IS
- Identify Stat Reporter
...
Create Report Request (Configuration)
Looks a lot like "Identify Report"
Define jurisdictional context/req (single or multi versions of same report)
How often it runs
Data Accessed
Outputs
Roles and Permissions
UI/Interface
Extraction Pattern
Aggregation Rules
Messaging
Participation Criteria
Two Phase Consent
Data Path (from TRV to X to Y - where is the data going and for what purpose)
Development Process (extraction/code)
Testing
Auditability of data
Generate Report
Rule Base for each report
...
Validate against participation criteria (vs report config)
Exception Processing
Messaging
Generate Report
Auditability/Traceability
...
- Everything needs to be edit-able
- Fixes don't happen in current month (monthly correcting and then moving on)
- Latency of error correction could be a year
- need to make sure we have facility to capture corrections made while NOT bastardizing HDS
- internal or architectural? DR is aware
- SC - Errors:
- missing information (on record provided)
- current environment vs future
- today - flat file from upstream, flat file submitted with missing limit, info passed to AAIS, flagged by AAIS, returned to carrier (can see instantly by state), these 2 states need fix made, go into SDMA to make fix then submit, AAIS approves, loaded by AAIS
- it had already gone thru the edit
- DH - load into SDMA, not approved yet, Susan makes corrections, goes thru edit again once Susan made corrections (see right away if fix worked), if in tolerance it is "approved" by AAIS
- PA. - doing upload to SDMA, staging area, AAIS not running load until it is approved (edit package engaged)
- SC - loading it to AAIS system, told to fix errors, fixes, then "officially submitting" and AAIS "approves"
- PA - can't go to HDS until "approved"
- DH - where within process is edit package? where is facility to correct the errors, if HDS is supposed to be matching to source systems, then we shouldn't be making changes to HDS for other purposes beyond StatReporting - decision in ArchWG - how handle error corrections and fidelity of HDS
- PA - direction, making update, go about making corrections of data already inside HDS, first example - data before HDS, different Error type
- JB - case Dale mentioned, HDS is out of sync with source system, SS has error, needs time to fix, copies of DB with errors to be corrected - would suggest errors corrected get corrected in HDS but a log to inform source system of corrections as made - instead of lots of copies of collected data
- JM - crossing boundary - doesn't care what carriers do - where do we stop caring - only thing, HDS has to be right, up to the carrier how they get it right
- JB -yes but instead of making fix and a copy of DB it seems it should be fixed in HDS
- SC - internal issue, AAIS needs to edit data, thats their job , if they say "2 errors" and they get fixed she says "done" and pushes to HDS - conflict with source system is something SHE deals with
- JB transferred to AAIS for edit checks,
- PA - held before data lake until adter corrected
- JM - cant occur until content is in it
- PA - edit pre ETL
- JB - do it 2x, if you correct HDS need to run edit in that environ
- PA - how do we have chick-egg issue
- JM - policy vs implementation ? - HDS is great cutoff point, everthing inside, up to the carrier to get it right in HDS - BUT Edits tell you whats right - carrier accountable up to HDS, if accountable on the carrier side and can verify before HDS, do the edits, send to HDS - what if I say "right, but edit stuff can't run iutnil other side - alrwady loaded to HDS - now what do I do? - accountability? where run edits is key question
- PA - edit package run today, run on etl on load, no knowl on load - 2nd part AAIS does reconciliation after, sometimes errors arise
- error type 1 - pre HDS , edit package fails on load - but what if loaded in HDS what is the recon process and what the process for that
- JB - financial types of reconciliation
- PA - yellowbook #s, compare #s submitted vs financial #s and due to granularity things come out wrong, financial reconciliation before stat reporting
- JB - 1x year vs monthly
- JM - reconciling financials? where?
- SC - public info
- PA - reach out to team with gap analysis, grey areas in codeing vs what they have, validate where /why numbers are off
- SC - those arent errors, do reconcile, out of process doesnt become errors, differences and reasons why page 14 doesn't match - but NOT errors
- PA - validity AAIS gets turning in reports on carriers - not only passed edit package but biz data matches fin data and a reason if it doesn't - why states listen to AAIS, how are we ensuring we are doing stuff correctly
- JB - diff record exception
- JM - annual value add - edits? HDS needs two stage?
- think its right but flag then run edits and get "ok/not ok" - question - who runs the eidts? in principle edits run on anyone centralized db
- JB - copy of edits made avail to all
- DH - one body resp for edits, not every single carrier
- JM - you put data in HDS, centralized code runs on all dbs, puot into HDS in some manner "this is not fully approved/edited" and decision: edit in place or is it a 2-stage thing?
- SC - even if every carrier ran edit package themselvess, ult AAIS HAS TO RUN EDIT PACKAGE - resp lies with statistical reporting partner
- PA - extract patterns to un T/F that a package was run - do test on clean or dirty data
- JM - edits form of extractPattern, is it sufficient if it checks all the data
- PA - regulator!
- JM - need feedback - run edits, if answer wrong, accountability to get it right
- phys load or set flags
- PA - should be running edits before load,
- JM - WHERE? edits have to be consistent lang, thing needs to be well-defined structure
- PA - rules engine, java, repackage rules engine as step in process going thru load (pass/no pass)
- JM - engine has to run against well define struct - b/c our data runs against well defines struct, now you are in HDS? put it into well def struct to run the rule that is the post-edit vers of that structure
- PA - messaging format of HDS - stat plan, objects, run edit package against that
- JM - stat loading and knowl, if run edits against that, once passes - put it somewhere else or flag it - 2 concepts pre and post - saying to all carriers it needs to be PRE data but it has to have a shape - HDS? JM perceives when you demand "struct in diff way" and sees it as HDS
- PA - diff pipleline but sees why it is outside of HDS
- JB - data standard for saying how data will be considered, keep in mind dist arch, AAIS can't run anyting on db at carrier - raw, wont be sent to AAIS
- PA - collections of stat records, running rules against them, if HDS is stat plan JSONified, run EPs, passed valdiation and legit extract
- JM - HDS is JSONified stat stuff, edits, things all can see are ALL HDS in his mind - if prescribing shape b/c edits won't work, first place carriers have to do that
- PA - pipeline A before HDS, where prescribed the data hits first
- JM - widget shape here, then ep - prescribing shape, set of edits then HDS - pipeline A is a prescribed shape, do whateve it takes to get it right, once edit passed drop into HDS
- DH - wants to have DavidR weight in
- PA - Pipeline A (infra before HDS), need to pull rules engine, before how much do we want to control creation? JM talking about HDS being a larger thing, where does the balloon around openIDL begin? PipelineA is infra, carrier does all before? will still design load up to plugin
- JB - think of pipeline A as data format
- PA - wont process and give feedback
- JB - need data format to be standard to run rules against, gives flexibility to reconstruct design with same format (transit from flat file to whatever).
- PA - docker image with initial process? where is the official inbound point of openIDL community vs carrier
- JM - one step at a time - HDS in the dark (far right), run extract patterns on - before HDS has to pass edits - edits need to be centrally maintained, id DRules expecting something - pipeline A - already in that shape - sayig to carriers, prescribe format of HDS, to be right prescribe the edits, has to hit a prescribe shape here - carrier can do whatever to get into that form, that form is prescribed, java thing, json, all prescriptive, no flexibility
- PA HDS, cna write queries against, layering other things not HDS
- JM - centralized group do edits, carriers get it into that shape, must be part of standard of stuff to be prescribed
- PA - meat of Drules, lot of it is testing stat plan, start ingesting as json, checking positionality
- JM - thou shalt not load HDS until edits passed, edits managed, approved format, carrier must get data into shape - reload until passed and THEN move to HDS
- PA - can we have a bucket, fire lambdas against it, won't move to secondary bucket until passes
- DH - suppose use HDS for other things, communicating with reinsurers, something outside of stat reporting, now that HDS not necessarily reflects source systems
- JB - source consistent, take time to get corrected, logically - more correct versus HDS
- PA - HDS more right than source system
- JB - fixed at HDS but not at source
- JM - policy, carrier accountability, edit finds something wrong, iterates on changes, if it takes 6 months to get back to source, for next 6 months other reports don't reconcile - accountability in governance statement "if you find an error you are accountable to reconcile"
- JB - consolidated data in HDS for other purposes, if corrections were in HDS the right place to do it
- JM - better that doesn't line up is wrong
- JB - log for where / when changes done
- JM - carrier accountability - more right data - where is accountability to carrier? whatever it takes upstream - tell us changes you made requirement - log that says "to get this loaded here are 7 edits" - accountability to make it transparent
- PA - meta on each row with last update date and what changed
- BH - if systems don't reconcile - BAD - what else are we doing with it? problem to be solved, may be a log, sounds painful
- SC - reality - keep a log today (she does of every change made) - most cases data SC didn't get on her file (stat file) - is it really diff from source system? she didn't get it on her file due to mapping upstream -know zip code is wrong or vin is wrong don't change things in her file or tell source system theres too many (agents inputting) - ok if under 5%
- JM - practical question - do edits - syntactically and semantically: find alpha, don't know if someone mistyped VIN, but no idea T/F in real world - HOW RIGOUROUS DO EDITS NEED TO BE? - even if edits flag error? can we accept it?
- SC - happens all the time, might get edit "limit on policy is $1MM and you got something else - not an error"
- JM - 2 levels of edits? showstopped (dead) and one we accept
- SC - wont ignore fact error was received, will go and looks "did I have the right limit" - edits help und if there is a problem - is it internal edits ?
- JM - what is the purpose of an edit? don't edit more than you have to - what is the purpose in this context - all sorts of mech for internal correction - don't edit more than you need to without purpose - some things you have to fix, principle: only put in edits b/c hardcore reason to do it (not just clean data"
- JB - work to be done - application and analysis and insight, not policy-level corrections
- JM - do edits have levels? severity of error (which means will it be addressed)
- JB - sanity check errors vs record format errors - can and will catch but WHERE in process
- DH - gut check for AAIS as stat agent on how rigorous they need to be
- JM - levels - showstoping and scary and "oughta check"
- JB - accuracy in general (THRESHOLD)
- JM - confidence scores from address cleansers -
- showstoppers (break system)
- competency score (".7 good enough? yaaay")
- JB - data quality scores, pick battles
- SC - basic: does every field get a val - current and future, if not ABCD - if that field is filled? if so whats in there, nebulous - stat agents bear resp of "data is reasonable", know it is not garbage, how much has to be "good" - what does "good" mean (every field filled w/ reasonable value"
- JM - mTable that does this - argument - for every field "type, table, range, = score"
- SC, come across something, didn't meet the threshold, kick it back?
- JB - levels determine response
- JM - governance ? - value, string, etc. - don't measure if you aren't gonna govern it - if you are gonna put a rule in there, must have governance polity - arch has to provide for edit layer and series of thresholds to get a score and governance policies by score
- JM - pass/fail and scoring
- PA - extra metadata for user queries
- KS - "close out the quarter", might go back and add to it - close out means can't change later, cant put in records that apply in that 1/4 later if you "close out" - do we need some way of sensing we are opening up a 1/4 again and need to re-assert it is ok?
- SC - have had situations where we discovered issue and "need to fix year" for a line or situation, b/c today timeline is so stretched out - takes too long - go back and adjust the year b/c the reports hadn't been issued - recent sit in MASS where they wanted to change format of something, had to refile and had to insure when refiled the dollar hadn't changed at all b/c they closed out the quarter already - nice to say "over/done", this case money wasnt part of the problem, but if discovered issue with $, must be some threshold, why would you go ahead with an annual report KNOWING there is missing $ - can update #s quarterly for up to 2 years, as necessary, REGS want quick/soon data, how long keep something open - "close it out" - can't just say "ill make sure under 5% at end of year" at the very least 1/4 has to be finished as best of your knowledge
- DH - does AAIS have to close out quarters
- PA - getting better for the future, like what TRV doing smaller slices, update by quarters for all
- DH - do we really need to close a quarter?
- SC - maybe not "close" but maintain data integrity
- DH - is there metadata that needs to be est that says "ok, data thru Jan is within Tolerance", accumulates over course of time
- KS - across all data, individual state?
- SC 5% has to be BY STATE BY LINE
- KS - if you don't load anything over 5%, dont allow, cant be over - closing out, interesting, discussing "attestation" - attest as of date: date range is good, edit package says "quality data". attestation range of data "as of X date, data for Qx is good, use for reporting now)
- DH - some time in the future, find something wrong with Qx, "as of today I can say last month's data is correct
- JB -change month or period, re-run that check
- KS - update data to be in sync with source systems (HDS Not source of truth) - any time problem with upstream or ETL, requirement: closing out a quarter by attesting "as of x date, quarter 3 has been loaded and any change to that must be re-attested" - simple as "up to this date"
- DH - other than ETL issues DR described beofre, something funky happended between source and HDS, diff than what SUsan is describing with "true errors" - when fixing those errors will be a new set of transactions, new load with corrected info, as done will be run thru edit package and maintain 5% tolerance
- KS - if a transax is changing data that would come. from a report that would have gotten diff data need to reattest
- PA - Regs want us to strive to make the data better, not a req to repro report when it was generated
- KS - this req: I changed data and reattesting it is ok, changing the data just saying, not saying reproduce - CLOSE OUT: loaded data, ready for reports to run, now changed data, needs to be auditable, data that was there and attested to, changed and been closed out
- DH - go back as a req - do we need to close anything out? dont see purpose to having it "close", policy this year will have claims for next 10 years. I can't close 2021, can close data for 2021, not sure what "closing" means
- JB - not the same as closing a financial report - this is a data qual check to make sure threshold still valid for a time period, re-attesting - can still add data
- DH - making a glitch correction vs fixing data, SC's example: not changing data but adding new records to fix whats out there, transactions thru edit, wihtin tolerance
- KS - data thats ready by a report in that time period will get different data - does it matter - close or re-attest
- JB - get rid of idea of "closing"
- SC - be careful, "closing" is semantics, at some point to produce timely reports needs to have deadline, today report monthly to AAIS, report monthly to other stat agents that req monthly, needs to be there 45 days afte rmonth ends OR 45 days after end of 1/4 (AAIS), regardless of when sent needs to be in and under 5% by May 15, to produce reports, have to be timelines, dont wait until end of year
- DH - small carriers who only load 1x a year
- PA - due to old contracts, moving them to openIDL on a diff cadence
- SC - good example, if only report annualy and now report Feb15 for prior year, runs thru edit package find errors - is it in and under by Feb15? clean by? longer you go the longer you push out when you do reporting
- PA - diff in the future, spring 2021 lots were turning in stuff late, no repeat of that
- KS - assumption: nothing in HDS above 5%
- DH - needs to be architected, is there a precursor to HDS where info loaded, read into edit package, correction then HDS or is HDS a landing point and a secondary DB for stat reporting that has the correct info - how do we put that plumbing together
- PA DIAGRAM
- DH - many erors dealing with are omissions, coming from plumbing, ult source into data files used to create stat files, where info has not been provided that should be, while stat file may not rep "truth" the corrections should rep TRUTH
- KS -attesting data loaded in HDS is TRV ability to tell the truth, wont match source sys for reasons, but attesting it is the data you can puit into HDS, for stat reporting
- DH attesting to " good for stat reporting"
- Everyting in HDS is usable for stat reporting
- DH - outside of HDS do we need metadata that says "as of aug20, info in the hds, the last load was in tolerance and sequence of loads into HDS are within tolerance" - do we need to inc control mechs (policies, premiums and losses)
- KS - opinion regarding claims vs policies, cant use for loss data up to this date, certain years old before used for loss reports
- DH - "accident year" wont close for sev years, have info, "incurred losses" is what they THINK it will be may change over time
- KS - attesting that data in HDS is good up to this date
- missing information (on record provided)
- 08222022
- DR - Can't start making changes to HDS directly, gets out of sync with source system, can end up not matching sourcr systems then State Management problem, hairy, load new data, what edits already made? (not better than used to do) - doesn't think you can edit directly in place, HDS in his mind still design tenet one: faithful rep of back end systems... Dale made clear need to have facility to make changes, cant do on fly and takes time and needs to be done - solution something with foreign Key to a CORRECTED table or a federal other store of view, updated or changed as needed and as processes improved goal would be thing is short lived, alive for corrections and next extract - HDS can't be anythign but rep of systems of record
- KS - Edit package not based on completed report
- JB - if there were errors that came from source systems, had exceptions (fatal nature) and couldn't accept data and had to edit source system, takes time to correct something in source systems, easier to extract
- DH - clarify - errors they had runnign thru SDMA, most instances (not a lot) had 486 instances to correct stuff/169 of those were "liability limits missing" - feed was not providing the approptiat liability limint, doesnt mean source didn't have it/ correct just NOT being fed to them
- KS - ETL is wrong or source system is wrong, have to keep what was fed from the source system and when making vhange has to be sep place understood that this was changed, can go back and find changes made and fix them
- DR - situation where, limit wasnt there, in source system BUT in HDS, whatever reason, new record in and fixed (ETL is fixed) and somehting is there, how to handle mismatch? which to trust? One in HDS is prob right? FOrces making decisions as to what to do when reload HDS, code a lot of judgement in or precode decisions in how to update.
- KS - keep it simple, see if patterns, automate where can, track what changed to (dont lose previous) and deal with it when refreshed
- DR - obvious prob is bloat, shadow versions of everyting
- KS - 480/10MM not bloat
- DR - 2 assumptions: A. Not a lot of changes, B architectected to take adv of fact not a lot of changes, make it in a way to not hurt you and way to automate processes, so bloat becomes ephemeral,
- JB - do something like that, HDS has correct info so queries are correct and audited record. keep track of what did change
- DR - HDS can never be edited in place, cant be something to keep track of something that diverges from downstream systems - only SoR is Downstream SoR, cant maintain business logic of having to decide what to update, preferable: the edits will be referenced,
- JB - complicated Extract Pattern, looking for exceptions
- KS - do views to accomplish that, want to make exception hard, not easy path, make whatever it is, keep both in mind, when you have few then sattelite table rather than core table and then deal with view idea, as long as keep consistent pattern not bad
- JB - run reports against HDS,
- KS - extraction has to see corrected data otherwise why make corrections at all,
- DR - too challenging to write "HDS is faithful rep of Core system" but an edit needs to be made, pull that data, easier someone does EP that does nothing but that table with edits applied (convenience function: first thing build corrected table, build EP, run extraction)
- DR - Ephemeral Bloat
- PA - 2 weeks ago on with jamesM, ETL pipelines, edit packages - scoring errors - talking about two metadat columns on load, simple pass/fail - ex: 5% by line by state, wong zip in wrong state, maybe have metadata columns - 2 flags
- DR - like "flag" if wrong, didnt pass, flag it. Confidence score is imprecise,
- KS - use case specific
- JB - on collection not single record, accumulate across records, addiitve processing of scores, confidence score of total
- DR - 400 out of 10MM, make any changes? THey did, more errors but within tolerance - in some locations, sime lines, all add up to 400
- DH - not to zip code, not vin number
- DR - leans to "dont fix on load fix on extraction", spend time pushing downstream to sys of Record as they become problems
- PA - first flag - dead in the water, real vs nonreal error, rate zip codes diff, some pass/fail
- DH - pass/fail depends on case, ipcode may be bad, state ok, if doing extract for state of AL, not looking at zipcode, zipcode bad doesn't affect ability to pull AL data
- DR - confindence score too tough, too specific, "heres an edited row IF YOU NEED IT", logic of EP could make decision to pull or not
- PA - doing load, zipcode wrong but state code correct, will I gen edited row to omit zipcode
- DR - edited row based on why edit, its the fix, ommision , correction, leave up to extract pattern and say "something wrong with row, here is the correction
- DH - may not fix it - Error / Fix Error / Not Fixed
- KS - can't have fidelity the specific knowledge of knowing exactly what was wrong w/o going crazy, need to see biiggest/scariest
- DR will happen as we build, flag up, fix = omit zip, if under the state/line then great - we will be using to improve downstream processes
- KS - if we can track what fixes need to be made all the time, across carriers, can work on fixes
- DR - up to each company to decide -
- KS - wantt o nkow wheat activity is, what you do with it is your choice - good req: track changes and report them
- JB - exceptioin for data qual checking, more than "there is an error here" - want to share what is wrong
- KS - is it true, fixed record or deltas
- PA - need an error log table, what keys failed
- DR - wouldn't that come outof it anyway? right now tell you what to fix?
- KS - are we keeping it in the right place?
- PA - 2 sep systems: data lake vs SDMA, 2 diff systmes, not carrying error in record
- DH - dead simple: error and heres th efix, inferred what was fixed by what changed, if error flagged and new row with correx, great if not then assumption says "good enough as is", can see what works if that simple, if we see spending too much time fixing rows, someone will fix downstream, solution too much gets way to complicated
- PA - can we work thru what it will be like to work thru these errors, more in terms of SQL than Mongo - select all rows with a Y in the rror col, re-run edit package to get those? what col or errors?
- DR - judgement still on, whether needs to be corrected, if there is a change needs to be made, will gen new linkage, 400 transactions point to 400 corrected transactions, if no pointer, then ignore, if there is use the coreected record, judgement based on team w/in carrier so it meets requirememnts - in theory coudl spin up new tabler with corrected columns, make a materialized table ot corrected rows, do extract against that - super simple dont worry abotu whats wrong
- PA - wondering not be able to say "grab all records with no errors"
- DR - could grab all rows, if any has "Y" for error, go retrieve alt row with that foreign key and overwrite. - convenience funct, make a new table on day one, make extracting from that table and drop it
- PA - corrected table, updates daily
- JB - source isn't single table but a set, have complications with replica of tables
- DR - convenience funct, save complication on extract writers part
- PA - single table design
- JB - not forever, will want simple relational scheme in the future
- PA -hesitant, single table with stat records
- DR - see where this takers us, where we need to go, feels like bar for Regs ins't high, just wants to get data more frequently, 80% gets us easy, try something, simple approach, Dale ability to edit as needed to implement requirements, similicity, no state management and hits biz reqs
- PA - benefits to get more normalzed
- JB - if just doing POC for stat report but if you are doing other lines of buisness,...
- PA - other lines have key, stable design
- JB - at some point single table looks like long cobol row, need to keep track of errors, more than just stat report
- PA - havent heard ? that couldn't be solved w/ single table design from the stat record
- KS - think there will be use cases, challenge signle table approach, be ready to make change, mnot stuck w/ one model, can do it when we need
- JB - from arch POV, descrinbing how to do exceptions, error handling, approach isn't general soliution
- DR - only thing in horizone, dirt simple approach to solve problem reticent to complicate it yet,
- KS - deal when we get there, overthinking now a distraction, KISS, flat model, eyes and ears open
- DR - have to make it simple for them AND reliable, holding HDS must be reflect of SoR, makes it easier for carriers, flat makes it easy too (single table) - flag error points to corrected entry,
- KS - some way for EP to see errors, materialized view or something else, to make fixed val available
- DR - edits made via dashboard (TBD design)
- PA - java app runnign SDMA, ? as to how to implement dashboard into HDS
- DR - assumes some, b/c taking HDS and outside dependency of edit table, not auto generated, get some manual assent to the data call or stat report
- PA - like specificy funct on dash for what records to use?
- DR - since someone could be manually editing, has to be a
- PA - if load is not compliant, too many exceptions to be used for particular dataset it SHOULDN"T be moved into HDS
- DR - would be moved to HDS, as HDS will show wharevers there
- PA - get under tolerance pre-load
- DH - will HDS rep soiurce of record or corrected file
- KS - extractable part of HDS, need to say "this record or set of records is logical HDS for extraction after edit package, initial stuff
- JB - simplify initial HDS agrees with source systme is STAGING HDS and make correction and push to Production HDS
- DH - have 50MM reocrds and correct 5000s
- JB source vs corrected, staging and queryable
- DR - copies imply you have a staging and fixed db and fixed is fit for purpose for cert things, might not correct everththing (resource intensive) - it is a GOOD ENOUGH db, reps best of carriers ability compared to core system, can be referenced into another row/table somewhere - if you make a copy, make logical divisions, right back where we started - "correct enough"
- KS - EXTRACTABLE - implement is tbd, original stuff, edits made, extraction pattern
- DH when do the battles begin?
- PA - initial load of data today, needs to be some way for loader to get stats like SDMA does now, - UI, do load, get stats back
- KS get data from carrier, fix data, when EP happens fixes are visible
- PA – doiung now, carrier is fixing stuff, give carrier tool to fix before extraction happens
- DR - 2 types of fixes, might be able to affect in source sys of ETL and in data store - SDMA tools, the error checking tool highlights those, still the same DB still there, not loaded in staging
- PA - as a data engineer, def think before do first submission you want to run checks before, putting data in and THEN running daata checks
- DR - only place to make changes is in this db, no where else to do it
- PA - need an SDMA person, when doing a load and stuff goes wrtong are you correcting in UI or dataset? Doing correx in UI
- DR when they fix upstream fixing the flat file, sometimes impact source systems, in this case load into the <not HDS> and then say "oh errors" and then decide to fix
- JB - if existing db with records, loading batches, other criteria to check (all, just what loaded, timeframe), seems natural for simplicity
- DR - worried, other DB/datastore that is corrected that gets loaded in
- PA - not accepting flat file until it is good enough
- DR - doesn't want to work on flat file, would rather take flat file via API at that point, if fixing all in flat file,
- PA - scenario - state column doesn't populate, load 1MM records, fix flat file rather than fix errors - some preprocessing will keep error count lower
- DH - can build an edit package theyd have anyway, qual checks to prevent egregious errors
- DR - building shadow system, to build diagnostic, additional flow - test environment (pre-load), lot of times, changes needed in this case are goign to occur in this db, if can make downstream he will but many times he can't, can run diagnostics starts feeling like in-between
- PA - w/o preproccessing to check keys are there, end up w/ extraordinatary # of errors
- DR - initial problems, but closer to steady state ops of this, prob wont have a lot of probs cropping up alot - if you do have probs on load, decide where to fix em, otherwise maintaining dev env in parallel
- DH - maybe a solution is, as we run edit package we have an abilityt o delete those records that were part of this batch load to fix outside of this load, if you had to
- DR - dont want it to be stateful, reload when fix probs, make it simple, could have vers in parallel, probs would fix probs, load into test again, load into proper one, geared towards ETL errors
- PA - weird not to do pre-processing check
- DR- issue is scale, obvs build checks before anythign pushed in QA process for operational changes but in production load...
- PA - data not code, saying "before we run load we have x checks run, are valid keys here
- DR - not loading entire data set, could be some checks occur before load
- DH - high level QA process, critical errors vs
- DR - not realtime system, doesnt need to be perfect when it lands, doesn't see need for perfect when hits HDS
- KS - removes some of the bloat of records - NEEDS OT BE A WAY TO AVOID UNNCESSSARY ERRORS (PRE-PROCESS< whatever)
- DR - capture in QA vs operational load, not thinking of operational process,
- PA - every load as from a 3rd party, seems like ingesting 3rd party data will check
- DR - not internal for
- JB - not just loading anything w/ stuff that is incorrect, some kind of prechecking
- DR cant commit to prechecking - need to be addressed, cant agree to it yet
Reconciliation (to do Mon 9/12 w/ SusanC)
Reconciliation (make sure report is correct based on request - reasonability check on the report - NOT financial reconciliation)
Financial Reconciliation (Oracle? Source of truth to tie against those #s?)
Statistical ReconciliationReconciliation
Auditability/Traceability
Deliver Report
- PA - operator/operations - walk thru ELowe's perspective and discuss others - Eric reps VA, will need one report per season per line, some lines more than one report, just talking annual now - get report to Eric, get it from every carrier writing in VA, make it both as seamless for EL to get carriers and as seamless as possible for carriers to acceppt all this - today have to have one EP per report (50 EPs/50 reports - ea EP slightly different) - how do we group things togerther to minimimze the number of requests/consents (auto coverage report)
- DH - Stat Reporting? Likely to say yes to all 48 states, for TRV all-in, doesn't matter, prob isn't going to be challenge could do by state by line and an option to "select-all"
- PA - stat agent ( in this case AAIS) be keeping track of all reports and then have Regs from ea state subscribe to which reports they want done and then Carriers can subscribe to ones they want to fill
- DH - will choose stat agent, not only YES to all states all lines but also which stat agent doing it (no state with 2 stat agents) - a stat agent per state/line - only one w/ 2 stat agents is TX (only recognizes ISO/Verisk) and MASS (MASS CAR)
- PA - Stat agent needs to keep track of where reports are, keep list of subscribable events
...
- PA - what kinds of exceptions
- report generation fail
- Combiner logic failed
- way combining stuff, got logic, EP, put it all together - combining logic for each report will be unique, EP implemented and attached to data call but haven't talked about combiner logic and attached to EP
- Potential where each report has slightly different combiner logic
- define EP, not all the code in a report, then there is formatting report after combined, diff from report to report
- potential there is bespoke report logic for a particular data call or stat report - could vary and could be same
- how much do you put into the data call - will have learnings
- take a look at combiner logic
- report generation failed
- Diff then combiner logic
- retry report gen
- Combiner logic failed
- didnt meet % threshold (company wont participate)
- data in PDC does not match expected format (something went wrong with EP)
- data in PDC does not pass edits
- possible
- tolerances - specific record may not pass edit but w/in tolerance, how to handle?
- ex - NC state w/ SC zipcode, under 5%, include/not include?
- "for a record they do not have a limit", on a couple of records,
- if doing report, missing stuff, then omit record (acceptable solution?)
- visible in the EP, when you do EP, aggregate and ignore records would be visible in that code, hard to see, not in text
- report generation fail
Auditability/Traceability
- DH - entire comms module needs to be auditable
- JB - requests and transactions on-chain, how do we inc email notifications
- KS - audit a report was received, what do we do with the system that isn't auditable
- JB - benefit of common channel, audit trail for interactions there
- KS - utilize and put things like receipts next to data call (JSON object in ledger,), receipts, contributors, etc
- PA - can't inc raw data (no raw data)
- DH - timestamps of when pulled w/in each carriers node
- KS - consent timestamp
- JB when deliver data itself, private data collection is hashed and thats a record (on chain) - inc into a consistent scheme - sone application that uses it, info on chain
- KS - all updates to data call itself are auditable, on blockchain a given - who has access to whats in the audit trail/traceability? unearth some things not sharing - hanover could see who consented even if they didn't -
- PA - hash the stuff on chain and only give keys to consented
- KS - if a company needs some audit report, managed by admin of the network, should not make audit info avail by default
- KS - attributing things orgs touch to them, part of audit trail, consent: who, when, destination, history of data call, who updated data call, when EP run by consent
...