openIDL - Architecture Definition Workspace
- 1 Contributors
- 2 Process
- 3 Deliverables:
- 4 Scenarios
- 4.1 Stat Report
- 4.2 Auditability/Traceability
- 4.3 Notification
- 4.4 Data Call
- 4.4.1 Communications for resolving conflicts, etc.
- 4.4.2 Load Data
- 4.4.3 Create Data Call
- 4.4.4 Like Data Call
- 4.4.5 Issue Data Call
- 4.4.6 Subscription to Data Call
- 4.4.7 Consent to Data Call
- 4.4.8 Mature Data Call
- 4.4.9 Abandon Data Call
- 4.4.10 Clone Data Call
- 4.4.11 Deliver Report
- 5 Access and Roles
- 6 Application Components
- 7 Data Sources, Sinks and Flows
- 8 Decisions
- 9 Tenets
Contributors
Initials | Contributor |
|---|---|
DH | Dale Harris - Travelers |
DR | David Reale - Travelers |
SC | Susan Chudwick - Travelers |
JM | James Madison - The Hartford |
SK | Satish Kasala - The Hartford |
KS | Ken Sayers - AAIS |
PA | Peter Antley - AAIS |
SB | Sean Bohan - openIDL / Linux Foundation |
JB | Jeff Braswell - openIDL / Linux Foundation |
Process
The Archiecture Definition Workspace is where we as a community come together to work through the architecture for openIDL going forward. We take our experiences, combine them with inputs from the community and apply them against the scenarios of usage we have for openIDL. Below is a table of the phases and the expected outcomes of each.
Phase | Description | Outcome |
|---|---|---|
Requirements | Define the requirements for one or more possible scenario for openIDL. In this case, we are focused on the stat reporting use case. | A set of requirements. openIDL - System Requirements Table (DaleH @ Travelers) |
Define Scenarios | Define the scenarios sufficiently to gather ideas about the different steps. The scenarios will change over time as we dig into the details. | A few scenarios broken down into steps. |
Brainstorming | Gather ideas from all participants for all the different steps in the scenarios | Detailed notes for each of the steps in the scenario(s) |
Architecture Elaboration and Illustration | Consolidate notes and start defining architecture details. Network Architecture - different kinds of nodes and how they participate Application Architecture - structure of the functional components and their responsibilities Data Architecture - data flows and formats Technical Architecture - use of technologies to support the application | Diagrams for the different architectures
Tenets
|
Identify Spikes | From the elaboration phase, will come questions that require answers. Sometimes, answers come through research. Often, answers must come from spikes. Spikes are short, focused deep dive implementation activities that help identify the right solution for aspects of the system. The TSC must approve the spikes. |
|
Execute Spikes | Execute approved work to answer the question that required the spike. | Spike results documented. |
Plan Implementation | With spikes completed, the team can finalize the design of the architecture and plan the implementation. | Implementation Plan |
Implement | Implement the architecture per the plan. | Running network in approved architecture |
Deliverables:
Scenarios
Stat Report
Subscribe to Report (automate initiation & consent - assumes stat report)
Define jurisdictional context/req (single or multi versions of same report)
How often it runs (report generation frequency)
PA - avoid 13 lines and 13 reports per state per carrier - come up with a way to simplify distro of reports
KS - make sure subsections covered, sections expanded - dont see anywhere we discuss data access
Extraction Details / Metadata
KS - want an und of what data will be accessed by EP, when a report runs what data will be accesses, field by field, EP says all that, discussed whether or not EP being code is enough, said we want something more than just code to tell us whats being accessed
PA - work with Auto Coverage Report is complex, want fields going across rows, and know what filtering criteria is, take these fields on this line over these dates
KS - declarative approach to EP, declarative less writing JS code and more saying "fields accessed in EP" and a section "Aggregations", early on suggested approach (DR?) some sort of a "pre-processor" (was JB) - "this is an EP, know things in the EP, requires way to gen code from it
PA - great idea, seeing 3 tiered thing: 1 explicitly biz level (top of data call - earner prem, most recent year), 2 more like lang agnostic metadata, 3 full implementation
DH - really talking about "smart contract" will be - really the REQUEST, detals -agree we want what fields, how info aggregated, how filtered, what date param used
KS - agg rules, parts of the implementation detail, agg rules for data access, date, param, section on param for the subscription, look at parameters, considered higher level stuff, suggests other things, place to see those things, whether they drive implementation or not - agg, access, filtering - sections of pseudocode, can get out of sync with actual implementation, who puts that in there (REG or implementor of the EP?)
DH - could be both
PA - having success w/ Auto Coverage report due to DH doing all the mid-level data stuff, able to reproduce it, whoever is doing EP has to have a clear plan
KS - basic model we have right now, for REG to put in prose and implementor to create map-reduce to make it, suggest 1 level deeper, more declarative sections like data access, maybe not field by field - data, filtering
DH - want to have specific fields you are using, worry about "fishing expedition"
KS - describe sections that need to be filled out as part of data call, specific (field names, actual aggs, pseudocode) - data access is mis
Outputs / Aggregation Rules
DH - from HDS? sep from final report?
KS should be same logically
DH - output from HDS is agg data, will be anonymized?
KS - not anon until combined in analytics node - what outputs? Data Points, how different from Aggregations - Agg: sum of the coverages by zipcode, what about outputs is different?
DH - shouldn't be, outputs are aggregations
KS - other point, consider when we combine things is there an expectation there is a functionality beyond that point, ex: the ND thing, does something on Analytics Node, compares against registered VINs, not just aggs from carriers, some possibility the report itself does work itself (compares to other data, etc.), find a way to describe that as well
DH - what are you doing with the data when in the Analytics Node (AN)?
KS - outputs on the EP - aggregations, the reduce - what do we want to know about the AGG itself? in code, do we need to come up with standard lang or just prose?
DH - should not be computer code, (not tech people need to be able to read and und)
KS - start with prose (human readable) and aspirational goal for some structure
JB - whats requested to begin with, subset of EP
KS - how accurately it can be expressed
DH - then final report - or sep section,
Analytics Node Function (what are you gonna do with the data after combination?)
KS - formatting final report
DH - anonymizing data, deleting data once report created
KS - req from customer to verify or keep evidence of that (deletion), prove data was removed, any core report logic (other than formatting) - anything like collab correlation data, has to be described here, ND example: comparison against registered VINs
DH - running against 3rd party data, would want to know "what specifically are you matching on and expected results - what do you hope to get out of that 3rd party data" -
KS - part of what we showed where we correlate with X data, some you want to do on the carrier node before you agg, uses data you don't want to share, might want this section about correlated data on both inside carrier node and outside node
DH - on Extraction
KS - talked agg and outputs, didn't talk about correlated data or third party data
DH - think it would be more efficient for one party doing it rather than each doing their own
KS - ex: address, dont want to share it, if we are all ok if it isn't exposed w/in Carrier node, every carrier would do it individually BEFORE aggregation
DH - would want is within EP, already set up will go against 3rd party data, already set
KS - hinting, this is not just code, some sort of API call avail, all carriers agree API is call-able, hornets nest, could be complicated, requires another component avail to EP
JB - specify logically, some may have an API, logical not phys requirements
KS - dont need to do it for stat reporting, not a near-term goal, good to know its avail -
KS - has to make report available, put it somewhere to be accessed
DH - reconciliation / qual check at this point before avail to be accessed (reasonability test)
KS - manual thing, part of process that says someone gets eyeballs on before automatically released, chance for REG to look at report and say "ok", some Release by the requestor?
PA - seems like want to have stat agent review before Reg sees it (esp with Stat Reports) - have to be able to have stat reporter access analytics node, thoughts?
KS - logically making the report in some status avail based on permissions, requestor or stat agent can see pre-public release, these are request specific: same person doesn't have access across board only for that specific request, permissions based on that particular data call/stat report
DH - before AN, part of EP, part of ask, do we want a mock up of what report will look like? so REG can say "this is the level of info I am looking for"
KS - dry run of report, tells you what would leave your node but doesn't format
DH - thinking ask rather than EP, "what is this report going to look like", not sure if REG will mock it up or intermediary, dont want them to receive and say "not what I wanted", format detail (what is in which columns vs rows) - what info are they actually looking for, simple, "written prem, by zipcode, cars color=red" - wants to know, as carrier, what is being asked for (format-wise), so EP gets into "how am I going to do that"
KS - mock up of report or description (@Sean Bohan - ask George and Eric to weigh in, join next Mon-Tues calls)
DH - descrip fine, complicated= parts a, b, c, Peter's stat reports more complicated than "give me written prem by zip", complexity of request and output expected
KS - what report am I getting?
PA - sent Andy and Padma for next Monday
SB - maybe put mock up on REG to put into the data call (not just why they want the data but what they expect from the report
KS - talked about carrier being able to see the results before shared, as a dry run, diff section, rolling in mind, does reg or requestor want similar funct at some point, some sort of dry-run capability for the requestor?
JB - "heres what I want" and get data is problematic, peter's efforts, data I would like and what - role of the intermediary (who is actually creating data) - translates biz request
KS - is there. aneed for REG to get more than consent/like but some sense report will give them what they need,
DH - example: want by zip, don't write a lot of "red cars", do they want zeros or noting, or zips where they have red cars (gross ex)
KS - is it sufficient to say we have made data calls quick enough, you dont get what you want you do it again
JB - purpose of data call is to see whats there (sorry, we don't have red car + zip)
KS - formulated question either dont get or get what you want
JB - clarity of ask, goes back to how to define what you are asking for,
KS - w/o debugger, have to figure out all code in advance before running, some support of what is avail to REQUESTOR as opposed to wide open report, do whatever you want
JB - elastic search query form, what CAN be executed, focused topic we need to spend time on anyways
Roles and Permissions
KS - identified the REG and/or stat agent, role of report reviewer can review report before published (diff than report approver) - merge into others - describing can be done by the REG but might also be done by the implementor, exact fields being returned - creating thing called data call or stat report, more detailed than when impelmentors gets in there, will find they need more, can update EP and configuration
DH - collab between implementor and REG
UI/Interface
Extraction Pattern
Aggregation Rules
Messaging
Participation Criteria
Two Phase Consent
Data Path (from TRV to X to Y - where is the data going and for what purpose)
Development Process (extraction/code)
Testing
Auditability of data
Identify Report
Who?
originally everyone participating in call from Regulators to Carriers to Intermediary (Participants)
What is it (metadata)
Naming it
identifier
Requestor
type of input
generation source
line of business
what output should look like
explicit math for aggregation
Purpose of data (what being used for)
similar to what is captured on a data call
DR - stab at making a vers of this, idea of what it should be (ref reqs), see how it looks, whats missing, etc. - find gaps as opposed to trying to be complete here - for todays putpose some metadata along lines of reqs, would we do first req/draft of what it would look like, anything missing? (feels like reqs lite)
KS - info req section in reqs table, first iteration/sol will highlight gaps
SK - any existing samples of data calls/reqs? metadata assoc w/ request, match up, covered in the list?
PA and KS to discuss what will be shared, integrating w/ other depts, large list of data calls from other systems, working with ops teams to bring it together, high level looking to make big improveements on metadata and reqs
SK - date thinking couple (date of req, deadline data, expiration date)
KS - for a report these are the fields we fill in: (a la data dictionary definitions), what data call was intended to capture but inc all of details Dale pointed out, there is bridging vs pointing back to reqs, layout for report - THIS IS WHAT WE ARE TRYING TO DO/WHAT THIS REPORT IS
Identify Stat Reporter
Identify who is subscribing
Defining participants and role
Data Providers (Carriers)
Report Requestors (DOI)
Implementors (AAIS & etc. )
Stat Reporter (not necessarily same as implementor, general approved or cert stat reporter))
producer of the data and the receiver of the data (source and sync/target)
Carriers providing data, DOI creates request
DH - Who are the participants? Carrier, Requestor, Intermediary (AAIS? other stat agents? those building extraction patterns and formatting report), implementor of report
Connecting Subscriber and Report
Carriers and DOIs, want to capture that Carrier is data provider for a specific report and DOI is specific receiver for a report
not data itself, more metadata about report, who getting specifically
who get from / give to
Notion of give-take between implementors and carriers and DOI about the intent
Section about ability to communicate and improve to come to consensus it is the report we want
Communicate about = user interface, carrier gets a chance to say "this one" and the abiltiy to comment on report before implemented, and then implementation and then feedback to agree to
Stat Reporting or data calls too? apply to both but focused on Stat Reporting and can bridge later date
Reqs for stat reporting in handbook
Parameters of Subscription
Specific to each report (loss dates, premium dates - other variables?)
Some general to all reports
Line of Business, Dates, Jurisdictions,
Differences in report by state? Something Stat Reporting folks can answer
Territory, Coverage? Diff reports same time period, grouping not a filter
Editing Subscription
create/read/update/delete subscriptions
self-service or goverened thing?
right now, sign up thru stat reporter for reports a Carrier wants run
AAIS does it for them or on their own - something to be done
Part of governance of openIDL (members, credentialing, )
Audit log - auditability of subscriptions - managing subscriptions as part of openIDL - AAIS thing, funct of openIDL
openIDL not a stat reporter - is there. a specific designation? AAIS is stat reporter working thru openIDL, if others join, they could be doing stat reporting on openIDL, there will be a "Stat Reporter" as intermediary,
defines a seat in openIDL network (how to say "AAIS is doing X")
DH - Trv joins openIDL, selects which stat agent thry would do stat reporting through - could be report by report but guess all-or-nothing
PA - not all or nothing as AAIS doesnt do all line (work with AAIS, then Verisk, ISO - can't be complete)
DH - don't do MassCar and Texas w/ AAIS
KS - identifying report, id stat reporter, per report detail (each report stat reported via AAIS), stat reporter per report or by line of business - per report connection covers all cases
Ending Subscription
Delete
Give subscription an end data (effective expiration on the subscription itself)
lead time where AAIS or Carriers want to know if they are continuing or moving to new stat agent in openIDL
Autorenewal
Load Data / Assert Ready for Report
080122
?? Facilitate semi-auto inquiries, metadata management scheme
?? Day 1 - PDF uploaded somewhere
080222
KS - Homework, turn the above into arch statements or drawings/tenets, not in the requirements, feel little like requirements still, how do we add progress outside meetings?
PA - like about reqs - key of what genre a req comes from and a unique ID - can we get a unique ID for these elemtns and a table, what refs what reqs, do homework
KS - components or arch elements as oppsed to reqs - talking solutioning, trying to take reqs and apply to scenarios, break out into a set of arch statements for each component (LD1 assert up to a date on the data, LD2), then consolidate - AAIS team to org this doc into that format (due next Mon 8/8
SK - is the reqs based on discussions, done, next step to jump into solution design and arch?
PA - jumping in makes sense, int in 2 things: interactions of network and HDS, hard to think of how data load happens w/o knowing target
SK - deliberated reqs, organized, next step not to re-deliberate reqs but to solidify the arch or at least start on it NOT reclassifuying this into another set of reqs
KS - avoid that, these are functional areas sys needs to support, not get to details of tech for a while, all the ideas that need to hold true, made progress in open ended way
JB - top down/bottom up - some sense going back to phases of the sys we started with, keep in mind arch we are dealing with network, not centralized data center, keep in mind org funct around aspects of that network, reflect some of the initial thinking arch needs to be supported, what are the elements for producers, processors, receivers of data
KS - need to be tolerant of chaos, in between meetings remove chaos and refine, brainstormer, raw material
PA - outlined our big boxes?
KS - Data formats? Stat plan
Define Format
What is the data? Glossary or definition? What is being loaded (stat report well-defined)
Assumption - stat plan transactional data, metadata is handled by spec docs as yet to be written
Data existing in HDS, what schema says, there to fulfill stat report, this is just data thats there, period and quant/qual of data designed to do stat report, for this purpose just a database
Minimal data catalog - whats the latest, define whats there (not stat report per se), whats in there is determined thru the funct described (time period, #, etc.) - diff between schema for a db and querying it, format for what could be in there
Minimal form of data catalog - info about whats in the data
Schema is set but might evolve - "type of data loaded" - could say "not making assertions this data is good for a specific data call but to the best of our ability it is good to X date"
KS - must be able to develop report from extracted data
Load Function
Deeper in process of data you have getting into openIDL, details of managing
Process, raw data in carrier DB, turned into some "load candidate", proposed to be loaded into system, needs to go thru edit package
DH - before HDS?
KS - from your raw data to accepted HDS data (load function) and will inc other pieces like edit package
DH - internal loading to the carrier
KS - carrier resp for turning data into intake format (stat plan)
DR - req for "heres what data should look like to be ingested" -
data model - stat plan day 1, day 2... data model
KS - process of taking it in, do work to make more workable in the middle, dont commit to saying "what you put in front end is exactly what ends up in HDS" - right now not putting it exactly, turning it into at least a diff syntax and never will be 1:1, semantically close,
DH - more sense for decoding
KS - load funct part of openIDL, carrier entry point, what carrier putting into load func is stat plan, THEN run thru edit package, review/edit (a la SDMA), "go" and then pushed thru HDS - carrier not doing transform, carrier loading thru UI (SDMA), may even be SDMA (repurposed) to load HDS at end of day
DH - HDS w/in carrier node?
KS - adapter package - need to support 1 keeping data in carrier world and dont want everyone to write their own edit package and load process, agree on somethign that runs in your world that is lightweight edit package
DR - simplify, essentially a data model, how does it lie in HDS, may or may not be a different input data model that is whats loaded, once in HDS and "loaded" should conform and have any edit packages already run on it, all running on carrier side, dont want it going out and back - caveat, edit packages are shallow tests, not looking at rollup or reconciliations, "is it in the format intended?"
KS - row by row edits, not across rows, had to have x w/o errors, etc. - syntactical and internal, "if you pick this loss record cant have a premium"
DR - sanity checks and housekeeping
after edit, push to HDS (tbd format, close to stat plan day 1)
PA - extensibility, adding more to end of stat plan in the future
Transform
whatever we need, might do some small decoding, def turn in from flat text to TBD (database model in HDS)
normalization? some light transformation in the beginning
assumes not collapsing records, like stat plan same level of granularity every record input is record in HDS (time being)? 1:1
decoding has reference data to lookup
Edit Package
Big (all of SDMA)
when we discuss loading data is it already edited and run thru FDMA rulebase and good to go or raw untested data
ASSUMING thru the edit
Can tell how goods the data and through when
pointer to SDMA functionality:
PA - SDMA - business level rules, large manual process for reconciliation BEFORE turning in reports (today), business and schema testing (does data match rules and schema? cross field edits)
KS - cross field edits - loss records, diff coverages, do have a publishable set of 1000s of rules if used SDMA will just work, just plug SDMA in - can and has been pulled out, proved it could be done, rules could be run as an ETL process - havent done, back and forth and fixing of records not part of it, run the rules as ETL process
Data Attestation
do we have an automated way to attest to data?
cannot attest completeness
Provide data attestation function. Carrier attests to data for a particular date. Attestation parameters? Data attested, time frame (last data of complete transactional data), level of data (must define for attestation: like stat reporting day 0, 1, 2)
different attestation for claims and premium data
Must have data formats / levels defined for attestation
on extraction - check last attested date. If last attested date meets requirement of data call.
attesting to the quality of the data (meets 5% error constraint for data from x to y dates)
Raw Notes
Have it or don't by time period
Assumption - run report, everyone is always up to date with data, loading thru stat plan, data has been fixed in edit process, ask for 2021 data its there
Automated query cant tell if data is there, may have transax that haven't processed, dont know complete until someone says complete
Never in position to say "complete" due to late transax
If someone queries data on Dec 31, midday. not complete - transax occur that day but get loaded Jan 3 - never a time where it is "COMPLETE"
Time complete = when requested - 2 ways - 1 whenever Trav writes data, "data is good as of X date" metadata attached, Trav writes business rules for that date, OR business logic on extract "as long as date is one day earlier" = data valid as of transax written
Manual insertion - might not put more data in there, assume complete as of this date
Making req on Dec 31, may not have Dec data in there (might be Nov as of Dec 31)
Request itself - I have to have data up to this date - every query will have diff param, data it wants, cant say "I have data for all purposes as of this date"
2 dates: 12/31 load date and the effective date of information (thru Nov 30)
Point - could use metadata about insertion OR the actual data, could use one, both or either
Data bi-temporal, need both dates, could do both or either, could say if Trv wrote data on Jan 3, assumption all thru 12/31 is good
May not be valid, mistake in a load, errors back and fixing it - need to assert MANUYALLY the data is complete as of a cert time
3-4 days to load a months data, at the end of the job, some assertion as to when data is complete
most likely as this gets implemented it will be a job that does the loading, not someone attesting to data as of this date -where manual attestation becomes less valuable overe time
as loads written (biz rule, etc.) If we load on X date it is valid - X weeks, business rule, not manual attestation - maybe using last transax date is just as good - if Dec 31 is last tranx date, not valid yet - if Dec 31 is last transax date then Jan 1
Data for last year - build into system you cant have that for a month
Start with MANUAL attestation and move towards automated
Data thru edit and used for SR, data trailing by 2 years
doesn't need to be trailing
submission deadline to get data in within 2 years then reconciliation, these reports are trailing - uncomfortable with tis constraint
our ? is the data good, are we running up to this end date, not so much about initial transax than claims process
May have report that wants 2021 data in 2023 bug 2021 data updated in 2022
Attestation is rolling, constantly changing, edit package and sdma is not reconciliatioj it is business logic - doesnt have to be trailing
As loading data, whats the last date loaded, attestation date
sticky - go back x years a report might want, not sure you can attest to
decoupling attestation from a given report (data current as of x date),
everything up to the date my attestation is up to date in the system
"Data is good through x date" not attesting to period
Monkey Wrench: Policy data, our data is good as of Mar 2022 all 2021 data is up to date BUT Loss (incurred and paid) could go 10 years into future
some should be Biz Logic built into extract pattern - saying in HDS< good to what we know as of this date, not saying complete but "good to what we know" - if we want to dome somethign with EP, "I will only use data greater than X months old as policy evolves
Loss exposure - all losses resolved, 10 years ahead of date of assertion, as of this date go back 10 years
decouple this from any specific data call or stat report - on the report writer
2 assertion dates - one for policy vs one for claim
not saying good complete data, saying accurate to best of knowledge at date x
only thing changing is loss side
saying data is accurate to this point in time, as of this date we dont have any claim transax on this policy as of this date
adding "comfort level" to extraction? - when you req data you will not req for policies in last 5 years - but if i am eric, wants to und market, cares about attestation I can give in March
Exception Handling in LOADING
Account for exception processing
What is an exception?
PA - loss & premium records, putting stat plan in JSON, older data didn't ask for VIN, some data fields optional
KS - exceptions can be expected, capturing & managing situations to be dealt with, not "happy path", need to have error codes and remediation steps, documentation for what they all mean and what to do about them (SDMA has internal to edit package) - things like "cant get it in edit package b/c file not correct", etc. - standard way of notifying exceptions throughout system, consistent, exception received and what to do about it
PA - ETL stuff, exceptions based on S&S topics, whats the generalize way to handle? or specific except cases?
KS - arch needs way to report and document and address/remediate exceptions (consistent, notifying, dealing)
PA - options:
messaging format,
db keeping log of all messages
hybrid approach of both
KS - immediate feedback and non-sequential (messaging or notification feedback)
JB - data loading transfer of data or into HDS?
KS - data loading starts with intake file in current statplan format, ends when data in HDS
JB - lot of exceptions local to this process loading data, reported to anyone or resolved or level of implementation of who is reporting data,
KS - some user interface, allows you to load a file and provide feedback, but a lot is asynchronous, no feedback from UI
JB - gen approach to be shared across
KS - consistent way to handle across system (sync/asynch, UI vs notification)
PA - 2 lambda funct loaded in, 2 S&S topics (1 topic per lambda), seems like nice granular feedback, as we get more lambdas throughout node would be unweildy, master topic to subscribe to resources