openIDL - Architecture Definition Workspace

openIDL - Architecture Definition Workspace

Contributors

Initials

Contributor

Initials

Contributor

DH

Dale Harris - Travelers

DR

David Reale - Travelers

SC

Susan Chudwick - Travelers

JM

James Madison - The Hartford

SK

Satish Kasala - The Hartford

KS

Ken Sayers - AAIS

PA

Peter Antley - AAIS

SB

Sean Bohan - openIDL / Linux Foundation

JB

Jeff Braswell - openIDL / Linux Foundation

Process

The Archiecture Definition Workspace is where we as a community come together to work through the architecture for openIDL going forward.  We take our experiences, combine them with inputs from the community and apply them against the scenarios of usage we have for openIDL.  Below is a table of the phases and the expected outcomes of each.

Phase

Description

Outcome

Phase

Description

Outcome

Requirements

Define the requirements for one or more possible scenario for openIDL.  In this case, we are focused on the stat reporting use case.

A set of requirements.  openIDL - System Requirements Table (DaleH @ Travelers)

Define Scenarios

Define the scenarios sufficiently to gather ideas about the different steps.  The scenarios will change over time as we dig into the details.

A few scenarios broken down into steps.

Brainstorming

Gather ideas from all participants for all the different steps in the scenarios

Detailed notes for each of the steps in the scenario(s)

Architecture Elaboration and Illustration

Consolidate notes and start defining architecture details.

Network Architecture - different kinds of nodes and how they participate

Application Architecture - structure of the functional components and their responsibilities

Data Architecture - data flows and formats

Technical Architecture - use of technologies to support the application

Diagrams for the different architectures

  • block diagrams

  • interaction diagrams

Tenets

  • strongly held beliefs / constraints on the implementation

Identify Spikes

From the elaboration phase, will come questions that require answers.  Sometimes, answers come through research.  Often, answers must come from spikes.  Spikes are short, focused deep dive implementation activities that help identify the right solution for aspects of the system.  The TSC must approve the spikes.

  • spikes defined

  • spikes approved

Execute Spikes

Execute approved work to answer the question that required the spike.  

Spike results documented.

Plan Implementation

With spikes completed, the team can finalize the design of the architecture and plan the implementation.

Implementation Plan

Implement

Implement the architecture per the plan.

Running network in approved architecture

Deliverables:

Scenarios

Stat Report 



Subscribe to Report (automate initiation & consent - assumes stat report)

Define jurisdictional context/req (single or multi versions of same report)

How often it runs (report generation frequency)

  • PA - avoid 13 lines and 13 reports per state per carrier - come up with a way to simplify distro of reports

  • KS - make sure subsections covered, sections expanded - dont see anywhere we discuss data access 

Extraction Details / Metadata

  •  KS - want an und of what data will be accessed by EP, when a report runs what data will be accesses, field by field, EP says all that, discussed whether or not EP being code is enough, said we want something more than just code to tell us whats being accessed

  • PA - work with Auto Coverage Report is complex, want fields going across rows, and know what filtering criteria is, take these fields on this line over these dates

  • KS - declarative approach to EP, declarative less writing JS code and more saying "fields accessed in EP" and a section "Aggregations", early on suggested approach (DR?) some sort of a "pre-processor" (was JB) - "this is an EP, know things in the EP, requires way to gen code from it

  • PA - great idea, seeing 3 tiered thing: 1 explicitly biz level (top of data call - earner prem, most recent year), 2 more like lang agnostic metadata, 3 full implementation

  • DH - really talking about "smart contract" will be - really the REQUEST, detals -agree we want what fields, how info aggregated, how filtered, what date param used

  • KS - agg rules, parts of the implementation detail, agg rules for data access, date, param, section on param for the subscription, look at parameters, considered higher level stuff, suggests other things, place to see those things, whether they drive implementation or not - agg, access, filtering - sections of pseudocode, can get out of sync with actual implementation, who puts that in there (REG or implementor of the EP?)

  • DH - could be both

  • PA - having success w/ Auto Coverage report due to DH doing all the mid-level data stuff, able to reproduce it, whoever is doing EP has to have a clear plan

  • KS - basic model we have right now, for REG to put in prose and implementor to create map-reduce to make it, suggest 1 level deeper, more declarative sections like data access, maybe not field by field - data, filtering

  • DH - want to have specific fields you are using, worry about "fishing expedition"

  • KS - describe sections that need to be filled out as part of data call, specific (field names, actual aggs, pseudocode) - data access is mis

Outputs / Aggregation Rules

  • DH - from HDS? sep from final report?

  • KS should be same logically 

  • DH - output from HDS is agg data, will be anonymized? 

  • KS - not anon until combined in analytics node - what outputs? Data Points, how different from Aggregations - Agg: sum of the coverages by zipcode, what about outputs is different?

  • DH - shouldn't be, outputs are aggregations

  • KS - other point, consider when we combine things is there an expectation there is a functionality beyond that point, ex: the ND thing, does something on Analytics Node, compares against registered VINs, not just aggs from carriers, some possibility the report itself does work itself (compares to other data, etc.), find a way to describe that as well 

  • DH - what are you doing with the data when in the Analytics Node (AN)?

  • KS - outputs on the EP - aggregations, the reduce - what do we want to know about the AGG itself? in code, do we need to come up with standard lang or just prose?

  • DH - should not be computer code, (not tech people need to be able to read and und)

  • KS - start with prose (human readable) and aspirational goal for some structure

  • JB - whats requested to begin with, subset of EP

  • KS - how accurately it can be expressed

  • DH - then final report - or sep section, 

Analytics Node Function (what are you gonna do with the data after combination?)

  • KS - formatting final report

  • DH - anonymizing data, deleting data once report created

  • KS - req from customer to verify or keep evidence of that (deletion), prove data was removed, any core report logic (other than formatting) - anything like collab correlation data, has to be described here, ND example: comparison against registered VINs

  • DH - running against 3rd party data, would want to know "what specifically are you matching on and expected results - what do you hope to get out of that 3rd party data" - 

  • KS - part of what we showed where we correlate with X data, some you want to do on the carrier node before you agg, uses data you don't want to share, might want this section about correlated data on both inside carrier node and outside node

  • DH - on Extraction

  • KS - talked agg and outputs, didn't talk about correlated data or third party data

  • DH - think it would be more efficient for one party doing it rather than each doing their own

  • KS - ex: address, dont want to share it, if we are all ok if it isn't exposed w/in Carrier node, every carrier would do it individually BEFORE aggregation

  • DH - would want is within EP, already set up will go against 3rd party data, already set

  • KS - hinting, this is not just code, some sort of API call avail, all carriers agree API is call-able, hornets nest, could be complicated, requires another component avail to EP

  • JB - specify logically, some may have an API, logical not phys requirements

  • KS - dont need to do it for stat reporting, not a near-term goal, good to know its avail -

  • KS - has to make report available, put it somewhere to be accessed

  • DH - reconciliation / qual check at this point before avail to be accessed (reasonability test)

  • KS - manual thing, part of process that says someone gets eyeballs on before automatically released, chance for REG to look at report and say "ok", some Release by the requestor?

  • PA - seems like want to have stat agent review before Reg sees it (esp with Stat Reports) - have to be able to have stat reporter access analytics node, thoughts?

  • KS - logically making the report in some status avail based on permissions, requestor or stat agent can see pre-public release, these are request specific: same person doesn't have access across board only for that specific request, permissions based on that particular data call/stat report

  • DH - before AN, part of EP, part of ask, do we want a mock up of what report will look like? so REG can say "this is the level of info I am looking for"

  • KS - dry run of report, tells you what would leave your node but doesn't format

  • DH - thinking ask rather than EP, "what is this report going to look like", not sure if REG will mock it up or intermediary, dont want them to receive and say "not what I wanted", format detail (what is in which columns vs rows) - what info are they actually looking for, simple, "written prem, by zipcode, cars color=red" - wants to know, as carrier, what is being asked for (format-wise), so EP gets into "how am I going to do that"

  • KS - mock up of report or description (@Sean Bohan - ask George and Eric to weigh in, join next Mon-Tues calls)

  • DH - descrip fine, complicated= parts a, b, c, Peter's stat reports more complicated than "give me written prem by zip", complexity of request and output expected

  • KS - what report am I getting?

  • PA - sent Andy and Padma for next Monday

  • SB - maybe put mock up on REG to put into the data call (not just why they want the data but what they expect from the report

  • KS - talked about carrier being able to see the results before shared, as a dry run, diff section, rolling in mind, does reg or requestor want similar funct at some point, some sort of dry-run capability for the requestor?

  • JB - "heres what I want" and get data is problematic, peter's efforts, data I would like and what - role of the intermediary (who is actually creating data) - translates biz request

  • KS - is there. aneed for REG to get more than consent/like but some sense report will give them what they need, 

  • DH - example: want by zip, don't write a lot of "red cars", do they want zeros or noting, or zips where they have red cars (gross ex)

  • KS - is it sufficient to say we have made data calls quick enough, you dont get what you want you do it again

  • JB - purpose of data call is to see whats there (sorry, we don't have red car + zip)

  • KS - formulated question either dont get or get what you want

  • JB - clarity of ask, goes back to how to define what you are asking for, 

  • KS - w/o debugger, have to figure out all code in advance before running, some support of what is avail to REQUESTOR as opposed to wide open report, do whatever you want

  • JB - elastic search query form, what CAN be executed, focused topic we need to spend time on anyways

Roles and Permissions

  • KS - identified the REG and/or stat agent, role of report reviewer can review report before published (diff than report approver) - merge into others - describing can be done by the REG but might also be done by the implementor, exact fields being returned - creating thing called data call or stat report, more detailed than when impelmentors gets in there, will find they need more, can update EP and configuration

  • DH - collab between implementor and REG

UI/Interface

Extraction Pattern

Aggregation Rules

Messaging

Participation Criteria

Two Phase Consent

Data Path (from TRV to X to Y - where is the data going and for what purpose)

Development Process (extraction/code)

Testing

Auditability of data

Identify Report

  • Who?

    • originally everyone participating in call from Regulators to Carriers to Intermediary (Participants)

  • What is it (metadata)

    • Naming it 

    • identifier

    • Requestor

    • type of input

    • generation source

    • line of business

    • what output should look like

    • explicit math for aggregation

    • Purpose of data (what being used for)

    • similar to what is captured on a data call

    • DR - stab at making a vers of this, idea of what it should be (ref reqs), see how it looks, whats missing, etc. - find gaps as opposed to trying to be complete here  - for todays putpose some metadata along lines of reqs, would we do first req/draft of what it would look like, anything missing? (feels like reqs lite)

    • KS - info req section in reqs table, first iteration/sol will highlight gaps

    • SK - any existing samples of data calls/reqs? metadata assoc w/ request, match up, covered in the list?

    • PA and KS to discuss what will be shared, integrating w/ other depts, large list of data calls from other systems, working with ops teams to bring it together, high level looking to make big improveements on metadata and reqs

    • SK - date thinking couple (date of req, deadline data, expiration date)

    • KS - for a report these are the fields we fill in: (a la data dictionary definitions), what data call was intended to capture but inc all of details Dale pointed out, there is bridging vs pointing back to reqs, layout for report - THIS IS WHAT WE ARE TRYING TO DO/WHAT THIS REPORT IS

  • Identify Stat Reporter

Identify who is subscribing

  • Defining participants and role

    • Data Providers (Carriers)

    • Report Requestors (DOI)

    • Implementors (AAIS & etc. )

    • Stat Reporter (not necessarily same as implementor, general approved or cert stat reporter))

  • producer of the data and the receiver of the data (source and sync/target)

  • Carriers providing data, DOI creates request

  • DH - Who are the participants? Carrier, Requestor, Intermediary (AAIS? other stat agents? those building extraction patterns and formatting report), implementor of report

Connecting Subscriber and Report

  • Carriers and DOIs, want to capture that Carrier is data provider for a specific report and DOI is specific receiver for a report

  • not data itself, more metadata about report, who getting specifically

  • who get from / give to

  • Notion of give-take between implementors and carriers and DOI about the intent

  • Section about ability to communicate and improve to come to consensus it is the report we want 

  • Communicate about = user interface, carrier gets a chance to say "this one" and the abiltiy to comment on report before implemented, and then implementation and then feedback to agree to

  • Stat Reporting or data calls too? apply to both but focused on Stat Reporting and can bridge later date 

  • Reqs for stat reporting in handbook

Parameters of Subscription

  • Specific to each report (loss dates, premium dates - other variables?)

  • Some general to all reports 

  • Line of Business, Dates, Jurisdictions, 

  • Differences in report by state? Something Stat Reporting folks can answer

  • Territory, Coverage? Diff reports same time period, grouping not a filter

Editing Subscription

  • create/read/update/delete subscriptions

  • self-service or goverened thing?

  • right now, sign up thru stat reporter for reports a Carrier wants run

  • AAIS does it for them or on their own - something to be done

  • Part of governance of openIDL (members, credentialing, )

  • Audit log - auditability of subscriptions - managing subscriptions as part of openIDL - AAIS thing, funct of openIDL 

  • openIDL not a stat reporter - is there. a specific designation? AAIS is stat reporter working thru openIDL, if others join, they could be doing stat reporting on openIDL, there will be a "Stat Reporter" as intermediary, 

  • defines a seat in openIDL network (how to say "AAIS is doing X")

  • DH - Trv joins openIDL, selects which stat agent thry would do stat reporting through - could be report by report but guess all-or-nothing

  • PA - not all or nothing as AAIS doesnt do all line (work with AAIS, then Verisk, ISO - can't be complete)

  • DH - don't do MassCar and Texas w/ AAIS

  • KS - identifying report,  id stat reporter, per report detail (each report stat reported via AAIS), stat reporter per report or by line of business - per report connection covers all cases

Ending Subscription

  • Delete

  • Give subscription an end data (effective expiration on the subscription itself)

  • lead time where AAIS or Carriers want to know if they are continuing or moving to new stat agent in openIDL

  • Autorenewal

Load Data / Assert Ready for Report

080122

  • ?? Facilitate semi-auto inquiries, metadata management scheme

  • ?? Day 1 - PDF uploaded somewhere

080222

  • KS - Homework, turn the above into arch statements or drawings/tenets, not in the requirements, feel little like requirements still, how do we add progress outside meetings?

  • PA - like about reqs - key of what genre a req comes from and a unique ID - can we get a unique ID for these elemtns and a table, what refs what reqs, do homework

  • KS - components or arch elements as oppsed to reqs - talking solutioning, trying to take reqs and apply to scenarios, break out into a set of arch statements for each component (LD1 assert up to a date on the data, LD2), then consolidate  - AAIS team to org this doc into that format (due next Mon 8/8

  • SK - is the reqs based on discussions, done, next step to jump into solution design and arch? 

  • PA - jumping in makes sense, int in 2 things: interactions of network and HDS, hard to think of how data load happens w/o knowing target

  • SK - deliberated reqs, organized, next step not to re-deliberate reqs but to solidify the arch or at least start on it NOT reclassifuying this into another set of reqs

  • KS - avoid that, these are functional areas sys needs to support, not get to details of tech for a while, all the ideas that need to hold true, made progress in open ended way

  • JB - top down/bottom up - some sense going back to phases of the sys we started with, keep in mind arch we are dealing with network, not centralized data center, keep in mind org funct around aspects of that network, reflect some of the initial thinking arch needs to be supported, what are the elements for producers, processors, receivers of data

  • KS - need to be tolerant of chaos, in between meetings remove chaos and refine, brainstormer, raw material

  • PA - outlined our big boxes? 

  • KS - Data formats? Stat plan

Define Format

  • What is the data? Glossary or definition? What is being loaded (stat report well-defined)

  • Assumption - stat plan transactional data, metadata is handled by spec docs as yet to be written

  • Data existing in HDS, what schema says, there to fulfill stat report, this is just data thats there, period and quant/qual of data designed to do stat report, for this purpose just a database

  • Minimal data catalog - whats the latest, define whats there (not stat report per se), whats in there is determined thru the funct described (time period, #, etc.) - diff between schema for a db and querying it, format for what could be in there

  • Minimal form of data catalog - info about whats in the data

  • Schema is set but might evolve - "type of data loaded" - could say "not making assertions this data is good for a specific data call but to the best of our ability it is good to X date"

  • KS - must be able to develop report from extracted data

Load Function

  • Deeper in process of data you have getting into openIDL, details of managing

  • Process, raw data in carrier DB, turned into some "load candidate", proposed to be loaded into system, needs to go thru edit package

  • DH - before HDS?

  • KS - from your raw data to accepted HDS data (load function) and will inc other pieces like edit package

  • DH - internal loading to the carrier

  • KS - carrier resp for turning data into intake format (stat plan)

  • DR - req for "heres what data should look like to be ingested" - 

  • data model - stat plan day 1, day 2... data model

  • KS - process of taking it in, do work to make more workable in the middle, dont commit to saying "what you put in front end is exactly what ends up in HDS" - right now not putting it exactly, turning it into at least a diff syntax and never will be 1:1, semantically close, 

  • DH - more sense for decoding

  • KS - load funct part of openIDL, carrier entry point, what carrier putting into load func is stat plan, THEN run thru edit package, review/edit (a la SDMA), "go" and then pushed thru HDS - carrier not doing transform, carrier loading thru UI (SDMA), may even be SDMA (repurposed) to load HDS at end of day

  • DH - HDS w/in carrier node?

  • KS - adapter package - need to support 1 keeping data in carrier world and dont want everyone to write their own edit package and load process, agree on somethign that runs in your world that is lightweight edit package

  • DR - simplify, essentially a data model, how does it lie in HDS, may or may not be a different input data model that is whats loaded, once in HDS and "loaded" should conform and have any edit packages already run on it, all running on carrier side, dont want it going out and back - caveat, edit packages are shallow tests, not looking at rollup or reconciliations, "is it in the format intended?"

  • KS - row by row edits, not across rows, had to have x w/o errors, etc. - syntactical and internal, "if you pick this loss record cant have a premium"

  • DR - sanity checks and housekeeping 

  • after edit, push to HDS (tbd format, close to stat plan day 1)

  • PA - extensibility, adding more to end of stat plan in the future

Transform

  • whatever we need, might do some small decoding, def turn in from flat text to TBD (database model in HDS)

  • normalization? some light transformation in the beginning

  • assumes not collapsing records, like stat plan same level of granularity every record input is record in HDS (time being)? 1:1

  • decoding has reference data to lookup

Edit Package

  • Big (all of SDMA)

  • when we discuss loading data is it already edited and run thru FDMA rulebase and good to go or raw untested data

  • ASSUMING thru the edit

  • Can tell how goods the data and through when

  • pointer to SDMA functionality:

  • PA - SDMA - business level rules, large manual process for reconciliation BEFORE turning in reports (today), business and schema testing (does data match rules and schema? cross field edits)

  • KS - cross field edits - loss records, diff coverages, do have a publishable set of 1000s of rules if used SDMA will just work, just plug SDMA in - can and has been pulled out, proved it could be done, rules could be run as an ETL process - havent done, back and forth and fixing of records not part of it, run the rules as ETL process

Data Attestation

  • do we have an automated way to attest to data?

  • cannot attest completeness

  • Provide data attestation function.  Carrier attests to data for a particular date.  Attestation parameters? Data attested, time frame (last data of complete transactional data), level of data (must define for attestation: like stat reporting day 0, 1, 2)

  • different attestation for claims and premium data

  • Must have data formats / levels defined for attestation

  • on extraction - check last attested date.  If last attested date meets requirement of data call.

  • attesting to the quality of the data (meets 5% error constraint for data from x to y dates)

Raw Notes
  • Have it or don't by time period

  • Assumption - run report, everyone is always up to date with data, loading thru stat plan, data has been fixed in edit process, ask for 2021 data its there

  • Automated query cant tell if data is there, may have transax that haven't processed, dont know complete until someone says complete

  • Never in position to say "complete" due to late transax

  • If someone queries data on Dec 31, midday. not complete - transax occur that day but get loaded Jan 3 - never a time where it is "COMPLETE"

  • Time complete = when requested - 2 ways - 1 whenever Trav writes data, "data is good as of X date" metadata attached, Trav writes business rules for that date, OR business logic on extract "as long as date is one day earlier" = data valid as of transax written

  • Manual insertion - might not put more data in there, assume complete as of this date

  • Making req on Dec 31, may not have Dec data in there (might be Nov as of Dec 31)

  • Request itself - I have to have data up to this date - every query will have diff param, data it wants, cant say "I have data for all purposes as of this date"

  • 2 dates: 12/31 load date and the effective date of information (thru Nov 30)

  • Point - could use metadata about insertion OR the actual data, could use one, both or either

  • Data bi-temporal, need both dates, could do both or either, could say if Trv wrote data on Jan 3, assumption all thru 12/31 is good

  • May not be valid, mistake in a load, errors back and fixing it - need to assert MANUYALLY the data is complete as of a cert time

  • 3-4 days to load a months data, at the end of the job, some assertion as to when data is complete

  • most likely as this gets implemented it will be a job that does the loading, not someone attesting to data as of this date -where manual attestation becomes less valuable overe time

  • as loads written (biz rule, etc.) If we load on X date it is valid - X weeks, business rule, not manual attestation - maybe using last transax date is just as good - if Dec 31 is last tranx date, not valid yet - if Dec 31 is last transax date then Jan 1

  • Data for last year - build into system you cant have that for a month 

  • Start with MANUAL attestation and move towards automated

  • Data thru edit and used for SR, data trailing by 2 years

  • doesn't need to be trailing 

  • submission deadline to get data in within 2 years then reconciliation, these reports are trailing - uncomfortable with tis constraint

  • our ? is the data good, are we running up to this end date, not so much about initial transax than claims process

  • May have report that wants 2021 data in 2023 bug 2021 data updated in 2022

  • Attestation is rolling, constantly changing, edit package and sdma is not reconciliatioj it is business logic - doesnt have to be trailing

  • As loading data, whats the last date loaded, attestation date

  • sticky - go back x years a report might want, not sure you can attest to 

  • decoupling attestation from a given report (data current as of x date), 

  • everything up to the date my attestation is up to date in the system

  • "Data is good through x date" not attesting to period

  • Monkey Wrench: Policy data, our data is good as of Mar 2022 all 2021 data is up to date BUT Loss (incurred and paid) could go 10 years into future

  • some should be Biz Logic built into extract pattern - saying in HDS< good to what we know as of this date, not saying complete but "good to what we know" - if we want to dome somethign with EP, "I will only use data greater than X months old as policy evolves

  • Loss exposure - all losses resolved, 10 years ahead of date of assertion, as of this date go back 10 years

  • decouple this from any specific data call or stat report - on the report writer 

  • 2 assertion dates - one for policy vs one for claim

  • not saying good complete data, saying accurate to best of knowledge at date x

  • only thing changing is loss side

  • saying data is accurate to this point in time, as of this date we dont have any claim transax on this policy as of this date

  • adding "comfort level" to extraction?  - when you req data you will not req for policies in last 5 years - but if i am eric, wants to und market, cares about attestation I can give in March

Exception Handling in LOADING

  • Account for exception processing

    • What is an exception? 

    • PA - loss & premium records, putting stat plan in JSON, older data didn't ask for VIN, some data fields optional

    • KS - exceptions can be expected, capturing & managing situations to be dealt with, not "happy path", need to have error codes and remediation steps, documentation for what they all mean and what to do about them (SDMA has internal to edit package) - things like "cant get it in edit package b/c file not correct", etc. - standard way of notifying exceptions throughout system, consistent, exception received and what to do about it

    • PA - ETL stuff, exceptions based on S&S topics, whats the generalize way to handle? or specific except cases?

    • KS - arch needs way to report and document and address/remediate exceptions (consistent, notifying, dealing)

    • PA - options: 

      • messaging format, 

      • db keeping log of all messages

      • hybrid approach of both

    • KS - immediate feedback and non-sequential (messaging or notification feedback)

    • JB - data loading transfer of data or into HDS? 

    • KS - data loading starts with intake file in current statplan format, ends when data in HDS

    • JB - lot of exceptions local to this process loading data, reported to anyone or resolved or level of implementation of who is reporting data,

    • KS - some user interface, allows you to load a file and provide feedback, but a lot is asynchronous, no feedback from UI

    • JB - gen approach to be shared across 

    • KS - consistent way to handle across system (sync/asynch, UI vs notification)

    • PA - 2 lambda funct loaded in, 2 S&S topics (1 topic per lambda), seems like nice granular feedback, as we get more lambdas throughout node would be unweildy, master topic to subscribe to resources