openIDL - Technical Backlog

This page covers the backlog for openIDL.

	Date	Item	Description
1	18-JAN-21	Extraction pattern - tech	What is the tech for the extraction pattern? map/reduce, optimized for scale, GraphQL or others? Context: The extraction pattern model currently uses a map reduce function in MongoDB. This locks us into MongoDB and uses a closed environment without access to the outside world we’ll need for correlating other data like census. The extraction capability must be reimagined. Suitable for simple data source layout. Can't do cross lookup, validation or reference data checks Currently works for mongoDB only. Won't work for other sources, example AAIS' data lake (Hadoop/Cloudera) GraphQL seems like a strong candidate. POC is required to validate the hypothesis Discuss any other candidates that should be considered
2	18-JAN-21	How to assert data integrity	How to assert data integrity? A checksum after record is locked & written to the chain, store the acknowledgement from the HLF to a control DB and map it to a record set etc.
3	18-JAN-21	How to assert data quality	How to run technical and business validation on data and certify the data? Context: Technical rules: may include JSON schema validation, format, cardinality check Business rules: enum check, field-to-transaction-to-record-set-to-dataset validations, reference data Error threshold: calculate the error threshold. NAIC allows for up to 5% error rate Timing: when should validation be applied? As the data arrives into the HDS (staging→core), or just before extraction (time pressure, lost opportunity with time that could be used to rectify errors), timeouts on the extraction API etc.
4	18-JAN-21	Common Rule Set	Is it possible to provide a common set of rules that can be used by all carriers against their data before making it available to the extraction? Context: Assumption: rules are standardized across openIDL members for a given use case. Example, rules related to Auto stat reporting
5	24-JAN-21	Is the HDS persistent or temporary?
6	18-JAN-21	Data quality error threshold	Current practice allows for an error rate of up to 5%. Allow? If allowed, how to design & implement
7	18-JAN-21	Reference data validation	Where to host reference data service? Within member's enterprise or within node? Must be applied before extraction (tenet)
8	18-JAN-21	Reference data lookup services/APIs	Which APIs to look up? For example, USPS state/zip validation, Carfax etc.
9	18-JAN-21	Reference data lookup services/APIs - pricing model	Who pays (assumption - whoever owns the data pays), and how to charge the consumers (via assigned accounts, via centralized billing account prorated to consumption etc.). Who signs the vendor contracts
10	18-JAN-21	Separating the Hyperledger Fabric Network from the data access	Can a carrier participate in the network from a hosted node without putting the data there? That is, can we give a carrier access to the network without them having the data access portion hosted in the same node. The HLF runtimes are not required to run in the carrier, and only a simple api is made available for extraction.
11	18-JAN-21	Simplify the technical footprint	Can we simplify the architecture so that there are not so many technologies required?
12	18-JAN-21	Hosted nodes	Should we consider hosted nodes for the HLF network instead of requiring all carriers who desire data privacy to host the network?
13	19-JAN-21	HDS vs Interface Spec for nodes	Do we need HDS to be uniform or only have uniform interface spec and communicate via service?
14
15