2023-03-30 Infrastructure Working Group Agenda and Meeting Notes

March 30, 2023

30 Mar 2023

When:
Tuesday, October 4, 2022
9am to 10am PDT/12:00PM-1PM EST
ZOOM Please Register in advance for this meeting:
https://zoom.us/meeting/register/tJwpcO2spjkpE9a1HXBeyBxz7TM_Dvo8Ne8j

Attendees:

Sean Bohan (openIDL)
Nathan Southern (openIDL)
Peter Antley (AAIS)
Ken Sayers (AAIS)
Surya Lanka (Chainyard)
Yanko Zhelyazkov (Senofi)
Allen Thompson (Hanover)
Adnan Choudhury (Chainyard)
Aashish Shrestha (Chainyard)
Jeff Braswell (openIDL)
Faheem Zakaria (Hanover)

Agenda Items:

Antitrust
ND POC: Learnings, What Worked, What Didn't, What Was Changed
Next topics
AOB

Minutes:

openIDL ND POC Changes
- 3 - Infrastructure, Fabric, Application POV
Infrastructure
- added code to deploy the lambda function for report processing and upload UI
- S3 bucket for cloud distro and AWS cert manager
- upgraded Kub to 1.23 (1.22 was getting deprecated)
- test environ updated
- added cert manager for ssl certs for all endpoints
- updated self-managed node groups to EKS managed node groups
  - self managed was using launch configs, getting deprecated in Dec - had to update them to EKS manager node group (template not configs)
- Added jenkins pipeline to deploy upload UI app config
- Upload UI was part of ND POC
KS
some things specific to ND, may take it out of base code and look at it as reference
Surya
- Fabric
- in HLF deployment, upgraded the images from 2.2.3 to 2.2.9
- have the bug fixes as part of 2.2.9
- fixed the chaincode to run after start
- whenever there is a peer restart, need to upgrade chaincode to make it work
- there is a restart of the peer, comes back up, should / will be able to create chaincode w/o issues
- fixed
- not related to ND POC, but in general
- Application
- ND POC - need to deploy incidence manager to analytics node
- with changes done, deployed data manager to Analytics node
- new params in application component configs
- automated gen of app component files through ansible templates
- ansible app config files and load into one based on node type
- integrated through jenkins pipeline
- during node setup or new changes, can just trigger pipeline to update add app config files to the cluster
Why upgrade was performed? Specific bugs? Or just in case and what problems occured
Surya
- w/ respect to bugs, necessary to keep code up to date, using 2.2.3 v of Fabric after which there is .4-.10
- keep code up to date w/ respect to 2.2 vers
JB
- was latest vers of fabric related to smaller max PDC size?
Surya - already there due to couch.db
- 2.5 has support,
Adnan
- re: DC size, Fab has default value, going to static, not configurable
- do not want to go over max
- larger dataset in PDC reduces performance due to larger transactions
Aashish
- couchdb 4, planning to make set limit (8mb)
- future versions would encounter problems
KenS - concern about latest ver?
Yanko - target specific fixes, cautious
- new vers may intro some other issues
- wondering what specific bugs
KS - we dont want to get 2/3 vers behind, takes imporrible lift, stay within certain # of fabric
YZ - minor fixes and features
- major vers require regression testing, etc
AS - did not update minor vers, just latest patch vers
KS 2.2.3 - 2.2.9
AS - next would be 2.4, feature wanted was to deleted PDC by triggering functions is in 2.5
KS - carriers dont want data in PDC
results of extractions, once in report, should be in PDC
KS - didn't test for some sizing
- some unneccesaary logging
- of things to do inside abd required beefing up machines
- ND is an outlier - passing 100k rows, diff than stat reporting
- more than running out of memory issues, probs with timeouts and restartting pods, etc.
Aashish
- haven't touched performance tests
- code optimization around loops too
Adnan
- most taken care of by resource restructuring
- specifically when running EP in Mongo, resources not enough
- after beefing up resources, some loops and logs needed to be reprogrammed
- longer timeouts, etc.
- 1MM rows of data, give proper time, etc.
KS
- had to batch stuff in PDC
AC
- going over config value for PDCs
- chopped up w/ config value and saved results in PDC 1 by 1
- taken for further processing, next set of process
YZ
- perf problems due to chaincode and how it processed data
AC
- unique nature of data itself, had to recalibrate a few things
- comparison of data needed to be efficient
YZ
- report processor - report being created
- what are the problems on the network side - fix or extra effort to make it work
AC
- saw timeouts
- tranax timeouts
- needed where we were trying to inc the resources
- made sure the node with peer was not overloaded
SL
- doc memory issue
- processing larger data in PDC, couchDB issues with peers (size issue)
- some app components getting killed due to "out of memory"
- mult cases where we see the processing time is taking longer and getting killed due to OOM
YZ
- analysis on why running out of memory?
AC - found issue
- openIDL has 1 status DB, saving status of datacall
- for some reason fails, scheduler comes in and tries to finish the job, comes back every 30 or 45 min
- in test enviornment
- some data calls did not - saw test environment was doing transax even though not doing data calls
YZ - issue was application side, not fabric
JB
- Resource sizing for HDS in AWS is distinct from resource requirements for Fabric nodes, is it not ?
- If HDS AWS resource for carrier is different than the Fabric node resource, the Fabric node resource would not need to be so large ?
- diff resource sizing reqs were for app requirements, not the node itself
AS - typical data call, size of data less than ND?
KS - EP would be hundreds of records, small results at extraction
PA - we will be somewhere between, comes back as JSON, formatting data layer, string of JSONs
- under 1000 element JSON
AS - will cut down a lot of processing
KS - stat reports are similar, will have other situations
- MS POC might have a similr result set
PA - much bigger, not drivers, whole state
AS # of carriers? per state?
PA - 100+ sep entities reco by NAIC
- 200 by AAIS
- less than 3k total
KS - way expecting to see it unfold, loading on behalf of carriers into multitenant node
PA - load testing to see how much we can fit
- lots of carriers in a single node
- 200+ carriers
- a lot of small mutuals, crrier node 4 has 100 carriers in same table
- data call, agg info, wont have all the primary keys
KS - ind nodes for less than 20 carriers for a while
AS - multi-tenant node size - MS ?
KS stat reporting will stretch multitenant
PA - didn't have any of these carriers working with us
KS - putting data in was a huge win
KS - cool thing
- adding HDS to analutics node made it possible
- wasnt considered in the original design
- allowed DOT to load data that could be used by reporting processor
- ability to quicly pivot and create diff reports was a big win as well
- data quality issues
- bad VIN situation, vehilces not on the list due to not insurable
KS - how do we merge?
YZ - tickets in github with problem and solution, how approached, get aproved, can/has addressed can create a branch to get approved and merged
KS
- how do we decide
YZ - focus on aplication side issues, few probs
- processing data, etc.
- Network side - going with operator so we prob wont use those changes, not relevant to new openIDL
- ex: performance prob with processing data during data call, title of a git hub issue
AS - can move over things on trello, tracking this, as issues in github
JB - any reason to make a repo for ND app specific stuff, baseline concept
KS - are we refactoring a repo? roadmap, backlog
YZ - merge the fixes into MAIN, then refactoring later
KS - grab Trello items, pick what applies, merge and refactor
PA - Mason and PA working on ETL, are we making a new repo? stay where we are?
KS - ETL will stick around - kee working there

2023-03-30 Infrastructure Working Group Agenda and Meeting Notes

March 30, 2023

Agenda Items:

Minutes:

Action items: