SBIR-STTR Award

Large-scale data integration and harmonization to accurately predict sites facing future health-based drinking water crises
Award last edited on: 8/8/2022

Sponsored Program
SBIR
Awarding Agency
NIH : NIEHS
Total Award Amount
$256,579
Award Phase
1
Solicitation Topic Code
113
Principal Investigator
Nathan L Tintle

Company Information

Superior Statistical Research LLC

1606 4th Avenue SE
Sioux Center, IA 51250
   (712) 635-4811
   N/A
   superiorstatresearch.com
Location: Single
Congr. District: 04
County: Sioux

Phase I

Contract Number: 1R43ES033134-01
Start Date: 4/1/2021    Completed: 9/30/2022
Phase I year
2021
Phase I Amount
$256,579
Up to 45 million people per year in the U.S. are directly impacted by health-based drinkingwater problems. This leads to at least 16 million cases of acute gastroenteritis directly linked to pollution atcommunity water systems, with tens of millions more directly impacted by chemical and organic pollutants.Impacts are further exacerbated in locations dealing with water scarcity, in under-served populations, andwithin other vulnerable populations already suffering from health disparities. Many of these water problems arethe direct result of managerial negligence, inconsistent monitoring, and a lack of the ability to anticipate whereproblems may arise next. While the reasons for drinking water problems are complex, if we could anticipatewhere health-based drinking water problems were to occur in the future, it could have an immediateand positive impact on tens of millions of Americans annually. Interestingly, extensive data about waterquality and the performance of municipal water systems already exists in large, disparate databases. Thesedatabases are largely ignored and, when used, are typically used only anecdotally and retroactively.Preliminary evidence suggests that these existing databases, which contain histories of administrativeviolations and sub-threshold water-quality results, can be mined to accurately predict future drinking watercrises. The Superior Statistical Research R&D team is an internationally recognized group of water expertswith cross-cutting expertise in statistics/data analysis/modelling/computing, water-quality monitoring ofbiological and chemical contaminants, and the ability to clearly and compellingly translate water-quality andhealth information to actionable steps for individuals, organizations and communities. In this Phase I project,we will show that it is possible to predict water-related, health-based problem areas utilizing already collected,historical data on water quality and municipal water system performance. We will begin by harmonizing thedisparate water quality and municipal water system performance in two different states (Michigan and Iowa).We will then utilize machine-learning techniques to predict health-based violation histories and will evaluate ourmethods by comparing predicted violations to actual health-based violations in the previous 5 years. Finally,we will identify at least 10 municipalities determined by our algorithm to be at the highest risk for future health-based water problems and will do systematic sampling to confirm our model-based predictions. We will thendemonstrate how making these predictions can be leveraged to profitability by exploring how our model-basedpredictions can be presented to customers in an economical, usable form. Proof of our concept and profitabilitymodels in two states (Phase I) will set us up for widespread (multi-state) database harmonization andimprovement of the proposed machine-learning/modelling effort in Phase II. With multi-state harmonizeddatasets, identification of key data gaps in particular states/areas, and proven financial models, our technologywill ultimately lead to dramatic reductions in the number of health-based drinking water problems annually.

Public Health Relevance Statement:
Project Narrative Up to 45 million people per year in the U.S. are directly impacted by health-based drinking water problems, but predicting where and when these health-based drinking water problems will occur remains a large and complex obstacle. Current approaches focus on a reactive approach to health-based water-quality violations in community water systems, rather than a proactive one that seeks to anticipate where problems will occur in the future. The overall goal of this project is to leverage large and disparate historical datasets of water quality to accurately predict locations of future health-based water-quality violations, validate the predictions, and commercialize our proprietary predictions as a practical and cost-saving approach to anticipating and heading off future health-based water problems.

Project Terms:
Algorithms ; Biological Monitoring ; Biologic Monitoring ; biomonitoring ; Serinus ; Canaries ; Cities ; Coal ; Communities ; Community Surveys ; Data Analyses ; Data Analysis ; data interpretation ; Filtration ; Filtration Fractionation ; Focus Groups ; Future ; Gastroenteritis ; Goals ; Government ; Health ; Recording of previous events ; History ; Human ; Modern Man ; Iowa ; Lead ; Pb element ; heavy metal Pb ; heavy metal lead ; Methods ; Michigan ; Persons ; Negligence ; Public Health ; Records ; Research ; research and development ; Development and Research ; R & D ; R&D ; ROC Curve ; ROC Analyses ; receiver operating characteristic analyses ; receiver operating characteristic curve ; Safety ; statistics ; Surveys ; Survey Instrument ; Technology ; Testing ; Translating ; Water ; Hydrogen Oxide ; Price ; pricing ; Cost Savings ; Data Set ; Dataset ; base ; rural area ; rural location ; rural region ; improved ; Site ; Area ; Acute ; Phase ; Link ; Ensure ; Chemicals ; Individual ; Trust ; Databases ; Data Bases ; data base ; Exposure to ; machine learned ; Machine Learning ; Pollution ; Complex ; water sampling ; Techniques ; System ; Location ; inner city ; American ; Performance ; water quality ; drinking water ; pollutant ; Municipalities ; member ; economic impact ; Modeling ; Sampling ; water monitoring ; water quality monitoring ; water testing ; vulnerable group ; Vulnerable Populations ; Provider ; disparity in health ; health disparity ; Address ; Data ; International ; Monitor ; Pathway interactions ; pathway ; willingness to pay ; predictive modeling ; computer based prediction ; prediction model ; data integration ; Underserved Population ; under served group ; under served people ; under served population ; underserved group ; underserved people ; innovation ; innovate ; innovative ; advocacy organizations ; commercialization ; high risk ; Lead levels ; Pb levels ; lead level ; level of lead ; large scale data ; large scale data sets ; large scale datasets ; data harmonization ; harmonized data ;

Phase II

Contract Number: ----------
Start Date: 00/00/00    Completed: 00/00/00
Phase II year
----
Phase II Amount
----