Data is a critical and highly valuable commodity, driving meaningful change in our society,especially when it pertains to patient care and biomedical research. Currently, institutions payinordinate sums to increase, regain, and complement their data panels. As an extra burden,data legislation and privacy protection regulations introduce barriers to forming effectivepartnerships between business, clinical, research and educational organizations. As a result,approximately 80% of medical data today canât be readily shared because they containpersonal, protected or sensitive information and remains unstructured and untapped after theyare created. There is a growing and urgent unmet need for technology solutions that balanceresearch and commercial organizations interests by supporting flexible general-purposeanalytics while guaranteeing privacy protection.There are no effective mechanisms to enabledata sharing without either risking inappropriate release of sensitive information or potentialdegradation of the information content. The currently available few protocols and algorithms formodeling, processing, interrogating, and ultimately sharing large sensitive data (e.g., thousandsand millions of records with thousands of heterogeneous features) all share significantlimitations and their practical use still lags behind research progress. Two major unmet needs inthe data sharing industry are i) the inability to return de-identified clones of the raw data, and ii)lack of scalability requirements of production deployments. GrayRain, LLC is an early-stageSoftware-as-a-Service company developing a novel platform for statistical obfuscation and de-identification of sensitive structured (numerical, categorical tabular data) and unstructuredinformation (e.g., clinical text, doctors/nurses notes and clinical images, such as MRI, PET). Thecore of GrayRainâs technology is the novel patented statistical obfuscation algorithm, DataSifter. Thetechnology proposed in this STTR Phase I application will significantly increase the number ofsecure data transactions in the healthcare sector and beyond, enabling data sharing with fullycontrollable risk of identification of any sensitive information, including, but not limited to PHI(personal health information), demographic information, or socioeconomic status. GrayRainâstechnology is able to produce de-identified clones of raw tabular data, addressing a major limitationsencounter across existing data anonymization protocols. As far as scalability, the main goal of thisSTTR Phase I is to establish feasibility of GrayRain to accurately and efficiently (re: scalability) de-identify and share large-scale complex EHR data repositories with a controlled risk of disclosingprotected or personal health information .
Public Health Relevance Statement: PROJECT NARRATIVE
S
haring of Big Biomedical Data, including millions of records with thousands of complex
sensitive features, provides research and commercial organizations enormous opportunities for
disruptive healthcare innovations.
There is a growing and urgent unmet need for technology solutions that allow sharing of critical
proprietary data across stakeholders and at the same time complying with strict privacy
regulations (e.g., HIPAA and GDPR).
GrayRain LLC proposes to productize the DataSifter, our unique patented statistical obfuscation
technique, as a novel, scalable, cost-effective, reliable and efficient tool for the obfuscation of
sensitive electronic health data, to balance these competing interests by supporting general-
purpose analytics while guaranteeing privacy protection.
Project Terms: