SBIR-STTR Award

Statistical Methods for Incomplete Data with Measurement Errors
Award last edited on: 1/11/18

Sponsored Program
SBIR
Awarding Agency
NIH : NIGMS
Total Award Amount
$1,512,415
Award Phase
2
Solicitation Topic Code
-----

Principal Investigator
Edward C Chao

Company Information

Data Numerica Institute Inc (AKA: Data Numerica)

6120 149th Avenue SE
Bellevue, WA 98006
   (425) 591-7944
   echao@datanumerica.com
   www.datanumerica.com
Location: Single
Congr. District: 09
County: King

Phase I

Contract Number: 1R43GM100573-01
Start Date: 6/1/12    Completed: 2/28/13
Phase I year
2012
Phase I Amount
$198,601
Missing data and measurement errors are common problems in statistical data analysis. We are interested in experimental and observational studies where there exist missing data and measurement errors problems. Examples include health surveys containing non-responders or missing items, surrogate marker data with measurement errors, etc. The applications could be longitudinal clinical trials, multilevel community studies and health surveys. The incomplete data could be the non-ignorable missing response used in a model or as predictors, i.e. missing response, missing covariate, and covariate measurement errors. The most complicated scenario is the combination of such difficulties, i.e. the missing response with covariate measurement errors. The results from this project include innovative statistical methods, case studies, tools, solutions, and publications. These resources will be incorporated in our Longit Informatics Center for sharing and illustration. The Longit Informatics Center is an online data analysis environment. Subscribers can access many statistical packages and dynamic graphics for data analysis. In this project, the ultimate results will be two statistical packages added to Longit: 1) MiMe: statistical methods for missing data and measurement errors, and 2) Laso: joint modeling methods for longitudinal and survival outcomes in the study of surrogate marker for clinical event time. These packages include innovative statistical methods, sensitivity analysis and graphical methods. There is no commercial software to deal with complicated case as Laso.

Public Health Relevance:
This project aims to develop statistical methods and tools for analyzing incomplete data with missing data and measurement errors.

Phase II

Contract Number: 2R44GM100573-02A1
Start Date: 6/1/12    Completed: 4/30/17
Phase II year
2015
(last award dollars: 2016)
Phase II Amount
$1,313,814

Missing data, censored data and surrogate markers are common incomplete data problems in biomedical data analysis. In this project, we are interested in statistical methods for experimental, observational, and genetic studies where there exist missing data, measurement errors, and surrogate markers. Examples include health surveys containing non-responders or missing items, surrogate marker data with measurement errors, etc. The applications could be longitudinal clinical trials, multilevel community studies, genetic markers, health surveys, etc. The incomplete data could be the non-ignorable missing response used in a model or as predictors, i.e. missing response, missing covariate, and covariate measurement errors. The most complicated scenario is the combination of such difficulties, e.g. missing response with covariate measurement errors, censored data with surrogate markers and measurement errors, etc. In this project, the ultimate results will be two statistical packages aiming at longitudinal and survival responses: 1) MiMe: statistical methods for missing data and measurement errors, and 2) Laso: joint modeling methods for longitudinal and survival outcomes in the study of surrogate marker for clinical event time. Functional and structural approaches will be developed, and they are applicable to many other areas, e.g. genetic markers association studies. The results from this project include innovative statistical methods, sensitivity analysis, graphical methods, case studies, software tools, and publications. An R version will be available and advanced used may apply this version for comparison studies vs. other approaches or customize this version for further extensions. A second version is to incorporate the tools from this research into our online data analysis platform, the Longit Informatics Center. Subscribers can access many statistical packages, modules, and dynamic graphics in Longit for data analysis. For various commercialization purposes, we will deliver online and offline versions, i.e. internet, intraweb, and desktop versions. We will also license ou API version for integrating with other analytic systems in business and other non-biomedical fields. One example is to integrate Longit with Alteryx, a commercial data mining tool for big data analysis.

Public Health Relevance Statement:


Public Health Relevance:
This project aims to develop statistical methods and friendly software tools for analyzing incomplete data with missing data, surrogate markers, and measurement errors. The outcome response could be time- independent data, longitudinal data or survival data in behavioral studies, cancer studies, AIDS studies, health surveys, etc. We will deliver desktop, intraweb, and web versions, and users may integrate and customize our software API in their own analytic systems for biomedical studies or business data mining.

Project Terms:
Acquired Immunodeficiency Syndrome; Adverse effects; Area; attenuation; Big Data; Body Weight decreased; Businesses; Case Study; case-based; Clinical; Clinical Research; Clinical Trials; commercialization; Communities; Computer software; Computerized Medical Record; computing resources; cost; Data; Data Analyses; data mining; design; Diagnosis; Diagnostic; Disease; Dropout; Eating; Evaluation; Event; Genetic; genetic association; Genetic Markers; graphical user interface; Health Surveys; Imagery; Individual; Influentials; Informatics; innovation; Intelligence; interest; Internet; Investigation; Joints; Licensing; Longitudinal Studies; Malignant Neoplasms; Measurement; Measures; method development; Methods; Modeling; nutrition; Nutritional Study; Observational Study; Outcome; parallel computer; Patient Self-Report; Phase; phase 1 study; Physical activity; prototype; public health relevance; Publications; Records; Research; research study; Resource Sharing; response; simulation; Software Tools; Statistical Computing; Statistical Methods; Study Subject; Surrogate Markers; Surveys; System; Testing; Time; tool