SBIR-STTR Award

Extracting Semantic Knowledge from Clinical Reports
Award last edited on: 6/2/09

Sponsored Program
SBIR
Awarding Agency
NIH : NLM
Total Award Amount
$952,683
Award Phase
2
Solicitation Topic Code
-----

Principal Investigator
Patrick W Jamieson

Company Information

Logical Semantics Inc (AKA: Medical Reporting Solutions Inc)

714 North Senate Avenue Suite 100
Indianapolis, IN 46202
   (317) 863-2723
   pjamieson@logicalsemantics.com
   www.logicalsemantics.com
Location: Single
Congr. District: 07
County: Marion

Phase I

Contract Number: 1R43LM008974-01
Start Date: 00/00/00    Completed: 00/00/00
Phase I year
2005
Phase I Amount
$100,000
Electronic medical record systems (EMR) contain a wealth of clinical data that is invaluable for biomedical research, but because there are no satisfactory methods to build coherent specialized knowledge bases, which represent the information in free text medical records, data mining and clinical discovery are held back. Medical Reporting Solutions, Inc. has developed advanced technology, which we propose to extend, refine, and test for constructing specialized semantic knowledge bases. These knowledge bases will encode the clinical information in medical reports, and enable automated natural language processing systems for extracting clinical knowledge. Our research and development uses methods in corpus linguistics and sentential logic to represent the knowledge in free-text medical reports in an efficient, codeable manner. We have created tools to map sentences in a medical domain to unique codeable propositions. Our method for creating knowledge ontologies makes it easy for biomedical researchers to get semantic information at the appropriate level of detail. The knowledge base and mapping tables allow us to analyze medical reports in near real-time. One knowledge base, under development, is derived from hundreds of thousands of reports in the radiology domain, and we intend to analyze other medical domains using the methods we have pioneered. Our phase one project plan includes further improving our knowledge editing tools, substantially enlarging our semantic knowledge base to cover over 60-70% of the radiology domain, and extensively test our knowledge representation schema against actual radiology reports. We plan to make the knowledge base freely accessible to the biomedical research community, while providing commercial services to codify free text reports found in EMRs.

Thesaurus Terms:
automated medical record system, computer data analysis, computer program /software, computer system design /evaluation, informatics, information retrieval clinical research, human data

Phase II

Contract Number: 9R44RR024929-02
Start Date: 00/00/00    Completed: 00/00/00
Phase II year
2008
(last award dollars: 2009)
Phase II Amount
$852,683

Analyzing and processing free-text medical reports for data mining and clinical data interchange is one of the most challenging problems in medical informatics, yet it is crucial for continued research advances and improvements in clinical care. Natural language processing (NLP) is an important enabling technology, but has been held back because it is difficult to understand human language, since it requires extensive domain knowledge. In Phase I, we developed new statistical and machine learning methods that apply domain specific knowledge to the semantic analysis of free-text radiology reports. The methods enabled the creation of two new prototype applications - a SNOMED CT (Systematized Nomenclature of Medicine--Clinical Terms) coding service called SnomedCoder, and a text mining tool for analyzing a large corpus of medical reports, called DataMiner. In Phase II, we will accomplish the following specific aims: 1) Improve the semantic extraction methods developed in Phase I, 2) Expand the semantic knowledge base and classify at least two million new unique sentences from multiple medical institutions, 3) Provide a SNOMED CT auto coding service (alpha service) to participating Indiana Health Information Exchange hospitals, and 4) Build a commercial version of the DataMiner software, and test its functionality using researchers at the Regenstrief Institute. These scientific innovations will revolutionize the ability of health care researchers to analyze vast repositories of clinical information currently locked up in electronic medical records, and correlate this data with new biomedical discoveries in proteonomics and genomics. The ability to codify text rapidly will extend the potential for clinical decision support beyond its narrow base of numeric and structured medical data, and enable SNOMED CT to become a useful coding standard. Phase III will offer coding and data mining services to healthcare payers (both private and government), pharmaceuticals, and academic researchers. A key advantage of our approach over other NLP systems is that we attempt to codify all the information in the report and not just a limited subset, and insist on expert validation which provides a high degree of confidence in the accuracy of the coded data.Project Narrative

Thesaurus Terms:
There Are No Thesaurus Terms On File For This Project.