SBIR-STTR Award

Automatic Extraction of Financial Data from Text
Award last edited on: 6/26/2015

Sponsored Program
SBIR
Awarding Agency
NSF
Total Award Amount
$1,145,998
Award Phase
2
Solicitation Topic Code
-----

Principal Investigator
Hassan Alam

Company Information

BCL Technologies (AKA: BCL Computers-Asta-Blu)

3031 Tisch Way Suite 1000
San Jose, CA 95128
   (408) 225-2679
   info@bcltechnologies.com
   www.bcltechnologies.com
Location: Single
Congr. District: 16
County: Santa Clara

Phase I

Contract Number: ----------
Start Date: ----    Completed: ----
Phase I year
2013
Phase I Amount
$179,999
The innovation is the development of a linguistically-driven machine learning system that will extract financial data from financial text such as 10-Q documents, with an accuracy of over 85%. To be useful to the analysts, financial data needs to be a triple of "Financial Concept", "Numeric Value" and "Date Range." Because of the complexity of sentences in the financial domain, detecting the Financial Concept and attaching it to the correct Numeric Value and Date Range remains a challenge. Current financial extraction systems record an accuracy of less than 50%. The proposed method will use a combination of Financial Named Entity Recognition, Semantic Nearest Neighbor location and Support Vector Machines to improve Financial Concept detection, attachment and semantic tagging to 85%. By combining these methods in its Phase II Research, the innovation is the development of an end-to-end 'Automatic Extraction of Financial Data from Text' system that is usable by computerized systems. At the end of Phase I, the proposed method will demonstrate the feasibility of financial data extraction on the Notes section of 10-Q documents. The Phase II system will be designed to scale up to handle very large data sets, including non-American English documents in near real-time.

The broader/commercial impact of Automatic Extraction of Financial Data from Text system is the availability of relevant financial information in computer-readable format with high accuracy in near real time. Currently, data embedded in financial text are extracted manually by hundreds of people working for data warehouses. This manual effort takes on the order of weeks making the bulk of the data unavailable in easily computer-usable forms in real time. The benefit of Automatic Extraction of Financial Data from Text will be in three areas: 1. Algorithmic Trading programs will be able to use all data published worldwide immediately after the data is published; 2. Financial data warehouses will be able to provide much larger types of data concepts - there are 18,498 concepts in the US Generally Accepted Accounting Principles taxonomy versus less than 180 available in commercial data warehouses; 3. There will be increased transparency in the financial market as financial information embedded in the text becomes computer readable. The algorithmic trading was estimated to reach over $5 Trillion with 750 Billion shares traded, generating a profit of over $600 Million in 2012. The impact of financial transparency is an intangible benefit that will improve financial market efficiency.

Phase II

Contract Number: ----------
Start Date: ----    Completed: ----
Phase II year
2014
(last award dollars: 2017)
Phase II Amount
$965,999

The broader impact/commercial potential of this Small Business Innovation Research (SBIR) Phase II project will result from the availability of relevant financial information in structured computer-usable format with high accuracy in near real time. Currently, data embedded in financial text are extracted manually by hundreds of people working for data warehouses. This manual effort takes on the order of weeks making the bulk of the data unavailable in easily computer-usable form in real time. The benefits of this project will be focused in three areas: (i) algorithmic trading programs will be able to use all data published worldwide immediately after the data is published; (ii) financial data warehouses will be able to provide an order of magnitude larger set of data concepts (from < 200 to > 3000); (iii) there will be increased transparency in the financial markets as financial information embedded in text becomes computer-readable. In 2012 algorithmic trading was estimated to exceed $5 Trillion in value with 750 Billion shares traded, generating a profit of over $600 Million. Financial transparency is an intangible benefit that will improve financial market efficiency.This Small Business Innovative Research (SBIR) Phase II project will develop automated methods to extract and tag relevant financial concepts from the free text of financial documents such as an annual reports, press releases and analyst reports. The extracted financial concepts will be semantically mapped (tagged) to a financial taxonomy such as the US-GAAP or IFRS for standardized analysis. There is a growing need in current financial markets for accurate and timely access to relevant financial information for supporting trading and analysis decisions. At the end of this project, the company's technology is expected to have the capability to provide such information to analysts and decision makers in a timely fashion. The primary goals of this project will be to: (i) build an end-to-end prototype system for automatically extracting financial data from text; (ii) extend the scope of the technology to reach a broader range of real-world applications; (iii) increase accuracy; (iv) reduce the processing time for financial data extraction. The technology will employ machine learning and natural language processing techniques toward financial concept annotation, extraction and semantic tagging to achieve these goals.