SBIR-STTR Award

Fast Biosequence Annotation via Reconfigurable Hardware
Award last edited on: 3/28/19

Sponsored Program
STTR
Awarding Agency
NIH : NCHGR
Total Award Amount
$1,892,727
Award Phase
2
Solicitation Topic Code
-----

Principal Investigator
Jeremy D Buhler

Company Information

Becs Technology Inc

9487 Dielman Rock Island Industrial Drive
St. Louis, MO 63132
   (314) 567-0088
   tim@becs.com
   www.becs.com

Research Institution

Washington University

Phase I

Contract Number: 1R42HG003225-01
Start Date: 9/30/04    Completed: 8/31/06
Phase I year
2004
Phase I Amount
$156,462
Databases of biological sequences have proven valuable for understanding the organization of human and other genomes, for unraveling the etiology of genetic disorders, and for studying medically important pathogens. Unfortunately, these databases are growing at an exponential rate, posing a severe problem for bio-sequence annotation. Specialized hardware implementations of sequence comparison can dramatically accelerate BLAST and other algorithms used to annotate sequences. However, the utility of these hardware accelerators is limited by inflexible functionality, lack of a clear upgrade path to faster components, and limitations on the rate at which bio-sequence data can be streamed to the comparison logic. This proposal seeks to construct a novel hardware-based bio-sequence accelerator, the "smart disk" engine that addresses limitations of existing accelerators. The proposed engine combines the flexibility of reconfigurable FPGA logic, which can be reprogrammed at will and easily upgraded while running at hardware speeds, with an innovative architecture that ties the comparison logic closely to an array of hard disks, guaranteeing massive data bandwidth into the comparison hardware. The smart disk engine is designed to accelerate all stages of BLAST-like similarity search algorithms, not just the final Smith- Waterman stage. Phase I of this fast-track STTR application proposes to build the initial prototype of the smart disk engine, to implement comparison logic mirroring the stages of the widely used BLAST pipeline, to construct a transparent software front end to the engine, and finally to evaluate the performance of the prototype on large-scale tasks in bio-sequence annotation. Key innovations in this phase will be the integration of FPGA logic with the mass storage system, modeling the performance of the new architecture, and development of software control that can rapidly reprogram the FPGAs to construct comparison pipelines optimized for different types of comparison (BLASTN, BLASTP, etc). At the end of this phase, the combined hardware and software of the smart disk prototype should implement at least BLASTN- and BLASTP-like computations, run at least 30x faster than 2003-era commodity general-purpose processors, and successfully hide the complexity of the engine's hardware from the biological end user.

Thesaurus Terms:
computational biology, computer assisted sequence analysis, computer system design /evaluation, computer system hardware, data management, high throughput technology, molecular biology information system computer program /software, functional /structural genomics, nucleic acid sequence, protein sequence, proteomics bioengineering /biomedical engineering

Phase II

Contract Number: 5R42HG003225-02
Start Date: 9/30/04    Completed: 8/31/10
Phase II year
2005
(last award dollars: 2009)
Phase II Amount
$1,736,265

Databases of biological sequences have proven valuable for understanding the organization of human and other genomes, for unraveling the etiology of genetic disorders, and for studying medically important pathogens. Unfortunately, these databases are growing at an exponential rate, posing a severe problem for bio-sequence annotation. Specialized hardware implementations of sequence comparison can dramatically accelerate BLAST and other algorithms used to annotate sequences. However, the utility of these hardware accelerators is limited by inflexible functionality, lack of a clear upgrade path to faster components, and limitations on the rate at which bio-sequence data can be streamed to the comparison logic. This proposal seeks to construct a novel hardware-based bio-sequence accelerator, the "smart disk" engine that addresses limitations of existing accelerators. The proposed engine combines the flexibility of reconfigurable FPGA logic, which can be reprogrammed at will and easily upgraded while running at hardware speeds, with an innovative architecture that ties the comparison logic closely to an array of hard disks, guaranteeing massive data bandwidth into the comparison hardware. The smart disk engine is designed to accelerate all stages of BLAST-like similarity search algorithms, not just the final Smith- Waterman stage. Phase I of this fast-track STTR application proposes to build the initial prototype of the smart disk engine, to implement comparison logic mirroring the stages of the widely used BLAST pipeline, to construct a transparent software front end to the engine, and finally to evaluate the performance of the prototype on large-scale tasks in bio-sequence annotation. Key innovations in this phase will be the integration of FPGA logic with the mass storage system, modeling the performance of the new architecture, and development of software control that can rapidly reprogram the FPGAs to construct comparison pipelines optimized for different types of comparison (BLASTN, BLASTP, etc). At the end of this phase, the combined hardware and software of the smart disk prototype should implement at least BLASTN- and BLASTP-like computations, run at least 30x faster than 2003-era commodity general-purpose processors, and successfully hide the complexity of the engine's hardware from the biological end user.

Thesaurus Terms:
computational biology, computer assisted sequence analysis, computer system design /evaluation, computer system hardware, data management, high throughput technology, molecular biology information system computer program /software, functional /structural genomics, nucleic acid sequence, protein sequence, proteomics bioengineering /biomedical engineering