The sequencing of long contiguous regions of human genomic DNA is soon expected at a rate of several million bases per day. Many genes embedded in these sequences are important for applications in genetic medicine, drug design and the environment. However, standard gene finding programs are not capable of accurately locating and describing genes, and a more powerful technology is needed. In this project, previously developed software that combines pattern recognition and homology information will be used to identify, model, and properly parse genes from long stretches of genomic DNA sequence. Because the software is only available as a research code and not available in a stand-alone, user friendly package, a suitable graphical user interface will be built, and various database issues will be resolved to provide a commercially viable product. Phase I will develop a suitable client server architecture for the system, develop a user-friendly Java graphical interface for workstations and personal computers, develop an application programming interface to embed the program in a user configured pipeline, refine search routines and database structure to enhance the efficiency of the programâs database search component, build or find a database of full-length cDNAs, and make provisions for users to include data from in-house, proprietary databases.
Commercial Applications and Other Benefits as described by the awardee: These software tools should allow the sequence data produced by DOE and other Federal agencies to be used by the private sector for important medical and environmental applications such as diagnostics, drug design, and bioremediation