Database system employing protein function hierarchies for viewing biomolecular sequence data
Patent 6023659 Issued on February 8, 2000. Estimated Expiration Date: March 6, 2017. Estimated Expiration Date is calculated based on simple USPTO term provisions. It does not account for terminal disclaimers, term adjustments, failure to pay maintenance fees, or other factors which might affect the term of a patent.
Disclosed is a relational database system for storing biomolecular sequence information in a manner that allows sequences to be catalogued and searched according to one or more protein function hierarchies. The hierarchies allow searches for sequences based upon a protein's biological function or molecular function. Also disclosed is a mechanism for automatically grouping new sequences into protein function hierarchies. This mechanism uses descriptive information obtained from "external hits" which are matches of stored sequences against gene sequences stored in an external database such as GenBank. The descriptive information provided with the external database is evaluated according to a specific algorithm and used to automatically group the external hits (or the sequences associated with the hits) in the categories. Ultimately, the biomolecular sequences stored in databases of this invention are provided with both descriptive information from the external hit and category information from a relevant hierarchy or hierarchies.
Other References
The Institute for Genomic Research, TIGR Database. Internet--http://www.tigr.org (Downloaded Dec. 23, 1997)
National Center for Biotechnology Information (Index). Internet--http://www3.ncbi.nlm.nih.gov/ (Downloaded Dec. 23, 1997)
National Center for Biotechnology Information; ENTREZ. Internet--http://www3.ncbi.nlm.nih.gov/entrez (Downloaded Dec. 23, 1997)
National Center for Biotechnology Information; UniGene. Internet--http://www3.ncbi.nlm.nih.gov/unigene (Downloaded Dec. 23, 1997)
National Center for Biotechnology Information; GenBank. Internet--http://www3.ncbi.nlm.nih.gov/web/genbank (Downloaded Dec. 23, 1997)
EcoCyc: Electronic Encyclopedia of E.coli Genes and Metabolism. Internet--http://www.ai.sri.com/ecocyc/ ecocyc.html (Downloaded Feb. 28, 1997)
Taxonomy of Genes (EcoCyc). Internet--http://www.ai.sri.com:1555/class-subs?object=Genes (Downloaded Mar. 19, 1997)
GenProtEC: E. coli genome and proteome database. Internet--http://www.mbl.edu/html/ecoli.html (Downloaded Feb. 28, 1997)
Keele, J.W., "A Conceptual Database Model for Genomic Research," Journal of Computational Biology, vol. 1, No. 1, pp. 65-76, (1994)
Fickett, J.W., "Finding Genes by Computer: the State of the Art," TIG vol. 12, No. 8, pp. 316-320 (1996)
"GDB 6.0 Goals," http://gdb.gdbnet.ad.jp/gdb/docs/gdb6-goals.htm, pp. 1-5 (1995)
Matsubara, K. and Okubo, K. "Identification of New Genes by Systematic Analysis of cDNAs and Database Construction", Current Opinion in Biotechnology, 1993, No. 4, pp. 672-677
Kanehisa, M. "Toward Pathway Engineering: A New Database Old Genetic and Molecular Pathways." Science and Technology Japan, 1995, No. 59, pp. 34-38
Gaasterland, T. and Sensen, C. "Using Multiple Tools for Automated Genome Interpretation in an Integrated Environment", Trends In Genetics, Jan. 1996
Adams, M.D. et al. "Complementary DNA Sequencing: Expressed Sequence Tags and Human Genome Project", Science, Jun. 21, 1991, vol. 252, pp. 1651-1656