BIOCCELERATOR:

A CURRENTLY AVAILABLE SOLUTION FOR FAST PROFILE AND SMITH-WATERMAN SEARCHES.

Leon Esterman,
Israeli National Node (INN)


Background.

DNA and Protein databases and sequence analysis programs have become critical tools in biological, medical and genetic research. The load on INN's server for frequently requested services like database searches, FASTA, TFASTA, Profilesearch and multiple sequence alignment is rapidly increasing with the ever expanding size of the databases. This is a situation probably occurring at every EMBnet site.


Bioccelerator.

The amount of data available is growing exponentially and many computational resources are needed in order to find meaning in this information. Biologists are aware of the need for more processing power and are examining solutions based on RISC workstations and on massively parallel computing[1]. Workstations are relatively cheap but will quickly be overwhelmed by the data that is being generated. Massively parallel computers will be able to support future needs but they are expensive.

The BIOCCELERATOR , on the other hand, is an accelerator board designed to support the specific functions needed to run a DNA or protein sequence search extremely fast. Developed and built by Compugen Ltd. of Petach-Tikva, Israel, and tested extensively by the Biological Computing Division of the Weizmann Institute, the BIOCCELERATOR is the simplest and most cost effective solution available today for rigorous sequence analysis. A high end BIOCCELERATOR completes, in seconds, runs that can take hours or days on fast workstations. The BIOCCELERATOR is just as fast, or faster, than the massively parallel solutions available but costs less than 1/10 of these solutions and is more convenient to use.

Algorithms supported by the BIOCCELERATOR include the Smith- Waterman algorithm [2] and PROFILESEARCH [3]. The BIOCCELERATOR is the only currently available solution for fast profile searches. Smith- Waterman has been shown by various studies [4] to be the most sensitive algorithm for searching protein databases. It is the only algorithm that allows an unlimited number of gaps in the alignment. This fact has been known by biologists for a long time but the algorithm was used infrequently because of the large amount of time each run required. With the BIOCCELERATOR , it has become feasible to run such searches on a regular basis and thus increase the chances of detecting significant homologies. PROFILESEARCH is an extension of the Smith-Waterman algorithm. Instead of using a single sequence as a query, a profile is constructed from a multiple alignment of related proteins. This profile reflects information known about a family of proteins, not just one member of the family. PROFILESEARCH is one of the most sensitive methods for finding new members of protein families.

The BIOCCELERATOR was designed to be completely compatible with the widely-used Genetics Computer Groups Wisconsin Sequence Analysis Package For users of the GCG package, the BIOCCELERATOR is transparent, employing the same file and database formats. For numerous sites which are GCG- literate, the BIOCCELERATOR is a simple solution that doesn't require re-educating users.

The Weizmann Institute was the natural beta site for the BIOCCELERATOR . The Institute's Biological Computing Division [BCD] hosts the Israeli National Node [INN] of EMBnet and provides computing services to over 1000 users in Israel.


BCD provides an international e-mail service based on the BIOCCELERATOR for performing rigorous database searches. For information send a blank e-mail message to bicserv@sgbcd.weizmann.ac.il.


With the phenomenal growth in cDNA sequencing use of the Smith-Waterman algorithm has also grown. Because of its ability to handle gaps, the Smith-Waterman algorithm can accurately retrieve sequences when cDNA queries are used. cDNA sequence fragments require sensitive algorithms because of the need to retrieve the parent genomic sequences in which the cDNA fragments are surrounded by non-coding regions.

BIOCCELERATOR , along with NCBI and EMBL, was used last year to perform the database comparison that was instrumental in finding the gene BRCA1 [5]. For academic institutions the BIOCCELERATOR provides a comprehensive solution for speeding up the most time consuming database searches and enables researchers to work almost interactively instead of waiting hours for results.

In 1993, the EMBnet BRIDGE project supported the introduction of this technology.


References:
  • 1. Guy Bottu (1994) Fundamentals of database similarity searching methods. embnet.news 1(1): 2-3
  • 2. Temple F. Smith and Michael S. Waterman (1981) Comparison in biosequences. Advanced in Applied Mathematics 2: 482-489
  • 3. Michael Gribskov, Andrew D. McLachlan, and David Eisenberg (1987) Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. USA 84: 4355-4358.
  • 4. Saul B. Needleman and Christian D. Wunsch (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48: 443-453
  • 5. Mark H.Skolnick et al. (1994) A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. Science 266: 66-71.

  • Go to: previous article - next article - Table to contents