embnet.news_Development

The Sequence Retrieval System (SRS) on the World Wide Web

T. Etzold, EMBL

Introduction

SRS started in 1989 out of frustration. At that time almost all relevant data was contained in a plethora of different flat file libraries or databanks. This situation is not much different today, despite relational and object oriented databases being increasingly used.

The annotation format of these databanks was more or less consistent but almost all databanks had, and still have, different formats. Databanks started cross-referencing each other but there was no tool to use these references. The initial project dealt with sequence databanks only but was soon extended to support all flat file databanks in biology. To date more than 50 databases are supported by the SRS system. (See here for an example).

The System

In biology, as well as in other fields, cross-indexing information is the major source of knowledge and understanding. How to devise a cross-referencing strategy that applies to many different types of information is a major challenge. One way of answering this challenge is to create a description language explicitly. The SRS system has such a feature built-in in the form of the ODD (Object Design and Definition) Language . For the specification of a databanks format and file organization ODD provides a common description interface. With these tools at hand an indexing and retrieval system was developed with the following characteristics:
  • Support for several different index types (e.g. strings and numbers).
  • The possibility to index subentries (e.g. sequence features) so that they can be retrieved independently.
  • Cross-references can be indexed to generate links between databanks.
  • Indexing makes links bi-directional and allows them to be combined to create a network of databanks in which it is possible to navigate from one databank to any other.
  • Only simple queries such as search by accession number, author name or by sequence length are supported but are extremely fast and can be combined by logical operators.

    To allow queries the entire set of databases must be indexed. The indexing recognises the native databank format and file organization, parses every entry and extracts words and puts them into one of several indexes. Indexes may be created for each individual data field.

    The current release works on all major UNIX platforms, VMS and on 32bit DOS and supports almost 50 different databanks such as Swissprot, EMBL, GenBank & PDB. It has two user interfaces: a command line interface and a SRSWWW server.

    The SRSWWW server

    The World Wide Web server is currently the main interface to SRS. It has unique features that distinguish it from other servers. Almost all HTML (Hyper Text Manipulation Language) files are created by programs just before being displayed. This ensures that the pages are always up to date. This makes obsolete the requirement of editing HTML files upon the modification, addition or removal of a databank within SRS. Generation of HTML on-the-fly also simplifies the installation on different sites having different sets of databanks.

    The protocol used for World Wide Web is called HTTP (Hyper Text Transaction Protocol), and is a simple Query , followed by a stream of replies . The WWW Server will die after having replied to a query and therefore does not remember previous queries. Therefore, the HTTP protocol is called stateless .

    The SRSWWW server, however, is stateful and achieves this by maintaining a file with the user context. All queries are saved so that they can be reinspected or combined in query expressions. This is an essential extension to the normal WWW server, but maintains compatibility with the existing browsers.

    SRSWWW supports links between databanks both by hypertext links in the entries or by allowing navigation in the network generated by the indexing.

    The SRSWWW server is currently installed on 8 EMBnet nodes and one site in the USA. A status page is automatically generated at least once per day that lists all the servers and all databanks served.

    If you use the SRS system and have created parsers for publicly available biological databases please contact etzold@embl-heidelberg.de. I would love to add your parsers to the distribution.

    Access points:

  • EMBL, Heidelberg
  • EBI, Hinxton UK
  • Biozentrum, Basel CH
  • CAOS/CAMM Center, University of Nijmegen, The Netherlands
  • CNRS - INSERM, France
  • BEN, the Belgian EMBnet Node
  • Norwegian EMBnet node, Oslo
  • Swedish EMBnet node, Uppsala
  • Skirbal Institute, New York
  • Distribution

    Current major version: 4.0

    anonymous ftp to ftp.embl-heidelberg.de for UNIX or VMS .

    anonymous ftp to ftp.ebi.ac.uk for UNIX or VMS .

    Reference:

    SRS-an indexing and retrieval tool for flat file data libraries. T. Etzold and P. Argos. CABIOS 9, 1993, pp.49-57

    The use of WWW in Biological Research. R. Doelz and T. Etzold. Computer Networks and ISDN Systems 27 (2), 1994. Electronic publication accesible via http://www.elsevier.nl/

    Acknowledgement:

    Part of the SRS development program has been financed and carried out by grants provided under the Biomed/EU agreement, and project funds from the EMBnet project (BRIDGE/EU agreement). The initial port to the UNIX operating system was done as part of the EMBnet project in Switzerland, funded by the Schweizerische Nationalfonds.

    +Database and Program Development, part 3

    Main:Contents page