The Sequence Retrieval System (SRS) on the World Wide Web
T. Etzold, EMBL
Introduction
SRS started in 1989 out of frustration. At that time almost
all relevant data was contained in a plethora of different
flat file libraries or databanks. This situation is not much
different today, despite relational and object oriented
databases being increasingly used.
The annotation format of these databanks was more or less
consistent but almost all databanks had, and still have,
different formats. Databanks started cross-referencing each
other but there was no tool to use these references. The
initial project dealt with sequence databanks only but was
soon extended to support all flat file databanks in biology.
To date more than 50 databases are supported by the SRS
system. (See
here for an example).
The System
In biology, as well as in other fields, cross-indexing
information is the major source of knowledge and
understanding. How to devise a cross-referencing strategy
that applies to many different types of information is a
major challenge. One way of answering this challenge is to
create a description language explicitly. The SRS system has
such a feature built-in in the form of the ODD (Object
Design and Definition) Language . For the specification of a
databanks format and file organization ODD provides a common
description interface. With these tools at hand an indexing
and retrieval system was developed with the following
characteristics:
Only simple queries such as search by accession number,
author name or by sequence length are supported but are
extremely fast and can be combined by logical operators.
To allow queries the entire set of databases must be
indexed. The indexing recognises the native databank format
and file organization, parses every entry and extracts words
and
puts them into one of several indexes. Indexes may be
created for each individual data field.
The current release works on all major UNIX platforms, VMS
and on 32bit DOS and supports almost 50 different databanks
such as Swissprot, EMBL, GenBank & PDB. It has two user
interfaces: a command line interface and a SRSWWW server.
The SRSWWW server
The World Wide Web server is currently the main interface to
SRS. It has unique features that distinguish it from other
servers. Almost all HTML (Hyper Text Manipulation Language)
files are created by programs just before being displayed.
This ensures that the pages are always up to date. This
makes obsolete the requirement of editing HTML files upon
the modification, addition or removal of a databank within
SRS. Generation of HTML on-the-fly also simplifies the
installation on different sites having different sets of
databanks.
The protocol used for World Wide Web is called HTTP (Hyper
Text Transaction Protocol), and is a simple Query , followed
by a stream of replies . The WWW Server will die after having
replied to a query and therefore does not remember previous
queries. Therefore, the HTTP protocol is called stateless .
The SRSWWW server,
however, is stateful and achieves this by
maintaining a file with the user context. All queries are
saved so that they can be reinspected or combined in query
expressions. This is an essential extension
to the normal
WWW server, but maintains compatibility with the existing
browsers.
SRSWWW supports links between databanks both by hypertext
links in the entries or by allowing navigation in the
network generated by the indexing.
The SRSWWW server is currently installed on 8 EMBnet nodes
and one site in the USA. A
status page
is automatically
generated at least once per day that lists all the servers
and
all databanks served.
If you use the SRS system and have created parsers for
publicly available biological databases please contact
etzold@embl-heidelberg.de. I would love to add your parsers
to the distribution.
Access points:
Distribution
Current major version: 4.0
Reference:
SRS-an indexing and retrieval tool for flat file data
libraries. T. Etzold and P. Argos.
CABIOS 9, 1993, pp.49-57
The use of WWW in Biological Research. R. Doelz and T.
Etzold. Computer Networks and ISDN Systems 27 (2), 1994.
Electronic publication accesible via
http://www.elsevier.nl/
Acknowledgement:
Part of the SRS development program has been financed and
carried out by grants provided under the Biomed/EU
agreement, and project funds from the EMBnet project
(BRIDGE/EU agreement). The initial port to the UNIX
operating system was done as part of the EMBnet project in
Switzerland, funded by the Schweizerische Nationalfonds.