Using the cumulative file to update a local database copy offers the advantage that there is no need for further data processing and, hence, no risk of errors arising through the manipulation of incremental update files. At many sites the available network bandwidth does not make it reasonable to transfer the cumulative file by ftp from the database provider site daily or even weekly. Downloading incremental files requires much less bandwith, but higher local maintenance effort, since the incremental files have to be integrated into a cumulative file to present the data as a single file for conversion into GCG format or indexing by SRS. To facilitate this processing step, and to provide a reliable mechanism to regenerate the cumulative file locally from incremental updates, we have developed the SynCron tools.
The version number (currently, 005) will change as the programs are updated.
SynCron makes use of transaction listings which have been made avaiable by the EBI for the nucleotide database since January 1996. The EBI supplies a listing for each of its update files that describes the update, insert and delete operations to the database represented in the flat-file updates. The core program of SynCron merges updates into the cumulative file following the instructions given in the transaction listings. Two additional utilities are provided to verify that the resulting cumulative file contains the correct entries. Currently, accession number, entry name, version, divison and a datestamp of an entry are validated. In the future, we may include the NID of an entry (a unique identifier for the sequence), or a checksum for the sequence to prove its identity.
####### 1 +--+
# # -----> | |
####### +--+
cumulative cumulative
data list
+-+ 2 +---+
+ | | -----> | | <.....
+-+ +---+ :
incremental merged :
list (EBI) list :
: I
:
####### ### +-+ 3 ########## +---+ :
# # + # # + | | -----> # # + | | <....:
####### ### +-+ ########## +---+ :
cumulative incremental merged merged :
data data + list data list :
: II
:
+---+ :
| | <....:
+---+
cumulative
list (EBI)
The SynCron package also includes tools to assist file transfer. If you have already a working configuration for obtaining the update files, there is no need to change this, except that you will need to add a procedure that fetches the list files in addition to the data files.
List files are available . The naming scheme is the same as for incremental and cumulative update files with the extension ".lis". All customization for local file names and paths can be done in a general configuration file, as explained in the installation instructions for the package.
Using SynCron it should be possible to keep a copy of the EMBL Nucleotide Sequence Database that exactly matches the contents of the database in operation at the EBI for external services, with manual intervention required only in the event of network failure, etc. We hope, thereby, to help users to improve and guarantee the quality of the EMBL Nucleotide Sequence Database updates obtained by electronic transfer.