A Colour INteractive Editor for Multiple Alignments - CINEMA

Attwood, T.K.[1], Payne, A.W.R.[2], Michie, A.D.[1] and Parry-Smith, D.J.[3]

1. Dept. of Biochemistry and Molecular Biology, University College London, London WC1E 6BT, UK
2. Cyberdynamics (UK) Ltd., 38 Lordsbury Field, Wallington, Surrey SM6 9PE, UK
3. Dept. of Molecular Sciences, Pfizer Central Research, Sandwich, Kent CT13 9NJ, UK

Introduction

The World Wide Web has become the most popular vehicle for distributing bio-information. The `information superhighway' was initially used primarily to ship data, but we are now seeing a shift away from data dissemination per se to its use in transmitting concepts. This is true, for example, in the pharmaceutical industry, where extraction of information about potential structural or functional sites from sequence data is an essential component of drug-discovery protocols. Thus, while the provision of, and links between, databases has revolutionised the way we access data, visualisation and interactive manipulation of data are now key goals in allowing users to get the most from their bio-information.

For the sequence analyst, a vital tool is an alignment editor. Numerous programs are now available, either in a stand-alone form or as components of larger packages. The programs range from fully-manual to fully-automatic, but results from automatic procedures almost invariably require manual editing. This often presents problems, as there is currently no standard format for output, storage and distribution of alignments.

Java, an object-oriented network programming language that allows interactive use of software over the Web, begins to address some of these problems. Java-capable browsers may run applets on a variety of platforms (applets are small applications launched from a server via HTML pages). To an extent, this obviates the need to distribute code, as software is loaded on-the-fly from the server, and cached for that session by the browser. Executable code will thus run on virtually all desktop platforms without modification: when modifications are made and the source recompiled, the program should run everywhere.

To address the torrent of genome data, new-generation tools are required to deliver up-to-date information to the community via user-friendly, interactive interfaces. To this end, we have developed CINEMA, a tool both for local alignment construction and modification, and for visualisation and manipulation of alignments resident at different sites on the Internet.

The program

CINEMA is embedded in a comprehensive help file, so instructions on its use are immediately to hand. The applet is demonstrated with an alignment of lysozyme sequences, which are easily purged from the display to allow input of user-specified files: these may be loaded directly from local sequence or alignment databases, from a temporary directory, or they may be input by the user (via cut-and-paste or file-upload facilities). The program thus provides the flexibility to extend and enhance pre-existing alignments, and to generate alignments from scratch.

Navigation around the display is effected by means of scroll bars, and gaps are inserted/deleted using a click-and-drag mouse action. Other facilities include group editing (to allow simultaneous gap insertion into sets of sequences); sequence re-ordering/removal; variation of font sizes and colours; etc.. By default, alignments are coloured according to residue property groups consistent with those of physical modelling components and graphics packages, vis: acidic=red; basic=blue; polar neutral=green; hydrophobic aliphatic=white; hydrophobic aromatic=purple; P,G=brown; C=yellow.

Output options allow files to be saved to a temporary directory for future program input, or alignments may be mailed-back (in text or PostScript formats).

Alternatively, results may be output in gif format for display within the browser.

Conclusion and future directions

Interactive editors are essential where alignments from automatic programs require manual adjustment. The use of colour then allows rapid interpretation of the results, allowing different properties to be depicted in an immediately informative way, no matter how large the alignment. Such tools thus offer a rapid and informed means of selecting residues suitable for mutagenesis studies by revealing regions crucial to the structure and/or function of a protein: e.g., critically conserved residues can be seen at a glance; unusual mutations may stand proud against a smooth backdrop of conservation; and mutational hotspots are readily pin-pointed.

CINEMA is the first component of a modular network-oriented analysis package. A structure display module is now well-advanced; this links the sequence to features of biological interest, by allowing visualisation of conserved motifs in a 3D context. Other emerging tools include hydropathy plots, diagonal plots, etc.. Our intention is to extend the applet flexibly, by dynamically loading new classes to `plug in' additional functionality. With this open approach, our aim is to permit multiple centres to develop and deploy extensions to the applet through construction of custom `pluglets', allowing the package to grow rapidly. Such cooperation avoids costly duplication of effort, it encourages global collaboration, and allows convergence to a set of standards.

The different features of this program are not new or remarkable in themselves. What is striking is that alignment manipulation is able to happen in real time, that users may swap data with the applet, and that the applet may be enhanced and expanded without the need to distribute code.

Availability

CINEMA is accessible via UCL's Bioinformatics server. For security reasons, use of the Internet is not a universal solution. We are therefore making the applet available for installation on organisations' Intranets.


Go to: previous article - next article - Table of contents