Due to the rapid progress in genome analysis, complete genome sequences
of many organisms become available. This enables us to study biological
function at the genome scale. Proteins play critical roles in many biological
processes. In order to elucidate molecular function of proteins, we
need to have knowledge of their amino acid sequence and structure. However,
the sequence and structural information is not enough to infer the protein
function, because protein is a microscopic entity and its behavior obeys
the law of thermodynamics. Thus, the thermodynamic knowledge of proteins
is as fundamental as sequence and structural information. Although the
databases for sequence and structure are well established, available
databases for thermodynamic quantities on proteins and their interactions
Thermodynamic data for protein-nucleic acids interactions are very useful
for understanding the principles of molecular recognition. Although
a large number of interacting systems have been structurally characterized,
the mechanism of specific molecular recognition is still poorly understood.
The integration of the structural and thermodynamic knowledge of molecular
recognition would help us to delineate the molecular mechanism of affinity
and specificity of interactions. Gene regulation is achieved by a complex
network of many transcription factors, cofactors and target genes. There
are some databases describing binary relations about molecular interactions.
However, the presence or absence of interactions critically depend on
thermodynamic quantities, namely, binding constant, protein concentration
and environmental conditions including temperature, pH and ion concentrations
in cells. Hence, thermodynamic data of interacting systems are solely
needed to comprehend quantitatively the processes involved in gene regulation.
The resulting knowledge will also lead to a wide spectrum of applications
such as the design of novel nucleic acid binding proteins, predictive
methods for the target sites, and the quantitative simulation of gene
regulation network. In this regard, we have started collecting thermodynamic
data of protein-nucleic acid interactions from published articles, and
constructed an online database, ProNIT: Thermodynamic Database for Protein-Nucleic
Acid Interactions, and opened it to public through the Internet in 2000.
Content of ProNIT
ProNIT currently contains more than 4900 data. Each entry of the database contains
the following information. Protein information: name, source, fragment
and sequence of the protein, EC, PIR, and PDB codes, information about
monomeric or oligomeric state, ProTherm number, details of mutation
with mutant residue, position, secondary structure and accessibility
at the mutant sites; Nucleic acid information: name, source and sequence
of the nucleic acid, information on mutation and sequence of mutant
nucleic acid, GenBank accession number and NDB code; Complex information:
codes for PDB and NDB, links to protein-nucleic acid complex structures,
and description about conformational changes of protein and nucleic
acid upon binding; Experimental information: temperature, pH, buffers,
ions, additives and experimental method; Binding thermodynamic data:
dissociation constant,Kd, ΔG, ΔH and ΔCp for wild and mutant entities,
stoichiometry of binding and activity (Km and kcat). Literature information:
journal name, authors, publication year, keywords and remarks.
ProNIT is implemented in 3DinSight
, an integrated relational database for structure, function and property
of biomolecules. 3DinSight has been designed to integrate PDB structures,
their non-redundant subsets, structures of protein-nucleic acid complexes,
PROSITE motif, ligand information, mutations, disease-related mutations,
amino-acid sequences and thermodynamic data into a relational database
(SYBASE). These data are connected in relational tables, and their correspondence
can be efficiently searched by flexible queries. The visualization tools
depicts the relations in 3D-space and as graph plots, e.g., motif sites,
mutation site and ligand binding sites are automatically mapped on the
structures and can be viewed by 3D viewers such as RasMol and VRML.
The relevant objects in RasMol and VRML images are hyperlinked to the
corresponding document data so that the documents can be easily viewed
by click them. The entries in ProNIT are linked to
Protein-Nucleic Acid Complex Database built within 3DinSight, where
complex structures are classified according to the recognition motif
and other characteristics, and one can examine the complex structures
and sequence-dependent conformation and flexibility of DNA molecules
in each complex. The ProNIT data are also linked to Base-Amino
Acid Interaction Database available via 3DinSight, in which specific
pairs of base-amino acid interaction can be analyzed in detail: If the
users want to examine the specific base-amino acid interactions involved
in the complex for comparing with the binding thermodynamic data, they
can search for the pairs by specifying atom, residue and distance criteria.
The specific base-amino acid pairs are automatically highlighted in
the complex and visualized by the 3D viewers. 3DinSight has several
form-based WWW interfaces with search, display and sorting options,
that allow users to retrieve relevant information according to their
purpose and convenience.
We will try to continue the data collection
and the improvement of the database. It is very laborious work to collect
data from literature. In the future, therefore, we would like to implement
a system by which we can collect experimental data directly from researchers.
Please send your opinion and suggestions to firstname.lastname@example.org,
which will help us to improve the database.