|The online server DBS-Pred provides two types of predictions.
First, it calculates the residue composition in a given sequence and
then makes the prediction about its likelihood to bind DNA, as described
in methods' section. Next step of prediction pertains to the probable
locations of binding sites at user's choice of sensitivity. Training
all the available data for a sufficiently long time and using different
biases in the error function produced network weights for both these
predictions. The use if a biased error function during neural network
training enabled us to provide three levels of sensitivity in the prediction
of binding sites. Online predictions, for example, may be made at a high
sensitivity, which will be useful in eliminating highly unlikely binding
sites. Default predictions of binding sites have been allowed to bias
towards over predictions, whereas a third level of predictions is provided
which is stricter in predictions. Prediction accuracy scores of binding
sites in sensitive, medium and strict-level for the whole data of
62 proteins are presented in Table 1. Prediction of binding sites using
this option will return only the sites with high probability to bind and
may miss a significant number of sites. It may be noted that this method
of online predictions will be useful when no significant homology of the
binding proteins is observed with any DNA-binding protein sequence.
Probabilities of DNA-binding, obtained by this predictor
for the 62 sequences used in this paper have been provided in the
website, which would enable a users to estimate the degree of
confidence in their predictions. On the second
level of prediction, amino acid composition of proteins has been used
as the main feature, determining binding. Neural networks trained to
separate nearly 1000 DNA-binding proteins from a control data of size
nearly three times this, we could obtain a prediction of probability
of DNA-binding for a given amino acid sequence. If proteins with more
thatn 50% probability of binding were to be designated DNA-binding proteins,
we obtained accuracy scores for this data set, presented
in Table 2.
|Table 1: Expected binding sites accuracy scores for
DBS-Pred (DNA-binding site prediction server), obtained for the data
set of 62 DNA-binding proteins.
Table 2: Expected prediction results for the composition-based prediction of binding sequences in DBS-Pred. Following table shows, the results obtained on a cross-validation data set when 0.5 predicted probability was used to designate protein sequences as DNA-binding. Detailed histogram with larger available datasets at all probability cutoffs will be provided soon.
Analysis and Prediction of DNA-binding proteins and their binding residues based on Composition, Sequence and Structural Information,
Shandar Ahmad, M. Michael Gromiha and Akinori Sarai, Bioinformatics 20 (2004) 477-486