The online server DBS-Pred provides two types of predictions. First, it calculates the residue composition in a given sequence and then makes the prediction about its likelihood to bind DNA, as described in methods' section. Next step of prediction pertains to the probable locations of binding sites at user's choice of sensitivity. Training all the available data for a sufficiently long time and using different biases in the error function produced network weights for both these predictions. The use if a biased error function during neural network training enabled us to provide three levels of sensitivity in the prediction of binding sites. Online predictions, for example, may be made at a high sensitivity, which will be useful in eliminating highly unlikely binding sites. Default predictions of binding sites have been allowed to bias towards over predictions, whereas a third level of predictions is provided which is stricter in predictions. Prediction accuracy scores of binding sites in sensitive, medium and strict-level for the whole data of 62 proteins are presented in Table 1. Prediction of binding sites using this option will return only the sites with high probability to bind and may miss a significant number of sites. It may be noted that this method of online predictions will be useful when no significant homology of the binding proteins is observed with any DNA-binding protein sequence. Probabilities of DNA-binding, obtained by this predictor for the 62 sequences used in this paper have been provided in the website, which would enable a users to estimate the degree of confidence in their predictions. On the second level of prediction, amino acid composition of proteins has been used as the main feature, determining binding. Neural networks trained to separate nearly 1000 DNA-binding proteins from a control data of size nearly three times this, we could obtain a prediction of probability of DNA-binding for a given amino acid sequence. If proteins with more thatn 50% probability of binding were to be designated DNA-binding proteins, we obtained accuracy scores for this data set, presented in Table 2.


Table 1: Expected binding sites accuracy scores for DBS-Pred (DNA-binding site prediction server), obtained for the data set of 62 DNA-binding proteins.     

  Accuracy (%)
Sensitivity (%)
Specificity(%)
Sensitive
58.0
84.1
54.5
Medium
74.9 67.9 75.8
Strict
88.1
26.9
96.3


Table 2: Expected prediction results for the composition-based prediction of binding sequences in DBS-Pred. Following table shows, the results obtained on a cross-validation data set when 0.5 predicted probability was used to designate protein sequences as DNA-binding. Detailed histogram with larger available datasets at all probability cutoffs will be provided soon.
     
Accuracy (%)
 Sensitivity (%)
Specificity (%)
Net Prediction
(Sens. + Spec.)/2
64.5
68.6
63.4
66.1

Please Cite:
Analysis and Prediction of DNA-binding proteins and their binding residues based on Composition, Sequence and Structural Information,
Shandar Ahmad, M. Michael Gromiha and Akinori Sarai, Bioinformatics 20 (2004) 477-486