Submit Manuscript  

Article Details


Prediction of Protein-protein Interactions Based on Feature Selection and Data Balancing

[ Vol. 20 , Issue. 3 ]

Author(s):

Liang Liu, Wen-Cong Lu, Yu-Dong Cai, Kai-Yan Feng, Chunrong Peng and Yubei Zhu   Pages 336 - 345 ( 10 )

Abstract:


Computational approaches are able to analyze protein-protein interactions (PPIs) from a different angle of view by complementing the experimental ones. And they are very efficient in determining whether two proteins can interact with each other. In this paper, KNNs (K-nearest neighbors) is applied to predict the PPIs by coding each protein with the physical and chemical properties of its residues, predicted secondary structures and amino acid compositions. mRMR (minimum-redundancy maximum-relevance) feature selection is adopted to select a compact feature set, features of which are considered to be important for the determination of PPI-nesses. Because the size of the negative dataset (containing non-interactive protein pairs) is much larger than that of the positive dataset (containing interactive protein pairs), the negative dataset is divided into 5 portions and each portion is combined with the positive dataset for one prediction. Thus 5 predictions are performed and the final results are obtained through voting. As a result, the prediction achieves an overall accuracy of 0.8369 with sensitivity of 0.7356. The predictor, developed by this research for the prediction of the fruit fly PPI-nesses, is available for public use at http://chemdata.shu.edu.cn/ppip.

Keywords:

Bioinformatics, feature selection, KNNs, protein-protein interactions, unbalanced data, mRMR (minimum-redundancy maximum-relevance), PPI-nesses, negative dataset

Affiliation:

Department of Chemistry, College of Sciences, Shanghai University, 99 Shang-Da Road, Shanghai, 200444, People’s Republic of China.



Read Full-Text article