Maqsood Hayat and Asifullah Khan Pages 411 - 421 ( 11 )
Outer membrane proteins (OMPs) play important roles in cell biology. In addition, OMPs are targeted by multiple drugs. The identification of OMPs from genomic sequences and successful prediction of their secondary and tertiary structures is a challenging task due to short membrane-spanning regions with high variation in properties. Therefore, an effective and accurate silico method for discrimination of OMPs from their primary sequences is needed. In this paper, we have analyzed the performance of various machine learning mechanisms for discriminating OMPs such as: Genetic Programming, K-nearest Neighbor, and Fuzzy K-nearest Neighbor (Fuzzy K-NN) in conjunction with discrete methods such as: Amino acid composition, Amphiphilic Pseudo amino acid composition, Split amino acid composition (SAAC), and hybrid versions of these methods. The performance of the classifiers is evaluated by two datasets using 5-fold crossvalidation. After the simulation, we have observed that Fuzzy K-NN using SAAC based-features makes it quite effective in discriminating OMPs. Fuzzy K-NN achieves the highest success rates of 99.00% accuracy for discriminating OMPs from non-OMPs and 98.77% and 98.28% accuracies from &lapha;-helix membrane and globular proteins, respectively on dataset1. While on dataset2, Fuzzy K-NN achieves 99.55%, 99.90%, and 99.81% accuracies for discriminating OMPs from non- OMPs, α-helix membrane, and globular proteins, respectively. It is observed that the classification performance of our proposed method is satisfactory and is better than the existing methods. Thus, it might be an effective tool for high throughput innovation of OMPs.
AAC, Am-PseAAC, SAAC, genetic programming, K-nearest neighbor, Fuzzy K-NN
Department of Computer and Information Sciences, Pakistan Institute of Engineering and Applied Sciences (PIEAS), P.O. 45650, Nilore, Islamabad, Pakistan.