Bing Niu, Xiao-Cheng Yuan, Preston Roeper, Qiang Su, Chun-Rong Peng, Jing-Yuan Yin, Juan Ding, HaiPeng Li and Wen-Cong Lu Pages 290 - 298 ( 9 )
Knowledge of the mechanism of HIV protease cleavage specificity is critical to the design of specific and effective HIV inhibitors. Searching for an accurate, robust, and rapid method to correctly predict the cleavage sites in proteins is crucial when searching for possible HIV inhibitors. In this article, HIV-1 protease specificity was studied using the correlation-based feature subset (CfsSubset) selection method combined with Genetic Algorithms method. Thirty important biochemical features were found based on a jackknife test from the original data set containing 4,248 features. By using the AdaBoost method with the thirty selected features the prediction model yields an accuracy of 96.7% for the jackknife test and 92.1% for an independent set test, with increased accuracy over the original dataset by 6.7% and 77.4%, respectively. Our feature selection scheme could be a useful technique for finding effective competitive inhibitors of HIV protease.
Correlation-based feature subset (CfsSubset), genetic algorithm (GA), adaboost, feature selection, HIV protease, chou’s distorted key theory, HIV inhibitor, HIV-1 protease specificity, Genetic Algorithms method, jackknife test
College of Life Science, Shanghai University, 333 Nan-Cheng Road, Shanghai, 200444, People's republic of China.