Hui-Ling Huang, Fang-Lin Chang, Shinn-Jang Ho, Li-Sun Shu, Wen-Lin Huang and Shinn-Ying Ho Pages 299 - 308 ( 10 )
Numerous prediction methods of DNA-binding domains/proteins were proposed by identifying informative features and designing effective classifiers. These researches reveal that the DNA-protein binding mechanism is complicated and existing accurate predictors such as support vector machine (SVM) with position specific scoring matrices (PSSMs) are regarded as black-box methods which are not easily interpretable for biologists. In this study, we propose an ensemble fuzzy rule base classifier consisting of a set of interpretable fuzzy rule classifiers (iFRCs) using informative physicochemical properties as features. In designing iFRCs, feature selection, membership function design, and fuzzy rule base generation are all simultaneously optimized using an intelligent genetic algorithm (IGA). IGA maximizes prediction accuracy, minimizes the number of features selected, and minimizes the number of fuzzy rules to generate an accurate and concise fuzzy rule base. Benchmark datasets of DNA-binding domains are used to evaluate the proposed ensemble classifier of 30 iFRCs. Each iFRC has a mean test accuracy of 77.46%, and the ensemble classifier has a test accuracy of 83.33%, where the method of SVM with PSSMs has the accuracy of 82.81%. The physicochemical properties of the first two ranks according to their contribution are positive charge and Van Der Waals volume. Charge complementarity between protein and DNA is thought to be important in the first step of recognition between protein and DNA. The amino acid residues of binding peptides have larger Van Der Waals volumes and positive charges than those of non-binding ones. The proposed knowledge acquisition method by establishing a fuzzy rule-based classifier can also be applicable to predict and analyze other protein functions from sequences.
DNA-binding domains, feature selection, fuzzy rules, genetic algorithm, knowledge acquisition, physicochemical properties, support vector machine, intelligent genetic algorithm (IGA), Benchmark dataset, Charge complementarity
Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan.