This paper used an amino acid location-based sequence encoding as a feature extraction techniques to identify single chains antibody molecules that bind to B-lymphocyte stimulator (BLyS) antigen. The data were manually derived from the European patent (EP2275449B1) text. The dataset was cleaned and made suitable for the machine learning models. The accuracy, precision and recall achieved across individual descriptors (Membrane and Soluble) for Logistic regression, KNN, KSVM, and Random Forest Tree was above 80%. However, it was much lower for the Naïve Bayes except for the precision score. The promising accuracy value achieved from such a minimal dataset has significant implications for the drug discovery process – this includes considerable savings in time and resources.
Original languageEnglish
Title of host publicationThe 10th International Conference on Information Communication and Management (ICICM 2020)
Place of PublicationParis, France
Number of pages5
ISBN (Electronic)978-1-4503-8770-5
Publication statusAccepted/In press - 28 Jun 2020

    Research areas

  • machine learning, Antigen, Antibody, Amino Acid Sequence, Infectious disease

ID: 4176312