TY - JOUR
T1 - Identification of significant risks in pediatric acute lymphoblastic leukemia (ALL) through machine learning (ML) approach
AU - Mahmood, Nasir
AU - Shahid, Saman
AU - Bakhshi, Taimur
AU - Riaz, Sehar
AU - Ghufran, Hafiz
AU - Yaqoob, Muhammad
PY - 2020/11/1
Y1 - 2020/11/1
N2 - Pediatric acute lymphoblastic leukemia (ALL) through machine learning (ML) technique was analyzed to determine the significance of clinical and phenotypic variables as well as environmental conditions that can identify the underlying causes of child ALL. Fifty pediatric patients (n = 50) included who were diagnosed with acute lymphoblastic leukemia (ALL) according to the inclusion and exclusion criteria. Clinical variables comprised of the blood biochemistry (CBC, LFTs, RFTs) results, and distribution of type of ALL, i.e., T ALL or B ALL. Phenotypic data included the age, sex of the child, and consanguinity, while environmental factors included the habitat, socioeconomic status, and access to filtered drinking water. Fifteen different features/attributes were collected for each case individually. To retrieve most useful discriminating attributes, four different supervised ML algorithms were used including classification and regression trees (CART), random forest (RM), gradient boosted machine (GM), and C5.0 decision tree algorithm. To determine the accuracy of the derived CART algorithm on future data, a ten-fold cross validation was performed on the present data set. The ALL was common in children of age below 5 years in male patients whole belonged to middle class family of rural areas. (B-ALL) was most frequent as compared with T-ALL. The consanguinity was present in 54% of cases. Low levels of platelets and hemoglobin and high levels of white blood cells were reported in child ALL patients. CART provided the best and complete fit for the entire data set yielding a 99.83% model fit accuracy, and a misclassification of 0.17% on the entire sample space, while C5.0 reported 98.6%, random forest 94.44%, and gradient boosted machine resulted in 95.61% fitting. The variable importance of each primary discriminating attribute is platelet 43%, hemoglobin 24%, white blood cells 4%, and sex of the child 4%. An overall accuracy of 87.4% was recorded for the classifier. Platelet count abnormality can be considered as a major factor in predicting pediatric ALL. The machine learning algorithms can be applied efficiently to provide details for the prognosis for better treatment outcome. [Figure not available: see fulltext.]
AB - Pediatric acute lymphoblastic leukemia (ALL) through machine learning (ML) technique was analyzed to determine the significance of clinical and phenotypic variables as well as environmental conditions that can identify the underlying causes of child ALL. Fifty pediatric patients (n = 50) included who were diagnosed with acute lymphoblastic leukemia (ALL) according to the inclusion and exclusion criteria. Clinical variables comprised of the blood biochemistry (CBC, LFTs, RFTs) results, and distribution of type of ALL, i.e., T ALL or B ALL. Phenotypic data included the age, sex of the child, and consanguinity, while environmental factors included the habitat, socioeconomic status, and access to filtered drinking water. Fifteen different features/attributes were collected for each case individually. To retrieve most useful discriminating attributes, four different supervised ML algorithms were used including classification and regression trees (CART), random forest (RM), gradient boosted machine (GM), and C5.0 decision tree algorithm. To determine the accuracy of the derived CART algorithm on future data, a ten-fold cross validation was performed on the present data set. The ALL was common in children of age below 5 years in male patients whole belonged to middle class family of rural areas. (B-ALL) was most frequent as compared with T-ALL. The consanguinity was present in 54% of cases. Low levels of platelets and hemoglobin and high levels of white blood cells were reported in child ALL patients. CART provided the best and complete fit for the entire data set yielding a 99.83% model fit accuracy, and a misclassification of 0.17% on the entire sample space, while C5.0 reported 98.6%, random forest 94.44%, and gradient boosted machine resulted in 95.61% fitting. The variable importance of each primary discriminating attribute is platelet 43%, hemoglobin 24%, white blood cells 4%, and sex of the child 4%. An overall accuracy of 87.4% was recorded for the classifier. Platelet count abnormality can be considered as a major factor in predicting pediatric ALL. The machine learning algorithms can be applied efficiently to provide details for the prognosis for better treatment outcome. [Figure not available: see fulltext.]
KW - Classification and regression trees (CART)
KW - Environmental factors
KW - Hemoglobin
KW - Machine learning (ML)
KW - Pediatric ALL
KW - Platelets
KW - alanine aminotransferase
KW - alkaline phosphatase
KW - aspartate aminotransferase
KW - creatinine
KW - drinking water
KW - hemoglobin
KW - uric acid
KW - acute lymphoblastic leukemia
KW - adult
KW - alanine aminotransferase blood level
KW - alkaline phosphatase blood level
KW - article
KW - aspartate aminotransferase blood level
KW - blood biochemistry
KW - cancer diagnosis
KW - cancer prognosis
KW - cancer risk
KW - childhood leukemia
KW - classification and regression trees
KW - classifier
KW - clinical article
KW - consanguinity
KW - controlled study
KW - creatinine blood level
KW - cross validation
KW - decision tree
KW - environmental exposure
KW - environmental factor
KW - female
KW - gradient boosted machine
KW - habitat
KW - human
KW - leukocyte count
KW - machine learning
KW - male
KW - outcome assessment
KW - platelet count
KW - priority journal
KW - random forest
KW - rural area
KW - social status
KW - uric acid blood level
U2 - 10.1007/s11517-020-02245-2
DO - 10.1007/s11517-020-02245-2
M3 - Article
VL - 58
SP - 2631
EP - 2640
JO - Medical and Biological Engineering and Computing
JF - Medical and Biological Engineering and Computing
SN - 0140-0118
IS - 11
ER -