Abstract
Tuberculosis (TB) remains a major global health challenge driven by persistent Mycobacterium tuberculosis infection and increasing drug resistance. Phytochemicals represent a structurally diverse and underexplored chemical space for anti-TB drug discovery, yet systematic prioritization strategies integrating machine learning and structure-based validation are limited. A curated phenotypic anti-TB dataset of 425,180 compounds was used to train ensemble ExtraTrees models based on ECFP4 fingerprints and physicochemical descriptors. The models achieved strong predictive performance (ROC-AUC up to 0.983; MCC up to 0.871). SHAP analysis enabled mechanistic interpretation by identifying the key molecular descriptors and fingerprint features driving anti-TB activity predictions. The validated ensemble was applied to screen 4707 phytochemicals, yielding 3209 predicted actives, of which 778 satisfied applicability domain criteria. High-confidence candidates were subsequently evaluated by molecular docking against twelve structurally validated essential M. tuberculosis targets spanning cell wall biosynthesis, energy metabolism, nucleotide synthesis, and cofactor pathways. Docking analysis identified 486 phytochemicals with favorable predicted binding affinities, including 193 compounds exhibiting multi-target engagement. Several top-ranked candidates reproduced canonical interaction patterns of co-crystallized inhibitors, supporting mechanistic plausibility. This integrated chemoinformatics and structure-based framework enables robust prioritization of phytochemicals with biologically meaningful and multi-target antitubercular potential. The study provides a computationally grounded strategy for accelerating lead identification against drug-resistant TB.
Citation
ID:
7906
Ref Key:
sajal2026ensemble