PathogPDx

Development and validation of a machine learning-based diagnostic system for 22 pediatric respiratory pathogens

Respiratory tract infections (RTIs) are a significant cause of morbidity and mortality in children, caused by a wide range of pathogens. Despite presenting with similar clinical symptoms, optimal treatment strategies vary depending on the specific causative pathogen. This study aimed to develop and validate an interpretable machine learning (ML) model to enable early detection of diverse respiratory pathogens in pediatric patients.

We conducted a large-scale, multicenter study involving 134,500 hospitalized children across three clinical centers and two large databases, with data collected from January 2015 to December 2024. The innovative Pathogen Diagnostic System for Pediatric Respiratory Infections (Pathog-PDx) integrates multiple algorithms to enable early identification of respiratory pathogens in pediatric patients.

The system uses 40 routinely available clinical and laboratory features from electronic health records (EHR), enabling more precise investigation and individualized clinical management. Predictive performance was assessed using metrics such as the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity.

Additionally, prospective validation was carried out on an independent cohort of 1,338 pediatric patients, enrolled between January and December 2024, to assess the real-world applicability of the model. The model's architecture employs ensemble learning strategies, combining the predictive strengths of multiple base models through a voting mechanism. This approach enables more robust predictions and minimizes errors that might occur in single algorithm implementations.

Model interpretability was evaluated using SHapley Additive exPlanations (SHAP), which provides insight into the contribution of each feature to the final prediction. This approach not only enhances the model's predictive capabilities but also makes its decision-making process transparent to clinicians, fostering trust and adoption in clinical settings.

Feature importance analysis revealed that laboratory values including white blood cell count, neutrophil percentage, and C-reactive protein levels played crucial roles in distinguishing between viral and bacterial infections.

We developed Pathog-PDx to accurately distinguish 22 pathogen subtypes. Compared to conventional models, Pathog-PDx demonstrated enhanced performance in identifying both single and mixed infection. Across internal and independent external validation cohorts, the model achieved mean AUCs of 0.875 for viral, 0.860 for bacterial, and 0.903 for fungal infections. For instance, the influenza virus demonstrated extremely high classification performance (AUC = 0.946; Sn: 0.880; Sp: 0.860). Prospective validation showed promising performance of the model, thus substantiating its clinical applicability.

The implementation of this diagnostic system could significantly reduce diagnostic delays, improve antimicrobial stewardship, and ultimately enhance patient outcomes through targeted therapies. The model's ability to distinguish between viral, bacterial, and fungal pathogens with high accuracy provides clinicians with valuable information for making treatment decisions, particularly regarding antibiotic prescription.

To facilitate clinical implementation, the model has been deployed as a web-based decision support system with capabilities for real-time data analysis, which is freely accessible at: https://pathogpdx.zzu.edu.cn