Assessing Machine Learning Classifiers in COVID-19: The Role of Clinical, Laboratory, and Radiological Features in Predicting Oxygen Saturation
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Brieflands
Abstract
Background: Oxygen saturation is a vital parameter for evaluating the severity of COVID-19 in hospitalized patients, with levels below 90% indicating respiratory distress and a potential need for intensive care. Objectives: This study develops machine learning (ML) models that integrate computed tomography (CT)-based features with clinical and laboratory data to predict binary oxygen saturation outcomes in COVID-19 patients. Patients and Methods: We conducted a retrospective study of 1,008 COVID-19 patients admitted between October 2020 and May 2021, utilizing 70% of the data for training and 30% for testing. The classifiers used were linear support vector machine (SVM), SVM with radial basis function (RBF) kernels, logistic regression, random forests (RFs), naive Bayes, and XGBoost. Performance was assessed by validation area under the curve (AUC) and the range of AUC from 10-fold cross-validation. Features were selected through a multi-step process integrating importance ranking and stability analysis, with the top three features showing stability ≥ 0.7 chosen for model development, yielding the highest AUC among tested combinations. Results: Linear ML classifiers performed well in Clinical and Laboratory Models, while non-linear classifiers excelled in CT-Based and Integrated Models. Logistic regression in the Clinical Model achieved an AUC of 0.82, with age, gender, and fever as significant features. In the Laboratory Model, linear SVM (AUC = 0.82) identified white blood cell (WBC) count as key. Random forest in the CT-Based Model (AUC = 0.87) highlighted mean lesion volume. The Integrated Model's top classifier, SVM with RBF kernel (AUC = 0.89), found WBC and mean non-lesion lung volume (NLLV) critical. Conclusion: Linear classifiers effectively predict oxygen saturation using clinical and laboratory data, while non-linear classifiers excel with CT-based and integrated models, highlighting the need for tailored ML approaches to different data types in COVID-19 patient care.