Enhancing Predictive Analytics for Fetal Health Using Feature Selection
The primary goal of this project was to enhance the predictive analytics capabilities for fetal health by refining the input features used in machine learning models. Starting with a dataset containing 28 features, we applied dimensionality reduction techniques to reduce overfitting and excessive diagnostic costs.
Project Overview:
- Feature Reduction: Utilized Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) to identify the most essential features for fetal health prediction. Reduced the number of necessary features from 28 to 16, focusing on those explaining up to 95% of the variance in the dataset.
- Dataset Utilization: Worked with a comprehensive dataset of 2126 measurements classified by expert obstetricians, originally compiled by Ayres de Campos et al. (2000), which included various cardiotocograms.
- Model Training and Testing: After feature reduction, trained multiple machine learning models on the streamlined dataset to compare their accuracy, precision, recall, and F1 scores.
- Cost-Effective Prediction: Demonstrated that fewer diagnostic features can still yield high accuracy, reducing the need for expensive medical equipment and making fetal health assessment more accessible.
This initiative not only made fetal health predictions more efficient but also paved the way for more scalable and cost-effective prenatal care solutions.