Challenges and solutions in data collection and model evaluation in supervised machine learning: a review article
Author | Saeedeh Aliakbari | en |
Author | Payman Hejazi | en |
Author | Zeinab Hormozi-Moghaddam | en |
Orcid | Payman Hejazi [0000-0002-4121-7471] | en |
Issued Date | 2023-12-31 | en |
Abstract | Introduction: The main purpose of machine learning is a complex process that is carried out by determining the model and training it using a large volume of data. In the past, the main focus in this field was more on improving the structures of models and algorithms, but recently more emphasis has been placed on the quality and quantity of data. This article aims to provide an overview of the problems in data collection and offer a solution for them. Materials and Methods: In this study, the challenges faced by researchers in collecting data and evaluating supervised machine-learning models were examined through a review method. Documentation from PubMed, Scopus, Science Direct databases, and Google Scholar search engine from 2001 to 2023 was retrieved. After screening, a total of 17 full articles were reviewed and included in the study. Results: The findings indicate that researchers in supervised machine learning studies face four challenges in data collection, which are: insufficient number of samples, unrepresentative training data, poor data quality, and irrelevant features, and in model evaluation, they face four challenges: overfitting, lack of generalizability, lack of sufficient data for validation, and mismatched data. Conclusion: Increasing the sample size, utilizing a random selection algorithm, data cleansing, using the correct statistical test, feature selection, feature extraction, using a simpler model, the K-fold technique, and data processing are among the factors that contribute to achieving a model with better performance. | en |
DOI | https://doi.org/ | en |
Keyword | Supervised Machine Learning | en |
Keyword | Data Collection | en |
Keyword | Model Evaluation | en |
Keyword | یادگیری ماشین نظارت شده | en |
Keyword | جمعآوری داده | en |
Keyword | ارزیابی مدل | en |
Publisher | Brieflands | en |
Title | Challenges and solutions in data collection and model evaluation in supervised machine learning: a review article | en |
Type | Review Article | en |