期刊论文详细信息
Journal of Big Data
Using Big Data-machine learning models for diabetes prediction and flight delays analytics
Thérence Nibareke1  Jalal Laassiri1 
[1] Informatics Systems and Optimization Laboratory, Ibn Tofail University, Kenitra, Morocco;
关键词: Big Data;    Hadoop;    Spark;    HBase;    Machine learning;    Data analytics;    Accuracy;    K-Nearest Neighbor;    K means;   
DOI  :  10.1186/s40537-020-00355-0
来源: Springer
PDF
【 摘 要 】

IntroductionNowadays large data volumes are daily generated at a high rate. Data from health system, social network, financial, government, marketing, bank transactions as well as the censors and smart devices are increasing. The tools and models have to be optimized. In this paper we applied and compared Machine Learning algorithms (Linear Regression, Naïve bayes, Decision Tree) to predict diabetes. Further more, we performed analytics on flight delays. The main contribution of this paper is to give an overview of Big Data tools and machine learning models. We highlight some metrics that allow us to choose a more accurate model. We predict diabetes disease using three machine learning models and then compared their performance. Further more we analyzed flight delay and produced a dashboard which can help managers of flight companies to have a 360° view of their flights and take strategic decisions.Case descriptionWe applied three Machine Learning algorithms for predicting diabetes and we compared the performance to see what model give the best results. We performed analytics on flights datasets to help decision making and predict flight delays.Discussion and evaluationThe experiment shows that the Linear Regression, Naive Bayesian and Decision Tree give the same accuracy (0.766) but Decision Tree outperforms the two other models with the greatest score (1) and the smallest error (0). For the flight delays analytics, the model could show for example the airport that recorded the most flight delays.ConclusionsSeveral tools and machine learning models to deal with big data analytics have been discussed in this paper. We concluded that for the same datasets, we have to carefully choose the model to use in prediction. In our future works, we will test different models in other fields (climate, banking, insurance.).

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202104243183225ZK.pdf 1472KB PDF download
  文献评价指标  
  下载次数:14次 浏览次数:9次