In recent years, large scale computer systems have emerged to meet the demands of high storage, supercomputing, and applications using very large data sets. The emergence of Cloud Computing offers the potentiel for analysis and processing of large data sets.
Mapreduce is the most popular programming model which is used to support the development of such applications. It was initially designed by Google for building large datacenters on a large scale, to provide Web search services with rapid response and high availability.
In this paper we will test the clustering algorithm K-means Clustering in a Cloud Computing. This algorithm is implemented on MapReduce. It has been chosen for its characteristics that are representative of many iterative data analysis algorithms. Then, we modify the framework CloudSim to simulate the MapReduce execution of K-means Clustering on different Cloud Computing, depending on their size and characteristics of target platforms.
The experiment show that the implementation of K-means Clustering gives good results especially for large data set and the Cloud infrastructure has an influence on these results.