期刊论文详细信息
PeerJ
Probabilistically sampled and spectrally clustered plant species using phenotypic characteristics
article
Aditya A. Shastri1  Kapil Ahuja1  Milind B. Ratnaparkhe2  Yann Busnel3 
[1] Math of Data Science & Simulation ,(MODSS) Lab, Indian Institute of Technology Indore;ICAR-Indian Institute of Soybean Research;Network Systems, Cybersecurity and Digital Law Department, Institut Mines-Telecom Atlantique
关键词: Spectral clustering;    Hierarchical clustering;    Similarity measures;    Probabilistic sampling;    Pivotal sampling;    Vector quantization;    Phenotypic data;   
DOI  :  10.7717/peerj.11927
学科分类:社会科学、人文和艺术(综合)
来源: Inra
PDF
【 摘 要 】

Phenotypic characteristics of a plant species refers to its physical properties as cataloged by plant biologists at different research centers around the world. Clustering species based upon their phenotypic characteristics is used to obtain diverse sets of parents that are useful in their breeding programs. The Hierarchical Clustering (HC) algorithm is the current standard in clustering of phenotypic data. This algorithm suffers from low accuracy and high computational complexity issues. To address the accuracy challenge, we propose the use of Spectral Clustering (SC) algorithm. To make the algorithm computationally cheap, we propose using sampling, specifically, Pivotal Sampling that is probability based. Since application of samplings to phenotypic data has not been explored much, for effective comparison, another sampling technique called Vector Quantization (VQ) is adapted for this data as well. VQ has recently generated promising results for genotypic data. The novelty of our SC with Pivotal Sampling algorithm is in constructing the crucial similarity matrix for the clustering algorithm and defining probabilities for the sampling technique. Although our algorithm can be applied to any plant species, we tested it on the phenotypic data obtained from about 2,400 Soybean species. SC with Pivotal Sampling achieves substantially more accuracy (in terms of Silhouette Values) than all the other proposed competitive clustering with sampling algorithms (i.e. SC with VQ, HC with Pivotal Sampling, and HC with VQ). The complexities of our SC with Pivotal Sampling algorithm and these three variants are almost the same because of the involved sampling. In addition to this, SC with Pivotal Sampling outperforms the standard HC algorithm in both accuracy and computational complexity. We experimentally show that we are up to 45% more accurate than HC in terms of clustering accuracy. The computational complexity of our algorithm is more than a magnitude less than that of HC.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO202307100005342ZK.pdf 1646KB PDF download
  文献评价指标  
  下载次数:16次 浏览次数:2次