期刊论文

【摘要】

BackgroundUnsupervised clustering is a common and exceptionally useful tool for large biological datasets. However, clustering requires upfront algorithm and hyperparameter selection, which can introduce bias into the final clustering labels. It is therefore advisable to obtain a range of clustering results from multiple models and hyperparameters, which can be cumbersome and slow.ResultsWe present hypercluster, a python package and SnakeMake pipeline for flexible and parallelized clustering evaluation and selection. Users can efficiently evaluate a huge range of clustering results from multiple models and hyperparameters to identify an optimal model.ConclusionsHypercluster improves ease of use, robustness and reproducibility for unsupervised clustering application for high throughput biology. Hypercluster is available on pip and bioconda; installation, documentation and example workflows can be found at: https://github.com/ruggleslab/hypercluster.

【授权许可】

CC BY

【预览】

附件列表
Files	Size	Format	View
RO202104242684718ZK.pdf	1474KB	PDF	download

BMC Bioinformatics
Hypercluster: a flexible tool for parallelized unsupervised clustering optimization

Lili Blumenberg¹ Kelly V. Ruggles¹
[1] Institute of Systems Genetics, New York University Grossman School of Medicine, 10016, New York, NY, USA;Department of Medicine, New York University Grossman School of Medicine, 10016, New York, NY, USA;
关键词: Machine learning; Unsupervised clustering; Hyperparameter optimization; Scikit-learn; Python; SnakeMake;
DOI : 10.1186/s12859-020-03774-1
来源: Springer
PDF


	文献评价指标
	下载次数：9次	浏览次数：6次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】