Merge pull request #3 from liliblu/dependabot/pip/docs/numpy-1.22.0 Bump numpy from 1.16.4 to 1.22.0 in /docs
Merge pull request #3 from liliblu/dependabot/pip/docs/numpy-1.22.0
Bump numpy from 1.16.4 to 1.22.0 in /docs
A package for clustering optimization with sklearn.
pandasnumpyscipymatplotlibseabornscikit-learnhdbscan
Optional: snakemake
With pip:
pip install hypercluster
or with conda:
conda install hypercluster # or conda install -c conda-forge -c bioconda hypercluster
If you are having problems installing with conda, try changing your channel priority. Priority of conda-forge > bioconda > defaults is recommended. To check channel priority: conda config --get channels It should look like:
conda config --get channels
--add channels 'defaults' # lowest priority --add channels 'bioconda' --add channels 'conda-forge' # highest priority
If it doesn’t look like that, try:
conda config --add channels bioconda conda config --add channels conda-forge
https://hypercluster.readthedocs.io/en/latest/index.html
It will also be useful to check out sklearn’s page on clustering and evaluation metrics
https://github.com/liliblu/hypercluster/tree/dev/examples
Default config.yml and hypercluster.smk are in the snakemake repo above.Edit the config.yml file or arguments.
config.yml
hypercluster.smk
snakemake -s hypercluster.smk --configfile config.yml --config input_data_files=test_data input_data_folder=.
Example editing with python:
import yaml with open('config.yml', 'r') as fh: config = yaml.load(fh) input_data_prefix = 'test_data' config['input_data_folder'] = os.path.abspath('.') config['input_data_files'] = [input_data_prefix] config['read_csv_kwargs'] = {input_data_prefix:{'index_col': [0]}} with open('config.yml', 'w') as fh: yaml.dump(config, stream=fh)
Then call snakemake.
snakemake -s hypercluster.smk
Or submit the snakemake scheduler as an sbatch job e.g. with BigPurple Slurm:
module add slurm sbatch snakemake_submit.sh
Examples for snakemake_submit.sh and cluster.json is in the scRNA-seq example.
snakemake_submit.sh
cluster.json
import pandas as pd from sklearn.datasets import make_blobs import hypercluster data, labels = make_blobs() data = pd.DataFrame(data) labels = pd.Series(labels, index=data.index, name='labels') # With a single clustering algorithm clusterer = hypercluster.AutoClusterer() clusterer.fit(data).evaluate( methods = hypercluster.constants.need_ground_truth+hypercluster.constants.inherent_metrics, gold_standard = labels ) clusterer.visualize_evaluations() # With a range of algorithms clusterer = hypercluster.MultiAutoClusterer() clusterer.fit(data).evaluate( methods = hypercluster.constants.need_ground_truth+hypercluster.constants.inherent_metrics, gold_standard = labels ) clusterer.visualize_evaluations()
提供多种聚类方法封装、比较或集成分析。
版权所有:中国计算机学会技术支持:开源发展技术委员会 京ICP备13000930号-9 京公网安备 11010802032778号
Hypercluster
A package for clustering optimization with sklearn.
Requirements:
pandas
numpy
scipy
matplotlib
seaborn
scikit-learn
hdbscan
Optional: snakemake
Install
With pip:
or with conda:
If you are having problems installing with conda, try changing your channel priority. Priority of conda-forge > bioconda > defaults is recommended. To check channel priority:
conda config --get channelsIt should look like:If it doesn’t look like that, try:
Docs
https://hypercluster.readthedocs.io/en/latest/index.html
It will also be useful to check out sklearn’s page on clustering and evaluation metrics
Examples
https://github.com/liliblu/hypercluster/tree/dev/examples
Quickstart with SnakeMake
Default
config.ymlandhypercluster.smkare in the snakemake repo above.Edit the
config.ymlfile or arguments.Example editing with python:
Then call snakemake.
Or submit the snakemake scheduler as an sbatch job e.g. with BigPurple Slurm:
Examples for
snakemake_submit.shandcluster.jsonis in the scRNA-seq example.Quickstart with python