clustpy.hierarchical package

Submodules

clustpy.hierarchical.diana module

@authors: Collin Leiber

class clustpy.hierarchical.diana.Diana(n_clusters: int | None = None, distance_threshold: float = 0, construct_full_tree: bool = False, metric: str = 'euclidean')[source]

Bases: BaseEstimator, ClusterMixin

The DIvisive ANAlysis (DIANA) clustering algorithm. DIANA build a top-down clustering hierarchy by considering pairwise dissimilarity of objects. It recursively splits the clusters with maximum dissimilarity, whereby the dissimilarity is based on a specified distance metric (e.g., Euclidean distance).

Parameters:

n_clusters (int) – The number of clusters. If n_clusters is None the tree will be constructed until the max diamater is below distance_threshold (default: None)
distance_threshold (float) – The distance thresholds defines the minimum diameter that is considered. Must be 0 if n_clusters is specified (default: 0)
construct_full_tree (bool) – Defines whether the full tree should be constructed after n_clusters has been reached (default: False)
metric (str) – Metric used to compute the dissimilarity. Can be “euclidean”, “l1”, “l2”, “manhattan”, “cosine”, or “precomputed” (see scipy.spatial.distance.pdist) (default: euclidean)

labels_

The final labels

Type:: np.ndarray

tree_

The resulting cluster tree

Type:: list

References

Kaufman, Rousseeuw “Divisive Analysis (Program DIANA)” Chapter six from Finding Groups in Data: An Introduction to Cluster Analysis. 1990.

fit(X: ndarray, y: ndarray | None = None) → Diana[source]

Initiate the actual clustering process on the input data set. The resulting cluster labels will be stored in the labels_ attribute.

Parameters:

X (np.ndarray) – the given data set
y (np.ndarray) – the labels (can be ignored)

Returns:

self – this instance of the Diana algorithm

Return type:

Diana

prune_tree(level)[source]

Prune the tree at a specified cluster hierarchy level. Returns labels as if the clustering procedure would have stopped at the specified level. The resulting number of clusters will be level + 1.

Parameters:: level (int) – The level at which the tree should be pruned. Must be larger than 0
Returns:: labels_pruned – The pruned cluster labels
Return type:: np.ndarray

Module contents

class clustpy.hierarchical.Diana(n_clusters: int | None = None, distance_threshold: float = 0, construct_full_tree: bool = False, metric: str = 'euclidean')[source]

Bases: BaseEstimator, ClusterMixin

The DIvisive ANAlysis (DIANA) clustering algorithm. DIANA build a top-down clustering hierarchy by considering pairwise dissimilarity of objects. It recursively splits the clusters with maximum dissimilarity, whereby the dissimilarity is based on a specified distance metric (e.g., Euclidean distance).

Parameters:

n_clusters (int) – The number of clusters. If n_clusters is None the tree will be constructed until the max diamater is below distance_threshold (default: None)
distance_threshold (float) – The distance thresholds defines the minimum diameter that is considered. Must be 0 if n_clusters is specified (default: 0)
construct_full_tree (bool) – Defines whether the full tree should be constructed after n_clusters has been reached (default: False)
metric (str) – Metric used to compute the dissimilarity. Can be “euclidean”, “l1”, “l2”, “manhattan”, “cosine”, or “precomputed” (see scipy.spatial.distance.pdist) (default: euclidean)

labels_

The final labels

Type:: np.ndarray

tree_

The resulting cluster tree

Type:: list

References

Kaufman, Rousseeuw “Divisive Analysis (Program DIANA)” Chapter six from Finding Groups in Data: An Introduction to Cluster Analysis. 1990.

fit(X: ndarray, y: ndarray | None = None) → Diana[source]

Initiate the actual clustering process on the input data set. The resulting cluster labels will be stored in the labels_ attribute.

Parameters:

X (np.ndarray) – the given data set
y (np.ndarray) – the labels (can be ignored)

Returns:

self – this instance of the Diana algorithm

Return type:

Diana

prune_tree(level)[source]

Prune the tree at a specified cluster hierarchy level. Returns labels as if the clustering procedure would have stopped at the specified level. The resulting number of clusters will be level + 1.

Parameters:: level (int) – The level at which the tree should be pruned. Must be larger than 0
Returns:: labels_pruned – The pruned cluster labels
Return type:: np.ndarray