clustpy.hierarchical package

Submodules

clustpy.hierarchical.diana module

@authors: Collin Leiber

class clustpy.hierarchical.diana.Diana(n_clusters: int = None, distance_threshold: float = 0, construct_full_tree: bool = False, metric: str = 'euclidean')[source]

Bases: BaseEstimator, ClusterMixin

The DIvisive ANAlysis (DIANA) clustering algorithm. DIANA build a top-down clustering hierarchy by considering pairwise dissimilarity of objects. It recursively splits the clusters with maximum dissimilarity, whereby the dissimilarity is based on a specified distance metric (e.g., Euclidean distance).

Parameters:

n_clusters (int) – The number of clusters. If n_clusters is None the tree will be constructed until the max diamater is below distance_threshold (default: None)
distance_threshold (float) – The distance thresholds defines the minimum diameter that is considered. Must be 0 if n_clusters is specified (default: 0)
construct_full_tree (bool) – Defines whether the full tree should be constructed after n_clusters has been reached (default: False)
metric (str) – Metric used to compute the dissimilarity. Can be “euclidean”, “l1”, “l2”, “manhattan”, “cosine”, or “precomputed” (see scipy.spatial.distance.pdist) (default: euclidean)

labels_

The final labels

Type:: np.ndarray

tree_

The resulting cluster tree

Type:: BinaryClusterTree

References

Kaufman, Rousseeuw “Divisive Analysis (Program DIANA)” Chapter six from Finding Groups in Data: An Introduction to Cluster Analysis. 1990.

fit(X: ndarray, y: ndarray = None) → Diana[source]

Initiate the actual clustering process on the input data set. The resulting cluster labels will be stored in the labels_ attribute.

Parameters:

X (np.ndarray) – the given data set
y (np.ndarray) – the labels (can be ignored)

Returns:

self – this instance of the Diana algorithm

Return type:

Diana

flat_clustering(n_leaf_nodes_to_keep: int) → ndarray[source]

Transform the predicted labels into a flat clustering result by only keeping n_leaf_nodes_to_keep leaf nodes in the tree. Returns labels as if the clustering procedure would have stopped at the specified number of nodes. Note that each leaf node corresponds to a cluster.

Parameters:: n_leaf_nodes_to_keep (int) – The number of leaf nodes to keep in the cluster tree
Returns:: labels_pruned – The new cluster labels
Return type:: np.ndarray

Module contents

class clustpy.hierarchical.Diana(n_clusters: int = None, distance_threshold: float = 0, construct_full_tree: bool = False, metric: str = 'euclidean')[source]

Bases: BaseEstimator, ClusterMixin

The DIvisive ANAlysis (DIANA) clustering algorithm. DIANA build a top-down clustering hierarchy by considering pairwise dissimilarity of objects. It recursively splits the clusters with maximum dissimilarity, whereby the dissimilarity is based on a specified distance metric (e.g., Euclidean distance).

Parameters:

n_clusters (int) – The number of clusters. If n_clusters is None the tree will be constructed until the max diamater is below distance_threshold (default: None)
distance_threshold (float) – The distance thresholds defines the minimum diameter that is considered. Must be 0 if n_clusters is specified (default: 0)
construct_full_tree (bool) – Defines whether the full tree should be constructed after n_clusters has been reached (default: False)
metric (str) – Metric used to compute the dissimilarity. Can be “euclidean”, “l1”, “l2”, “manhattan”, “cosine”, or “precomputed” (see scipy.spatial.distance.pdist) (default: euclidean)

labels_

The final labels

Type:: np.ndarray

tree_

The resulting cluster tree

Type:: BinaryClusterTree

References

Kaufman, Rousseeuw “Divisive Analysis (Program DIANA)” Chapter six from Finding Groups in Data: An Introduction to Cluster Analysis. 1990.

fit(X: ndarray, y: ndarray = None) → Diana[source]

Initiate the actual clustering process on the input data set. The resulting cluster labels will be stored in the labels_ attribute.

Parameters:

X (np.ndarray) – the given data set
y (np.ndarray) – the labels (can be ignored)

Returns:

self – this instance of the Diana algorithm

Return type:

Diana

flat_clustering(n_leaf_nodes_to_keep: int) → ndarray[source]

Transform the predicted labels into a flat clustering result by only keeping n_leaf_nodes_to_keep leaf nodes in the tree. Returns labels as if the clustering procedure would have stopped at the specified number of nodes. Note that each leaf node corresponds to a cluster.

Parameters:: n_leaf_nodes_to_keep (int) – The number of leaf nodes to keep in the cluster tree
Returns:: labels_pruned – The new cluster labels
Return type:: np.ndarray