clustpy.hierarchical package
Submodules
clustpy.hierarchical.diana module
@authors: Collin Leiber
- class clustpy.hierarchical.diana.Diana(n_clusters: int = None, distance_threshold: float = 0, construct_full_tree: bool = False, metric: str = 'euclidean')[source]
Bases:
BaseEstimator,ClusterMixinThe DIvisive ANAlysis (DIANA) clustering algorithm. DIANA build a top-down clustering hierarchy by considering pairwise dissimilarity of objects. It recursively splits the clusters with maximum dissimilarity, whereby the dissimilarity is based on a specified distance metric (e.g., Euclidean distance).
- Parameters:
n_clusters (int) – The number of clusters. If n_clusters is None the tree will be constructed until the max diamater is below distance_threshold (default: None)
distance_threshold (float) – The distance thresholds defines the minimum diameter that is considered. Must be 0 if n_clusters is specified (default: 0)
construct_full_tree (bool) – Defines whether the full tree should be constructed after n_clusters has been reached (default: False)
metric (str) – Metric used to compute the dissimilarity. Can be “euclidean”, “l1”, “l2”, “manhattan”, “cosine”, or “precomputed” (see scipy.spatial.distance.pdist) (default: euclidean)
- labels_
The final labels
- Type:
np.ndarray
- tree_
The resulting cluster tree
- Type:
BinaryClusterTree
References
Kaufman, Rousseeuw “Divisive Analysis (Program DIANA)” Chapter six from Finding Groups in Data: An Introduction to Cluster Analysis. 1990.
- fit(X: ndarray, y: ndarray = None) Diana[source]
Initiate the actual clustering process on the input data set. The resulting cluster labels will be stored in the labels_ attribute.
- Parameters:
X (np.ndarray) – the given data set
y (np.ndarray) – the labels (can be ignored)
- Returns:
self – this instance of the Diana algorithm
- Return type:
- flat_clustering(n_leaf_nodes_to_keep: int) ndarray[source]
Transform the predicted labels into a flat clustering result by only keeping n_leaf_nodes_to_keep leaf nodes in the tree. Returns labels as if the clustering procedure would have stopped at the specified number of nodes. Note that each leaf node corresponds to a cluster.
- Parameters:
n_leaf_nodes_to_keep (int) – The number of leaf nodes to keep in the cluster tree
- Returns:
labels_pruned – The new cluster labels
- Return type:
np.ndarray
Module contents
- class clustpy.hierarchical.Diana(n_clusters: int = None, distance_threshold: float = 0, construct_full_tree: bool = False, metric: str = 'euclidean')[source]
Bases:
BaseEstimator,ClusterMixinThe DIvisive ANAlysis (DIANA) clustering algorithm. DIANA build a top-down clustering hierarchy by considering pairwise dissimilarity of objects. It recursively splits the clusters with maximum dissimilarity, whereby the dissimilarity is based on a specified distance metric (e.g., Euclidean distance).
- Parameters:
n_clusters (int) – The number of clusters. If n_clusters is None the tree will be constructed until the max diamater is below distance_threshold (default: None)
distance_threshold (float) – The distance thresholds defines the minimum diameter that is considered. Must be 0 if n_clusters is specified (default: 0)
construct_full_tree (bool) – Defines whether the full tree should be constructed after n_clusters has been reached (default: False)
metric (str) – Metric used to compute the dissimilarity. Can be “euclidean”, “l1”, “l2”, “manhattan”, “cosine”, or “precomputed” (see scipy.spatial.distance.pdist) (default: euclidean)
- labels_
The final labels
- Type:
np.ndarray
- tree_
The resulting cluster tree
- Type:
BinaryClusterTree
References
Kaufman, Rousseeuw “Divisive Analysis (Program DIANA)” Chapter six from Finding Groups in Data: An Introduction to Cluster Analysis. 1990.
- fit(X: ndarray, y: ndarray = None) Diana[source]
Initiate the actual clustering process on the input data set. The resulting cluster labels will be stored in the labels_ attribute.
- Parameters:
X (np.ndarray) – the given data set
y (np.ndarray) – the labels (can be ignored)
- Returns:
self – this instance of the Diana algorithm
- Return type:
- flat_clustering(n_leaf_nodes_to_keep: int) ndarray[source]
Transform the predicted labels into a flat clustering result by only keeping n_leaf_nodes_to_keep leaf nodes in the tree. Returns labels as if the clustering procedure would have stopped at the specified number of nodes. Note that each leaf node corresponds to a cluster.
- Parameters:
n_leaf_nodes_to_keep (int) – The number of leaf nodes to keep in the cluster tree
- Returns:
labels_pruned – The new cluster labels
- Return type:
np.ndarray