clustpy.hierarchical package
Submodules
clustpy.hierarchical.diana module
@authors: Collin Leiber
- class clustpy.hierarchical.diana.Diana(n_clusters: int | None = None, distance_threshold: float = 0, construct_full_tree: bool = False, metric: str = 'euclidean')[source]
Bases:
BaseEstimator
,ClusterMixin
The DIvisive ANAlysis (DIANA) clustering algorithm. DIANA build a top-down clustering hierarchy by considering pairwise dissimilarity of objects. It recursively splits the clusters with maximum dissimilarity, whereby the dissimilarity is based on a specified distance metric (e.g., Euclidean distance).
- Parameters:
n_clusters (int) – The number of clusters. If n_clusters is None the tree will be constructed until the max diamater is below distance_threshold (default: None)
distance_threshold (float) – The distance thresholds defines the minimum diameter that is considered. Must be 0 if n_clusters is specified (default: 0)
construct_full_tree (bool) – Defines whether the full tree should be constructed after n_clusters has been reached (default: False)
metric (str) – Metric used to compute the dissimilarity. Can be “euclidean”, “l1”, “l2”, “manhattan”, “cosine”, or “precomputed” (see scipy.spatial.distance.pdist) (default: euclidean)
- labels_
The final labels
- Type:
np.ndarray
- tree_
The resulting cluster tree
- Type:
list
References
Kaufman, Rousseeuw “Divisive Analysis (Program DIANA)” Chapter six from Finding Groups in Data: An Introduction to Cluster Analysis. 1990.
- fit(X: ndarray, y: ndarray | None = None) Diana [source]
Initiate the actual clustering process on the input data set. The resulting cluster labels will be stored in the labels_ attribute.
- Parameters:
X (np.ndarray) – the given data set
y (np.ndarray) – the labels (can be ignored)
- Returns:
self – this instance of the Diana algorithm
- Return type:
- prune_tree(level)[source]
Prune the tree at a specified cluster hierarchy level. Returns labels as if the clustering procedure would have stopped at the specified level. The resulting number of clusters will be level + 1.
- Parameters:
level (int) – The level at which the tree should be pruned. Must be larger than 0
- Returns:
labels_pruned – The pruned cluster labels
- Return type:
np.ndarray
Module contents
- class clustpy.hierarchical.Diana(n_clusters: int | None = None, distance_threshold: float = 0, construct_full_tree: bool = False, metric: str = 'euclidean')[source]
Bases:
BaseEstimator
,ClusterMixin
The DIvisive ANAlysis (DIANA) clustering algorithm. DIANA build a top-down clustering hierarchy by considering pairwise dissimilarity of objects. It recursively splits the clusters with maximum dissimilarity, whereby the dissimilarity is based on a specified distance metric (e.g., Euclidean distance).
- Parameters:
n_clusters (int) – The number of clusters. If n_clusters is None the tree will be constructed until the max diamater is below distance_threshold (default: None)
distance_threshold (float) – The distance thresholds defines the minimum diameter that is considered. Must be 0 if n_clusters is specified (default: 0)
construct_full_tree (bool) – Defines whether the full tree should be constructed after n_clusters has been reached (default: False)
metric (str) – Metric used to compute the dissimilarity. Can be “euclidean”, “l1”, “l2”, “manhattan”, “cosine”, or “precomputed” (see scipy.spatial.distance.pdist) (default: euclidean)
- labels_
The final labels
- Type:
np.ndarray
- tree_
The resulting cluster tree
- Type:
list
References
Kaufman, Rousseeuw “Divisive Analysis (Program DIANA)” Chapter six from Finding Groups in Data: An Introduction to Cluster Analysis. 1990.
- fit(X: ndarray, y: ndarray | None = None) Diana [source]
Initiate the actual clustering process on the input data set. The resulting cluster labels will be stored in the labels_ attribute.
- Parameters:
X (np.ndarray) – the given data set
y (np.ndarray) – the labels (can be ignored)
- Returns:
self – this instance of the Diana algorithm
- Return type:
- prune_tree(level)[source]
Prune the tree at a specified cluster hierarchy level. Returns labels as if the clustering procedure would have stopped at the specified level. The resulting number of clusters will be level + 1.
- Parameters:
level (int) – The level at which the tree should be pruned. Must be larger than 0
- Returns:
labels_pruned – The pruned cluster labels
- Return type:
np.ndarray