clustpy.density package

Submodules

clustpy.density.multi_density_dbscan module

@authors: Collin Leiber

class clustpy.density.multi_density_dbscan.MultiDensityDBSCAN(k: int = 15, var: float = 2.5, min_cluster_size: int = 2)[source]

Bases: BaseEstimator, ClusterMixin

The Multi Density DBSCAN algorithm. First, the densities of all data points will be calculated. Afterwards, clusters will be expanded starting with the most dense point. Density is defined as the average distance to the k-nearest neighbors.

Parameters:
  • k (int) – The number of nearest neighbors. Does not include the objects itself (default: 15)

  • var (float) – Defines the factor that the density of a point may deviate from the average cluster density (default: 2.5)

  • min_cluster_size (int) – The minimum cluster size (if a cluster is smaller, all contained points will be labeled as noise) (default: 2)

n_clusters_

The identified number of clusters

Type:

int

labels_

The final labels

Type:

np.ndarray

cluster_densities_

The final cluster densities

Type:

list

References

Ashour, Wesam, and Saad Sunoallah. “Multi density DBSCAN.” International Conference on Intelligent Data Engineering and Automated Learning. Springer, Berlin, Heidelberg, 2011.

fit(X: ndarray, y: ndarray | None = None) MultiDensityDBSCAN[source]

Initiate the actual clustering process on the input data set. The resulting cluster labels will be stored in the labels_ attribute.

Parameters:
  • X (np.ndarray) – the given data set

  • y (np.ndarray) – the labels (can be ignored)

Returns:

self – this instance of the Multi Density DBSCAN algorithm

Return type:

MultiDensityDBSCAN

Module contents

class clustpy.density.MultiDensityDBSCAN(k: int = 15, var: float = 2.5, min_cluster_size: int = 2)[source]

Bases: BaseEstimator, ClusterMixin

The Multi Density DBSCAN algorithm. First, the densities of all data points will be calculated. Afterwards, clusters will be expanded starting with the most dense point. Density is defined as the average distance to the k-nearest neighbors.

Parameters:
  • k (int) – The number of nearest neighbors. Does not include the objects itself (default: 15)

  • var (float) – Defines the factor that the density of a point may deviate from the average cluster density (default: 2.5)

  • min_cluster_size (int) – The minimum cluster size (if a cluster is smaller, all contained points will be labeled as noise) (default: 2)

n_clusters_

The identified number of clusters

Type:

int

labels_

The final labels

Type:

np.ndarray

cluster_densities_

The final cluster densities

Type:

list

References

Ashour, Wesam, and Saad Sunoallah. “Multi density DBSCAN.” International Conference on Intelligent Data Engineering and Automated Learning. Springer, Berlin, Heidelberg, 2011.

fit(X: ndarray, y: ndarray | None = None) MultiDensityDBSCAN[source]

Initiate the actual clustering process on the input data set. The resulting cluster labels will be stored in the labels_ attribute.

Parameters:
  • X (np.ndarray) – the given data set

  • y (np.ndarray) – the labels (can be ignored)

Returns:

self – this instance of the Multi Density DBSCAN algorithm

Return type:

MultiDensityDBSCAN