clustpy.metrics package

Submodules

clustpy.metrics.clustering_metrics module

clustpy.metrics.clustering_metrics.fair_normalized_mutual_information(labels_true: ndarray, labels_pred: ndarray)[source]

Evaluate the quality of predicted labels by comparing to the ground truth labels using the fair normalized mutual information score. Often simply called FNMI. A value of 1 indicates a perfect clustering result, a value of 0 indicates a totally random result. The FNMI punishes results where the number of predicted clusters diverges from the ground truth number of clusters. Therefore, it uses the normalized mutual information from sklearn and scales the value by using the predicted and ground truth number of clusters.

Parameters:

labels_true (np.ndarray) – The ground truth labels of the data set
labels_pred (np.ndarray) – The labels as predicted by a clustering algorithm

Returns:

fnmi – The score between the two input label sets.

Return type:

float

References

Amelio, Alessia, and Clara Pizzuti. “Is normalized mutual information a fair measure for comparing community detection methods?.” Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015. 2015.

clustpy.metrics.clustering_metrics.information_theoretic_external_cluster_validity_measure(labels_true: ndarray, labels_pred: ndarray, scale: bool = True) → float[source]

Evaluate the quality of predicted labels by comparing it to the ground truth labels using the Information-Theoretic External Cluster-Validity Measure. Often simply called DOM. A lower value indicates a better clustering result. If the result is scaled, this method will return a value between 1.0 (perfect match) and 0.0 (arbitrary result). An advantage of this metric is that it also works with a differing number of clusters.

Parameters:

labels_true (np.ndarray) – The ground truth labels of the data set
labels_pred (np.ndarray) – The labels as predicted by a clustering algorithm
scale (bool) – Scale the result to (0, 1], where 1 indicates a perfect match and 0 indicates an arbitrary result (default: True)

Returns:

dom – The validity between the two input label sets.

Return type:

float

References

Byron E. Dom. 2002. “An information-theoretic external cluster-validity measure.” In Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence (UAI’02).

clustpy.metrics.clustering_metrics.unsupervised_clustering_accuracy(labels_true: ndarray, labels_pred: ndarray) → float[source]

Evaluate the quality of predicted labels by comparing it to the ground truth labels using the clustering accuracy. Returns a value between 1.0 (perfect match) and 0.0 (arbitrary result). Since the id of a cluster is not fixed in a clustering setting, the clustering accuracy evaluates each possible combination with the ground truth labels.

Parameters:

labels_true (np.ndarray) – The ground truth labels of the data set
labels_pred (np.ndarray) – The labels as predicted by a clustering algorithm

Returns:

acc – The accuracy between the two input label sets.

Return type:

float

References

Yang, Yi, et al. “Image clustering using local discriminant models and global integration.” IEEE Transactions on Image Processing 19.10 (2010): 2761-2773.

clustpy.metrics.clustering_metrics.variation_of_information(labels_true: ndarray, labels_pred: ndarray) → float[source]

Calculate the variation of information between the ground truth labels and the predicted labels. Returns a minimum value of 0.0 which corresponds to a perfect match. Implemented as defined in https://en.wikipedia.org/wiki/Variation_of_information

Parameters:

labels_true (np.ndarray) – The ground truth labels of the data set
labels_pred (np.ndarray) – The labels as predicted by a clustering algorithm

Returns:

vi – The variation of information

Return type:

float

References

Meilă, Marina. “Comparing clusterings by the variation of information.” Learning theory and kernel machines. Springer, Berlin, Heidelberg, 2003. 173-187.

clustpy.metrics.confusion_matrix module

class clustpy.metrics.confusion_matrix.ConfusionMatrix(labels_true: ndarray, labels_pred: ndarray)[source]

Bases: object

Create a Confusion Matrix of predicted and ground truth labels. Each row corresponds to a ground truth label and each column to a predicted label. The number in each cell (i, j) indicates how many objects with ground truth label i have been predicted label j.

Parameters:

labels_true (np.ndarray) – The ground truth labels of the data set
labels_pred (np.ndarray) – The labels as predicted by a clustering algorithm

confusion_matrix

The confusion matrix

Type:: np.ndarray

plot(show_text: bool = True, figsize: tuple = (10, 10), cmap: str = 'YlGn', textcolor: str = 'black', vmin: int = 0, vmax: int | None = None) → None[source]

Plot the confusion matrix.

Parameters:

show_text (bool) – Show the value in each cell as text (default: True)
figsize (tuple) – Tuple indicating the height and width of the plot (default: (10, 10))
cmap (str) – Colormap used for the plot (default: “YlGn”)
textcolor (str) – Color of the text. Only relevant if show_text is True (default: “black”)
vmin (int) – Minimum possible value within a cell of the confusion matrix. If None, it will be set as the minimum value within the confusion matrix. Used to choose the color from the colormap (default: 0)
vmax (int) – Maximum possible value within a cell of the confusion matrix. If None, it will be set as the maximum value within the confusion matrix. Used to choose the color from the colormap (default: None)

rearrange(inplace: bool = True) → ndarray[source]

Rearrange the confusion matrix in such a way that the sum of the diagonal is maximized. Thereby, the best matching combination of labels will be shown. Uses the Hungarian Method to identify the best match. If parameter inplace is set to True, this method will change the original confusion matrix. Else the rearranged matrix will only be returned.

Parameters:: inplace (bool) – Should the new confusion matrix overwrite the original one (default: True)
Returns:: rearranged_confusion_matrix – The rearranged confusion matrix If number of ground truth labels is larer than the number of predicted labels, the resulting confusion matrix will be quadradic with multiple 0 columns.
Return type:: np.ndarray

clustpy.metrics.multipe_labelings_scoring module

class clustpy.metrics.multipe_labelings_scoring.MultipleLabelingsConfusionMatrix(labels_true: ~numpy.ndarray, labels_pred: ~numpy.ndarray, metric: ~collections.abc.Callable = <function normalized_mutual_info_score>, remove_noise_spaces: bool = True, metric_params: dict = {})[source]

Bases: ConfusionMatrix

A Multi Labelings Confusion Matrix is a special type of Confusion Matrix where each cell corresonds to the clustering score between one ground truth label set and one predicted label set. Therefore, the shape is equal to (number of ground truth labelings, number of predicted labelings). The scoring metric used can be freely selected by the user and is called up as follows: metric(labels_gt, labels_pred). Additional parameters for the chosen metric can be set by using the metric_params dictionary. The default metric is the ‘normalized mutual information’ from sklearn.metrics.normalized_mutual_info_score.

The Multi Labelings Confusion Matrix can also be used to calculate the average redundancy of a set of labels. Therefore, it is recommended to set metric=clustpy.metrics.variation_of_information and aggregate the confusion matrix with aggregation_strategy=”mean_wo_diag”.

Parameters:

labels_true (np.ndarray) – The true set of labelings. Shape must match (n_samples, n_subspaces)
labels_pred (np.ndarray) – The predicted set of labelings. Shape must match (n_samples, n_subspaces)
remove_noise_spaces (bool) – Defines if optional noise spaces should be ignored when calculating the score (default: True)
metric (Callable) – The chosen scoring metric (default: sklearn.metrics.normalized_mutual_info_score)
metric_params (dict) – Additional parameters for the scoring metric (default: {})

confusion_matrix

The confusion matrix

Type:: np.ndarray

Examples

>>> # Calculate average redundancy
>>> from clustpy.metrics import variation_of_information as vi
>>> labels = np.array([[1, 1, 1, 1, 0, 0, 0, 0],
>>>                    [0, 0, 0, 0, 1, 1, 1, 1],
>>>                    [0, 0, 1, 1, 1, 1, 1, 1],
>>>                    [1, 2, 3, 4, 5, 6, 7, 8]]).T
>>> mlcm = MultipleLabelingsConfusionMatrix(labels, labels, metric=vi)
>>> mlcm.aggregate("mean_wo_diag")

aggregate(aggregation_strategy: str = 'max') → float[source]

Aggregate the Multiple Labelings Confusion Matrix to a single value. Different strategies of aggregations are possible:

“max”: Choose for each ground truth set of labels the predicted set of labels with the maximum value (prediction labeling can be used multiple times).

“min”: Choose for each ground truth set of labels the predicted set of labels with the minimum value (prediction labeling can be used multiple times).

“permut-max”: Assign each ground truth labeling one predicted labeling, so that the sum of the combinations is maximzed (prediction labeling can only be assigned to one ground truth labeling).

“permut-min”: Assign each ground truth labeling one predicted labeling, so that the sum of the combinations is minimized (prediction labeling can only be assigned to one ground truth labeling).

“mean”: Calculate the mean value of all values in the confusion matrix.

“mean_wo_diag”: Calculate mean value ignoring the diagonal. E.g. used to calculate average redundancy. Note: Confusion matrix must be quadratic!

In the end all results (except for ‘mean’) are divided by the number of ground truth labelings.

Parameters:: aggregation_strategy (str) – The aggregation strategy (default: “max”)
Returns:: score – The resulting aggregated score
Return type:: float

Examples

>>> from clustpy.metrics import MultipleLabelingsConfusionMatrix
>>> mlcm = MultipleLabelingsConfusionMatrix(np.array([0, 1]), np.array([0, 1]))
>>> # Overwrite confusion matrix (for demonstration purposes only)
>>> mlcm.confusion_matrix = np.array([[0., 0.1, 0.2],
>>>                                   [1, 0.9, 0.8],
>>>                                   [0, 0.2, 0.3]])
>>> mlcm.aggregate("max") == 1.5 / 3 # True
>>> mlcm.aggregate("min") == 0.8 / 3 # True
>>> mlcm.aggregate("permut-max") == 1.4 / 3 # True
>>> mlcm.aggregate("permut-min") == 0.9 / 3 # True
>>> mlcm.aggregate("mean") == 3.5 / 9 # True

plot(show_text: bool = True, figsize: tuple = (10, 10), cmap: str = 'YlGn', textcolor: str = 'black', vmin: float = 0.0, vmax: float = 1.0) → None[source]

Plot the Multiple Labelings Confusion Matrix. Same plot as for a regular Confusion Matrix but vmax is by default set to 1 as it is usually the maximum value for clustering metrics.

Parameters:

show_text (bool) – Show the value in each cell as text (default: True)
figsize (tuple) – Tuple indicating the height and width of the plot (default: (10, 10))
cmap (str) – Colormap used for the plot (default: “YlGn”)
textcolor (str) – Color of the text. Only relevant if show_text is True (default: “black”)
vmin (float) – Minimum possible value within a cell of the confusion matrix. If None, it will be set as the minimum value within the confusion matrix. Used to choose the color from the colormap (default: 0.0)
vmax (float) – Maximum possible value within a cell of the confusion matrix. If None, it will be set as the maximum value within the confusion matrix. Used to choose the color from the colormap (default: 1.0)

class clustpy.metrics.multipe_labelings_scoring.MultipleLabelingsPairCountingScores(labels_true: ndarray, labels_pred: ndarray, remove_noise_spaces: bool = True)[source]

Bases: PairCountingScores

Obtain all parameters that are necessary to calculate the pair-counting scores ‘jaccard’, ‘rand’, ‘precision’, ‘recall’ and ‘f1’. These parameters are the number of ‘true positives’, ‘false positives’, ‘false negatives’ and ‘true negatives’. The resulting object can call all pair-counting score methods. In contrast to common pair-counting calculations, a match between two samples counts if it occurs in at least one label set.

Parameters:

labels_true (np.ndarray) – The true set of labelings. Shape must match (n_samples, n_subspaces)
labels_pred (np.ndarray) – The predicted set of labelings. Shape must match (n_samples, n_subspaces)
remove_noise_spaces (bool) – Defines if optional noise spaces should be ignored when calculating the score (default: True)

n_tp

The number of true positives,

Type:: int

n_fp

The number of false positives,

Type:: int

n_fn

The number of false negatives,

Type:: int

n_tn

The number of true negatives

Type:: int

References

Achtert, Elke, et al. “Evaluation of clusterings–metrics and visual support.” 2012 IEEE 28th International Conference on Data Engineering. IEEE, 2012.

clustpy.metrics.multipe_labelings_scoring.is_multi_labelings_n_clusters_correct(labels_true: ndarray, labels_pred: ndarray, check_subset: bool = True, remove_noise_spaces: bool = True) → bool[source]

Check if number of clusters of two sets of labelings matches. The parameter check_subset defines, if it is sufficient if the number of clusters of a subset of the predicted label set (n_clusters_pred) is equal to the number of clusters of the true label set (n_clusters_true). E.g. assume n_clusters_true is [4, 3, 1] and n_clusters_pred is [4, 2, 1]. In this case is_multi_labelings_n_clusters_correct(labels_true, labels_pred) will be False. Now let us assume n_clusters_true is still [4, 3, 1] but n_clusters_pred is [4, 3, 2, 1]. In this case is_multi_labelings_n_clusters_correct(labels_true, labels_pred) will be False if check_subset is False and True otherwise.

Parameters:

labels_true (np.ndarray) – The true set of labelings. Shape must match (n_samples, n_subspaces)
labels_pred (np.ndarray) – The predicted set of labelings. Shape must match (n_samples, n_subspaces)
check_subset (bool) – Boolean defines if it is sufficient if a subset of n_clusters_pred is equal to n_clusters_true (default: True)
remove_noise_spaces (bool) – Defines if optional noise spaces should be ignored when calculating the score (default: True)

Returns:

is_equal – Boolean indicating if the number of clusters of labels_true and labels_pred matches

Return type:

bool

clustpy.metrics.multipe_labelings_scoring.multiple_labelings_pc_f1_score(labels_true: ndarray, labels_pred: ndarray, remove_noise_spaces: bool = True) → float[source]

Calculate the f1 score for multiple labelings. F1 score = 2 * precision * recall / (precision + recall). In the clustering domain the calculation is based on pair-counting as the true label ids are unknown. In contrast to common pair-counting calculations, a match between two samples counts if it occurs in at least one label set.

Parameters:

labels_true (np.ndarray) – The true set of labelings. Shape must match (n_samples, n_subspaces)
labels_pred (np.ndarray) – The predicted set of labelings. Shape must match (n_samples, n_subspaces)
remove_noise_spaces (bool) – Defines if optional noise spaces should be ignored when calculating the score (default: True)

Returns:

score – The f1 score

Return type:

float

References

Van Rijsbergen, C. J. (1979). Information Retrieval (2nd ed.). Butterworth-Heinemann.

and

Achtert, Elke, et al. “Evaluation of clusterings–metrics and visual support.” 2012 IEEE 28th International Conference on Data Engineering. IEEE, 2012.

clustpy.metrics.multipe_labelings_scoring.multiple_labelings_pc_jaccard_score(labels_true: ndarray, labels_pred: ndarray, remove_noise_spaces: bool = True) → float[source]

Calculate the jaccard score for multiple labelings. Jaccard score = n_tp / (n_tp + n_fp + n_fn). In the clustering domain the calculation is based on pair-counting as the true label ids are unknown. In contrast to common pair-counting calculations, a match between two samples counts if it occurs in at least one label set.

Parameters:

labels_true (np.ndarray) – The true set of labelings. Shape must match (n_samples, n_subspaces)
labels_pred (np.ndarray) – The predicted set of labelings. Shape must match (n_samples, n_subspaces)
remove_noise_spaces (bool) – Defines if optional noise spaces should be ignored when calculating the score (default: True)

Returns:

score – The jaccard score

Return type:

float

References

Jaccard, Paul. “Lois de distribution florale dans la zone alpine.” Bull Soc Vaudoise Sci Nat 38 (1902): 69-130.

and

Achtert, Elke, et al. “Evaluation of clusterings–metrics and visual support.” 2012 IEEE 28th International Conference on Data Engineering. IEEE, 2012.

clustpy.metrics.multipe_labelings_scoring.multiple_labelings_pc_precision_score(labels_true: ndarray, labels_pred: ndarray, remove_noise_spaces: bool = True) → float[source]

Calculate the precision for multiple labelings. Precision score = n_tp / (n_tp + n_fp). In the clustering domain the calculation is based on pair-counting as the true label ids are unknown. In contrast to common pair-counting calculations, a match between two samples counts if it occurs in at least one label set.

Parameters:

labels_true (np.ndarray) – The true set of labelings. Shape must match (n_samples, n_subspaces)
labels_pred (np.ndarray) – The predicted set of labelings. Shape must match (n_samples, n_subspaces)
remove_noise_spaces (bool) – Defines if optional noise spaces should be ignored when calculating the score (default: True)

Returns:

score – The precision

Return type:

float

References

Allen, Kent, et al. “Machine literature searching VIII. Operational criteria for designing information retrieval systems.” American Documentation (pre-1986) 6.2 (1955): 93.

and

Achtert, Elke, et al. “Evaluation of clusterings–metrics and visual support.” 2012 IEEE 28th International Conference on Data Engineering. IEEE, 2012.

clustpy.metrics.multipe_labelings_scoring.multiple_labelings_pc_rand_score(labels_true: ndarray, labels_pred: ndarray, remove_noise_spaces: bool = True) → float[source]

Calculate the rand score for multiple labelings. Rand score = (n_tp + n_tn) / (n_tp + n_fp + n_fn + n_tn). In the clustering domain the calculation is based on pair-counting as the true label ids are unknown. In contrast to common pair-counting calculations, a match between two samples counts if it occurs in at least one label set.

Parameters:

labels_true (np.ndarray) – The true set of labelings. Shape must match (n_samples, n_subspaces)
labels_pred (np.ndarray) – The predicted set of labelings. Shape must match (n_samples, n_subspaces)
remove_noise_spaces (bool) – Defines if optional noise spaces should be ignored when calculating the score (default: True)

Returns:

score – The rand score

Return type:

float

References

Rand, William M. “Objective criteria for the evaluation of clustering methods.” Journal of the American Statistical association 66.336 (1971): 846-850.

and

Achtert, Elke, et al. “Evaluation of clusterings–metrics and visual support.” 2012 IEEE 28th International Conference on Data Engineering. IEEE, 2012.

clustpy.metrics.multipe_labelings_scoring.multiple_labelings_pc_recall_score(labels_true: ndarray, labels_pred: ndarray, remove_noise_spaces: bool = True) → float[source]

Calculate the recall for multiple labelings. Recall score = n_tp / (n_tp + n_fn). In the clustering domain the calculation is based on pair-counting as the true label ids are unknown. In contrast to common pair-counting calculations, a match between two samples counts if it occurs in at least one label set.

Parameters:

labels_true (np.ndarray) – The true set of labelings. Shape must match (n_samples, n_subspaces)
labels_pred (np.ndarray) – The predicted set of labelings. Shape must match (n_samples, n_subspaces)
remove_noise_spaces (bool) – Defines if optional noise spaces should be ignored when calculating the score (default: True)

Returns:

score – The recall

Return type:

float

References

Allen, Kent, et al. “Machine literature searching VIII. Operational criteria for designing information retrieval systems.” American Documentation (pre-1986) 6.2 (1955): 93.

and

Achtert, Elke, et al. “Evaluation of clusterings–metrics and visual support.” 2012 IEEE 28th International Conference on Data Engineering. IEEE, 2012.

clustpy.metrics.multipe_labelings_scoring.remove_noise_spaces_from_labels(labels: ndarray) → ndarray[source]

Remove optional noise spaces (n_clusters=1) from labels. If outliers are present (label=-1) but all non-outlier labels (label>=0) are equal, the label column will still be regarded as noise space.

Parameters:: labels (np.ndarray) – The input labels
Returns:: labels_new – The output labels
Return type:: np.ndarray

clustpy.metrics.pair_counting_scores module

class clustpy.metrics.pair_counting_scores.PairCountingScores(labels_true: ndarray, labels_pred: ndarray)[source]

Bases: object

Obtain all parameters that are necessary to calculate the pair-counting scores ‘jaccard’, ‘rand’, ‘precision’, ‘recall’ and ‘f1’. These parameters are the number of ‘true positives’, ‘false positives’, ‘false negatives’ and ‘true negatives’. The resulting object can call all pair-counting score methods.

Parameters:

labels_true (np.ndarray) – The ground truth labels of the data set
labels_pred (np.ndarray) – The labels as predicted by a clustering algorithm

n_tp

The number of true positives,

Type:: int

n_fp

The number of false positives,

Type:: int

n_fn

The number of false negatives,

Type:: int

n_tn

The number of true negatives

Type:: int

References

Pfitzner, Darius, Richard Leibbrandt, and David Powers. “Characterization and evaluation of similarity measures for pairs of clusterings.” Knowledge and Information Systems 19 (2009): 361-394.

f1() → float[source]

Calculate the f1 score. F1 score = 2 * precision * recall / (precision + recall).

Returns:: score – The f1 score
Return type:: float

References

Van Rijsbergen, C. J. (1979). Information Retrieval (2nd ed.). Butterworth-Heinemann.

jaccard() → float[source]

Calculate the jaccard score. Jaccard score = n_tp / (n_tp + n_fp + n_fn).

Returns:: score – The jaccard score
Return type:: float

References

Jaccard, Paul. “Lois de distribution florale dans la zone alpine.” Bull Soc Vaudoise Sci Nat 38 (1902): 69-130.

precision() → float[source]

Calculate the precision. Precision score = n_tp / (n_tp + n_fp).

Returns:: score – The precision score
Return type:: float

References

Allen, Kent, et al. “Machine literature searching VIII. Operational criteria for designing information retrieval systems.” American Documentation (pre-1986) 6.2 (1955): 93.

rand() → float[source]

Calculate the rand score. Rand score = (n_tp + n_tn) / (n_tp + n_fp + n_fn + n_tn).

Returns:: score – The rand score
Return type:: float

References

Rand, William M. “Objective criteria for the evaluation of clustering methods.” Journal of the American Statistical association 66.336 (1971): 846-850.

recall() → float[source]

Calculate the recall. Recall score = n_tp / (n_tp + n_fn).

Returns:: score – The recall score
Return type:: float

References

Allen, Kent, et al. “Machine literature searching VIII. Operational criteria for designing information retrieval systems.” American Documentation (pre-1986) 6.2 (1955): 93.

clustpy.metrics.pair_counting_scores.pc_f1_score(labels_true: ndarray, labels_pred: ndarray) → float[source]

Calculate the f1 score. F1 score = 2 * precision * recall / (precision + recall). In the clustering domain the calculation is based on pair-counting as the true label ids are unknown.

Parameters:

labels_true (np.ndarray) – The ground truth labels of the data set
labels_pred (np.ndarray) – The labels as predicted by a clustering algorithm

Returns:

score – The f1 score

Return type:

float

References

Van Rijsbergen, C. J. (1979). Information Retrieval (2nd ed.). Butterworth-Heinemann.

and

Pfitzner, Darius, Richard Leibbrandt, and David Powers. “Characterization and evaluation of similarity measures for pairs of clusterings.” Knowledge and Information Systems 19 (2009): 361-394.

clustpy.metrics.pair_counting_scores.pc_jaccard_score(labels_true: ndarray, labels_pred: ndarray) → float[source]

Calculate the jaccard score. Jaccard score = n_tp / (n_tp + n_fp + n_fn). In the clustering domain the calculation is based on pair-counting as the true label ids are unknown.

Parameters:

labels_true (np.ndarray) – The ground truth labels of the data set
labels_pred (np.ndarray) – The labels as predicted by a clustering algorithm

Returns:

score – The jaccard score

Return type:

float

References

Jaccard, Paul. “Lois de distribution florale dans la zone alpine.” Bull Soc Vaudoise Sci Nat 38 (1902): 69-130.

and

Pfitzner, Darius, Richard Leibbrandt, and David Powers. “Characterization and evaluation of similarity measures for pairs of clusterings.” Knowledge and Information Systems 19 (2009): 361-394.

clustpy.metrics.pair_counting_scores.pc_precision_score(labels_true: ndarray, labels_pred: ndarray) → float[source]

Calculate the precision. Precision score = n_tp / (n_tp + n_fp). In the clustering domain the calculation is based on pair-counting as the true label ids are unknown.

Parameters:

labels_true (np.ndarray) – The ground truth labels of the data set
labels_pred (np.ndarray) – The labels as predicted by a clustering algorithm

Returns:

score – The precision score

Return type:

float

References

Allen, Kent, et al. “Machine literature searching VIII. Operational criteria for designing information retrieval systems.” American Documentation (pre-1986) 6.2 (1955): 93.

and

Pfitzner, Darius, Richard Leibbrandt, and David Powers. “Characterization and evaluation of similarity measures for pairs of clusterings.” Knowledge and Information Systems 19 (2009): 361-394.

clustpy.metrics.pair_counting_scores.pc_rand_score(labels_true: ndarray, labels_pred: ndarray) → float[source]

Calculate the rand score. Rand score = (n_tp + n_tn) / (n_tp + n_fp + n_fn + n_tn). In the clustering domain the calculation is based on pair-counting as the true label ids are unknown.

Parameters:

labels_true (np.ndarray) – The ground truth labels of the data set
labels_pred (np.ndarray) – The labels as predicted by a clustering algorithm

Returns:

score – The rand score

Return type:

float

References

Rand, William M. “Objective criteria for the evaluation of clustering methods.” Journal of the American Statistical association 66.336 (1971): 846-850.

and

Pfitzner, Darius, Richard Leibbrandt, and David Powers. “Characterization and evaluation of similarity measures for pairs of clusterings.” Knowledge and Information Systems 19 (2009): 361-394.

clustpy.metrics.pair_counting_scores.pc_recall_score(labels_true: ndarray, labels_pred: ndarray) → float[source]

Calculate the recall. Recall score = n_tp / (n_tp + n_fn). In the clustering domain the calculation is based on pair-counting as the true label ids are unknown.

Parameters:

labels_true (np.ndarray) – The ground truth labels of the data set
labels_pred (np.ndarray) – The labels as predicted by a clustering algorithm

Returns:

score – The recall score

Return type:

float

References

Allen, Kent, et al. “Machine literature searching VIII. Operational criteria for designing information retrieval systems.” American Documentation (pre-1986) 6.2 (1955): 93.

and

Pfitzner, Darius, Richard Leibbrandt, and David Powers. “Characterization and evaluation of similarity measures for pairs of clusterings.” Knowledge and Information Systems 19 (2009): 361-394.

Module contents

class clustpy.metrics.ConfusionMatrix(labels_true: ndarray, labels_pred: ndarray)[source]

Bases: object

Create a Confusion Matrix of predicted and ground truth labels. Each row corresponds to a ground truth label and each column to a predicted label. The number in each cell (i, j) indicates how many objects with ground truth label i have been predicted label j.

Parameters:

labels_true (np.ndarray) – The ground truth labels of the data set
labels_pred (np.ndarray) – The labels as predicted by a clustering algorithm

confusion_matrix

The confusion matrix

Type:: np.ndarray

plot(show_text: bool = True, figsize: tuple = (10, 10), cmap: str = 'YlGn', textcolor: str = 'black', vmin: int = 0, vmax: int | None = None) → None[source]

Plot the confusion matrix.

Parameters:

show_text (bool) – Show the value in each cell as text (default: True)
figsize (tuple) – Tuple indicating the height and width of the plot (default: (10, 10))
cmap (str) – Colormap used for the plot (default: “YlGn”)
textcolor (str) – Color of the text. Only relevant if show_text is True (default: “black”)
vmin (int) – Minimum possible value within a cell of the confusion matrix. If None, it will be set as the minimum value within the confusion matrix. Used to choose the color from the colormap (default: 0)
vmax (int) – Maximum possible value within a cell of the confusion matrix. If None, it will be set as the maximum value within the confusion matrix. Used to choose the color from the colormap (default: None)

rearrange(inplace: bool = True) → ndarray[source]

Rearrange the confusion matrix in such a way that the sum of the diagonal is maximized. Thereby, the best matching combination of labels will be shown. Uses the Hungarian Method to identify the best match. If parameter inplace is set to True, this method will change the original confusion matrix. Else the rearranged matrix will only be returned.

Parameters:: inplace (bool) – Should the new confusion matrix overwrite the original one (default: True)
Returns:: rearranged_confusion_matrix – The rearranged confusion matrix If number of ground truth labels is larer than the number of predicted labels, the resulting confusion matrix will be quadradic with multiple 0 columns.
Return type:: np.ndarray

class clustpy.metrics.MultipleLabelingsConfusionMatrix(labels_true: ~numpy.ndarray, labels_pred: ~numpy.ndarray, metric: ~collections.abc.Callable = <function normalized_mutual_info_score>, remove_noise_spaces: bool = True, metric_params: dict = {})[source]

Bases: ConfusionMatrix

A Multi Labelings Confusion Matrix is a special type of Confusion Matrix where each cell corresonds to the clustering score between one ground truth label set and one predicted label set. Therefore, the shape is equal to (number of ground truth labelings, number of predicted labelings). The scoring metric used can be freely selected by the user and is called up as follows: metric(labels_gt, labels_pred). Additional parameters for the chosen metric can be set by using the metric_params dictionary. The default metric is the ‘normalized mutual information’ from sklearn.metrics.normalized_mutual_info_score.

The Multi Labelings Confusion Matrix can also be used to calculate the average redundancy of a set of labels. Therefore, it is recommended to set metric=clustpy.metrics.variation_of_information and aggregate the confusion matrix with aggregation_strategy=”mean_wo_diag”.

Parameters:

labels_true (np.ndarray) – The true set of labelings. Shape must match (n_samples, n_subspaces)
labels_pred (np.ndarray) – The predicted set of labelings. Shape must match (n_samples, n_subspaces)
remove_noise_spaces (bool) – Defines if optional noise spaces should be ignored when calculating the score (default: True)
metric (Callable) – The chosen scoring metric (default: sklearn.metrics.normalized_mutual_info_score)
metric_params (dict) – Additional parameters for the scoring metric (default: {})

confusion_matrix

The confusion matrix

Type:: np.ndarray

Examples

>>> # Calculate average redundancy
>>> from clustpy.metrics import variation_of_information as vi
>>> labels = np.array([[1, 1, 1, 1, 0, 0, 0, 0],
>>>                    [0, 0, 0, 0, 1, 1, 1, 1],
>>>                    [0, 0, 1, 1, 1, 1, 1, 1],
>>>                    [1, 2, 3, 4, 5, 6, 7, 8]]).T
>>> mlcm = MultipleLabelingsConfusionMatrix(labels, labels, metric=vi)
>>> mlcm.aggregate("mean_wo_diag")

aggregate(aggregation_strategy: str = 'max') → float[source]

Aggregate the Multiple Labelings Confusion Matrix to a single value. Different strategies of aggregations are possible:

“max”: Choose for each ground truth set of labels the predicted set of labels with the maximum value (prediction labeling can be used multiple times).

“min”: Choose for each ground truth set of labels the predicted set of labels with the minimum value (prediction labeling can be used multiple times).

“permut-max”: Assign each ground truth labeling one predicted labeling, so that the sum of the combinations is maximzed (prediction labeling can only be assigned to one ground truth labeling).

“permut-min”: Assign each ground truth labeling one predicted labeling, so that the sum of the combinations is minimized (prediction labeling can only be assigned to one ground truth labeling).

“mean”: Calculate the mean value of all values in the confusion matrix.

“mean_wo_diag”: Calculate mean value ignoring the diagonal. E.g. used to calculate average redundancy. Note: Confusion matrix must be quadratic!

In the end all results (except for ‘mean’) are divided by the number of ground truth labelings.

Parameters:: aggregation_strategy (str) – The aggregation strategy (default: “max”)
Returns:: score – The resulting aggregated score
Return type:: float

Examples

>>> from clustpy.metrics import MultipleLabelingsConfusionMatrix
>>> mlcm = MultipleLabelingsConfusionMatrix(np.array([0, 1]), np.array([0, 1]))
>>> # Overwrite confusion matrix (for demonstration purposes only)
>>> mlcm.confusion_matrix = np.array([[0., 0.1, 0.2],
>>>                                   [1, 0.9, 0.8],
>>>                                   [0, 0.2, 0.3]])
>>> mlcm.aggregate("max") == 1.5 / 3 # True
>>> mlcm.aggregate("min") == 0.8 / 3 # True
>>> mlcm.aggregate("permut-max") == 1.4 / 3 # True
>>> mlcm.aggregate("permut-min") == 0.9 / 3 # True
>>> mlcm.aggregate("mean") == 3.5 / 9 # True

plot(show_text: bool = True, figsize: tuple = (10, 10), cmap: str = 'YlGn', textcolor: str = 'black', vmin: float = 0.0, vmax: float = 1.0) → None[source]

Plot the Multiple Labelings Confusion Matrix. Same plot as for a regular Confusion Matrix but vmax is by default set to 1 as it is usually the maximum value for clustering metrics.

Parameters:

show_text (bool) – Show the value in each cell as text (default: True)
figsize (tuple) – Tuple indicating the height and width of the plot (default: (10, 10))
cmap (str) – Colormap used for the plot (default: “YlGn”)
textcolor (str) – Color of the text. Only relevant if show_text is True (default: “black”)
vmin (float) – Minimum possible value within a cell of the confusion matrix. If None, it will be set as the minimum value within the confusion matrix. Used to choose the color from the colormap (default: 0.0)
vmax (float) – Maximum possible value within a cell of the confusion matrix. If None, it will be set as the maximum value within the confusion matrix. Used to choose the color from the colormap (default: 1.0)

class clustpy.metrics.MultipleLabelingsPairCountingScores(labels_true: ndarray, labels_pred: ndarray, remove_noise_spaces: bool = True)[source]

Bases: PairCountingScores

Obtain all parameters that are necessary to calculate the pair-counting scores ‘jaccard’, ‘rand’, ‘precision’, ‘recall’ and ‘f1’. These parameters are the number of ‘true positives’, ‘false positives’, ‘false negatives’ and ‘true negatives’. The resulting object can call all pair-counting score methods. In contrast to common pair-counting calculations, a match between two samples counts if it occurs in at least one label set.

Parameters:

labels_true (np.ndarray) – The true set of labelings. Shape must match (n_samples, n_subspaces)
labels_pred (np.ndarray) – The predicted set of labelings. Shape must match (n_samples, n_subspaces)
remove_noise_spaces (bool) – Defines if optional noise spaces should be ignored when calculating the score (default: True)

n_tp

The number of true positives,

Type:: int

n_fp

The number of false positives,

Type:: int

n_fn

The number of false negatives,

Type:: int

n_tn

The number of true negatives

Type:: int

References

Achtert, Elke, et al. “Evaluation of clusterings–metrics and visual support.” 2012 IEEE 28th International Conference on Data Engineering. IEEE, 2012.

class clustpy.metrics.PairCountingScores(labels_true: ndarray, labels_pred: ndarray)[source]

Bases: object

Obtain all parameters that are necessary to calculate the pair-counting scores ‘jaccard’, ‘rand’, ‘precision’, ‘recall’ and ‘f1’. These parameters are the number of ‘true positives’, ‘false positives’, ‘false negatives’ and ‘true negatives’. The resulting object can call all pair-counting score methods.

Parameters:

labels_true (np.ndarray) – The ground truth labels of the data set
labels_pred (np.ndarray) – The labels as predicted by a clustering algorithm

n_tp

The number of true positives,

Type:: int

n_fp

The number of false positives,

Type:: int

n_fn

The number of false negatives,

Type:: int

n_tn

The number of true negatives

Type:: int

References

Pfitzner, Darius, Richard Leibbrandt, and David Powers. “Characterization and evaluation of similarity measures for pairs of clusterings.” Knowledge and Information Systems 19 (2009): 361-394.

f1() → float[source]

Calculate the f1 score. F1 score = 2 * precision * recall / (precision + recall).

Returns:: score – The f1 score
Return type:: float

References

Van Rijsbergen, C. J. (1979). Information Retrieval (2nd ed.). Butterworth-Heinemann.

jaccard() → float[source]

Calculate the jaccard score. Jaccard score = n_tp / (n_tp + n_fp + n_fn).

Returns:: score – The jaccard score
Return type:: float

References

Jaccard, Paul. “Lois de distribution florale dans la zone alpine.” Bull Soc Vaudoise Sci Nat 38 (1902): 69-130.

precision() → float[source]

Calculate the precision. Precision score = n_tp / (n_tp + n_fp).

Returns:: score – The precision score
Return type:: float

References

Allen, Kent, et al. “Machine literature searching VIII. Operational criteria for designing information retrieval systems.” American Documentation (pre-1986) 6.2 (1955): 93.

rand() → float[source]

Calculate the rand score. Rand score = (n_tp + n_tn) / (n_tp + n_fp + n_fn + n_tn).

Returns:: score – The rand score
Return type:: float

References

Rand, William M. “Objective criteria for the evaluation of clustering methods.” Journal of the American Statistical association 66.336 (1971): 846-850.

recall() → float[source]

Calculate the recall. Recall score = n_tp / (n_tp + n_fn).

Returns:: score – The recall score
Return type:: float

References

Allen, Kent, et al. “Machine literature searching VIII. Operational criteria for designing information retrieval systems.” American Documentation (pre-1986) 6.2 (1955): 93.

clustpy.metrics.fair_normalized_mutual_information(labels_true: ndarray, labels_pred: ndarray)[source]

Evaluate the quality of predicted labels by comparing to the ground truth labels using the fair normalized mutual information score. Often simply called FNMI. A value of 1 indicates a perfect clustering result, a value of 0 indicates a totally random result. The FNMI punishes results where the number of predicted clusters diverges from the ground truth number of clusters. Therefore, it uses the normalized mutual information from sklearn and scales the value by using the predicted and ground truth number of clusters.

Parameters:

labels_true (np.ndarray) – The ground truth labels of the data set
labels_pred (np.ndarray) – The labels as predicted by a clustering algorithm

Returns:

fnmi – The score between the two input label sets.

Return type:

float

References

Amelio, Alessia, and Clara Pizzuti. “Is normalized mutual information a fair measure for comparing community detection methods?.” Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015. 2015.

clustpy.metrics.information_theoretic_external_cluster_validity_measure(labels_true: ndarray, labels_pred: ndarray, scale: bool = True) → float[source]

Evaluate the quality of predicted labels by comparing it to the ground truth labels using the Information-Theoretic External Cluster-Validity Measure. Often simply called DOM. A lower value indicates a better clustering result. If the result is scaled, this method will return a value between 1.0 (perfect match) and 0.0 (arbitrary result). An advantage of this metric is that it also works with a differing number of clusters.

Parameters:

labels_true (np.ndarray) – The ground truth labels of the data set
labels_pred (np.ndarray) – The labels as predicted by a clustering algorithm
scale (bool) – Scale the result to (0, 1], where 1 indicates a perfect match and 0 indicates an arbitrary result (default: True)

Returns:

dom – The validity between the two input label sets.

Return type:

float

References

Byron E. Dom. 2002. “An information-theoretic external cluster-validity measure.” In Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence (UAI’02).

clustpy.metrics.is_multi_labelings_n_clusters_correct(labels_true: ndarray, labels_pred: ndarray, check_subset: bool = True, remove_noise_spaces: bool = True) → bool[source]

Check if number of clusters of two sets of labelings matches. The parameter check_subset defines, if it is sufficient if the number of clusters of a subset of the predicted label set (n_clusters_pred) is equal to the number of clusters of the true label set (n_clusters_true). E.g. assume n_clusters_true is [4, 3, 1] and n_clusters_pred is [4, 2, 1]. In this case is_multi_labelings_n_clusters_correct(labels_true, labels_pred) will be False. Now let us assume n_clusters_true is still [4, 3, 1] but n_clusters_pred is [4, 3, 2, 1]. In this case is_multi_labelings_n_clusters_correct(labels_true, labels_pred) will be False if check_subset is False and True otherwise.

Parameters:

labels_true (np.ndarray) – The true set of labelings. Shape must match (n_samples, n_subspaces)
labels_pred (np.ndarray) – The predicted set of labelings. Shape must match (n_samples, n_subspaces)
check_subset (bool) – Boolean defines if it is sufficient if a subset of n_clusters_pred is equal to n_clusters_true (default: True)
remove_noise_spaces (bool) – Defines if optional noise spaces should be ignored when calculating the score (default: True)

Returns:

is_equal – Boolean indicating if the number of clusters of labels_true and labels_pred matches

Return type:

bool

clustpy.metrics.multiple_labelings_pc_f1_score(labels_true: ndarray, labels_pred: ndarray, remove_noise_spaces: bool = True) → float[source]

Calculate the f1 score for multiple labelings. F1 score = 2 * precision * recall / (precision + recall). In the clustering domain the calculation is based on pair-counting as the true label ids are unknown. In contrast to common pair-counting calculations, a match between two samples counts if it occurs in at least one label set.

Parameters:

labels_true (np.ndarray) – The true set of labelings. Shape must match (n_samples, n_subspaces)
labels_pred (np.ndarray) – The predicted set of labelings. Shape must match (n_samples, n_subspaces)
remove_noise_spaces (bool) – Defines if optional noise spaces should be ignored when calculating the score (default: True)

Returns:

score – The f1 score

Return type:

float

References

Van Rijsbergen, C. J. (1979). Information Retrieval (2nd ed.). Butterworth-Heinemann.

and

Achtert, Elke, et al. “Evaluation of clusterings–metrics and visual support.” 2012 IEEE 28th International Conference on Data Engineering. IEEE, 2012.

clustpy.metrics.multiple_labelings_pc_jaccard_score(labels_true: ndarray, labels_pred: ndarray, remove_noise_spaces: bool = True) → float[source]

Calculate the jaccard score for multiple labelings. Jaccard score = n_tp / (n_tp + n_fp + n_fn). In the clustering domain the calculation is based on pair-counting as the true label ids are unknown. In contrast to common pair-counting calculations, a match between two samples counts if it occurs in at least one label set.

Parameters:

labels_true (np.ndarray) – The true set of labelings. Shape must match (n_samples, n_subspaces)
labels_pred (np.ndarray) – The predicted set of labelings. Shape must match (n_samples, n_subspaces)
remove_noise_spaces (bool) – Defines if optional noise spaces should be ignored when calculating the score (default: True)

Returns:

score – The jaccard score

Return type:

float

References

Jaccard, Paul. “Lois de distribution florale dans la zone alpine.” Bull Soc Vaudoise Sci Nat 38 (1902): 69-130.

and

Achtert, Elke, et al. “Evaluation of clusterings–metrics and visual support.” 2012 IEEE 28th International Conference on Data Engineering. IEEE, 2012.

clustpy.metrics.multiple_labelings_pc_precision_score(labels_true: ndarray, labels_pred: ndarray, remove_noise_spaces: bool = True) → float[source]

Calculate the precision for multiple labelings. Precision score = n_tp / (n_tp + n_fp). In the clustering domain the calculation is based on pair-counting as the true label ids are unknown. In contrast to common pair-counting calculations, a match between two samples counts if it occurs in at least one label set.

Parameters:

labels_true (np.ndarray) – The true set of labelings. Shape must match (n_samples, n_subspaces)
labels_pred (np.ndarray) – The predicted set of labelings. Shape must match (n_samples, n_subspaces)
remove_noise_spaces (bool) – Defines if optional noise spaces should be ignored when calculating the score (default: True)

Returns:

score – The precision

Return type:

float

References

Allen, Kent, et al. “Machine literature searching VIII. Operational criteria for designing information retrieval systems.” American Documentation (pre-1986) 6.2 (1955): 93.

and

Achtert, Elke, et al. “Evaluation of clusterings–metrics and visual support.” 2012 IEEE 28th International Conference on Data Engineering. IEEE, 2012.

clustpy.metrics.multiple_labelings_pc_rand_score(labels_true: ndarray, labels_pred: ndarray, remove_noise_spaces: bool = True) → float[source]

Calculate the rand score for multiple labelings. Rand score = (n_tp + n_tn) / (n_tp + n_fp + n_fn + n_tn). In the clustering domain the calculation is based on pair-counting as the true label ids are unknown. In contrast to common pair-counting calculations, a match between two samples counts if it occurs in at least one label set.

Parameters:

labels_true (np.ndarray) – The true set of labelings. Shape must match (n_samples, n_subspaces)
labels_pred (np.ndarray) – The predicted set of labelings. Shape must match (n_samples, n_subspaces)
remove_noise_spaces (bool) – Defines if optional noise spaces should be ignored when calculating the score (default: True)

Returns:

score – The rand score

Return type:

float

References

Rand, William M. “Objective criteria for the evaluation of clustering methods.” Journal of the American Statistical association 66.336 (1971): 846-850.

and

Achtert, Elke, et al. “Evaluation of clusterings–metrics and visual support.” 2012 IEEE 28th International Conference on Data Engineering. IEEE, 2012.

clustpy.metrics.multiple_labelings_pc_recall_score(labels_true: ndarray, labels_pred: ndarray, remove_noise_spaces: bool = True) → float[source]

Calculate the recall for multiple labelings. Recall score = n_tp / (n_tp + n_fn). In the clustering domain the calculation is based on pair-counting as the true label ids are unknown. In contrast to common pair-counting calculations, a match between two samples counts if it occurs in at least one label set.

Parameters:

labels_true (np.ndarray) – The true set of labelings. Shape must match (n_samples, n_subspaces)
labels_pred (np.ndarray) – The predicted set of labelings. Shape must match (n_samples, n_subspaces)
remove_noise_spaces (bool) – Defines if optional noise spaces should be ignored when calculating the score (default: True)

Returns:

score – The recall

Return type:

float

References

Allen, Kent, et al. “Machine literature searching VIII. Operational criteria for designing information retrieval systems.” American Documentation (pre-1986) 6.2 (1955): 93.

and

Achtert, Elke, et al. “Evaluation of clusterings–metrics and visual support.” 2012 IEEE 28th International Conference on Data Engineering. IEEE, 2012.

clustpy.metrics.pc_f1_score(labels_true: ndarray, labels_pred: ndarray) → float[source]

Calculate the f1 score. F1 score = 2 * precision * recall / (precision + recall). In the clustering domain the calculation is based on pair-counting as the true label ids are unknown.

Parameters:

labels_true (np.ndarray) – The ground truth labels of the data set
labels_pred (np.ndarray) – The labels as predicted by a clustering algorithm

Returns:

score – The f1 score

Return type:

float

References

Van Rijsbergen, C. J. (1979). Information Retrieval (2nd ed.). Butterworth-Heinemann.

and

Pfitzner, Darius, Richard Leibbrandt, and David Powers. “Characterization and evaluation of similarity measures for pairs of clusterings.” Knowledge and Information Systems 19 (2009): 361-394.

clustpy.metrics.pc_jaccard_score(labels_true: ndarray, labels_pred: ndarray) → float[source]

Calculate the jaccard score. Jaccard score = n_tp / (n_tp + n_fp + n_fn). In the clustering domain the calculation is based on pair-counting as the true label ids are unknown.

Parameters:

labels_true (np.ndarray) – The ground truth labels of the data set
labels_pred (np.ndarray) – The labels as predicted by a clustering algorithm

Returns:

score – The jaccard score

Return type:

float

References

Jaccard, Paul. “Lois de distribution florale dans la zone alpine.” Bull Soc Vaudoise Sci Nat 38 (1902): 69-130.

and

Pfitzner, Darius, Richard Leibbrandt, and David Powers. “Characterization and evaluation of similarity measures for pairs of clusterings.” Knowledge and Information Systems 19 (2009): 361-394.

clustpy.metrics.pc_precision_score(labels_true: ndarray, labels_pred: ndarray) → float[source]

Calculate the precision. Precision score = n_tp / (n_tp + n_fp). In the clustering domain the calculation is based on pair-counting as the true label ids are unknown.

Parameters:

labels_true (np.ndarray) – The ground truth labels of the data set
labels_pred (np.ndarray) – The labels as predicted by a clustering algorithm

Returns:

score – The precision score

Return type:

float

References

Allen, Kent, et al. “Machine literature searching VIII. Operational criteria for designing information retrieval systems.” American Documentation (pre-1986) 6.2 (1955): 93.

and

Pfitzner, Darius, Richard Leibbrandt, and David Powers. “Characterization and evaluation of similarity measures for pairs of clusterings.” Knowledge and Information Systems 19 (2009): 361-394.

clustpy.metrics.pc_rand_score(labels_true: ndarray, labels_pred: ndarray) → float[source]

Calculate the rand score. Rand score = (n_tp + n_tn) / (n_tp + n_fp + n_fn + n_tn). In the clustering domain the calculation is based on pair-counting as the true label ids are unknown.

Parameters:

labels_true (np.ndarray) – The ground truth labels of the data set
labels_pred (np.ndarray) – The labels as predicted by a clustering algorithm

Returns:

score – The rand score

Return type:

float

References

Rand, William M. “Objective criteria for the evaluation of clustering methods.” Journal of the American Statistical association 66.336 (1971): 846-850.

and

Pfitzner, Darius, Richard Leibbrandt, and David Powers. “Characterization and evaluation of similarity measures for pairs of clusterings.” Knowledge and Information Systems 19 (2009): 361-394.

clustpy.metrics.pc_recall_score(labels_true: ndarray, labels_pred: ndarray) → float[source]

Calculate the recall. Recall score = n_tp / (n_tp + n_fn). In the clustering domain the calculation is based on pair-counting as the true label ids are unknown.

Parameters:

labels_true (np.ndarray) – The ground truth labels of the data set
labels_pred (np.ndarray) – The labels as predicted by a clustering algorithm

Returns:

score – The recall score

Return type:

float

References

Allen, Kent, et al. “Machine literature searching VIII. Operational criteria for designing information retrieval systems.” American Documentation (pre-1986) 6.2 (1955): 93.

and

Pfitzner, Darius, Richard Leibbrandt, and David Powers. “Characterization and evaluation of similarity measures for pairs of clusterings.” Knowledge and Information Systems 19 (2009): 361-394.

clustpy.metrics.remove_noise_spaces_from_labels(labels: ndarray) → ndarray[source]

Remove optional noise spaces (n_clusters=1) from labels. If outliers are present (label=-1) but all non-outlier labels (label>=0) are equal, the label column will still be regarded as noise space.

Parameters:: labels (np.ndarray) – The input labels
Returns:: labels_new – The output labels
Return type:: np.ndarray

clustpy.metrics.unsupervised_clustering_accuracy(labels_true: ndarray, labels_pred: ndarray) → float[source]

Evaluate the quality of predicted labels by comparing it to the ground truth labels using the clustering accuracy. Returns a value between 1.0 (perfect match) and 0.0 (arbitrary result). Since the id of a cluster is not fixed in a clustering setting, the clustering accuracy evaluates each possible combination with the ground truth labels.

Parameters:

labels_true (np.ndarray) – The ground truth labels of the data set
labels_pred (np.ndarray) – The labels as predicted by a clustering algorithm

Returns:

acc – The accuracy between the two input label sets.

Return type:

float

References

Yang, Yi, et al. “Image clustering using local discriminant models and global integration.” IEEE Transactions on Image Processing 19.10 (2010): 2761-2773.

clustpy.metrics.variation_of_information(labels_true: ndarray, labels_pred: ndarray) → float[source]

Calculate the variation of information between the ground truth labels and the predicted labels. Returns a minimum value of 0.0 which corresponds to a perfect match. Implemented as defined in https://en.wikipedia.org/wiki/Variation_of_information

Parameters:

labels_true (np.ndarray) – The ground truth labels of the data set
labels_pred (np.ndarray) – The labels as predicted by a clustering algorithm

Returns:

vi – The variation of information

Return type:

float

References

Meilă, Marina. “Comparing clusterings by the variation of information.” Learning theory and kernel machines. Springer, Berlin, Heidelberg, 2003. 173-187.