madmom.ml.gmm

This module contains functionality needed for fitting and scoring Gaussian Mixture Models (GMMs) (needed e.g. in madmom.features.beats).

The needed functionality is taken from sklearn.mixture.GMM which is released under the BSD license and was written by these authors:

This version works with sklearn v0.16 (and hopefully onwards). All commits until 0650d5502e01e6b4245ce99729fc8e7a71aacff3 are incorporated.

madmom.ml.gmm.logsumexp(arr, axis=0)[source]

Computes the sum of arr assuming arr is in the log domain.

Parameters:

arr : numpy array

Input data [log domain].

axis : int, optional

Axis to operate on.

Returns:

numpy array

log(sum(exp(arr))) while minimizing the possibility of over/underflow.

Notes

Function copied from sklearn.utils.extmath.

madmom.ml.gmm.pinvh(a, cond=None, rcond=None, lower=True)[source]

Compute the (Moore-Penrose) pseudo-inverse of a hermetian matrix.

Calculate a generalized inverse of a symmetric matrix using its eigenvalue decomposition and including all ‘large’ eigenvalues.

Parameters:

a : array, shape (N, N)

Real symmetric or complex hermetian matrix to be pseudo-inverted.

cond, rcond : float or None

Cutoff for ‘small’ eigenvalues. Singular values smaller than rcond * largest_eigenvalue are considered zero. If None or -1, suitable machine precision is used.

lower : boolean

Whether the pertinent array data is taken from the lower or upper triangle of a.

Returns:

B : array, shape (N, N)

Raises:

LinAlgError

If eigenvalue does not converge

Notes

Function copied from sklearn.utils.extmath.

madmom.ml.gmm.log_multivariate_normal_density(x, means, covars, covariance_type='diag')[source]

Compute the log probability under a multivariate Gaussian distribution.

Parameters:

x : array_like, shape (n_samples, n_features)

List of n_features-dimensional data points. Each row corresponds to a single data point.

means : array_like, shape (n_components, n_features)

List of n_features-dimensional mean vectors for n_components Gaussians. Each row corresponds to a single mean vector.

covars : array_like

List of n_components covariance parameters for each Gaussian. The shape depends on covariance_type:

  • (n_components, n_features) if ‘spherical’,
  • (n_features, n_features) if ‘tied’,
  • (n_components, n_features) if ‘diag’,
  • (n_components, n_features, n_features) if ‘full’.

covariance_type : {‘diag’, ‘spherical’, ‘tied’, ‘full’}

Type of the covariance parameters. Defaults to ‘diag’.

Returns:

lpr : array_like, shape (n_samples, n_components)

Array containing the log probabilities of each data point in x under each of the n_components multivariate Gaussian distributions.

class madmom.ml.gmm.GMM(n_components=1, covariance_type='full')[source]

Gaussian Mixture Model

Representation of a Gaussian mixture model probability distribution. This class allows for easy evaluation of, sampling from, and maximum-likelihood estimation of the parameters of a GMM distribution.

Initializes parameters such that every mixture component has zero mean and identity covariance.

Parameters:

n_components : int, optional

Number of mixture components. Defaults to 1.

covariance_type : {‘diag’, ‘spherical’, ‘tied’, ‘full’}

String describing the type of covariance parameters to use. Defaults to ‘diag’.

See also

sklearn.mixture.GMM

Attributes

weights_ (array, shape (n_components,)) This attribute stores the mixing weights for each mixture component.
means_ (array, shape (n_components, n_features)) Mean parameters for each mixture component.
covars_ (array) Covariance parameters for each mixture component. The shape depends on covariance_type.:: - (n_components, n_features) if ‘spherical’, - (n_features, n_features) if ‘tied’, - (n_components, n_features) if ‘diag’, - (n_components, n_features, n_features) if ‘full’.
converged_ (bool) True when convergence was reached in fit(), False otherwise.
score_samples(x)[source]

Return the per-sample likelihood of the data under the model.

Compute the log probability of x under the model and return the posterior distribution (responsibilities) of each mixture component for each element of x.

Parameters:

x: array_like, shape (n_samples, n_features)

List of n_features-dimensional data points. Each row corresponds to a single data point.

Returns:

log_prob : array_like, shape (n_samples,)

Log probabilities of each data point in x.

responsibilities : array_like, shape (n_samples, n_components)

Posterior probabilities of each mixture component for each observation.

score(x)[source]

Compute the log probability under the model.

Parameters:

x : array_like, shape (n_samples, n_features)

List of n_features-dimensional data points. Each row corresponds to a single data point.

Returns:

log_prob : array_like, shape (n_samples,)

Log probabilities of each data point in x.

fit(x, random_state=None, tol=0.001, min_covar=0.001, n_iter=100, n_init=1, params='wmc', init_params='wmc')[source]

Estimate model parameters with the expectation-maximization algorithm.

A initialization step is performed before entering the em algorithm. If you want to avoid this step, set the keyword argument init_params to the empty string ‘’ when creating the GMM object. Likewise, if you would like just to do an initialization, set n_iter=0.

Parameters:

x : array_like, shape (n, n_features)

List of n_features-dimensional data points. Each row corresponds to a single data point.

random_state: RandomState or an int seed (0 by default)

A random number generator instance.

min_covar : float, optional

Floor on the diagonal of the covariance matrix to prevent overfitting.

tol : float, optional

Convergence threshold. EM iterations will stop when average gain in log-likelihood is below this threshold.

n_iter : int, optional

Number of EM iterations to perform.

n_init : int, optional

Number of initializations to perform, the best results is kept.

params : str, optional

Controls which parameters are updated in the training process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars.

init_params : str, optional

Controls which parameters are updated in the initialization process. Can contain any combination of ‘w’ for weights, ‘m’ for means, and ‘c’ for covars.