madmom.audio.spectrogram¶

This module contains spectrogram related functionality.

madmom.audio.spectrogram.spec(stft)[source]¶

Computes the magnitudes of the complex Short Time Fourier Transform of a signal.

Parameters:

stft : numpy array

Complex STFT of a signal.

Returns:

spec : numpy array

Magnitude spectrogram.

class madmom.audio.spectrogram.Spectrogram(stft, **kwargs)[source]¶

A Spectrogram represents the magnitude spectrogram of a audio.stft.ShortTimeFourierTransform.

Parameters:

stft : audio.stft.ShortTimeFourierTransform instance

Short Time Fourier Transform.

kwargs : dict, optional

If no audio.stft.ShortTimeFourierTransform instance was given, one is instantiated with these additional keyword arguments.

Attributes

stft	(`audio.stft.ShortTimeFourierTransform` instance) Underlying ShortTimeFourierTransform instance.
frames	(`audio.signal.FramedSignal` instance) Underlying FramedSignal instance.

diff(**kwargs)[source]¶

Return the difference of the magnitude spectrogram.

Parameters:

kwargs : dict

Keyword arguments passed to SpectrogramDifference.

Returns:

diff : SpectrogramDifference instance

The differences of the magnitude spectrogram.

filter(**kwargs)[source]¶

Return a filtered version of the magnitude spectrogram.

Parameters:

kwargs : dict

Keyword arguments passed to FilteredSpectrogram.

Returns:

filt_spec : FilteredSpectrogram instance

Filtered version of the magnitude spectrogram.

log(**kwargs)[source]¶

Return a logarithmically scaled version of the magnitude spectrogram.

Parameters:

kwargs : dict

Keyword arguments passed to LogarithmicSpectrogram.

Returns:

log_spec : LogarithmicSpectrogram instance

Logarithmically scaled version of the magnitude spectrogram.

class madmom.audio.spectrogram.SpectrogramProcessor(**kwargs)[source]¶

SpectrogramProcessor class.

process(data, **kwargs)[source]¶

Create a Spectrogram from the given data.

Parameters:

data : numpy array

Data to be processed.

kwargs : dict

Keyword arguments passed to Spectrogram.

Returns:

spec : Spectrogram instance

Spectrogram.

class madmom.audio.spectrogram.FilteredSpectrogram(spectrogram, filterbank=<class 'madmom.audio.filters.LogarithmicFilterbank'>, num_bands=12, fmin=30.0, fmax=17000.0, fref=440.0, norm_filters=True, unique_filters=True, **kwargs)[source]¶

FilteredSpectrogram class.

Parameters:

spectrogram : Spectrogram instance

Spectrogram.

filterbank : audio.filters.Filterbank, optional

Filterbank class or instance; if a class is given (rather than an instance), one will be created with the given type and parameters.

num_bands : int, optional

Number of filter bands (per octave, depending on the type of the filterbank).

fmin : float, optional

Minimum frequency of the filterbank [Hz].

fmax : float, optional

Maximum frequency of the filterbank [Hz].

fref : float, optional

Tuning frequency of the filterbank [Hz].

norm_filters : bool, optional

Normalize the filter bands of the filterbank to area 1.

unique_filters : bool, optional

Indicate if the filterbank should contain only unique filters, i.e. remove duplicate filters resulting from insufficient resolution at low frequencies.

kwargs : dict, optional

If no Spectrogram instance was given, one is instantiated with these additional keyword arguments.

class madmom.audio.spectrogram.FilteredSpectrogramProcessor(filterbank=<class 'madmom.audio.filters.LogarithmicFilterbank'>, num_bands=12, fmin=30.0, fmax=17000.0, fref=440.0, norm_filters=True, unique_filters=True, **kwargs)[source]¶

FilteredSpectrogramProcessor class.

Parameters:

filterbank : audio.filters.Filterbank

Filterbank used to filter a spectrogram.

num_bands : int

Number of bands (per octave).

fmin : float, optional

Minimum frequency of the filterbank [Hz].

fmax : float, optional

Maximum frequency of the filterbank [Hz].

fref : float, optional

Tuning frequency of the filterbank [Hz].

norm_filters : bool, optional

Normalize the filter of the filterbank to area 1.

unique_filters : bool, optional

Indicate if the filterbank should contain only unique filters, i.e. remove duplicate filters resulting from insufficient resolution at low frequencies.

process(data, **kwargs)[source]¶

Create a FilteredSpectrogram from the given data.

Parameters:

data : numpy array

Data to be processed.

kwargs : dict

Keyword arguments passed to FilteredSpectrogram.

Returns:

filt_spec : FilteredSpectrogram instance

Filtered spectrogram.

class madmom.audio.spectrogram.LogarithmicSpectrogram(spectrogram, mul=1.0, add=1.0, **kwargs)[source]¶

LogarithmicSpectrogram class.

Parameters:

spectrogram : Spectrogram instance

Spectrogram.

mul : float, optional

Multiply the magnitude spectrogram with this factor before taking the logarithm.

add : float, optional

Add this value before taking the logarithm of the magnitudes.

kwargs : dict, optional

If no Spectrogram instance was given, one is instantiated with these additional keyword arguments.

class madmom.audio.spectrogram.LogarithmicSpectrogramProcessor(mul=1.0, add=1.0, **kwargs)[source]¶

Logarithmic Spectrogram Processor class.

Parameters:

mul : float, optional

Multiply the magnitude spectrogram with this factor before taking the logarithm.

add : float, optional

Add this value before taking the logarithm of the magnitudes.

process(data, **kwargs)[source]¶

Perform logarithmic scaling of a spectrogram.

Parameters:

data : numpy array

Data to be processed.

kwargs : dict

Keyword arguments passed to LogarithmicSpectrogram.

Returns:

log_spec : LogarithmicSpectrogram instance

Logarithmically scaled spectrogram.

static add_arguments(parser, log=None, mul=None, add=None)[source]¶

Add spectrogram scaling related arguments to an existing parser.

Parameters:

parser : argparse parser instance

Existing argparse parser object.

log : bool, optional

Take the logarithm of the spectrogram.

mul : float, optional

Multiply the magnitude spectrogram with this factor before taking the logarithm.

add : float, optional

Add this value before taking the logarithm of the magnitudes.

Returns:

argparse argument group

Spectrogram scaling argument parser group.

Notes

Parameters are included in the group only if they are not ‘None’.

class madmom.audio.spectrogram.LogarithmicFilteredSpectrogram(spectrogram, **kwargs)[source]¶

LogarithmicFilteredSpectrogram class.

Parameters:

spectrogram : FilteredSpectrogram instance

Filtered spectrogram.

kwargs : dict, optional

If no FilteredSpectrogram instance was given, one is instantiated with these additional keyword arguments and logarithmically scaled afterwards, i.e. passed to LogarithmicSpectrogram.

Notes

For the filtering and scaling parameters, please refer to FilteredSpectrogram and LogarithmicSpectrogram.

class madmom.audio.spectrogram.LogarithmicFilteredSpectrogramProcessor(filterbank=<class 'madmom.audio.filters.LogarithmicFilterbank'>, num_bands=12, fmin=30.0, fmax=17000.0, fref=440.0, norm_filters=True, unique_filters=True, mul=1.0, add=1.0, **kwargs)[source]¶

Logarithmic Filtered Spectrogram Processor class.

Parameters:

filterbank : audio.filters.Filterbank

Filterbank used to filter a spectrogram.

num_bands : int

Number of bands (per octave).

fmin : float, optional

Minimum frequency of the filterbank [Hz].

fmax : float, optional

Maximum frequency of the filterbank [Hz].

fref : float, optional

Tuning frequency of the filterbank [Hz].

norm_filters : bool, optional

Normalize the filter of the filterbank to area 1.

unique_filters : bool, optional

Indicate if the filterbank should contain only unique filters, i.e. remove duplicate filters resulting from insufficient resolution at low frequencies.

mul : float, optional

Multiply the magnitude spectrogram with this factor before taking the logarithm.

add : float, optional

Add this value before taking the logarithm of the magnitudes.

process(data, **kwargs)[source]¶

Perform filtering and logarithmic scaling of a spectrogram.

Parameters:

data : numpy array

Data to be processed.

kwargs : dict

Keyword arguments passed to LogarithmicFilteredSpectrogram.

Returns:

log_filt_spec : LogarithmicFilteredSpectrogram instance

Logarithmically scaled filtered spectrogram.

class madmom.audio.spectrogram.SpectrogramDifference(spectrogram, diff_ratio=0.5, diff_frames=None, diff_max_bins=None, positive_diffs=False, **kwargs)[source]¶

SpectrogramDifference class.

Parameters:

spectrogram : Spectrogram instance

Spectrogram.

diff_ratio : float, optional

Calculate the difference to the frame at which the window used for the STFT yields this ratio of the maximum height.

diff_frames : int, optional

Calculate the difference to the diff_frames-th previous frame (if set, this overrides the value calculated from the diff_ratio)

diff_max_bins : int, optional

Apply a maximum filter with this width (in bins in frequency dimension) to the spectrogram the difference is calculated to.

positive_diffs : bool, optional

Keep only the positive differences, i.e. set all diff values < 0 to 0.

kwargs : dict, optional

If no Spectrogram instance was given, one is instantiated with these additional keyword arguments.

Notes

The SuperFlux algorithm [R1] uses a maximum filtered spectrogram with 3 diff_max_bins together with a 24 band logarithmic filterbank to calculate the difference spectrogram with a diff_ratio of 0.5.

The effect of this maximum filter applied to the spectrogram is that the magnitudes are “widened” in frequency direction, i.e. the following difference calculation is less sensitive against frequency fluctuations. This effect is exploitet to suppress false positive energy fragments for onsets detection originating from vibrato.

References

[R1]	(1, 2) Sebastian Böck and Gerhard Widmer “Maximum Filter Vibrato Suppression for Onset Detection” Proceedings of the 16th International Conference on Digital Audio Effects (DAFx), 2013.

positive_diff()[source]¶: Positive diff.

class madmom.audio.spectrogram.SpectrogramDifferenceProcessor(diff_ratio=0.5, diff_frames=None, diff_max_bins=None, positive_diffs=False, stack_diffs=None, **kwargs)[source]¶

Difference Spectrogram Processor class.

Parameters:

diff_ratio : float, optional

Calculate the difference to the frame at which the window used for the STFT yields this ratio of the maximum height.

diff_frames : int, optional

Calculate the difference to the diff_frames-th previous frame (if set, this overrides the value calculated from the diff_ratio)

diff_max_bins : int, optional

Apply a maximum filter with this width (in bins in frequency dimension) to the spectrogram the difference is calculated to.

positive_diffs : bool, optional

Keep only the positive differences, i.e. set all diff values < 0 to 0.

stack_diffs : numpy stacking function, optional

If ‘None’, only the differences are returned. If set, the diffs are stacked with the underlying spectrogram data according to the stack function:

np.vstack the differences and spectrogram are stacked vertically, i.e. in time direction,

np.hstack the differences and spectrogram are stacked horizontally, i.e. in frequency direction,

np.dstack the differences and spectrogram are stacked in depth, i.e. return them as a 3D representation with depth as the third dimension.

process(data, **kwargs)[source]¶

Perform a temporal difference calculation on the given data.

Parameters:

data : numpy array

Data to be processed.

kwargs : dict

Keyword arguments passed to SpectrogramDifference.

Returns:

diff : SpectrogramDifference instance

Spectrogram difference.

static add_arguments(parser, diff=None, diff_ratio=None, diff_frames=None, diff_max_bins=None, positive_diffs=None)[source]¶

Add spectrogram difference related arguments to an existing parser.

Parameters:

parser : argparse parser instance

Existing argparse parser object.

diff : bool, optional

Take the difference of the spectrogram.

diff_ratio : float, optional

Calculate the difference to the frame at which the window used for the STFT yields this ratio of the maximum height.

diff_frames : int, optional

Calculate the difference to the diff_frames-th previous frame (if set, this overrides the value calculated from the diff_ratio)

diff_max_bins : int, optional

Apply a maximum filter with this width (in bins in frequency dimension) to the spectrogram the difference is calculated to.

positive_diffs : bool, optional

Keep only the positive differences, i.e. set all diff values < 0 to 0.

Returns:

argparse argument group

Spectrogram difference argument parser group.

Notes

Parameters are included in the group only if they are not ‘None’.

Only the diff_frames parameter behaves differently, it is included if either the diff_ratio is set or a value != ‘None’ is given.

class madmom.audio.spectrogram.SuperFluxProcessor(**kwargs)[source]¶: Spectrogram processor which sets the default values suitable for the SuperFlux algorithm.

class madmom.audio.spectrogram.MultiBandSpectrogram(spectrogram, crossover_frequencies, fmin=30.0, fmax=17000.0, norm_filters=True, unique_filters=True, **kwargs)[source]¶

MultiBandSpectrogram class.

Parameters:

spectrogram : Spectrogram instance

Spectrogram.

crossover_frequencies : list or numpy array

List of crossover frequencies at which the spectrogram is split into multiple bands.

fmin : float, optional

Minimum frequency of the filterbank [Hz].

fmax : float, optional

Maximum frequency of the filterbank [Hz].

norm_filters : bool, optional

Normalize the filter bands of the filterbank to area 1.

unique_filters : bool, optional

Indicate if the filterbank should contain only unique filters, i.e. remove duplicate filters resulting from insufficient resolution at low frequencies.

kwargs : dict, optional

If no Spectrogram instance was given, one is instantiated with these additional keyword arguments.

Notes

The MultiBandSpectrogram is implemented as a Spectrogram which uses a audio.filters.RectangularFilterbank to combine multiple frequency bins.

class madmom.audio.spectrogram.MultiBandSpectrogramProcessor(crossover_frequencies, fmin=30.0, fmax=17000.0, norm_filters=True, unique_filters=True, **kwargs)[source]¶

Spectrogram processor which combines the spectrogram magnitudes into multiple bands.

Parameters:

crossover_frequencies : list or numpy array

List of crossover frequencies at which a spectrogram is split into the individual bands.

fmin : float, optional

Minimum frequency of the filterbank [Hz].

fmax : float, optional

Maximum frequency of the filterbank [Hz].

norm_filters : bool, optional

Normalize the filter bands of the filterbank to area 1.

unique_filters : bool, optional

Indicate if the filterbank should contain only unique filters, i.e. remove duplicate filters resulting from insufficient resolution at low frequencies.

process(data, **kwargs)[source]¶

Return the a multi-band representation of the given data.

Parameters:

data : numpy array

Data to be processed.

kwargs : dict

Keyword arguments passed to MultiBandSpectrogram.

Returns:

multi_band_spec : MultiBandSpectrogram instance

Spectrogram split into multiple bands.

class madmom.audio.spectrogram.StackedSpectrogramProcessor[source]¶

Deprecated in v0.13, will be removed in v0.14.

Functionality added to SpectrogramDifferenceProcessor as stack_diffs argument.