madmom.audio.spectrogram¶

This module contains spectrogram related functionality.

madmom.audio.spectrogram.spec(stft)[source]¶

Computes the magnitudes of the complex Short Time Fourier Transform of a signal.

Parameters:	stft : numpy array Complex STFT of a signal.
Returns:	spec : numpy array Magnitude spectrogram.

class madmom.audio.spectrogram.Spectrogram(stft, **kwargs)[source]¶

A Spectrogram represents the magnitude spectrogram of a audio.stft.ShortTimeFourierTransform.

Parameters:	stft : `audio.stft.ShortTimeFourierTransform` instance Short Time Fourier Transform. kwargs : dict, optional If no `audio.stft.ShortTimeFourierTransform` instance was given, one is instantiated with these additional keyword arguments.

Examples

Create a Spectrogram from a audio.stft.ShortTimeFourierTransform (or anything it can be instantiated from:

>>> spec = Spectrogram('tests/data/audio/sample.wav')
>>> spec  
Spectrogram([[ 3.15249,  4.00272, ...,  0.03634,  0.03671],
             [ 4.28429,  2.85158, ...,  0.0219 ,  0.02227],
             ...,
             [ 4.92274, 10.27775, ...,  0.00607,  0.00593],
             [ 9.22709,  9.6387 , ...,  0.00981,  0.00984]], dtype=float32)

num_frames¶: Number of frames.

num_bins¶: Number of bins.

bin_frequencies¶: Bin frequencies.

diff(**kwargs)[source]¶

Return the difference of the magnitude spectrogram.

Parameters:	kwargs : dict Keyword arguments passed to `SpectrogramDifference`.
Returns:	diff : `SpectrogramDifference` instance The differences of the magnitude spectrogram.

filter(**kwargs)[source]¶

Return a filtered version of the magnitude spectrogram.

Parameters:	kwargs : dict Keyword arguments passed to `FilteredSpectrogram`.
Returns:	filt_spec : `FilteredSpectrogram` instance Filtered version of the magnitude spectrogram.

log(**kwargs)[source]¶

Return a logarithmically scaled version of the magnitude spectrogram.

Parameters:	kwargs : dict Keyword arguments passed to `LogarithmicSpectrogram`.
Returns:	log_spec : `LogarithmicSpectrogram` instance Logarithmically scaled version of the magnitude spectrogram.

class madmom.audio.spectrogram.SpectrogramProcessor(**kwargs)[source]¶

SpectrogramProcessor class.

process(data, **kwargs)[source]¶

Create a Spectrogram from the given data.

Parameters:	data : numpy array Data to be processed. kwargs : dict Keyword arguments passed to `Spectrogram`.
Returns:	spec : `Spectrogram` instance Spectrogram.

class madmom.audio.spectrogram.FilteredSpectrogram(spectrogram, filterbank=<class 'madmom.audio.filters.LogarithmicFilterbank'>, num_bands=12, fmin=30.0, fmax=17000.0, fref=440.0, norm_filters=True, unique_filters=True, **kwargs)[source]¶

FilteredSpectrogram class.

Parameters:

spectrogram : Spectrogram instance: Spectrogram.
filterbank : audio.filters.Filterbank, optional: Filterbank class or instance; if a class is given (rather than an instance), one will be created with the given type and parameters.
num_bands : int, optional: Number of filter bands (per octave, depending on the type of the filterbank).
fmin : float, optional: Minimum frequency of the filterbank [Hz].
fmax : float, optional: Maximum frequency of the filterbank [Hz].
fref : float, optional: Tuning frequency of the filterbank [Hz].
norm_filters : bool, optional: Normalize the filter bands of the filterbank to area 1.
unique_filters : bool, optional: Indicate if the filterbank should contain only unique filters, i.e. remove duplicate filters resulting from insufficient resolution at low frequencies.
kwargs : dict, optional: If no Spectrogram instance was given, one is instantiated with these additional keyword arguments.

Examples

Create a FilteredSpectrogram from a Spectrogram (or anything it can be instantiated from. Per default a madmom.audio.filters.LogarithmicFilterbank with 12 bands per octave is used.

>>> spec = FilteredSpectrogram('tests/data/audio/sample.wav')
>>> spec  
FilteredSpectrogram([[ 5.66156, 6.30141, ..., 0.05426, 0.06461],
                     [ 8.44266, 8.69582, ..., 0.07703, 0.0902 ],
                     ...,
                     [10.04626, 1.12018, ..., 0.0487 , 0.04282],
                     [ 8.60186, 6.81195, ..., 0.03721, 0.03371]],
                    dtype=float32)

The resulting spectrogram has fewer frequency bins, with the centers of the bins aligned logarithmically (lower frequency bins still have a linear spacing due to the coarse resolution of the DFT at low frequencies):

>>> spec.shape
(281, 81)
>>> spec.num_bins
81
>>> spec.bin_frequencies  
array([    43.06641,    64.59961,    86.13281,   107.66602,
          129.19922,   150.73242,   172.26562,   193.79883, ...,
        10551.26953, 11175.73242, 11843.26172, 12553.85742,
        13285.98633, 14082.71484, 14922.50977, 15805.37109])

The filterbank used to filter the spectrogram is saved as an attribute:

>>> spec.filterbank  
LogarithmicFilterbank([[0., 0., ..., 0., 0.],
                       [0., 0., ..., 0., 0.],
                       ...,
                       [0., 0., ..., 0., 0.],
                       [0., 0., ..., 0., 0.]], dtype=float32)
>>> spec.filterbank.num_bands
81

The filterbank can be chosen at instantiation time:

>>> from madmom.audio.filters import MelFilterbank
>>> spec = FilteredSpectrogram('tests/data/audio/sample.wav',     filterbank=MelFilterbank, num_bands=40)
>>> type(spec.filterbank)
<class 'madmom.audio.filters.MelFilterbank'>
>>> spec.shape
(281, 40)

bin_frequencies¶: Bin frequencies.

class madmom.audio.spectrogram.FilteredSpectrogramProcessor(filterbank=<class 'madmom.audio.filters.LogarithmicFilterbank'>, num_bands=12, fmin=30.0, fmax=17000.0, fref=440.0, norm_filters=True, unique_filters=True, **kwargs)[source]¶

FilteredSpectrogramProcessor class.

Parameters:

filterbank : audio.filters.Filterbank: Filterbank used to filter a spectrogram.
num_bands : int: Number of bands (per octave).
fmin : float, optional: Minimum frequency of the filterbank [Hz].
fmax : float, optional: Maximum frequency of the filterbank [Hz].
fref : float, optional: Tuning frequency of the filterbank [Hz].
norm_filters : bool, optional: Normalize the filter of the filterbank to area 1.
unique_filters : bool, optional: Indicate if the filterbank should contain only unique filters, i.e. remove duplicate filters resulting from insufficient resolution at low frequencies.

process(data, **kwargs)[source]¶

Create a FilteredSpectrogram from the given data.

Parameters:	data : numpy array Data to be processed. kwargs : dict Keyword arguments passed to `FilteredSpectrogram`.
Returns:	filt_spec : `FilteredSpectrogram` instance Filtered spectrogram.

class madmom.audio.spectrogram.LogarithmicSpectrogram(spectrogram, log=<ufunc 'log10'>, mul=1.0, add=1.0, **kwargs)[source]¶

LogarithmicSpectrogram class.

Parameters:

spectrogram : Spectrogram instance: Spectrogram.
log : numpy ufunc, optional: Logarithmic scaling function to apply.
mul : float, optional: Multiply the magnitude spectrogram with this factor before taking the logarithm.
add : float, optional: Add this value before taking the logarithm of the magnitudes.
kwargs : dict, optional: If no Spectrogram instance was given, one is instantiated with these additional keyword arguments.

Examples

Create a LogarithmicSpectrogram from a Spectrogram (or anything it can be instantiated from. Per default np.log10 is used as the scaling function and a value of 1 is added to avoid negative values.

>>> spec = LogarithmicSpectrogram('tests/data/audio/sample.wav')
>>> spec  
LogarithmicSpectrogram([[...]], dtype=float32)
>>> spec.min()
LogarithmicSpectrogram(0., dtype=float32)

filterbank¶: Filterbank.

bin_frequencies¶: Bin frequencies.

class madmom.audio.spectrogram.LogarithmicSpectrogramProcessor(log=<ufunc 'log10'>, mul=1.0, add=1.0, **kwargs)[source]¶

Logarithmic Spectrogram Processor class.

Parameters:	log : numpy ufunc, optional Loagrithmic scaling function to apply. mul : float, optional Multiply the magnitude spectrogram with this factor before taking the logarithm. add : float, optional Add this value before taking the logarithm of the magnitudes.

process(data, **kwargs)[source]¶

Perform logarithmic scaling of a spectrogram.

Parameters:	data : numpy array Data to be processed. kwargs : dict Keyword arguments passed to `LogarithmicSpectrogram`.
Returns:	log_spec : `LogarithmicSpectrogram` instance Logarithmically scaled spectrogram.

static add_arguments(parser, log=None, mul=None, add=None)[source]¶

Add spectrogram scaling related arguments to an existing parser.

Parameters:	parser : argparse parser instance Existing argparse parser object. log : bool, optional Take the logarithm of the spectrogram. mul : float, optional Multiply the magnitude spectrogram with this factor before taking the logarithm. add : float, optional Add this value before taking the logarithm of the magnitudes.
Returns:	argparse argument group Spectrogram scaling argument parser group.

Notes

Parameters are included in the group only if they are not ‘None’.

class madmom.audio.spectrogram.LogarithmicFilteredSpectrogram(spectrogram, **kwargs)[source]¶

LogarithmicFilteredSpectrogram class.

Parameters:	spectrogram : `FilteredSpectrogram` instance Filtered spectrogram. kwargs : dict, optional If no `FilteredSpectrogram` instance was given, one is instantiated with these additional keyword arguments and logarithmically scaled afterwards, i.e. passed to `LogarithmicSpectrogram`.

Notes

For the filtering and scaling parameters, please refer to FilteredSpectrogram and LogarithmicSpectrogram.

Examples

Create a LogarithmicFilteredSpectrogram from a Spectrogram (or anything it can be instantiated from. This is mainly a convenience class which first filters the spectrogram and then scales it logarithmically.

>>> spec = LogarithmicFilteredSpectrogram('tests/data/audio/sample.wav')
>>> spec  
LogarithmicFilteredSpectrogram([[0.82358, 0.86341, ..., 0.02295, 0.02719],
                                [0.97509, 0.98658, ..., 0.03223, 0.0375 ],
                                ...,
                                [1.04322, 0.32637, ..., 0.02065, 0.01821],
                                [0.98236, 0.89276, ..., 0.01587, 0.0144 ]],
                                dtype=float32)
>>> spec.shape
(281, 81)
>>> spec.filterbank  
LogarithmicFilterbank([[...]], dtype=float32)
>>> spec.min()  
LogarithmicFilteredSpectrogram(0.00831, dtype=float32)

filterbank¶: Filterbank.

bin_frequencies¶: Bin frequencies.

class madmom.audio.spectrogram.LogarithmicFilteredSpectrogramProcessor(filterbank=<class 'madmom.audio.filters.LogarithmicFilterbank'>, num_bands=12, fmin=30.0, fmax=17000.0, fref=440.0, norm_filters=True, unique_filters=True, mul=1.0, add=1.0, **kwargs)[source]¶

Logarithmic Filtered Spectrogram Processor class.

Parameters:

filterbank : audio.filters.Filterbank: Filterbank used to filter a spectrogram.
num_bands : int: Number of bands (per octave).
fmin : float, optional: Minimum frequency of the filterbank [Hz].
fmax : float, optional: Maximum frequency of the filterbank [Hz].
fref : float, optional: Tuning frequency of the filterbank [Hz].
norm_filters : bool, optional: Normalize the filter of the filterbank to area 1.
unique_filters : bool, optional: Indicate if the filterbank should contain only unique filters, i.e. remove duplicate filters resulting from insufficient resolution at low frequencies.
mul : float, optional: Multiply the magnitude spectrogram with this factor before taking the logarithm.
add : float, optional: Add this value before taking the logarithm of the magnitudes.

process(data, **kwargs)[source]¶

Perform filtering and logarithmic scaling of a spectrogram.

Parameters:	data : numpy array Data to be processed. kwargs : dict Keyword arguments passed to `LogarithmicFilteredSpectrogram`.
Returns:	log_filt_spec : `LogarithmicFilteredSpectrogram` instance Logarithmically scaled filtered spectrogram.

class madmom.audio.spectrogram.SpectrogramDifference(spectrogram, diff_ratio=0.5, diff_frames=None, diff_max_bins=None, positive_diffs=False, keep_dims=True, **kwargs)[source]¶

SpectrogramDifference class.

Parameters:

spectrogram : Spectrogram instance: Spectrogram.
diff_ratio : float, optional: Calculate the difference to the frame at which the window used for the STFT yields this ratio of the maximum height.
diff_frames : int, optional: Calculate the difference to the diff_frames-th previous frame (if set, this overrides the value calculated from the diff_ratio)
diff_max_bins : int, optional: Apply a maximum filter with this width (in bins in frequency dimension) to the spectrogram the difference is calculated to.
positive_diffs : bool, optional: Keep only the positive differences, i.e. set all diff values < 0 to 0.
keep_dims : bool, optional: Indicate if the dimensions (i.e. shape) of the spectrogram should be kept.
kwargs : dict, optional: If no Spectrogram instance was given, one is instantiated with these additional keyword arguments.

Notes

The first diff_frames frames will have a value of 0.

If keep_dims is ‘True’ the returned difference has the same shape as the spectrogram. This is needed if the diffs should be stacked on top of it. If set to ‘False’, the length will be diff_frames frames shorter (mostly used by the SpectrogramDifferenceProcessor which first buffers that many frames.

The SuperFlux algorithm [1] uses a maximum filtered spectrogram with 3 diff_max_bins together with a 24 band logarithmic filterbank to calculate the difference spectrogram with a diff_ratio of 0.5.

The effect of this maximum filter applied to the spectrogram is that the magnitudes are “widened” in frequency direction, i.e. the following difference calculation is less sensitive against frequency fluctuations. This effect is exploited to suppress false positive energy fragments originating from vibrato.

References

[1]	(1, 2) Sebastian Böck and Gerhard Widmer “Maximum Filter Vibrato Suppression for Onset Detection” Proceedings of the 16th International Conference on Digital Audio Effects (DAFx), 2013.

Examples

To obtain the SuperFlux feature as described above first create a filtered and logarithmically spaced spectrogram:

>>> spec = LogarithmicFilteredSpectrogram('tests/data/audio/sample.wav',                                               num_bands=24, fps=200)
>>> spec  
LogarithmicFilteredSpectrogram([[0.82358, 0.86341, ..., 0.02809, 0.02672],
                                [0.92514, 0.93211, ..., 0.03607, 0.0317 ],
                                ...,
                                [1.03826, 0.767  , ..., 0.01814, 0.01138],
                                [0.98236, 0.89276, ..., 0.01669, 0.00919]],
                                dtype=float32)
>>> spec.shape
(561, 140)

Then use the temporal first order difference and apply a maximum filter with 3 bands, keeping only the positive differences (i.e. rise in energy):

>>> superflux = SpectrogramDifference(spec, diff_max_bins=3,                                           positive_diffs=True)
>>> superflux  
SpectrogramDifference([[0.     , 0. , ...,  0. ,  0. ],
                       [0.     , 0. , ...,  0. ,  0. ],
                       ...,
                       [0.01941, 0. , ...,  0. ,  0. ],
                       [0.     , 0. , ...,  0. ,  0. ]], dtype=float32)

bin_frequencies¶: Bin frequencies.

positive_diff()[source]¶: Positive diff.

class madmom.audio.spectrogram.SpectrogramDifferenceProcessor(diff_ratio=0.5, diff_frames=None, diff_max_bins=None, positive_diffs=False, stack_diffs=None, **kwargs)[source]¶

Difference Spectrogram Processor class.

Parameters:

diff_ratio : float, optional

Calculate the difference to the frame at which the window used for the STFT yields this ratio of the maximum height.

diff_frames : int, optional

Calculate the difference to the diff_frames-th previous frame (if set, this overrides the value calculated from the diff_ratio)

diff_max_bins : int, optional

Apply a maximum filter with this width (in bins in frequency dimension) to the spectrogram the difference is calculated to.

positive_diffs : bool, optional

Keep only the positive differences, i.e. set all diff values < 0 to 0.

stack_diffs : numpy stacking function, optional

If ‘None’, only the differences are returned. If set, the diffs are stacked with the underlying spectrogram data according to the stack function:

np.vstack the differences and spectrogram are stacked vertically, i.e. in time direction,
np.hstack the differences and spectrogram are stacked horizontally, i.e. in frequency direction,
np.dstack the differences and spectrogram are stacked in depth, i.e. return them as a 3D representation with depth as the third dimension.

process(data, reset=True, **kwargs)[source]¶

Perform a temporal difference calculation on the given data.

Parameters:	data : numpy array Data to be processed. reset : bool, optional Reset the spectrogram buffer before computing the difference. kwargs : dict Keyword arguments passed to `SpectrogramDifference`.
Returns:	diff : `SpectrogramDifference` instance Spectrogram difference.

Notes

If reset is ‘True’, the first diff_frames differences will be 0.

reset()[source]¶: Reset the SpectrogramDifferenceProcessor.

static add_arguments(parser, diff=None, diff_ratio=None, diff_frames=None, diff_max_bins=None, positive_diffs=None)[source]¶

Add spectrogram difference related arguments to an existing parser.

Parameters:

parser : argparse parser instance: Existing argparse parser object.
diff : bool, optional: Take the difference of the spectrogram.
diff_ratio : float, optional: Calculate the difference to the frame at which the window used for the STFT yields this ratio of the maximum height.
diff_frames : int, optional: Calculate the difference to the diff_frames-th previous frame (if set, this overrides the value calculated from the diff_ratio)
diff_max_bins : int, optional: Apply a maximum filter with this width (in bins in frequency dimension) to the spectrogram the difference is calculated to.
positive_diffs : bool, optional: Keep only the positive differences, i.e. set all diff values < 0 to 0.

Returns:

argparse argument group: Spectrogram difference argument parser group.

Notes

Parameters are included in the group only if they are not ‘None’.

Only the diff_frames parameter behaves differently, it is included if either the diff_ratio is set or a value != ‘None’ is given.

class madmom.audio.spectrogram.SuperFluxProcessor(**kwargs)[source]¶: Spectrogram processor which sets the default values suitable for the SuperFlux algorithm.

class madmom.audio.spectrogram.MultiBandSpectrogram(spectrogram, crossover_frequencies, fmin=30.0, fmax=17000.0, norm_filters=True, unique_filters=True, **kwargs)[source]¶

MultiBandSpectrogram class.

Parameters:

spectrogram : Spectrogram instance: Spectrogram.
crossover_frequencies : list or numpy array: List of crossover frequencies at which the spectrogram is split into multiple bands.
fmin : float, optional: Minimum frequency of the filterbank [Hz].
fmax : float, optional: Maximum frequency of the filterbank [Hz].
norm_filters : bool, optional: Normalize the filter bands of the filterbank to area 1.
unique_filters : bool, optional: Indicate if the filterbank should contain only unique filters, i.e. remove duplicate filters resulting from insufficient resolution at low frequencies.
kwargs : dict, optional: If no Spectrogram instance was given, one is instantiated with these additional keyword arguments.

Notes

The MultiBandSpectrogram is implemented as a Spectrogram which uses a audio.filters.RectangularFilterbank to combine multiple frequency bins.

class madmom.audio.spectrogram.MultiBandSpectrogramProcessor(crossover_frequencies, fmin=30.0, fmax=17000.0, norm_filters=True, unique_filters=True, **kwargs)[source]¶

Spectrogram processor which combines the spectrogram magnitudes into multiple bands.

Parameters:

crossover_frequencies : list or numpy array: List of crossover frequencies at which a spectrogram is split into the individual bands.
fmin : float, optional: Minimum frequency of the filterbank [Hz].
fmax : float, optional: Maximum frequency of the filterbank [Hz].
norm_filters : bool, optional: Normalize the filter bands of the filterbank to area 1.
unique_filters : bool, optional: Indicate if the filterbank should contain only unique filters, i.e. remove duplicate filters resulting from insufficient resolution at low frequencies.

process(data, **kwargs)[source]¶

Return the a multi-band representation of the given data.

Parameters:	data : numpy array Data to be processed. kwargs : dict Keyword arguments passed to `MultiBandSpectrogram`.
Returns:	multi_band_spec : `MultiBandSpectrogram` instance Spectrogram split into multiple bands.

class madmom.audio.spectrogram.SemitoneBandpassSpectrogram(signal, fps=50.0, fmin=27.5, fmax=4200.0)[source]¶

Construct a semitone spectrogram by using a time domain filterbank of bandpass filters as described in [1].

Parameters:	signal : Signal Signal instance. fps : float, optional Frame rate of the spectrogram [Hz]. fmin : float, optional Lowest frequency of the spectrogram [Hz]. fmax : float, optional Highest frequency of the spectrogram [Hz].

References

[1]	(1, 2) Meinard Müller, “Information retrieval for music and motion”, Springer, 2007.