madmom.audio.spectrogram

This module contains spectrogram related functionality.

madmom.audio.spectrogram.spec(stft)[source]

Computes the magnitudes of the complex Short Time Fourier Transform of a signal.

Parameters:
stft : numpy array

Complex STFT of a signal.

Returns:
spec : numpy array

Magnitude spectrogram.

class madmom.audio.spectrogram.Spectrogram(stft, **kwargs)[source]

A Spectrogram represents the magnitude spectrogram of a audio.stft.ShortTimeFourierTransform.

Parameters:
stft : audio.stft.ShortTimeFourierTransform instance

Short Time Fourier Transform.

kwargs : dict, optional

If no audio.stft.ShortTimeFourierTransform instance was given, one is instantiated with these additional keyword arguments.

Examples

Create a Spectrogram from a audio.stft.ShortTimeFourierTransform (or anything it can be instantiated from:

>>> spec = Spectrogram('tests/data/audio/sample.wav')
>>> spec  
Spectrogram([[ 3.15249,  4.00272, ...,  0.03634,  0.03671],
             [ 4.28429,  2.85158, ...,  0.0219 ,  0.02227],
             ...,
             [ 4.92274, 10.27775, ...,  0.00607,  0.00593],
             [ 9.22709,  9.6387 , ...,  0.00981,  0.00984]], dtype=float32)
num_frames

Number of frames.

num_bins

Number of bins.

bin_frequencies

Bin frequencies.

diff(**kwargs)[source]

Return the difference of the magnitude spectrogram.

Parameters:
kwargs : dict

Keyword arguments passed to SpectrogramDifference.

Returns:
diff : SpectrogramDifference instance

The differences of the magnitude spectrogram.

filter(**kwargs)[source]

Return a filtered version of the magnitude spectrogram.

Parameters:
kwargs : dict

Keyword arguments passed to FilteredSpectrogram.

Returns:
filt_spec : FilteredSpectrogram instance

Filtered version of the magnitude spectrogram.

log(**kwargs)[source]

Return a logarithmically scaled version of the magnitude spectrogram.

Parameters:
kwargs : dict

Keyword arguments passed to LogarithmicSpectrogram.

Returns:
log_spec : LogarithmicSpectrogram instance

Logarithmically scaled version of the magnitude spectrogram.

class madmom.audio.spectrogram.SpectrogramProcessor(**kwargs)[source]

SpectrogramProcessor class.

process(data, **kwargs)[source]

Create a Spectrogram from the given data.

Parameters:
data : numpy array

Data to be processed.

kwargs : dict

Keyword arguments passed to Spectrogram.

Returns:
spec : Spectrogram instance

Spectrogram.

class madmom.audio.spectrogram.FilteredSpectrogram(spectrogram, filterbank=<class 'madmom.audio.filters.LogarithmicFilterbank'>, num_bands=12, fmin=30.0, fmax=17000.0, fref=440.0, norm_filters=True, unique_filters=True, **kwargs)[source]

FilteredSpectrogram class.

Parameters:
spectrogram : Spectrogram instance

Spectrogram.

filterbank : audio.filters.Filterbank, optional

Filterbank class or instance; if a class is given (rather than an instance), one will be created with the given type and parameters.

num_bands : int, optional

Number of filter bands (per octave, depending on the type of the filterbank).

fmin : float, optional

Minimum frequency of the filterbank [Hz].

fmax : float, optional

Maximum frequency of the filterbank [Hz].

fref : float, optional

Tuning frequency of the filterbank [Hz].

norm_filters : bool, optional

Normalize the filter bands of the filterbank to area 1.

unique_filters : bool, optional

Indicate if the filterbank should contain only unique filters, i.e. remove duplicate filters resulting from insufficient resolution at low frequencies.

kwargs : dict, optional

If no Spectrogram instance was given, one is instantiated with these additional keyword arguments.

Examples

Create a FilteredSpectrogram from a Spectrogram (or anything it can be instantiated from. Per default a madmom.audio.filters.LogarithmicFilterbank with 12 bands per octave is used.

>>> spec = FilteredSpectrogram('tests/data/audio/sample.wav')
>>> spec  
FilteredSpectrogram([[ 5.66156, 6.30141, ..., 0.05426, 0.06461],
                     [ 8.44266, 8.69582, ..., 0.07703, 0.0902 ],
                     ...,
                     [10.04626, 1.12018, ..., 0.0487 , 0.04282],
                     [ 8.60186, 6.81195, ..., 0.03721, 0.03371]],
                    dtype=float32)

The resulting spectrogram has fewer frequency bins, with the centers of the bins aligned logarithmically (lower frequency bins still have a linear spacing due to the coarse resolution of the DFT at low frequencies):

>>> spec.shape
(281, 81)
>>> spec.num_bins
81
>>> spec.bin_frequencies  
array([    43.06641,    64.59961,    86.13281,   107.66602,
          129.19922,   150.73242,   172.26562,   193.79883, ...,
        10551.26953, 11175.73242, 11843.26172, 12553.85742,
        13285.98633, 14082.71484, 14922.50977, 15805.37109])

The filterbank used to filter the spectrogram is saved as an attribute:

>>> spec.filterbank  
LogarithmicFilterbank([[0., 0., ..., 0., 0.],
                       [0., 0., ..., 0., 0.],
                       ...,
                       [0., 0., ..., 0., 0.],
                       [0., 0., ..., 0., 0.]], dtype=float32)
>>> spec.filterbank.num_bands
81

The filterbank can be chosen at instantiation time:

>>> from madmom.audio.filters import MelFilterbank
>>> spec = FilteredSpectrogram('tests/data/audio/sample.wav',     filterbank=MelFilterbank, num_bands=40)
>>> type(spec.filterbank)
<class 'madmom.audio.filters.MelFilterbank'>
>>> spec.shape
(281, 40)
bin_frequencies

Bin frequencies.

class madmom.audio.spectrogram.FilteredSpectrogramProcessor(filterbank=<class 'madmom.audio.filters.LogarithmicFilterbank'>, num_bands=12, fmin=30.0, fmax=17000.0, fref=440.0, norm_filters=True, unique_filters=True, **kwargs)[source]

FilteredSpectrogramProcessor class.

Parameters:
filterbank : audio.filters.Filterbank

Filterbank used to filter a spectrogram.

num_bands : int

Number of bands (per octave).

fmin : float, optional

Minimum frequency of the filterbank [Hz].

fmax : float, optional

Maximum frequency of the filterbank [Hz].

fref : float, optional

Tuning frequency of the filterbank [Hz].

norm_filters : bool, optional

Normalize the filter of the filterbank to area 1.

unique_filters : bool, optional

Indicate if the filterbank should contain only unique filters, i.e. remove duplicate filters resulting from insufficient resolution at low frequencies.

process(data, **kwargs)[source]

Create a FilteredSpectrogram from the given data.

Parameters:
data : numpy array

Data to be processed.

kwargs : dict

Keyword arguments passed to FilteredSpectrogram.

Returns:
filt_spec : FilteredSpectrogram instance

Filtered spectrogram.

class madmom.audio.spectrogram.LogarithmicSpectrogram(spectrogram, log=<ufunc 'log10'>, mul=1.0, add=1.0, **kwargs)[source]

LogarithmicSpectrogram class.

Parameters:
spectrogram : Spectrogram instance

Spectrogram.

log : numpy ufunc, optional

Logarithmic scaling function to apply.

mul : float, optional

Multiply the magnitude spectrogram with this factor before taking the logarithm.

add : float, optional

Add this value before taking the logarithm of the magnitudes.

kwargs : dict, optional

If no Spectrogram instance was given, one is instantiated with these additional keyword arguments.

Examples

Create a LogarithmicSpectrogram from a Spectrogram (or anything it can be instantiated from. Per default np.log10 is used as the scaling function and a value of 1 is added to avoid negative values.

>>> spec = LogarithmicSpectrogram('tests/data/audio/sample.wav')
>>> spec  
LogarithmicSpectrogram([[...]], dtype=float32)
>>> spec.min()
LogarithmicSpectrogram(0., dtype=float32)
filterbank

Filterbank.

bin_frequencies

Bin frequencies.

class madmom.audio.spectrogram.LogarithmicSpectrogramProcessor(log=<ufunc 'log10'>, mul=1.0, add=1.0, **kwargs)[source]

Logarithmic Spectrogram Processor class.

Parameters:
log : numpy ufunc, optional

Loagrithmic scaling function to apply.

mul : float, optional

Multiply the magnitude spectrogram with this factor before taking the logarithm.

add : float, optional

Add this value before taking the logarithm of the magnitudes.

process(data, **kwargs)[source]

Perform logarithmic scaling of a spectrogram.

Parameters:
data : numpy array

Data to be processed.

kwargs : dict

Keyword arguments passed to LogarithmicSpectrogram.

Returns:
log_spec : LogarithmicSpectrogram instance

Logarithmically scaled spectrogram.

static add_arguments(parser, log=None, mul=None, add=None)[source]

Add spectrogram scaling related arguments to an existing parser.

Parameters:
parser : argparse parser instance

Existing argparse parser object.

log : bool, optional

Take the logarithm of the spectrogram.

mul : float, optional

Multiply the magnitude spectrogram with this factor before taking the logarithm.

add : float, optional

Add this value before taking the logarithm of the magnitudes.

Returns:
argparse argument group

Spectrogram scaling argument parser group.

Notes

Parameters are included in the group only if they are not ‘None’.

class madmom.audio.spectrogram.LogarithmicFilteredSpectrogram(spectrogram, **kwargs)[source]

LogarithmicFilteredSpectrogram class.

Parameters:
spectrogram : FilteredSpectrogram instance

Filtered spectrogram.

kwargs : dict, optional

If no FilteredSpectrogram instance was given, one is instantiated with these additional keyword arguments and logarithmically scaled afterwards, i.e. passed to LogarithmicSpectrogram.

Notes

For the filtering and scaling parameters, please refer to FilteredSpectrogram and LogarithmicSpectrogram.

Examples

Create a LogarithmicFilteredSpectrogram from a Spectrogram (or anything it can be instantiated from. This is mainly a convenience class which first filters the spectrogram and then scales it logarithmically.

>>> spec = LogarithmicFilteredSpectrogram('tests/data/audio/sample.wav')
>>> spec  
LogarithmicFilteredSpectrogram([[0.82358, 0.86341, ..., 0.02295, 0.02719],
                                [0.97509, 0.98658, ..., 0.03223, 0.0375 ],
                                ...,
                                [1.04322, 0.32637, ..., 0.02065, 0.01821],
                                [0.98236, 0.89276, ..., 0.01587, 0.0144 ]],
                                dtype=float32)
>>> spec.shape
(281, 81)
>>> spec.filterbank  
LogarithmicFilterbank([[...]], dtype=float32)
>>> spec.min()  
LogarithmicFilteredSpectrogram(0.00831, dtype=float32)
filterbank

Filterbank.

bin_frequencies

Bin frequencies.

class madmom.audio.spectrogram.LogarithmicFilteredSpectrogramProcessor(filterbank=<class 'madmom.audio.filters.LogarithmicFilterbank'>, num_bands=12, fmin=30.0, fmax=17000.0, fref=440.0, norm_filters=True, unique_filters=True, mul=1.0, add=1.0, **kwargs)[source]

Logarithmic Filtered Spectrogram Processor class.

Parameters:
filterbank : audio.filters.Filterbank

Filterbank used to filter a spectrogram.

num_bands : int

Number of bands (per octave).

fmin : float, optional

Minimum frequency of the filterbank [Hz].

fmax : float, optional

Maximum frequency of the filterbank [Hz].

fref : float, optional

Tuning frequency of the filterbank [Hz].

norm_filters : bool, optional

Normalize the filter of the filterbank to area 1.

unique_filters : bool, optional

Indicate if the filterbank should contain only unique filters, i.e. remove duplicate filters resulting from insufficient resolution at low frequencies.

mul : float, optional

Multiply the magnitude spectrogram with this factor before taking the logarithm.

add : float, optional

Add this value before taking the logarithm of the magnitudes.

process(data, **kwargs)[source]

Perform filtering and logarithmic scaling of a spectrogram.

Parameters:
data : numpy array

Data to be processed.

kwargs : dict

Keyword arguments passed to LogarithmicFilteredSpectrogram.

Returns:
log_filt_spec : LogarithmicFilteredSpectrogram instance

Logarithmically scaled filtered spectrogram.

class madmom.audio.spectrogram.SpectrogramDifference(spectrogram, diff_ratio=0.5, diff_frames=None, diff_max_bins=None, positive_diffs=False, keep_dims=True, **kwargs)[source]

SpectrogramDifference class.

Parameters:
spectrogram : Spectrogram instance

Spectrogram.

diff_ratio : float, optional

Calculate the difference to the frame at which the window used for the STFT yields this ratio of the maximum height.

diff_frames : int, optional

Calculate the difference to the diff_frames-th previous frame (if set, this overrides the value calculated from the diff_ratio)

diff_max_bins : int, optional

Apply a maximum filter with this width (in bins in frequency dimension) to the spectrogram the difference is calculated to.

positive_diffs : bool, optional

Keep only the positive differences, i.e. set all diff values < 0 to 0.

keep_dims : bool, optional

Indicate if the dimensions (i.e. shape) of the spectrogram should be kept.

kwargs : dict, optional

If no Spectrogram instance was given, one is instantiated with these additional keyword arguments.

Notes

The first diff_frames frames will have a value of 0.

If keep_dims is ‘True’ the returned difference has the same shape as the spectrogram. This is needed if the diffs should be stacked on top of it. If set to ‘False’, the length will be diff_frames frames shorter (mostly used by the SpectrogramDifferenceProcessor which first buffers that many frames.

The SuperFlux algorithm [1] uses a maximum filtered spectrogram with 3 diff_max_bins together with a 24 band logarithmic filterbank to calculate the difference spectrogram with a diff_ratio of 0.5.

The effect of this maximum filter applied to the spectrogram is that the magnitudes are “widened” in frequency direction, i.e. the following difference calculation is less sensitive against frequency fluctuations. This effect is exploited to suppress false positive energy fragments originating from vibrato.

References

[1](1, 2) Sebastian Böck and Gerhard Widmer “Maximum Filter Vibrato Suppression for Onset Detection” Proceedings of the 16th International Conference on Digital Audio Effects (DAFx), 2013.

Examples

To obtain the SuperFlux feature as described above first create a filtered and logarithmically spaced spectrogram:

>>> spec = LogarithmicFilteredSpectrogram('tests/data/audio/sample.wav',                                               num_bands=24, fps=200)
>>> spec  
LogarithmicFilteredSpectrogram([[0.82358, 0.86341, ..., 0.02809, 0.02672],
                                [0.92514, 0.93211, ..., 0.03607, 0.0317 ],
                                ...,
                                [1.03826, 0.767  , ..., 0.01814, 0.01138],
                                [0.98236, 0.89276, ..., 0.01669, 0.00919]],
                                dtype=float32)
>>> spec.shape
(561, 140)

Then use the temporal first order difference and apply a maximum filter with 3 bands, keeping only the positive differences (i.e. rise in energy):

>>> superflux = SpectrogramDifference(spec, diff_max_bins=3,                                           positive_diffs=True)
>>> superflux  
SpectrogramDifference([[0.     , 0. , ...,  0. ,  0. ],
                       [0.     , 0. , ...,  0. ,  0. ],
                       ...,
                       [0.01941, 0. , ...,  0. ,  0. ],
                       [0.     , 0. , ...,  0. ,  0. ]], dtype=float32)
bin_frequencies

Bin frequencies.

positive_diff()[source]

Positive diff.

class madmom.audio.spectrogram.SpectrogramDifferenceProcessor(diff_ratio=0.5, diff_frames=None, diff_max_bins=None, positive_diffs=False, stack_diffs=None, **kwargs)[source]

Difference Spectrogram Processor class.

Parameters:
diff_ratio : float, optional

Calculate the difference to the frame at which the window used for the STFT yields this ratio of the maximum height.

diff_frames : int, optional

Calculate the difference to the diff_frames-th previous frame (if set, this overrides the value calculated from the diff_ratio)

diff_max_bins : int, optional

Apply a maximum filter with this width (in bins in frequency dimension) to the spectrogram the difference is calculated to.

positive_diffs : bool, optional

Keep only the positive differences, i.e. set all diff values < 0 to 0.

stack_diffs : numpy stacking function, optional

If ‘None’, only the differences are returned. If set, the diffs are stacked with the underlying spectrogram data according to the stack function:

  • np.vstack the differences and spectrogram are stacked vertically, i.e. in time direction,
  • np.hstack the differences and spectrogram are stacked horizontally, i.e. in frequency direction,
  • np.dstack the differences and spectrogram are stacked in depth, i.e. return them as a 3D representation with depth as the third dimension.
process(data, reset=True, **kwargs)[source]

Perform a temporal difference calculation on the given data.

Parameters:
data : numpy array

Data to be processed.

reset : bool, optional

Reset the spectrogram buffer before computing the difference.

kwargs : dict

Keyword arguments passed to SpectrogramDifference.

Returns:
diff : SpectrogramDifference instance

Spectrogram difference.

Notes

If reset is ‘True’, the first diff_frames differences will be 0.

reset()[source]

Reset the SpectrogramDifferenceProcessor.

static add_arguments(parser, diff=None, diff_ratio=None, diff_frames=None, diff_max_bins=None, positive_diffs=None)[source]

Add spectrogram difference related arguments to an existing parser.

Parameters:
parser : argparse parser instance

Existing argparse parser object.

diff : bool, optional

Take the difference of the spectrogram.

diff_ratio : float, optional

Calculate the difference to the frame at which the window used for the STFT yields this ratio of the maximum height.

diff_frames : int, optional

Calculate the difference to the diff_frames-th previous frame (if set, this overrides the value calculated from the diff_ratio)

diff_max_bins : int, optional

Apply a maximum filter with this width (in bins in frequency dimension) to the spectrogram the difference is calculated to.

positive_diffs : bool, optional

Keep only the positive differences, i.e. set all diff values < 0 to 0.

Returns:
argparse argument group

Spectrogram difference argument parser group.

Notes

Parameters are included in the group only if they are not ‘None’.

Only the diff_frames parameter behaves differently, it is included if either the diff_ratio is set or a value != ‘None’ is given.

class madmom.audio.spectrogram.SuperFluxProcessor(**kwargs)[source]

Spectrogram processor which sets the default values suitable for the SuperFlux algorithm.

class madmom.audio.spectrogram.MultiBandSpectrogram(spectrogram, crossover_frequencies, fmin=30.0, fmax=17000.0, norm_filters=True, unique_filters=True, **kwargs)[source]

MultiBandSpectrogram class.

Parameters:
spectrogram : Spectrogram instance

Spectrogram.

crossover_frequencies : list or numpy array

List of crossover frequencies at which the spectrogram is split into multiple bands.

fmin : float, optional

Minimum frequency of the filterbank [Hz].

fmax : float, optional

Maximum frequency of the filterbank [Hz].

norm_filters : bool, optional

Normalize the filter bands of the filterbank to area 1.

unique_filters : bool, optional

Indicate if the filterbank should contain only unique filters, i.e. remove duplicate filters resulting from insufficient resolution at low frequencies.

kwargs : dict, optional

If no Spectrogram instance was given, one is instantiated with these additional keyword arguments.

Notes

The MultiBandSpectrogram is implemented as a Spectrogram which uses a audio.filters.RectangularFilterbank to combine multiple frequency bins.

class madmom.audio.spectrogram.MultiBandSpectrogramProcessor(crossover_frequencies, fmin=30.0, fmax=17000.0, norm_filters=True, unique_filters=True, **kwargs)[source]

Spectrogram processor which combines the spectrogram magnitudes into multiple bands.

Parameters:
crossover_frequencies : list or numpy array

List of crossover frequencies at which a spectrogram is split into the individual bands.

fmin : float, optional

Minimum frequency of the filterbank [Hz].

fmax : float, optional

Maximum frequency of the filterbank [Hz].

norm_filters : bool, optional

Normalize the filter bands of the filterbank to area 1.

unique_filters : bool, optional

Indicate if the filterbank should contain only unique filters, i.e. remove duplicate filters resulting from insufficient resolution at low frequencies.

process(data, **kwargs)[source]

Return the a multi-band representation of the given data.

Parameters:
data : numpy array

Data to be processed.

kwargs : dict

Keyword arguments passed to MultiBandSpectrogram.

Returns:
multi_band_spec : MultiBandSpectrogram instance

Spectrogram split into multiple bands.

class madmom.audio.spectrogram.SemitoneBandpassSpectrogram(signal, fps=50.0, fmin=27.5, fmax=4200.0)[source]

Construct a semitone spectrogram by using a time domain filterbank of bandpass filters as described in [1].

Parameters:
signal : Signal

Signal instance.

fps : float, optional

Frame rate of the spectrogram [Hz].

fmin : float, optional

Lowest frequency of the spectrogram [Hz].

fmax : float, optional

Highest frequency of the spectrogram [Hz].

References

[1](1, 2) Meinard Müller, “Information retrieval for music and motion”, Springer, 2007.