madmom.audio.spectrogram¶
This module contains spectrogram related functionality.
-
madmom.audio.spectrogram.
spec
(stft)[source]¶ Computes the magnitudes of the complex Short Time Fourier Transform of a signal.
Parameters: - stft : numpy array
Complex STFT of a signal.
Returns: - spec : numpy array
Magnitude spectrogram.
-
class
madmom.audio.spectrogram.
Spectrogram
(stft, **kwargs)[source]¶ A
Spectrogram
represents the magnitude spectrogram of aaudio.stft.ShortTimeFourierTransform
.Parameters: - stft :
audio.stft.ShortTimeFourierTransform
instance Short Time Fourier Transform.
- kwargs : dict, optional
If no
audio.stft.ShortTimeFourierTransform
instance was given, one is instantiated with these additional keyword arguments.
Examples
Create a
Spectrogram
from aaudio.stft.ShortTimeFourierTransform
(or anything it can be instantiated from:>>> spec = Spectrogram('tests/data/audio/sample.wav') >>> spec Spectrogram([[ 3.15249, 4.00272, ..., 0.03634, 0.03671], [ 4.28429, 2.85158, ..., 0.0219 , 0.02227], ..., [ 4.92274, 10.27775, ..., 0.00607, 0.00593], [ 9.22709, 9.6387 , ..., 0.00981, 0.00984]], dtype=float32)
-
num_frames
¶ Number of frames.
-
num_bins
¶ Number of bins.
-
bin_frequencies
¶ Bin frequencies.
-
diff
(**kwargs)[source]¶ Return the difference of the magnitude spectrogram.
Parameters: - kwargs : dict
Keyword arguments passed to
SpectrogramDifference
.
Returns: - diff :
SpectrogramDifference
instance The differences of the magnitude spectrogram.
-
filter
(**kwargs)[source]¶ Return a filtered version of the magnitude spectrogram.
Parameters: - kwargs : dict
Keyword arguments passed to
FilteredSpectrogram
.
Returns: - filt_spec :
FilteredSpectrogram
instance Filtered version of the magnitude spectrogram.
-
log
(**kwargs)[source]¶ Return a logarithmically scaled version of the magnitude spectrogram.
Parameters: - kwargs : dict
Keyword arguments passed to
LogarithmicSpectrogram
.
Returns: - log_spec :
LogarithmicSpectrogram
instance Logarithmically scaled version of the magnitude spectrogram.
- stft :
-
class
madmom.audio.spectrogram.
SpectrogramProcessor
(**kwargs)[source]¶ SpectrogramProcessor class.
-
process
(data, **kwargs)[source]¶ Create a Spectrogram from the given data.
Parameters: - data : numpy array
Data to be processed.
- kwargs : dict
Keyword arguments passed to
Spectrogram
.
Returns: - spec :
Spectrogram
instance Spectrogram.
-
-
class
madmom.audio.spectrogram.
FilteredSpectrogram
(spectrogram, filterbank=<class 'madmom.audio.filters.LogarithmicFilterbank'>, num_bands=12, fmin=30.0, fmax=17000.0, fref=440.0, norm_filters=True, unique_filters=True, **kwargs)[source]¶ FilteredSpectrogram class.
Parameters: - spectrogram :
Spectrogram
instance Spectrogram.
- filterbank :
audio.filters.Filterbank
, optional Filterbank class or instance; if a class is given (rather than an instance), one will be created with the given type and parameters.
- num_bands : int, optional
Number of filter bands (per octave, depending on the type of the filterbank).
- fmin : float, optional
Minimum frequency of the filterbank [Hz].
- fmax : float, optional
Maximum frequency of the filterbank [Hz].
- fref : float, optional
Tuning frequency of the filterbank [Hz].
- norm_filters : bool, optional
Normalize the filter bands of the filterbank to area 1.
- unique_filters : bool, optional
Indicate if the filterbank should contain only unique filters, i.e. remove duplicate filters resulting from insufficient resolution at low frequencies.
- kwargs : dict, optional
If no
Spectrogram
instance was given, one is instantiated with these additional keyword arguments.
Examples
Create a
FilteredSpectrogram
from aSpectrogram
(or anything it can be instantiated from. Per default amadmom.audio.filters.LogarithmicFilterbank
with 12 bands per octave is used.>>> spec = FilteredSpectrogram('tests/data/audio/sample.wav') >>> spec FilteredSpectrogram([[ 5.66156, 6.30141, ..., 0.05426, 0.06461], [ 8.44266, 8.69582, ..., 0.07703, 0.0902 ], ..., [10.04626, 1.12018, ..., 0.0487 , 0.04282], [ 8.60186, 6.81195, ..., 0.03721, 0.03371]], dtype=float32)
The resulting spectrogram has fewer frequency bins, with the centers of the bins aligned logarithmically (lower frequency bins still have a linear spacing due to the coarse resolution of the DFT at low frequencies):
>>> spec.shape (281, 81) >>> spec.num_bins 81 >>> spec.bin_frequencies array([ 43.06641, 64.59961, 86.13281, 107.66602, 129.19922, 150.73242, 172.26562, 193.79883, ..., 10551.26953, 11175.73242, 11843.26172, 12553.85742, 13285.98633, 14082.71484, 14922.50977, 15805.37109])
The filterbank used to filter the spectrogram is saved as an attribute:
>>> spec.filterbank LogarithmicFilterbank([[0., 0., ..., 0., 0.], [0., 0., ..., 0., 0.], ..., [0., 0., ..., 0., 0.], [0., 0., ..., 0., 0.]], dtype=float32) >>> spec.filterbank.num_bands 81
The filterbank can be chosen at instantiation time:
>>> from madmom.audio.filters import MelFilterbank >>> spec = FilteredSpectrogram('tests/data/audio/sample.wav', filterbank=MelFilterbank, num_bands=40) >>> type(spec.filterbank) <class 'madmom.audio.filters.MelFilterbank'> >>> spec.shape (281, 40)
-
bin_frequencies
¶ Bin frequencies.
- spectrogram :
-
class
madmom.audio.spectrogram.
FilteredSpectrogramProcessor
(filterbank=<class 'madmom.audio.filters.LogarithmicFilterbank'>, num_bands=12, fmin=30.0, fmax=17000.0, fref=440.0, norm_filters=True, unique_filters=True, **kwargs)[source]¶ FilteredSpectrogramProcessor class.
Parameters: - filterbank :
audio.filters.Filterbank
Filterbank used to filter a spectrogram.
- num_bands : int
Number of bands (per octave).
- fmin : float, optional
Minimum frequency of the filterbank [Hz].
- fmax : float, optional
Maximum frequency of the filterbank [Hz].
- fref : float, optional
Tuning frequency of the filterbank [Hz].
- norm_filters : bool, optional
Normalize the filter of the filterbank to area 1.
- unique_filters : bool, optional
Indicate if the filterbank should contain only unique filters, i.e. remove duplicate filters resulting from insufficient resolution at low frequencies.
-
process
(data, **kwargs)[source]¶ Create a FilteredSpectrogram from the given data.
Parameters: - data : numpy array
Data to be processed.
- kwargs : dict
Keyword arguments passed to
FilteredSpectrogram
.
Returns: - filt_spec :
FilteredSpectrogram
instance Filtered spectrogram.
- filterbank :
-
class
madmom.audio.spectrogram.
LogarithmicSpectrogram
(spectrogram, log=<ufunc 'log10'>, mul=1.0, add=1.0, **kwargs)[source]¶ LogarithmicSpectrogram class.
Parameters: - spectrogram :
Spectrogram
instance Spectrogram.
- log : numpy ufunc, optional
Logarithmic scaling function to apply.
- mul : float, optional
Multiply the magnitude spectrogram with this factor before taking the logarithm.
- add : float, optional
Add this value before taking the logarithm of the magnitudes.
- kwargs : dict, optional
If no
Spectrogram
instance was given, one is instantiated with these additional keyword arguments.
Examples
Create a
LogarithmicSpectrogram
from aSpectrogram
(or anything it can be instantiated from. Per default np.log10 is used as the scaling function and a value of 1 is added to avoid negative values.>>> spec = LogarithmicSpectrogram('tests/data/audio/sample.wav') >>> spec LogarithmicSpectrogram([[...]], dtype=float32) >>> spec.min() LogarithmicSpectrogram(0., dtype=float32)
-
filterbank
¶ Filterbank.
-
bin_frequencies
¶ Bin frequencies.
- spectrogram :
-
class
madmom.audio.spectrogram.
LogarithmicSpectrogramProcessor
(log=<ufunc 'log10'>, mul=1.0, add=1.0, **kwargs)[source]¶ Logarithmic Spectrogram Processor class.
Parameters: - log : numpy ufunc, optional
Loagrithmic scaling function to apply.
- mul : float, optional
Multiply the magnitude spectrogram with this factor before taking the logarithm.
- add : float, optional
Add this value before taking the logarithm of the magnitudes.
-
process
(data, **kwargs)[source]¶ Perform logarithmic scaling of a spectrogram.
Parameters: - data : numpy array
Data to be processed.
- kwargs : dict
Keyword arguments passed to
LogarithmicSpectrogram
.
Returns: - log_spec :
LogarithmicSpectrogram
instance Logarithmically scaled spectrogram.
-
static
add_arguments
(parser, log=None, mul=None, add=None)[source]¶ Add spectrogram scaling related arguments to an existing parser.
Parameters: - parser : argparse parser instance
Existing argparse parser object.
- log : bool, optional
Take the logarithm of the spectrogram.
- mul : float, optional
Multiply the magnitude spectrogram with this factor before taking the logarithm.
- add : float, optional
Add this value before taking the logarithm of the magnitudes.
Returns: - argparse argument group
Spectrogram scaling argument parser group.
Notes
Parameters are included in the group only if they are not ‘None’.
-
class
madmom.audio.spectrogram.
LogarithmicFilteredSpectrogram
(spectrogram, **kwargs)[source]¶ LogarithmicFilteredSpectrogram class.
Parameters: - spectrogram :
FilteredSpectrogram
instance Filtered spectrogram.
- kwargs : dict, optional
If no
FilteredSpectrogram
instance was given, one is instantiated with these additional keyword arguments and logarithmically scaled afterwards, i.e. passed toLogarithmicSpectrogram
.
See also
Notes
For the filtering and scaling parameters, please refer to
FilteredSpectrogram
andLogarithmicSpectrogram
.Examples
Create a
LogarithmicFilteredSpectrogram
from aSpectrogram
(or anything it can be instantiated from. This is mainly a convenience class which first filters the spectrogram and then scales it logarithmically.>>> spec = LogarithmicFilteredSpectrogram('tests/data/audio/sample.wav') >>> spec LogarithmicFilteredSpectrogram([[0.82358, 0.86341, ..., 0.02295, 0.02719], [0.97509, 0.98658, ..., 0.03223, 0.0375 ], ..., [1.04322, 0.32637, ..., 0.02065, 0.01821], [0.98236, 0.89276, ..., 0.01587, 0.0144 ]], dtype=float32) >>> spec.shape (281, 81) >>> spec.filterbank LogarithmicFilterbank([[...]], dtype=float32) >>> spec.min() LogarithmicFilteredSpectrogram(0.00831, dtype=float32)
-
filterbank
¶ Filterbank.
-
bin_frequencies
¶ Bin frequencies.
- spectrogram :
-
class
madmom.audio.spectrogram.
LogarithmicFilteredSpectrogramProcessor
(filterbank=<class 'madmom.audio.filters.LogarithmicFilterbank'>, num_bands=12, fmin=30.0, fmax=17000.0, fref=440.0, norm_filters=True, unique_filters=True, mul=1.0, add=1.0, **kwargs)[source]¶ Logarithmic Filtered Spectrogram Processor class.
Parameters: - filterbank :
audio.filters.Filterbank
Filterbank used to filter a spectrogram.
- num_bands : int
Number of bands (per octave).
- fmin : float, optional
Minimum frequency of the filterbank [Hz].
- fmax : float, optional
Maximum frequency of the filterbank [Hz].
- fref : float, optional
Tuning frequency of the filterbank [Hz].
- norm_filters : bool, optional
Normalize the filter of the filterbank to area 1.
- unique_filters : bool, optional
Indicate if the filterbank should contain only unique filters, i.e. remove duplicate filters resulting from insufficient resolution at low frequencies.
- mul : float, optional
Multiply the magnitude spectrogram with this factor before taking the logarithm.
- add : float, optional
Add this value before taking the logarithm of the magnitudes.
-
process
(data, **kwargs)[source]¶ Perform filtering and logarithmic scaling of a spectrogram.
Parameters: - data : numpy array
Data to be processed.
- kwargs : dict
Keyword arguments passed to
LogarithmicFilteredSpectrogram
.
Returns: - log_filt_spec :
LogarithmicFilteredSpectrogram
instance Logarithmically scaled filtered spectrogram.
- filterbank :
-
class
madmom.audio.spectrogram.
SpectrogramDifference
(spectrogram, diff_ratio=0.5, diff_frames=None, diff_max_bins=None, positive_diffs=False, keep_dims=True, **kwargs)[source]¶ SpectrogramDifference class.
Parameters: - spectrogram :
Spectrogram
instance Spectrogram.
- diff_ratio : float, optional
Calculate the difference to the frame at which the window used for the STFT yields this ratio of the maximum height.
- diff_frames : int, optional
Calculate the difference to the diff_frames-th previous frame (if set, this overrides the value calculated from the diff_ratio)
- diff_max_bins : int, optional
Apply a maximum filter with this width (in bins in frequency dimension) to the spectrogram the difference is calculated to.
- positive_diffs : bool, optional
Keep only the positive differences, i.e. set all diff values < 0 to 0.
- keep_dims : bool, optional
Indicate if the dimensions (i.e. shape) of the spectrogram should be kept.
- kwargs : dict, optional
If no
Spectrogram
instance was given, one is instantiated with these additional keyword arguments.
Notes
The first diff_frames frames will have a value of 0.
If keep_dims is ‘True’ the returned difference has the same shape as the spectrogram. This is needed if the diffs should be stacked on top of it. If set to ‘False’, the length will be diff_frames frames shorter (mostly used by the SpectrogramDifferenceProcessor which first buffers that many frames.
The SuperFlux algorithm [1] uses a maximum filtered spectrogram with 3 diff_max_bins together with a 24 band logarithmic filterbank to calculate the difference spectrogram with a diff_ratio of 0.5.
The effect of this maximum filter applied to the spectrogram is that the magnitudes are “widened” in frequency direction, i.e. the following difference calculation is less sensitive against frequency fluctuations. This effect is exploited to suppress false positive energy fragments originating from vibrato.
References
[1] (1, 2) Sebastian Böck and Gerhard Widmer “Maximum Filter Vibrato Suppression for Onset Detection” Proceedings of the 16th International Conference on Digital Audio Effects (DAFx), 2013. Examples
To obtain the SuperFlux feature as described above first create a filtered and logarithmically spaced spectrogram:
>>> spec = LogarithmicFilteredSpectrogram('tests/data/audio/sample.wav', num_bands=24, fps=200) >>> spec LogarithmicFilteredSpectrogram([[0.82358, 0.86341, ..., 0.02809, 0.02672], [0.92514, 0.93211, ..., 0.03607, 0.0317 ], ..., [1.03826, 0.767 , ..., 0.01814, 0.01138], [0.98236, 0.89276, ..., 0.01669, 0.00919]], dtype=float32) >>> spec.shape (561, 140)
Then use the temporal first order difference and apply a maximum filter with 3 bands, keeping only the positive differences (i.e. rise in energy):
>>> superflux = SpectrogramDifference(spec, diff_max_bins=3, positive_diffs=True) >>> superflux SpectrogramDifference([[0. , 0. , ..., 0. , 0. ], [0. , 0. , ..., 0. , 0. ], ..., [0.01941, 0. , ..., 0. , 0. ], [0. , 0. , ..., 0. , 0. ]], dtype=float32)
-
bin_frequencies
¶ Bin frequencies.
- spectrogram :
-
class
madmom.audio.spectrogram.
SpectrogramDifferenceProcessor
(diff_ratio=0.5, diff_frames=None, diff_max_bins=None, positive_diffs=False, stack_diffs=None, **kwargs)[source]¶ Difference Spectrogram Processor class.
Parameters: - diff_ratio : float, optional
Calculate the difference to the frame at which the window used for the STFT yields this ratio of the maximum height.
- diff_frames : int, optional
Calculate the difference to the diff_frames-th previous frame (if set, this overrides the value calculated from the diff_ratio)
- diff_max_bins : int, optional
Apply a maximum filter with this width (in bins in frequency dimension) to the spectrogram the difference is calculated to.
- positive_diffs : bool, optional
Keep only the positive differences, i.e. set all diff values < 0 to 0.
- stack_diffs : numpy stacking function, optional
If ‘None’, only the differences are returned. If set, the diffs are stacked with the underlying spectrogram data according to the stack function:
np.vstack
the differences and spectrogram are stacked vertically, i.e. in time direction,np.hstack
the differences and spectrogram are stacked horizontally, i.e. in frequency direction,np.dstack
the differences and spectrogram are stacked in depth, i.e. return them as a 3D representation with depth as the third dimension.
-
process
(data, reset=True, **kwargs)[source]¶ Perform a temporal difference calculation on the given data.
Parameters: - data : numpy array
Data to be processed.
- reset : bool, optional
Reset the spectrogram buffer before computing the difference.
- kwargs : dict
Keyword arguments passed to
SpectrogramDifference
.
Returns: - diff :
SpectrogramDifference
instance Spectrogram difference.
Notes
If reset is ‘True’, the first diff_frames differences will be 0.
-
static
add_arguments
(parser, diff=None, diff_ratio=None, diff_frames=None, diff_max_bins=None, positive_diffs=None)[source]¶ Add spectrogram difference related arguments to an existing parser.
Parameters: - parser : argparse parser instance
Existing argparse parser object.
- diff : bool, optional
Take the difference of the spectrogram.
- diff_ratio : float, optional
Calculate the difference to the frame at which the window used for the STFT yields this ratio of the maximum height.
- diff_frames : int, optional
Calculate the difference to the diff_frames-th previous frame (if set, this overrides the value calculated from the diff_ratio)
- diff_max_bins : int, optional
Apply a maximum filter with this width (in bins in frequency dimension) to the spectrogram the difference is calculated to.
- positive_diffs : bool, optional
Keep only the positive differences, i.e. set all diff values < 0 to 0.
Returns: - argparse argument group
Spectrogram difference argument parser group.
Notes
Parameters are included in the group only if they are not ‘None’.
Only the diff_frames parameter behaves differently, it is included if either the diff_ratio is set or a value != ‘None’ is given.
-
class
madmom.audio.spectrogram.
SuperFluxProcessor
(**kwargs)[source]¶ Spectrogram processor which sets the default values suitable for the SuperFlux algorithm.
-
class
madmom.audio.spectrogram.
MultiBandSpectrogram
(spectrogram, crossover_frequencies, fmin=30.0, fmax=17000.0, norm_filters=True, unique_filters=True, **kwargs)[source]¶ MultiBandSpectrogram class.
Parameters: - spectrogram :
Spectrogram
instance Spectrogram.
- crossover_frequencies : list or numpy array
List of crossover frequencies at which the spectrogram is split into multiple bands.
- fmin : float, optional
Minimum frequency of the filterbank [Hz].
- fmax : float, optional
Maximum frequency of the filterbank [Hz].
- norm_filters : bool, optional
Normalize the filter bands of the filterbank to area 1.
- unique_filters : bool, optional
Indicate if the filterbank should contain only unique filters, i.e. remove duplicate filters resulting from insufficient resolution at low frequencies.
- kwargs : dict, optional
If no
Spectrogram
instance was given, one is instantiated with these additional keyword arguments.
Notes
The MultiBandSpectrogram is implemented as a
Spectrogram
which uses aaudio.filters.RectangularFilterbank
to combine multiple frequency bins.- spectrogram :
-
class
madmom.audio.spectrogram.
MultiBandSpectrogramProcessor
(crossover_frequencies, fmin=30.0, fmax=17000.0, norm_filters=True, unique_filters=True, **kwargs)[source]¶ Spectrogram processor which combines the spectrogram magnitudes into multiple bands.
Parameters: - crossover_frequencies : list or numpy array
List of crossover frequencies at which a spectrogram is split into the individual bands.
- fmin : float, optional
Minimum frequency of the filterbank [Hz].
- fmax : float, optional
Maximum frequency of the filterbank [Hz].
- norm_filters : bool, optional
Normalize the filter bands of the filterbank to area 1.
- unique_filters : bool, optional
Indicate if the filterbank should contain only unique filters, i.e. remove duplicate filters resulting from insufficient resolution at low frequencies.
-
process
(data, **kwargs)[source]¶ Return the a multi-band representation of the given data.
Parameters: - data : numpy array
Data to be processed.
- kwargs : dict
Keyword arguments passed to
MultiBandSpectrogram
.
Returns: - multi_band_spec :
MultiBandSpectrogram
instance Spectrogram split into multiple bands.
-
class
madmom.audio.spectrogram.
SemitoneBandpassSpectrogram
(signal, fps=50.0, fmin=27.5, fmax=4200.0)[source]¶ Construct a semitone spectrogram by using a time domain filterbank of bandpass filters as described in [1].
Parameters: - signal : Signal
Signal instance.
- fps : float, optional
Frame rate of the spectrogram [Hz].
- fmin : float, optional
Lowest frequency of the spectrogram [Hz].
- fmax : float, optional
Highest frequency of the spectrogram [Hz].
References
[1] (1, 2) Meinard Müller, “Information retrieval for music and motion”, Springer, 2007.