madmom.audio.stft¶

This module contains Short-Time Fourier Transform (STFT) related functionality.

madmom.audio.stft.fft_frequencies(num_fft_bins, sample_rate)[source]¶

Frequencies of the FFT bins.

Parameters:	num_fft_bins : int Number of FFT bins (i.e. half the FFT length). sample_rate : float Sample rate of the signal.
Returns:	fft_frequencies : numpy array Frequencies of the FFT bins [Hz].

madmom.audio.stft.stft(frames, window, fft_size=None, circular_shift=False, include_nyquist=False, fftw=None)[source]¶

Calculates the complex Short-Time Fourier Transform (STFT) of the given framed signal.

Parameters:

frames : numpy array or iterable, shape (num_frames, frame_size): Framed signal (e.g. FramedSignal instance)
window : numpy array, shape (frame_size,): Window (function).
fft_size : int, optional: FFT size (should be a power of 2); if ‘None’, the ‘frame_size’ given by frames is used; if the given fft_size is greater than the ‘frame_size’, the frames are zero-padded, if smaller truncated.
circular_shift : bool, optional: Circular shift the individual frames before performing the FFT; needed for correct phase.
include_nyquist : bool, optional: Include the Nyquist frequency bin (sample rate / 2) in returned STFT.
fftw : pyfftw.FFTW instance, optional: If a pyfftw.FFTW object is given it is used to compute the STFT with the FFTW library. Requires ‘pyfftw’.

Returns:

stft : numpy array, shape (num_frames, frame_size): The complex STFT of the framed signal.

madmom.audio.stft.phase(stft)[source]¶

Returns the phase of the complex STFT of a signal.

Parameters:	stft : numpy array, shape (num_frames, frame_size) The complex STFT of a signal.
Returns:	phase : numpy array Phase of the STFT.

madmom.audio.stft.local_group_delay(phase)[source]¶

Returns the local group delay of the phase of a signal.

Parameters:	phase : numpy array, shape (num_frames, frame_size) Phase of the STFT of a signal.
Returns:	lgd : numpy array Local group delay of the phase.

madmom.audio.stft.lgd(phase)¶

Returns the local group delay of the phase of a signal.

Parameters:	phase : numpy array, shape (num_frames, frame_size) Phase of the STFT of a signal.
Returns:	lgd : numpy array Local group delay of the phase.

class madmom.audio.stft.ShortTimeFourierTransform(frames, window=<function hanning>, fft_size=None, circular_shift=False, include_nyquist=False, fft_window=None, fftw=None, **kwargs)[source]¶

ShortTimeFourierTransform class.

Parameters:

frames : audio.signal.FramedSignal instance: Framed signal.
window : numpy ufunc or numpy array, optional: Window (function); if a function (e.g. np.hanning) is given, a window with the frame size of frames and the given shape is created.
fft_size : int, optional: FFT size (should be a power of 2); if ‘None’, the frame_size given by frames is used, if the given fft_size is greater than the frame_size, the frames are zero-padded accordingly.
circular_shift : bool, optional: Circular shift the individual frames before performing the FFT; needed for correct phase.
include_nyquist : bool, optional: Include the Nyquist frequency bin (sample rate / 2).
fftw : pyfftw.FFTW instance, optional: If a pyfftw.FFTW object is given it is used to compute the STFT with the FFTW library. If ‘None’, a new pyfftw.FFTW object is built. Requires ‘pyfftw’.
kwargs : dict, optional: If no audio.signal.FramedSignal instance was given, one is instantiated with these additional keyword arguments.

Notes

If the Signal (wrapped in the FramedSignal) has an integer dtype, the window is automatically scaled as if the signal had a float dtype with the values being in the range [-1, 1]. This results in same valued STFTs independently of the dtype of the signal. On the other hand, this prevents extra memory consumption since the data-type of the signal does not need to be converted (and if no decoding is needed, the audio signal can be memory-mapped).

Examples

Create a ShortTimeFourierTransform from a Signal or FramedSignal:

>>> sig = Signal('tests/data/audio/sample.wav')
>>> sig
Signal([-2494, -2510, ...,   655,   639], dtype=int16)
>>> frames = FramedSignal(sig, frame_size=2048, hop_size=441)
>>> frames  
<madmom.audio.signal.FramedSignal object at 0x...>
>>> stft = ShortTimeFourierTransform(frames)
>>> stft  
ShortTimeFourierTransform([[-3.15249+0.j     ,  2.62216-3.02425j, ...,
                            -0.03634-0.00005j,  0.0367 +0.00029j],
                           [-4.28429+0.j     ,  2.02009+2.01264j, ...,
                            -0.01981-0.00933j, -0.00536+0.02162j],
                           ...,
                           [-4.92274+0.j     ,  4.09839-9.42525j, ...,
                             0.0055 -0.00257j,  0.00137+0.00577j],
                           [-9.22709+0.j     ,  8.76929+4.0005j , ...,
                             0.00981-0.00014j, -0.00984+0.00006j]],
                          dtype=complex64)

A ShortTimeFourierTransform can be instantiated directly from a file name:

>>> stft = ShortTimeFourierTransform('tests/data/audio/sample.wav')
>>> stft  
ShortTimeFourierTransform([[...]], dtype=complex64)

Doing the same with a Signal of float data-type will result in a STFT of same value range (rounding errors will occur of course):

>>> sig = Signal('tests/data/audio/sample.wav', dtype=np.float)
>>> sig  
Signal([-0.07611, -0.0766 , ...,  0.01999,  0.0195 ])
>>> frames = FramedSignal(sig, frame_size=2048, hop_size=441)
>>> frames  
<madmom.audio.signal.FramedSignal object at 0x...>
>>> stft = ShortTimeFourierTransform(frames)
>>> stft  
ShortTimeFourierTransform([[-3.1524 +0.j     ,  2.62208-3.02415j, ...,
                            -0.03633-0.00005j,  0.0367 +0.00029j],
                           [-4.28416+0.j     ,  2.02003+2.01257j, ...,
                            -0.01981-0.00933j, -0.00536+0.02162j],
                           ...,
                           [-4.92259+0.j     ,  4.09827-9.42496j, ...,
                             0.0055 -0.00257j,  0.00137+0.00577j],
                           [-9.22681+0.j     ,  8.76902+4.00038j, ...,
                             0.00981-0.00014j, -0.00984+0.00006j]],
                          dtype=complex64)

Additional arguments are passed to FramedSignal and Signal respectively:

>>> stft = ShortTimeFourierTransform('tests/data/audio/sample.wav', frame_size=2048, fps=100, sample_rate=22050)
>>> stft.frames  
<madmom.audio.signal.FramedSignal object at 0x...>
>>> stft.frames.frame_size
2048
>>> stft.frames.hop_size
220.5
>>> stft.frames.signal.sample_rate
22050

bin_frequencies¶: Bin frequencies.

spec(**kwargs)[source]¶

Returns the magnitude spectrogram of the STFT.

Parameters:	kwargs : dict, optional Keyword arguments passed to `audio.spectrogram.Spectrogram`.
Returns:	spec : `audio.spectrogram.Spectrogram` `audio.spectrogram.Spectrogram` instance.

phase(**kwargs)[source]¶

Returns the phase of the STFT.

Parameters:	kwargs : dict, optional keyword arguments passed to `Phase`.
Returns:	phase : `Phase` `Phase` instance.

madmom.audio.stft.STFT¶: alias of madmom.audio.stft.ShortTimeFourierTransform

class madmom.audio.stft.ShortTimeFourierTransformProcessor(window=<function hanning>, fft_size=None, circular_shift=False, include_nyquist=False, **kwargs)[source]¶

ShortTimeFourierTransformProcessor class.

Parameters:

window : numpy ufunc, optional: Window function.
fft_size : int, optional: FFT size (should be a power of 2); if ‘None’, it is determined by the size of the frames; if is greater than the frame size, the frames are zero-padded accordingly.
circular_shift : bool, optional: Circular shift the individual frames before performing the FFT; needed for correct phase.
include_nyquist : bool, optional: Include the Nyquist frequency bin (sample rate / 2).

Examples

Create a ShortTimeFourierTransformProcessor and call it with either a file name or a the output of a (Framed-)SignalProcessor to obtain a ShortTimeFourierTransform instance.

>>> proc = ShortTimeFourierTransformProcessor()
>>> stft = proc('tests/data/audio/sample.wav')
>>> stft  
ShortTimeFourierTransform([[-3.15249+0.j     ,  2.62216-3.02425j, ...,
                            -0.03634-0.00005j,  0.0367 +0.00029j],
                           [-4.28429+0.j     ,  2.02009+2.01264j, ...,
                            -0.01981-0.00933j, -0.00536+0.02162j],
                           ...,
                           [-4.92274+0.j     ,  4.09839-9.42525j, ...,
                             0.0055 -0.00257j,  0.00137+0.00577j],
                           [-9.22709+0.j     ,  8.76929+4.0005j , ...,
                             0.00981-0.00014j, -0.00984+0.00006j]],
                          dtype=complex64)

process(data, **kwargs)[source]¶

Perform FFT on a framed signal and return the STFT.

Parameters:	data : numpy array Data to be processed. kwargs : dict, optional Keyword arguments passed to `ShortTimeFourierTransform`.
Returns:	stft : `ShortTimeFourierTransform` `ShortTimeFourierTransform` instance.

static add_arguments(parser, window=None, fft_size=None)[source]¶

Add STFT related arguments to an existing parser.

Parameters:	parser : argparse parser instance Existing argparse parser. window : numpy ufunc, optional Window function. fft_size : int, optional Use this size for FFT (should be a power of 2).
Returns:	argparse argument group STFT argument parser group.

Notes

Parameters are included in the group only if they are not ‘None’.

madmom.audio.stft.STFTProcessor¶: alias of madmom.audio.stft.ShortTimeFourierTransformProcessor

class madmom.audio.stft.Phase(stft, **kwargs)[source]¶

Phase class.

Parameters:	stft : `ShortTimeFourierTransform` instance `ShortTimeFourierTransform` instance. kwargs : dict, optional If no `ShortTimeFourierTransform` instance was given, one is instantiated with these additional keyword arguments.

Examples

Create a Phase from a ShortTimeFourierTransform (or anything it can be instantiated from:

>>> stft = ShortTimeFourierTransform('tests/data/audio/sample.wav')
>>> phase = Phase(stft)
>>> phase  
Phase([[ 3.14159, -0.85649, ..., -3.14016,  0.00779],
       [ 3.14159,  0.78355, ..., -2.70136,  1.81393],
       ...,
       [ 3.14159, -1.16063, ..., -0.4373 ,  1.33774],
       [ 3.14159,  0.42799, ..., -0.0142 ,  3.13592]], dtype=float32)

bin_frequencies¶: Bin frequencies.

local_group_delay(**kwargs)[source]¶

Returns the local group delay of the phase.

Parameters:	kwargs : dict, optional Keyword arguments passed to `LocalGroupDelay`.
Returns:	lgd : `LocalGroupDelay` instance `LocalGroupDelay` instance.

lgd(**kwargs)¶

Returns the local group delay of the phase.

Parameters:	kwargs : dict, optional Keyword arguments passed to `LocalGroupDelay`.
Returns:	lgd : `LocalGroupDelay` instance `LocalGroupDelay` instance.

class madmom.audio.stft.LocalGroupDelay(phase, **kwargs)[source]¶

Local Group Delay class.

Parameters:	stft : `Phase` instance `Phase` instance. kwargs : dict, optional If no `Phase` instance was given, one is instantiated with these additional keyword arguments.

Examples

Create a LocalGroupDelay from a ShortTimeFourierTransform (or anything it can be instantiated from:

>>> stft = ShortTimeFourierTransform('tests/data/audio/sample.wav')
>>> lgd = LocalGroupDelay(stft)
>>> lgd  
LocalGroupDelay([[-2.2851 , -2.25605, ...,  3.13525,  0. ],
                 [ 2.35804,  2.53786, ...,  1.76788,  0. ],
                 ...,
                 [-1.98..., -2.93039, ..., -1.77505,  0. ],
                 [ 2.7136 ,  2.60925, ...,  3.13318,  0. ]])

bin_frequencies¶: Bin frequencies.

madmom.audio.stft.LGD¶: alias of madmom.audio.stft.LocalGroupDelay