madmom.audio.stft¶

This module contains Short-Time Fourier Transform (STFT) related functionality.

madmom.audio.stft.fft_frequencies(num_fft_bins, sample_rate)[source]¶

Frequencies of the FFT bins.

Parameters:

num_fft_bins : int

Number of FFT bins (i.e. half the FFT length).

sample_rate : float

Sample rate of the signal.

Returns:

fft_frequencies : numpy array

Frequencies of the FFT bins [Hz].

madmom.audio.stft.stft(frames, window, fft_size=None, circular_shift=False)[source]¶

Calculates the complex Short-Time Fourier Transform (STFT) of the given framed signal.

Parameters:

frames : numpy array or iterable, shape (num_frames, frame_size)

Framed signal (e.g. FramedSignal instance)

window : numpy array, shape (frame_size,)

Window (function).

fft_size : int, optional

FFT size (should be a power of 2); if ‘None’, the ‘frame_size’ given by frames is used; if the given fft_size is greater than the ‘frame_size’, the frames are zero-padded accordingly.

circular_shift : bool, optional

Circular shift the individual frames before performing the FFT; needed for correct phase.

Returns:

stft : numpy array, shape (num_frames, frame_size)

The complex STFT of the framed signal.

madmom.audio.stft.phase(stft)[source]¶

Returns the phase of the complex STFT of a signal.

Parameters:

stft : numpy array, shape (num_frames, frame_size)

The complex STFT of a signal.

Returns:

phase : numpy array

Phase of the STFT.

madmom.audio.stft.local_group_delay(phase)[source]¶

Returns the local group delay of the phase of a signal.

Parameters:

phase : numpy array, shape (num_frames, frame_size)

Phase of the STFT of a signal.

Returns:

lgd : numpy array

Local group delay of the phase.

madmom.audio.stft.lgd(phase)¶

Returns the local group delay of the phase of a signal.

Parameters:

phase : numpy array, shape (num_frames, frame_size)

Phase of the STFT of a signal.

Returns:

lgd : numpy array

Local group delay of the phase.

class madmom.audio.stft.ShortTimeFourierTransform(frames, window=<function hanning>, fft_size=None, circular_shift=False, **kwargs)[source]¶

ShortTimeFourierTransform class.

Parameters:

frames : audio.signal.FramedSignal instance

Framed signal.

window : numpy ufunc or numpy array, optional

Window (function); if a function (e.g. np.hanning) is given, a window with the frame size of frames and the given shape is created.

fft_size : int, optional

FFT size (should be a power of 2); if ‘None’, the frame_size given by frames is used, if the given fft_size is greater than the frame_size, the frames are zero-padded accordingly.

circular_shift : bool, optional

Circular shift the individual frames before performing the FFT; needed for correct phase.

kwargs : dict, optional

If no audio.signal.FramedSignal instance was given, one is instantiated with these additional keyword arguments.

Notes

If the Signal (wrapped in the FramedSignal) has an integer dtype, the window is automatically scaled as if the signal had a float dtype with the values being in the range [-1, 1]. This results in same valued STFTs independently of the dtype of the signal. On the other hand, this prevents extra memory consumption since the data-type of the signal does not need to be converted (and if no decoding is needed, the audio signal can be memory-mapped).

Examples

Create a ShortTimeFourierTransform from a Signal or FramedSignal:

>>> sig = Signal('tests/data/audio/sample.wav')
>>> sig
Signal([-2494, -2510, ...,   655,   639], dtype=int16)
>>> frames = FramedSignal(sig, frame_size=2048, hop_size=441)
>>> frames  
<madmom.audio.signal.FramedSignal object at 0x...>
>>> stft = ShortTimeFourierTransform(frames)
>>> stft  
ShortTimeFourierTransform([[-3.15249+0.j     ,  2.62216-3.02425j, ...,
                            -0.03634-0.00005j,  0.03670+0.00029j],
                           [-4.28429+0.j     ,  2.02009+2.01264j, ...,
                            -0.01981-0.00933j, -0.00536+0.02162j],
                           ...,
                           [-4.92274+0.j     ,  4.09839-9.42525j, ...,
                             0.00550-0.00257j,  0.00137+0.00577j],
                           [-9.22709+0.j     ,  8.76929+4.0005j , ...,
                             0.00981-0.00014j, -0.00984+0.00006j]],
                          dtype=complex64)

A ShortTimeFourierTransform can be instantiated directly from a file name:

>>> stft = ShortTimeFourierTransform('tests/data/audio/sample.wav')
>>> stft  
ShortTimeFourierTransform([[...]], dtype=complex64)

Doing the same with a Signal of float data-type will result in a STFT of same value range (rounding errors will occur of course):

>>> sig = Signal('tests/data/audio/sample.wav', dtype=np.float)
>>> sig  
Signal([-0.07611, -0.0766 , ...,  0.01999,  0.0195 ])
>>> frames = FramedSignal(sig, frame_size=2048, hop_size=441)
>>> frames  
<madmom.audio.signal.FramedSignal object at 0x...>
>>> stft = ShortTimeFourierTransform(frames)
>>> stft  
ShortTimeFourierTransform([[-3.15240+0.j     ,  2.62208-3.02415j, ...,
                            -0.03633-0.00005j,  0.03670+0.00029j],
                           [-4.28416+0.j     ,  2.02003+2.01257j, ...,
                            -0.01981-0.00933j, -0.00536+0.02162j],
                           ...,
                           [-4.92259+0.j     ,  4.09827-9.42496j, ...,
                             0.00550-0.00257j,  0.00137+0.00577j],
                           [-9.22681+0.j     ,  8.76902+4.00038j, ...,
                             0.00981-0.00014j, -0.00984+0.00006j]],
                          dtype=complex64)

Additional arguments are passed to FramedSignal and Signal respectively:

>>> stft = ShortTimeFourierTransform('tests/data/audio/sample.wav', frame_size=2048, fps=100, sample_rate=22050)
>>> stft.frames  
<madmom.audio.signal.FramedSignal object at 0x...>
>>> stft.frames.frame_size
2048
>>> stft.frames.hop_size
220.5
>>> stft.frames.signal.sample_rate
22050

spec(**kwargs)[source]¶

Returns the magnitude spectrogram of the STFT.

Parameters:

kwargs : dict, optional

Keyword arguments passed to audio.spectrogram.Spectrogram.

Returns:

spec : audio.spectrogram.Spectrogram

audio.spectrogram.Spectrogram instance.

phase(**kwargs)[source]¶

Returns the phase of the STFT.

Parameters:

kwargs : dict, optional

keyword arguments passed to Phase.

Returns:

phase : Phase

Phase instance.

madmom.audio.stft.STFT¶: alias of ShortTimeFourierTransform

class madmom.audio.stft.ShortTimeFourierTransformProcessor(window=<function hanning>, fft_size=None, circular_shift=False, **kwargs)[source]¶

ShortTimeFourierTransformProcessor class.

Parameters:

window : numpy ufunc, optional

Window function.

fft_size : int, optional

FFT size (should be a power of 2); if ‘None’, it is determined by the size of the frames; if is greater than the frame size, the frames are zero-padded accordingly.

circular_shift : bool, optional

Circular shift the individual frames before performing the FFT; needed for correct phase.

Examples

Create a ShortTimeFourierTransformProcessor and call it with either a file name or a the output of a (Framed-)SignalProcessor to obtain a ShortTimeFourierTransform instance.

>>> proc = ShortTimeFourierTransformProcessor()
>>> stft = proc('tests/data/audio/sample.wav')
>>> stft  
ShortTimeFourierTransform([[-3.15249+0.j     ,  2.62216-3.02425j, ...,
                            -0.03634-0.00005j,  0.03670+0.00029j],
                           [-4.28429+0.j     ,  2.02009+2.01264j, ...,
                            -0.01981-0.00933j, -0.00536+0.02162j],
                           ...,
                           [-4.92274+0.j     ,  4.09839-9.42525j, ...,
                             0.00550-0.00257j,  0.00137+0.00577j],
                           [-9.22709+0.j     ,  8.76929+4.0005j , ...,
                             0.00981-0.00014j, -0.00984+0.00006j]],
                          dtype=complex64)

process(data, **kwargs)[source]¶

Perform FFT on a framed signal and return the STFT.

Parameters:

data : numpy array

Data to be processed.

kwargs : dict, optional

Keyword arguments passed to ShortTimeFourierTransform.

Returns:

stft : ShortTimeFourierTransform

ShortTimeFourierTransform instance.

static add_arguments(parser, window=None, fft_size=None)[source]¶

Add STFT related arguments to an existing parser.

Parameters:

parser : argparse parser instance

Existing argparse parser.

window : numpy ufunc, optional

Window function.

fft_size : int, optional

Use this size for FFT (should be a power of 2).

Returns:

argparse argument group

STFT argument parser group.

Notes

Parameters are included in the group only if they are not ‘None’.

madmom.audio.stft.STFTProcessor¶: alias of ShortTimeFourierTransformProcessor

class madmom.audio.stft.Phase(stft, **kwargs)[source]¶

Phase class.

Parameters:

stft : ShortTimeFourierTransform instance

ShortTimeFourierTransform instance.

kwargs : dict, optional

If no ShortTimeFourierTransform instance was given, one is instantiated with these additional keyword arguments.

Examples

Create a Phase from a ShortTimeFourierTransform (or anything it can be instantiated from:

>>> stft = ShortTimeFourierTransform('tests/data/audio/sample.wav')
>>> phase = Phase(stft)
>>> phase  
Phase([[ 3.14159, -0.85649, ..., -3.14016,  0.00779],
       [ 3.14159,  0.78355, ..., -2.70136,  1.81393],
       ...,
       [ 3.14159, -1.16063, ..., -0.4373 ,  1.33774],
       [ 3.14159,  0.42799, ..., -0.0142 ,  3.13592]], dtype=float32)

local_group_delay(**kwargs)[source]¶

Returns the local group delay of the phase.

Parameters:

kwargs : dict, optional

Keyword arguments passed to LocalGroupDelay.

Returns:

lgd : LocalGroupDelay instance

LocalGroupDelay instance.

lgd(**kwargs)¶

Returns the local group delay of the phase.

Parameters:

kwargs : dict, optional

Keyword arguments passed to LocalGroupDelay.

Returns:

lgd : LocalGroupDelay instance

LocalGroupDelay instance.

class madmom.audio.stft.LocalGroupDelay(phase, **kwargs)[source]¶

Local Group Delay class.

Parameters:

stft : Phase instance

Phase instance.

kwargs : dict, optional

If no Phase instance was given, one is instantiated with these additional keyword arguments.

Examples

Create a LocalGroupDelay from a ShortTimeFourierTransform (or anything it can be instantiated from:

>>> stft = ShortTimeFourierTransform('tests/data/audio/sample.wav')
>>> lgd = LocalGroupDelay(stft)
>>> lgd  
LocalGroupDelay([[-2.2851 , -2.25605, ...,  3.13525,  0. ],
                 [ 2.35804,  2.53786, ...,  1.76788,  0. ],
                 ...,
                 [-1.98..., -2.93039, ..., -1.77505,  0. ],
                 [ 2.7136 ,  2.60925, ...,  3.13318,  0. ]])

madmom.audio.stft.LGD¶: alias of LocalGroupDelay