madmom.audio.stft¶
This module contains Short-Time Fourier Transform (STFT) related functionality.
-
madmom.audio.stft.fft_frequencies(num_fft_bins, sample_rate)[source]¶ Frequencies of the FFT bins.
Parameters: - num_fft_bins : int
Number of FFT bins (i.e. half the FFT length).
- sample_rate : float
Sample rate of the signal.
Returns: - fft_frequencies : numpy array
Frequencies of the FFT bins [Hz].
-
madmom.audio.stft.stft(frames, window, fft_size=None, circular_shift=False, include_nyquist=False, fftw=None)[source]¶ Calculates the complex Short-Time Fourier Transform (STFT) of the given framed signal.
Parameters: - frames : numpy array or iterable, shape (num_frames, frame_size)
Framed signal (e.g.
FramedSignalinstance)- window : numpy array, shape (frame_size,)
Window (function).
- fft_size : int, optional
FFT size (should be a power of 2); if ‘None’, the ‘frame_size’ given by frames is used; if the given fft_size is greater than the ‘frame_size’, the frames are zero-padded, if smaller truncated.
- circular_shift : bool, optional
Circular shift the individual frames before performing the FFT; needed for correct phase.
- include_nyquist : bool, optional
Include the Nyquist frequency bin (sample rate / 2) in returned STFT.
- fftw :
pyfftw.FFTWinstance, optional If a
pyfftw.FFTWobject is given it is used to compute the STFT with the FFTW library. Requires ‘pyfftw’.
Returns: - stft : numpy array, shape (num_frames, frame_size)
The complex STFT of the framed signal.
-
madmom.audio.stft.phase(stft)[source]¶ Returns the phase of the complex STFT of a signal.
Parameters: - stft : numpy array, shape (num_frames, frame_size)
The complex STFT of a signal.
Returns: - phase : numpy array
Phase of the STFT.
-
madmom.audio.stft.local_group_delay(phase)[source]¶ Returns the local group delay of the phase of a signal.
Parameters: - phase : numpy array, shape (num_frames, frame_size)
Phase of the STFT of a signal.
Returns: - lgd : numpy array
Local group delay of the phase.
-
madmom.audio.stft.lgd(phase)¶ Returns the local group delay of the phase of a signal.
Parameters: - phase : numpy array, shape (num_frames, frame_size)
Phase of the STFT of a signal.
Returns: - lgd : numpy array
Local group delay of the phase.
-
class
madmom.audio.stft.ShortTimeFourierTransform(frames, window=<function hanning>, fft_size=None, circular_shift=False, include_nyquist=False, fft_window=None, fftw=None, **kwargs)[source]¶ ShortTimeFourierTransform class.
Parameters: - frames :
audio.signal.FramedSignalinstance Framed signal.
- window : numpy ufunc or numpy array, optional
Window (function); if a function (e.g. np.hanning) is given, a window with the frame size of frames and the given shape is created.
- fft_size : int, optional
FFT size (should be a power of 2); if ‘None’, the frame_size given by frames is used, if the given fft_size is greater than the frame_size, the frames are zero-padded accordingly.
- circular_shift : bool, optional
Circular shift the individual frames before performing the FFT; needed for correct phase.
- include_nyquist : bool, optional
Include the Nyquist frequency bin (sample rate / 2).
- fftw :
pyfftw.FFTWinstance, optional If a
pyfftw.FFTWobject is given it is used to compute the STFT with the FFTW library. If ‘None’, a newpyfftw.FFTWobject is built. Requires ‘pyfftw’.- kwargs : dict, optional
If no
audio.signal.FramedSignalinstance was given, one is instantiated with these additional keyword arguments.
Notes
If the
Signal(wrapped in theFramedSignal) has an integer dtype, the window is automatically scaled as if the signal had a float dtype with the values being in the range [-1, 1]. This results in same valued STFTs independently of the dtype of the signal. On the other hand, this prevents extra memory consumption since the data-type of the signal does not need to be converted (and if no decoding is needed, the audio signal can be memory-mapped).Examples
Create a
ShortTimeFourierTransformfrom aSignalorFramedSignal:>>> sig = Signal('tests/data/audio/sample.wav') >>> sig Signal([-2494, -2510, ..., 655, 639], dtype=int16) >>> frames = FramedSignal(sig, frame_size=2048, hop_size=441) >>> frames <madmom.audio.signal.FramedSignal object at 0x...> >>> stft = ShortTimeFourierTransform(frames) >>> stft ShortTimeFourierTransform([[-3.15249+0.j , 2.62216-3.02425j, ..., -0.03634-0.00005j, 0.0367 +0.00029j], [-4.28429+0.j , 2.02009+2.01264j, ..., -0.01981-0.00933j, -0.00536+0.02162j], ..., [-4.92274+0.j , 4.09839-9.42525j, ..., 0.0055 -0.00257j, 0.00137+0.00577j], [-9.22709+0.j , 8.76929+4.0005j , ..., 0.00981-0.00014j, -0.00984+0.00006j]], dtype=complex64)
A ShortTimeFourierTransform can be instantiated directly from a file name:
>>> stft = ShortTimeFourierTransform('tests/data/audio/sample.wav') >>> stft ShortTimeFourierTransform([[...]], dtype=complex64)
Doing the same with a Signal of float data-type will result in a STFT of same value range (rounding errors will occur of course):
>>> sig = Signal('tests/data/audio/sample.wav', dtype=np.float) >>> sig Signal([-0.07611, -0.0766 , ..., 0.01999, 0.0195 ]) >>> frames = FramedSignal(sig, frame_size=2048, hop_size=441) >>> frames <madmom.audio.signal.FramedSignal object at 0x...> >>> stft = ShortTimeFourierTransform(frames) >>> stft ShortTimeFourierTransform([[-3.1524 +0.j , 2.62208-3.02415j, ..., -0.03633-0.00005j, 0.0367 +0.00029j], [-4.28416+0.j , 2.02003+2.01257j, ..., -0.01981-0.00933j, -0.00536+0.02162j], ..., [-4.92259+0.j , 4.09827-9.42496j, ..., 0.0055 -0.00257j, 0.00137+0.00577j], [-9.22681+0.j , 8.76902+4.00038j, ..., 0.00981-0.00014j, -0.00984+0.00006j]], dtype=complex64)
Additional arguments are passed to
FramedSignalandSignalrespectively:>>> stft = ShortTimeFourierTransform('tests/data/audio/sample.wav', frame_size=2048, fps=100, sample_rate=22050) >>> stft.frames <madmom.audio.signal.FramedSignal object at 0x...> >>> stft.frames.frame_size 2048 >>> stft.frames.hop_size 220.5 >>> stft.frames.signal.sample_rate 22050
-
bin_frequencies¶ Bin frequencies.
-
spec(**kwargs)[source]¶ Returns the magnitude spectrogram of the STFT.
Parameters: - kwargs : dict, optional
Keyword arguments passed to
audio.spectrogram.Spectrogram.
Returns: - spec :
audio.spectrogram.Spectrogram audio.spectrogram.Spectrograminstance.
- frames :
-
madmom.audio.stft.STFT¶
-
class
madmom.audio.stft.ShortTimeFourierTransformProcessor(window=<function hanning>, fft_size=None, circular_shift=False, include_nyquist=False, **kwargs)[source]¶ ShortTimeFourierTransformProcessor class.
Parameters: - window : numpy ufunc, optional
Window function.
- fft_size : int, optional
FFT size (should be a power of 2); if ‘None’, it is determined by the size of the frames; if is greater than the frame size, the frames are zero-padded accordingly.
- circular_shift : bool, optional
Circular shift the individual frames before performing the FFT; needed for correct phase.
- include_nyquist : bool, optional
Include the Nyquist frequency bin (sample rate / 2).
Examples
Create a
ShortTimeFourierTransformProcessorand call it with either a file name or a the output of a (Framed-)SignalProcessor to obtain aShortTimeFourierTransforminstance.>>> proc = ShortTimeFourierTransformProcessor() >>> stft = proc('tests/data/audio/sample.wav') >>> stft ShortTimeFourierTransform([[-3.15249+0.j , 2.62216-3.02425j, ..., -0.03634-0.00005j, 0.0367 +0.00029j], [-4.28429+0.j , 2.02009+2.01264j, ..., -0.01981-0.00933j, -0.00536+0.02162j], ..., [-4.92274+0.j , 4.09839-9.42525j, ..., 0.0055 -0.00257j, 0.00137+0.00577j], [-9.22709+0.j , 8.76929+4.0005j , ..., 0.00981-0.00014j, -0.00984+0.00006j]], dtype=complex64)
-
process(data, **kwargs)[source]¶ Perform FFT on a framed signal and return the STFT.
Parameters: - data : numpy array
Data to be processed.
- kwargs : dict, optional
Keyword arguments passed to
ShortTimeFourierTransform.
Returns: - stft :
ShortTimeFourierTransform ShortTimeFourierTransforminstance.
-
static
add_arguments(parser, window=None, fft_size=None)[source]¶ Add STFT related arguments to an existing parser.
Parameters: - parser : argparse parser instance
Existing argparse parser.
- window : numpy ufunc, optional
Window function.
- fft_size : int, optional
Use this size for FFT (should be a power of 2).
Returns: - argparse argument group
STFT argument parser group.
Notes
Parameters are included in the group only if they are not ‘None’.
-
madmom.audio.stft.STFTProcessor¶ alias of
madmom.audio.stft.ShortTimeFourierTransformProcessor
-
class
madmom.audio.stft.Phase(stft, **kwargs)[source]¶ Phase class.
Parameters: - stft :
ShortTimeFourierTransforminstance ShortTimeFourierTransforminstance.- kwargs : dict, optional
If no
ShortTimeFourierTransforminstance was given, one is instantiated with these additional keyword arguments.
Examples
Create a
Phasefrom aShortTimeFourierTransform(or anything it can be instantiated from:>>> stft = ShortTimeFourierTransform('tests/data/audio/sample.wav') >>> phase = Phase(stft) >>> phase Phase([[ 3.14159, -0.85649, ..., -3.14016, 0.00779], [ 3.14159, 0.78355, ..., -2.70136, 1.81393], ..., [ 3.14159, -1.16063, ..., -0.4373 , 1.33774], [ 3.14159, 0.42799, ..., -0.0142 , 3.13592]], dtype=float32)
-
bin_frequencies¶ Bin frequencies.
-
local_group_delay(**kwargs)[source]¶ Returns the local group delay of the phase.
Parameters: - kwargs : dict, optional
Keyword arguments passed to
LocalGroupDelay.
Returns: - lgd :
LocalGroupDelayinstance LocalGroupDelayinstance.
-
lgd(**kwargs)¶ Returns the local group delay of the phase.
Parameters: - kwargs : dict, optional
Keyword arguments passed to
LocalGroupDelay.
Returns: - lgd :
LocalGroupDelayinstance LocalGroupDelayinstance.
- stft :
-
class
madmom.audio.stft.LocalGroupDelay(phase, **kwargs)[source]¶ Local Group Delay class.
Parameters: Examples
Create a
LocalGroupDelayfrom aShortTimeFourierTransform(or anything it can be instantiated from:>>> stft = ShortTimeFourierTransform('tests/data/audio/sample.wav') >>> lgd = LocalGroupDelay(stft) >>> lgd LocalGroupDelay([[-2.2851 , -2.25605, ..., 3.13525, 0. ], [ 2.35804, 2.53786, ..., 1.76788, 0. ], ..., [-1.98..., -2.93039, ..., -1.77505, 0. ], [ 2.7136 , 2.60925, ..., 3.13318, 0. ]])
-
bin_frequencies¶ Bin frequencies.
-
-
madmom.audio.stft.LGD¶ alias of
madmom.audio.stft.LocalGroupDelay