madmom.audio.signal¶

This module contains basic signal processing functionality.

madmom.audio.signal.smooth(signal, kernel)[source]¶

Smooth the signal along its first axis.

Parameters:	signal : numpy array Signal to be smoothed. kernel : numpy array or int Smoothing kernel (size).
Returns:	numpy array Smoothed signal.

Notes

If kernel is an integer, a Hamming window of that length will be used as a smoothing kernel.

madmom.audio.signal.adjust_gain(signal, gain)[source]¶

” Adjust the gain of the signal.

Parameters:	signal : numpy array Signal to be adjusted. gain : float Gain adjustment level [dB].
Returns:	numpy array Signal with adjusted gain.

Notes

The signal is returned with the same dtype, thus rounding errors may occur with integer dtypes.

gain values > 0 amplify the signal and are only supported for signals with float dtype to prevent clipping and integer overflows.

madmom.audio.signal.attenuate(signal, attenuation)[source]¶

Attenuate the signal.

Parameters:	signal : numpy array Signal to be attenuated. attenuation : float Attenuation level [dB].
Returns:	numpy array Attenuated signal (same dtype as signal).

Notes

The signal is returned with the same dtype, thus rounding errors may occur with integer dtypes.

madmom.audio.signal.normalize(signal)[source]¶

Normalize the signal to have maximum amplitude.

Parameters:	signal : numpy array Signal to be normalized.
Returns:	numpy array Normalized signal.

Notes

Signals with float dtypes cover the range [-1, +1], signals with integer dtypes will cover the maximally possible range, e.g. [-32768, 32767] for np.int16.

The signal is returned with the same dtype, thus rounding errors may occur with integer dtypes.

madmom.audio.signal.remix(signal, num_channels)[source]¶

Remix the signal to have the desired number of channels.

Parameters:	signal : numpy array Signal to be remixed. num_channels : int Number of channels.
Returns:	numpy array Remixed signal (same dtype as signal).

Notes

This function does not support arbitrary channel number conversions. Only down-mixing to and up-mixing from mono signals is supported.

The signal is returned with the same dtype, thus rounding errors may occur with integer dtypes.

If the signal should be down-mixed to mono and has an integer dtype, it will be converted to float internally and then back to the original dtype to prevent clipping of the signal. To avoid this double conversion, convert the dtype first.

madmom.audio.signal.resample(signal, sample_rate, **kwargs)[source]¶

Resample the signal.

Parameters:	signal : numpy array or Signal Signal to be resampled. sample_rate : int Sample rate of the signal. kwargs : dict, optional Keyword arguments passed to `load_ffmpeg_file()`.
Returns:	numpy array or Signal Resampled signal.

Notes

This function uses ffmpeg to resample the signal.

madmom.audio.signal.rescale(signal, dtype=<type 'numpy.float32'>)[source]¶

Rescale the signal to range [-1, 1] and return as float dtype.

Parameters:	signal : numpy array Signal to be remixed. dtype : numpy dtype Data type of the signal.
Returns:	numpy array Signal rescaled to range [-1, 1].

madmom.audio.signal.trim(signal, where='fb')[source]¶

Trim leading and trailing zeros of the signal.

Parameters:	signal : numpy array Signal to be trimmed. where : str, optional A string with ‘f’ representing trim from front and ‘b’ to trim from back. Default is ‘fb’, trim zeros from both ends of the signal.
Returns:	numpy array Trimmed signal.

madmom.audio.signal.energy(signal)[source]¶

Compute the energy of a (framed) signal.

Parameters:	signal : numpy array Signal.
Returns:	energy : float Energy of the signal.

Notes

If signal is a FramedSignal, the energy is computed for each frame individually.

madmom.audio.signal.root_mean_square(signal)[source]¶

Compute the root mean square of a (framed) signal. This can be used as a measurement of power.

Parameters:	signal : numpy array Signal.
Returns:	rms : float Root mean square of the signal.

Notes

If signal is a FramedSignal, the root mean square is computed for each frame individually.

madmom.audio.signal.sound_pressure_level(signal, p_ref=None)[source]¶

Compute the sound pressure level of a (framed) signal.

Parameters:	signal : numpy array Signal. p_ref : float, optional Reference sound pressure level; if ‘None’, take the max amplitude value for the data-type, if the data-type is float, assume amplitudes are between -1 and +1.
Returns:	spl : float Sound pressure level of the signal [dB].

Notes

From http://en.wikipedia.org/wiki/Sound_pressure: Sound pressure level (SPL) or sound level is a logarithmic measure of the effective sound pressure of a sound relative to a reference value. It is measured in decibels (dB) above a standard reference level.

If signal is a FramedSignal, the sound pressure level is computed for each frame individually.

exception madmom.audio.signal.LoadAudioFileError(value=None)[source]¶: Deprecated as of version 0.16. Please use madmom.io.audio.LoadAudioFileError instead. Will be removed in version 0.18.

madmom.audio.signal.load_wave_file(*args, **kwargs)[source]¶: Deprecated as of version 0.16. Please use madmom.io.audio.load_wave_file instead. Will be removed in version 0.18.

madmom.audio.signal.write_wave_file(*args, **kwargs)[source]¶: Deprecated as of version 0.16. Please use madmom.io.audio.write_wave_file instead. Will be removed in version 0.18.

madmom.audio.signal.load_audio_file(*args, **kwargs)[source]¶: Deprecated as of version 0.16. Please use madmom.io.audio.load_audio_file instead. Will be removed in version 0.18.

class madmom.audio.signal.Signal(data, sample_rate=None, num_channels=None, start=None, stop=None, norm=False, gain=0.0, dtype=None, **kwargs)[source]¶

The Signal class represents a signal as a (memory-mapped) numpy array and enhances it with a number of attributes.

Parameters:

data : numpy array, str or file handle: Signal data or file name or file handle.
sample_rate : int, optional: Desired sample rate of the signal [Hz], or ‘None’ to return the signal in its original rate.
num_channels : int, optional: Reduce or expand the signal to num_channels channels, or ‘None’ to return the signal with its original channels.
start : float, optional: Start position [seconds].
stop : float, optional: Stop position [seconds].
norm : bool, optional: Normalize the signal to maximum range of the data type.
gain : float, optional: Adjust the gain of the signal [dB].
dtype : numpy data type, optional: The data is returned with the given dtype. If ‘None’, it is returned with its original dtype, otherwise the signal gets rescaled. Integer dtypes use the complete value range, float dtypes the range [-1, +1].

Notes

sample_rate or num_channels can be used to set the desired sample rate and number of channels if the audio is read from file. If set to ‘None’ the audio signal is used as is, i.e. the sample rate and number of channels are determined directly from the audio file.

If the data is a numpy array, the sample_rate is set to the given value and num_channels is set to the number of columns of the array.

The gain can be used to adjust the level of the signal.

If both norm and gain are set, the signal is first normalized and then the gain is applied afterwards.

If norm or gain is set, the selected part of the signal is loaded into memory completely, i.e. .wav files are not memory-mapped any more.

Examples

Load a mono audio file:

>>> sig = Signal('tests/data/audio/sample.wav')
>>> sig
Signal([-2494, -2510, ...,   655,   639], dtype=int16)
>>> sig.sample_rate
44100

Load a stereo audio file, down-mix it to mono:

>>> sig = Signal('tests/data/audio/stereo_sample.flac', num_channels=1)
>>> sig
Signal([ 36,  36, ..., 524, 495], dtype=int16)
>>> sig.num_channels
1

Load and re-sample an audio file:

>>> sig = Signal('tests/data/audio/sample.wav', sample_rate=22050)
>>> sig
Signal([-2470, -2553, ...,   517,   677], dtype=int16)
>>> sig.sample_rate
22050

Load an audio file with float32 data type (i.e. rescale it to [-1, 1]):

>>> sig = Signal('tests/data/audio/sample.wav', dtype=np.float32)
>>> sig
Signal([-0.07611, -0.0766 , ...,  0.01999,  0.0195 ], dtype=float32)
>>> sig.dtype
dtype('float32')

num_samples¶: Number of samples.

num_channels¶: Number of channels.

length¶: Length of signal in seconds.

write(filename)[source]¶

Write the signal to disk as a .wav file.

Parameters:	filename : str Name of the file.
Returns:	filename : str Name of the written file.

energy()[source]¶: Energy of signal.

root_mean_square()[source]¶: Root mean square of signal.

rms()¶: Root mean square of signal.

sound_pressure_level()[source]¶: Sound pressure level of signal.

spl()¶: Sound pressure level of signal.

class madmom.audio.signal.SignalProcessor(sample_rate=None, num_channels=None, start=None, stop=None, norm=False, gain=0.0, dtype=None, **kwargs)[source]¶

The SignalProcessor class is a basic signal processor.

Parameters:

sample_rate : int, optional: Sample rate of the signal [Hz]; if set the signal will be re-sampled to that sample rate; if ‘None’ the sample rate of the audio file will be used.
num_channels : int, optional: Number of channels of the signal; if set, the signal will be reduced to that number of channels; if ‘None’ as many channels as present in the audio file are returned.
start : float, optional: Start position [seconds].
stop : float, optional: Stop position [seconds].
norm : bool, optional: Normalize the signal to the range [-1, +1].
gain : float, optional: Adjust the gain of the signal [dB].
dtype : numpy data type, optional: The data is returned with the given dtype. If ‘None’, it is returned with its original dtype, otherwise the signal gets rescaled. Integer dtypes use the complete value range, float dtypes the range [-1, +1].

Examples

Processor for loading the first two seconds of an audio file, re-sampling it to 22.05 kHz and down-mixing it to mono:

>>> proc = SignalProcessor(sample_rate=22050, num_channels=1, stop=2)
>>> sig = proc('tests/data/audio/sample.wav')
>>> sig
Signal([-2470, -2553, ...,  -173,  -265], dtype=int16)
>>> sig.sample_rate
22050
>>> sig.num_channels
1
>>> sig.length
2.0

process(data, **kwargs)[source]¶

Processes the given audio file.

Parameters:	data : numpy array, str or file handle Data to be processed. kwargs : dict, optional Keyword arguments passed to `Signal`.
Returns:	signal : `Signal` instance `Signal` instance.

static add_arguments(parser, sample_rate=None, mono=None, start=None, stop=None, norm=None, gain=None)[source]¶

Add signal processing related arguments to an existing parser.

Parameters:

parser : argparse parser instance: Existing argparse parser object.
sample_rate : int, optional: Re-sample the signal to this sample rate [Hz].
mono : bool, optional: Down-mix the signal to mono.
start : float, optional: Start position [seconds].
stop : float, optional: Stop position [seconds].
norm : bool, optional: Normalize the signal to the range [-1, +1].
gain : float, optional: Adjust the gain of the signal [dB].

Returns:

argparse argument group: Signal processing argument parser group.

Notes

Parameters are included in the group only if they are not ‘None’. To include start and stop arguments with a default value of ‘None’, i.e. do not set any start or stop time, they can be set to ‘True’.

madmom.audio.signal.signal_frame(signal, index, frame_size, hop_size, origin=0)[source]¶

This function returns frame at index of the signal.

Parameters:	signal : numpy array Signal. index : int Index of the frame to return. frame_size : int Size of each frame in samples. hop_size : float Hop size in samples between adjacent frames. origin : int Location of the window center relative to the signal position.
Returns:	frame : numpy array Requested frame of the signal.

Notes

The reference sample of the first frame (index == 0) refers to the first sample of the signal, and each following frame is placed hop_size samples after the previous one.

The window is always centered around this reference sample. Its location relative to the reference sample can be set with the origin parameter. Arbitrary integer values can be given:

zero centers the window on its reference sample
negative values shift the window to the right
positive values shift the window to the left

An origin of half the size of the frame_size results in windows located to the left of the reference sample, i.e. the first frame starts at the first sample of the signal.

The part of the frame which is not covered by the signal is padded with zeros.

This function is totally independent of the length of the signal. Thus, contrary to common indexing, the index ‘-1’ refers NOT to the last frame of the signal, but instead the frame left of the first frame is returned.

class madmom.audio.signal.FramedSignal(signal, frame_size=2048, hop_size=441.0, fps=None, origin=0, end='normal', num_frames=None, **kwargs)[source]¶

The FramedSignal splits a Signal into frames and makes it iterable and indexable.

Parameters:

signal : Signal instance: Signal to be split into frames.
frame_size : int, optional: Size of one frame [samples].
hop_size : float, optional: Progress hop_size samples between adjacent frames.
fps : float, optional: Use given frames per second; if set, this computes and overwrites the given hop_size value.
origin : int, optional: Location of the window relative to the reference sample of a frame.
end : int or str, optional: End of signal handling (see notes below).
num_frames : int, optional: Number of frames to return.
kwargs : dict, optional: If no Signal instance was given, one is instantiated with these additional keyword arguments.

Notes

The FramedSignal class is implemented as an iterator. It splits the given signal automatically into frames of frame_size length with hop_size samples (can be float, normal rounding applies) between the frames. The reference sample of the first frame refers to the first sample of the signal.

The location of the window relative to the reference sample of a frame can be set with the origin parameter (with the same behaviour as used by scipy.ndimage filters). Arbitrary integer values can be given:

zero centers the window on its reference sample,
negative values shift the window to the right,
positive values shift the window to the left.

Additionally, it can have the following literal values:

‘center’, ‘offline’: the window is centered on its reference sample,
‘left’, ‘past’, ‘online’: the window is located to the left of its reference sample (including the reference sample),
‘right’, ‘future’, ‘stream’: the window is located to the right of its reference sample.

The end parameter is used to handle the end of signal behaviour and can have these values:

‘normal’: stop as soon as the whole signal got covered by at least one frame (i.e. pad maximally one frame),
‘extend’: frames are returned as long as part of the frame overlaps with the signal to cover the whole signal.

Alternatively, num_frames can be used to retrieve a fixed number of frames.

In order to be able to stack multiple frames obtained with different frame sizes, the number of frames to be returned must be independent from the set frame_size. It is not guaranteed that every sample of the signal is returned in a frame unless the origin is either ‘right’ or ‘future’.

If used in online real-time mode the parameters origin and num_frames should be set to ‘stream’ and 1, respectively.

Examples

To chop a Signal (or anything a Signal can be instantiated from) into overlapping frames of size 2048 with adjacent frames being 441 samples apart:

>>> sig = Signal('tests/data/audio/sample.wav')
>>> sig
Signal([-2494, -2510, ...,   655,   639], dtype=int16)
>>> frames = FramedSignal(sig, frame_size=2048, hop_size=441)
>>> frames  
<madmom.audio.signal.FramedSignal object at 0x...>
>>> frames[0]
Signal([    0,     0, ..., -4666, -4589], dtype=int16)
>>> frames[10]
Signal([-6156, -5645, ...,  -253,   671], dtype=int16)
>>> frames.fps
100.0

Instead of passing a Signal instance as the first argument, anything a Signal can be instantiated from (e.g. a file name) can be used. We can also set the frames per second (fps) instead, they get converted to hop_size based on the sample_rate of the signal:

>>> frames = FramedSignal('tests/data/audio/sample.wav', fps=100)
>>> frames  
<madmom.audio.signal.FramedSignal object at 0x...>
>>> frames[0]
Signal([    0,     0, ..., -4666, -4589], dtype=int16)
>>> frames.frame_size, frames.hop_size
(2048, 441.0)

When trying to access an out of range frame, an IndexError is raised. Thus the FramedSignal can be used the same way as a numpy array or any other iterable.

>>> frames = FramedSignal('tests/data/audio/sample.wav')
>>> frames.num_frames
281
>>> frames[281]
Traceback (most recent call last):
IndexError: end of signal reached
>>> frames.shape
(281, 2048)

Slices are FramedSignals itself:

>>> frames[:4]  
<madmom.audio.signal.FramedSignal object at 0x...>

To obtain a numpy array from a FramedSignal, simply use np.array() on the full FramedSignal or a slice of it. Please note, that this requires a full memory copy.

>>> np.array(frames[2:4])
array([[    0,     0, ..., -5316, -5405],
       [ 2215,  2281, ...,   561,   653]], dtype=int16)

frame_rate¶: Frame rate (same as fps).

fps¶: Frames per second.

overlap_factor¶: Overlapping factor of two adjacent frames.

shape¶: Shape of the FramedSignal (num_frames, frame_size[, num_channels]).

ndim¶: Dimensionality of the FramedSignal.

energy()[source]¶: Energy of the individual frames.

root_mean_square()[source]¶: Root mean square of the individual frames.

rms()¶: Root mean square of the individual frames.

sound_pressure_level()[source]¶: Sound pressure level of the individual frames.

spl()¶: Sound pressure level of the individual frames.

class madmom.audio.signal.FramedSignalProcessor(frame_size=2048, hop_size=441.0, fps=None, origin=0, end='normal', num_frames=None, **kwargs)[source]¶

Slice a Signal into frames.

Parameters:

frame_size : int, optional: Size of one frame [samples].
hop_size : float, optional: Progress hop_size samples between adjacent frames.
fps : float, optional: Use given frames per second; if set, this computes and overwrites the given hop_size value.
origin : int, optional: Location of the window relative to the reference sample of a frame.
end : int or str, optional: End of signal handling (see FramedSignal).
num_frames : int, optional: Number of frames to return.

Notes

When operating on live audio signals, origin must be set to ‘stream’ in order to retrieve always the last frame_size samples.

Examples

Processor for chopping a Signal (or anything a Signal can be instantiated from) into overlapping frames of size 2048, and a frame rate of 100 frames per second:

>>> proc = FramedSignalProcessor(frame_size=2048, fps=100)
>>> frames = proc('tests/data/audio/sample.wav')
>>> frames  
<madmom.audio.signal.FramedSignal object at 0x...>
>>> frames[0]
Signal([    0,     0, ..., -4666, -4589], dtype=int16)
>>> frames[10]
Signal([-6156, -5645, ...,  -253,   671], dtype=int16)
>>> frames.hop_size
441.0

process(data, **kwargs)[source]¶

Slice the signal into (overlapping) frames.

Parameters:	data : `Signal` instance Signal to be sliced into frames. kwargs : dict, optional Keyword arguments passed to `FramedSignal`.
Returns:	frames : `FramedSignal` instance FramedSignal instance

static add_arguments(parser, frame_size=2048, fps=None, online=None)[source]¶

Add signal framing related arguments to an existing parser.

Parameters:	parser : argparse parser instance Existing argparse parser object. frame_size : int, optional Size of one frame in samples. fps : float, optional Frames per second. online : bool, optional Online mode (use only past signal information, i.e. align the window to the left of the reference sample).
Returns:	argparse argument group Signal framing argument parser group.

Notes

Parameters are included in the group only if they are not ‘None’.

class madmom.audio.signal.Stream(sample_rate=None, num_channels=None, dtype=<type 'numpy.float32'>, frame_size=2048, hop_size=441.0, fps=None, **kwargs)[source]¶

A Stream handles live (i.e. online, real-time) audio input via PyAudio.

Parameters:

sample_rate : int: Sample rate of the signal.
num_channels : int, optional: Number of channels.
dtype : numpy dtype, optional: Data type for the signal.
frame_size : int, optional: Size of one frame [samples].
hop_size : int, optional: Progress hop_size samples between adjacent frames.
fps : float, optional: Use given frames per second; if set, this computes and overwrites the given hop_size value (the resulting hop_size must be an integer).
queue_size : int: Size of the FIFO (first in first out) queue. If the queue is full and new audio samples arrive, the oldest item in the queue will be dropped.

Notes

Stream is implemented as an iterable which blocks until enough new data is available.

shape¶: Shape of the Stream (None, frame_size[, num_channels]).