madmom.audio.signal

This module contains basic signal processing functionality.

madmom.audio.signal.smooth(signal, kernel)[source]

Smooth the signal along its first axis.

Parameters:

signal : numpy array

Signal to be smoothed.

kernel : numpy array or int

Smoothing kernel (size).

Returns:

numpy array

Smoothed signal.

Notes

If kernel is an integer, a Hamming window of that length will be used as a smoothing kernel.

madmom.audio.signal.adjust_gain(signal, gain)[source]

” Adjust the gain of the signal.

Parameters:

signal : numpy array

Signal to be adjusted.

gain : float

Gain adjustment level [dB].

Returns:

numpy array

Signal with adjusted gain.

Notes

The signal is returned with the same dtype, thus rounding errors may occur with integer dtypes.

gain values > 0 amplify the signal and are only supported for signals with float dtype to prevent clipping and integer overflows.

madmom.audio.signal.attenuate(signal, attenuation)[source]

Attenuate the signal.

Parameters:

signal : numpy array

Signal to be attenuated.

attenuation : float

Attenuation level [dB].

Returns:

numpy array

Attenuated signal (same dtype as signal).

Notes

The signal is returned with the same dtype, thus rounding errors may occur with integer dtypes.

madmom.audio.signal.normalize(signal)[source]

Normalize the signal to have maximum amplitude.

Parameters:

signal : numpy array

Signal to be normalized.

Returns:

numpy array

Normalized signal.

Notes

Signals with float dtypes cover the range [-1, +1], signals with integer dtypes will cover the maximally possible range, e.g. [-32768, 32767] for np.int16.

The signal is returned with the same dtype, thus rounding errors may occur with integer dtypes.

madmom.audio.signal.remix(signal, num_channels)[source]

Remix the signal to have the desired number of channels.

Parameters:

signal : numpy array

Signal to be remixed.

num_channels : int

Number of channels.

Returns:

numpy array

Remixed signal (same dtype as signal).

Notes

This function does not support arbitrary channel number conversions. Only down-mixing to and up-mixing from mono signals is supported.

The signal is returned with the same dtype, thus rounding errors may occur with integer dtypes.

If the signal should be down-mixed to mono and has an integer dtype, it will be converted to float internally and then back to the original dtype to prevent clipping of the signal. To avoid this double conversion, convert the dtype first.

madmom.audio.signal.resample(signal, sample_rate, **kwargs)[source]

Resample the signal.

Parameters:

signal : numpy array or Signal

Signal to be resampled.

sample_rate : int

Sample rate of the signal.

kwargs : dict, optional

Keyword arguments passed to load_ffmpeg_file().

Returns:

numpy array or Signal

Resampled signal.

Notes

This function uses ffmpeg to resample the signal.

madmom.audio.signal.rescale(signal, dtype=<type 'numpy.float32'>)[source]

Rescale the signal to range [-1, 1] and return as float dtype.

Parameters:

signal : numpy array

Signal to be remixed.

dtype : numpy dtype

Data type of the signal.

Returns:

numpy array

Signal rescaled to range [-1, 1].

madmom.audio.signal.trim(signal, where='fb')[source]

Trim leading and trailing zeros of the signal.

Parameters:

signal : numpy array

Signal to be trimmed.

where : str, optional

A string with ‘f’ representing trim from front and ‘b’ to trim from back. Default is ‘fb’, trim zeros from both ends of the signal.

Returns:

numpy array

Trimmed signal.

madmom.audio.signal.energy(signal)[source]

Compute the energy of a (framed) signal.

Parameters:

signal : numpy array

Signal.

Returns:

energy : float

Energy of the signal.

Notes

If signal is a FramedSignal, the energy is computed for each frame individually.

madmom.audio.signal.root_mean_square(signal)[source]

Compute the root mean square of a (framed) signal. This can be used as a measurement of power.

Parameters:

signal : numpy array

Signal.

Returns:

rms : float

Root mean square of the signal.

Notes

If signal is a FramedSignal, the root mean square is computed for each frame individually.

madmom.audio.signal.sound_pressure_level(signal, p_ref=None)[source]

Compute the sound pressure level of a (framed) signal.

Parameters:

signal : numpy array

Signal.

p_ref : float, optional

Reference sound pressure level; if ‘None’, take the max amplitude value for the data-type, if the data-type is float, assume amplitudes are between -1 and +1.

Returns:

spl : float

Sound pressure level of the signal [dB].

Notes

From http://en.wikipedia.org/wiki/Sound_pressure: Sound pressure level (SPL) or sound level is a logarithmic measure of the effective sound pressure of a sound relative to a reference value. It is measured in decibels (dB) above a standard reference level.

If signal is a FramedSignal, the sound pressure level is computed for each frame individually.

exception madmom.audio.signal.LoadAudioFileError(value=None)[source]

Exception to be raised whenever an audio file could not be loaded.

madmom.audio.signal.load_wave_file(filename, sample_rate=None, num_channels=None, start=None, stop=None, dtype=None)[source]

Load the audio data from the given file and return it as a numpy array.

Only supports wave files, does not support re-sampling or arbitrary channel number conversions. Reads the data as a memory-mapped file with copy-on-write semantics to defer I/O costs until needed.

Parameters:

filename : str

Name of the file.

sample_rate : int, optional

Desired sample rate of the signal [Hz], or ‘None’ to return the signal in its original rate.

num_channels : int, optional

Reduce or expand the signal to num_channels channels, or ‘None’ to return the signal with its original channels.

start : float, optional

Start position [seconds].

stop : float, optional

Stop position [seconds].

dtype : numpy data type, optional

The data is returned with the given dtype. If ‘None’, it is returned with its original dtype, otherwise the signal gets rescaled. Integer dtypes use the complete value range, float dtypes the range [-1, +1].

Returns:

signal : numpy array

Audio signal.

sample_rate : int

Sample rate of the signal [Hz].

Notes

The start and stop positions are rounded to the closest sample; the sample corresponding to the stop value is not returned, thus consecutive segment starting with the previous stop can be concatenated to obtain the original signal without gaps or overlaps.

madmom.audio.signal.write_wave_file(signal, filename, sample_rate=None)[source]

Write the signal to disk as a .wav file.

Parameters:

signal : numpy array or Signal

The signal to be written to file.

filename : str

Name of the file.

sample_rate : int, optional

Sample rate of the signal [Hz].

Returns:

filename : str

Name of the file.

Notes

sample_rate can be ‘None’ if signal is a Signal instance. If set, the given sample_rate is used instead of the signal’s sample rate. Must be given if signal is a ndarray.

madmom.audio.signal.load_audio_file(filename, sample_rate=None, num_channels=None, start=None, stop=None, dtype=None)[source]

Load the audio data from the given file and return it as a numpy array. This tries load_wave_file() load_ffmpeg_file() (for ffmpeg and avconv).

Parameters:

filename : str or file handle

Name of the file or file handle.

sample_rate : int, optional

Desired sample rate of the signal [Hz], or ‘None’ to return the signal in its original rate.

num_channels: int, optional

Reduce or expand the signal to num_channels channels, or ‘None’ to return the signal with its original channels.

start : float, optional

Start position [seconds].

stop : float, optional

Stop position [seconds].

dtype : numpy data type, optional

The data is returned with the given dtype. If ‘None’, it is returned with its original dtype, otherwise the signal gets rescaled. Integer dtypes use the complete value range, float dtypes the range [-1, +1].

Returns:

signal : numpy array

Audio signal.

sample_rate : int

Sample rate of the signal [Hz].

Notes

For wave files, the start and stop positions are rounded to the closest sample; the sample corresponding to the stop value is not returned, thus consecutive segment starting with the previous stop can be concatenated to obtain the original signal without gaps or overlaps. For all other audio files, this can not be guaranteed.

class madmom.audio.signal.Signal(data, sample_rate=None, num_channels=None, start=None, stop=None, norm=False, gain=0.0, dtype=None, **kwargs)[source]

The Signal class represents a signal as a (memory-mapped) numpy array and enhances it with a number of attributes.

Parameters:

data : numpy array, str or file handle

Signal data or file name or file handle.

sample_rate : int, optional

Desired sample rate of the signal [Hz], or ‘None’ to return the signal in its original rate.

num_channels : int, optional

Reduce or expand the signal to num_channels channels, or ‘None’ to return the signal with its original channels.

start : float, optional

Start position [seconds].

stop : float, optional

Stop position [seconds].

norm : bool, optional

Normalize the signal to maximum range of the data type.

gain : float, optional

Adjust the gain of the signal [dB].

dtype : numpy data type, optional

The data is returned with the given dtype. If ‘None’, it is returned with its original dtype, otherwise the signal gets rescaled. Integer dtypes use the complete value range, float dtypes the range [-1, +1].

Notes

sample_rate or num_channels can be used to set the desired sample rate and number of channels if the audio is read from file. If set to ‘None’ the audio signal is used as is, i.e. the sample rate and number of channels are determined directly from the audio file.

If the data is a numpy array, the sample_rate is set to the given value and num_channels is set to the number of columns of the array.

The gain can be used to adjust the level of the signal.

If both norm and gain are set, the signal is first normalized and then the gain is applied afterwards.

If norm or gain is set, the selected part of the signal is loaded into memory completely, i.e. .wav files are not memory-mapped any more.

Examples

Load a mono audio file:

>>> sig = Signal('tests/data/audio/sample.wav')
>>> sig
Signal([-2494, -2510, ...,   655,   639], dtype=int16)
>>> sig.sample_rate
44100

Load a stereo audio file, down-mix it to mono:

>>> sig = Signal('tests/data/audio/stereo_sample.flac', num_channels=1)
>>> sig
Signal([ 36,  36, ..., 524, 495], dtype=int16)
>>> sig.num_channels
1

Load and re-sample an audio file:

>>> sig = Signal('tests/data/audio/sample.wav', sample_rate=22050)
>>> sig
Signal([-2470, -2553, ...,   517,   677], dtype=int16)
>>> sig.sample_rate
22050

Load an audio file with float32 data type (i.e. rescale it to [-1, 1]):

>>> sig = Signal('tests/data/audio/sample.wav', dtype=np.float32)
>>> sig
Signal([-0.07611, -0.0766 , ...,  0.01999,  0.0195 ], dtype=float32)
>>> sig.dtype
dtype('float32')
num_samples

Number of samples.

num_channels

Number of channels.

length

Length of signal in seconds.

write(filename)[source]

Write the signal to disk as a .wav file.

Parameters:

filename : str

Name of the file.

Returns:

filename : str

Name of the written file.

energy()[source]

Energy of signal.

root_mean_square()[source]

Root mean square of signal.

rms()

Root mean square of signal.

sound_pressure_level()[source]

Sound pressure level of signal.

spl()

Sound pressure level of signal.

class madmom.audio.signal.SignalProcessor(sample_rate=None, num_channels=None, start=None, stop=None, norm=False, gain=0.0, **kwargs)[source]

The SignalProcessor class is a basic signal processor.

Parameters:

sample_rate : int, optional

Sample rate of the signal [Hz]; if set the signal will be re-sampled to that sample rate; if ‘None’ the sample rate of the audio file will be used.

num_channels : int, optional

Number of channels of the signal; if set, the signal will be reduced to that number of channels; if ‘None’ as many channels as present in the audio file are returned.

start : float, optional

Start position [seconds].

stop : float, optional

Stop position [seconds].

norm : bool, optional

Normalize the signal to the range [-1, +1].

gain : float, optional

Adjust the gain of the signal [dB].

dtype : numpy data type, optional

The data is returned with the given dtype. If ‘None’, it is returned with its original dtype, otherwise the signal gets rescaled. Integer dtypes use the complete value range, float dtypes the range [-1, +1].

Examples

Processor for loading the first two seconds of an audio file, re-sampling it to 22.05 kHz and down-mixing it to mono:

>>> proc = SignalProcessor(sample_rate=22050, num_channels=1, stop=2)
>>> sig = proc('tests/data/audio/sample.wav')
>>> sig
Signal([-2470, -2553, ...,  -173,  -265], dtype=int16)
>>> sig.sample_rate
22050
>>> sig.num_channels
1
>>> sig.length
2.0
process(data, **kwargs)[source]

Processes the given audio file.

Parameters:

data : numpy array, str or file handle

Data to be processed.

kwargs : dict, optional

Keyword arguments passed to Signal.

Returns:

signal : Signal instance

Signal instance.

static add_arguments(parser, sample_rate=None, mono=None, start=None, stop=None, norm=None, gain=None)[source]

Add signal processing related arguments to an existing parser.

Parameters:

parser : argparse parser instance

Existing argparse parser object.

sample_rate : int, optional

Re-sample the signal to this sample rate [Hz].

mono : bool, optional

Down-mix the signal to mono.

start : float, optional

Start position [seconds].

stop : float, optional

Stop position [seconds].

norm : bool, optional

Normalize the signal to the range [-1, +1].

gain : float, optional

Adjust the gain of the signal [dB].

Returns:

argparse argument group

Signal processing argument parser group.

Notes

Parameters are included in the group only if they are not ‘None’. To include start and stop arguments with a default value of ‘None’, i.e. do not set any start or stop time, they can be set to ‘True’.

madmom.audio.signal.signal_frame(signal, index, frame_size, hop_size, origin=0)[source]

This function returns frame at index of the signal.

Parameters:

signal : numpy array

Signal.

index : int

Index of the frame to return.

frame_size : int

Size of each frame in samples.

hop_size : float

Hop size in samples between adjacent frames.

origin : int

Location of the window center relative to the signal position.

Returns:

frame : numpy array

Requested frame of the signal.

Notes

The reference sample of the first frame (index == 0) refers to the first sample of the signal, and each following frame is placed hop_size samples after the previous one.

The window is always centered around this reference sample. Its location relative to the reference sample can be set with the origin parameter. Arbitrary integer values can be given:

  • zero centers the window on its reference sample
  • negative values shift the window to the right
  • positive values shift the window to the left

An origin of half the size of the frame_size results in windows located to the left of the reference sample, i.e. the first frame starts at the first sample of the signal.

The part of the frame which is not covered by the signal is padded with zeros.

This function is totally independent of the length of the signal. Thus, contrary to common indexing, the index ‘-1’ refers NOT to the last frame of the signal, but instead the frame left of the first frame is returned.

class madmom.audio.signal.FramedSignal(signal, frame_size=2048, hop_size=441.0, fps=None, origin=0, end='normal', num_frames=None, **kwargs)[source]

The FramedSignal splits a Signal into frames and makes it iterable and indexable.

Parameters:

signal : Signal instance

Signal to be split into frames.

frame_size : int, optional

Size of one frame [samples].

hop_size : float, optional

Progress hop_size samples between adjacent frames.

fps : float, optional

Use given frames per second; if set, this computes and overwrites the given hop_size value.

origin : int, optional

Location of the window relative to the reference sample of a frame.

end : int or str, optional

End of signal handling (see notes below).

num_frames : int, optional

Number of frames to return.

kwargs : dict, optional

If no Signal instance was given, one is instantiated with these additional keyword arguments.

Notes

The FramedSignal class is implemented as an iterator. It splits the given signal automatically into frames of frame_size length with hop_size samples (can be float, normal rounding applies) between the frames. The reference sample of the first frame refers to the first sample of the signal.

The location of the window relative to the reference sample of a frame can be set with the origin parameter (with the same behaviour as used by scipy.ndimage filters). Arbitrary integer values can be given:

  • zero centers the window on its reference sample,
  • negative values shift the window to the right,
  • positive values shift the window to the left.

Additionally, it can have the following literal values:

  • ‘center’, ‘offline’: the window is centered on its reference sample,
  • ‘left’, ‘past’, ‘online’: the window is located to the left of its reference sample (including the reference sample),
  • ‘right’, ‘future’, ‘stream’: the window is located to the right of its reference sample.

The end parameter is used to handle the end of signal behaviour and can have these values:

  • ‘normal’: stop as soon as the whole signal got covered by at least one frame (i.e. pad maximally one frame),
  • ‘extend’: frames are returned as long as part of the frame overlaps with the signal to cover the whole signal.

Alternatively, num_frames can be used to retrieve a fixed number of frames.

In order to be able to stack multiple frames obtained with different frame sizes, the number of frames to be returned must be independent from the set frame_size. It is not guaranteed that every sample of the signal is returned in a frame unless the origin is either ‘right’ or ‘future’.

If used in online real-time mode the parameters origin and num_frames should be set to ‘stream’ and 1, respectively.

Examples

To chop a Signal (or anything a Signal can be instantiated from) into overlapping frames of size 2048 with adjacent frames being 441 samples apart:

>>> sig = Signal('tests/data/audio/sample.wav')
>>> sig
Signal([-2494, -2510, ...,   655,   639], dtype=int16)
>>> frames = FramedSignal(sig, frame_size=2048, hop_size=441)
>>> frames  
<madmom.audio.signal.FramedSignal object at 0x...>
>>> frames[0]
Signal([    0,     0, ..., -4666, -4589], dtype=int16)
>>> frames[10]
Signal([-6156, -5645, ...,  -253,   671], dtype=int16)
>>> frames.fps
100.0

Instead of passing a Signal instance as the first argument, anything a Signal can be instantiated from (e.g. a file name) can be used. We can also set the frames per second (fps) instead, they get converted to hop_size based on the sample_rate of the signal:

>>> frames = FramedSignal('tests/data/audio/sample.wav', fps=100)
>>> frames  
<madmom.audio.signal.FramedSignal object at 0x...>
>>> frames[0]
Signal([    0,     0, ..., -4666, -4589], dtype=int16)
>>> frames.frame_size, frames.hop_size
(2048, 441.0)

When trying to access an out of range frame, an IndexError is raised. Thus the FramedSignal can be used the same way as a numpy array or any other iterable.

>>> frames = FramedSignal('tests/data/audio/sample.wav')
>>> frames.num_frames
281
>>> frames[281]
Traceback (most recent call last):
IndexError: end of signal reached
>>> frames.shape
(281, 2048)

Slices are FramedSignals itself:

>>> frames[:4]  
<madmom.audio.signal.FramedSignal object at 0x...>

To obtain a numpy array from a FramedSignal, simply use np.array() on the full FramedSignal or a slice of it. Please note, that this requires a full memory copy.

>>> np.array(frames[2:4])
array([[    0,     0, ..., -5316, -5405],
       [ 2215,  2281, ...,   561,   653]], dtype=int16)
frame_rate

Frame rate (same as fps).

fps

Frames per second.

overlap_factor

Overlapping factor of two adjacent frames.

shape

Shape of the FramedSignal (num_frames, frame_size[, num_channels]).

ndim

Dimensionality of the FramedSignal.

energy()[source]

Energy of the individual frames.

root_mean_square()[source]

Root mean square of the individual frames.

rms()

Root mean square of the individual frames.

sound_pressure_level()[source]

Sound pressure level of the individual frames.

spl()

Sound pressure level of the individual frames.

class madmom.audio.signal.FramedSignalProcessor(frame_size=2048, hop_size=441.0, fps=None, origin=0, end='normal', num_frames=None, **kwargs)[source]

Slice a Signal into frames.

Parameters:

frame_size : int, optional

Size of one frame [samples].

hop_size : float, optional

Progress hop_size samples between adjacent frames.

fps : float, optional

Use given frames per second; if set, this computes and overwrites the given hop_size value.

origin : int, optional

Location of the window relative to the reference sample of a frame.

end : int or str, optional

End of signal handling (see FramedSignal).

num_frames : int, optional

Number of frames to return.

kwargs : dict, optional

If no Signal instance was given, one is instantiated with these additional keyword arguments.

Notes

When operating on live audio signals, origin must be set to ‘stream’ in order to retrieve always the last frame_size samples.

Examples

Processor for chopping a Signal (or anything a Signal can be instantiated from) into overlapping frames of size 2048, and a frame rate of 100 frames per second:

>>> proc = FramedSignalProcessor(frame_size=2048, fps=100)
>>> frames = proc('tests/data/audio/sample.wav')
>>> frames  
<madmom.audio.signal.FramedSignal object at 0x...>
>>> frames[0]
Signal([    0,     0, ..., -4666, -4589], dtype=int16)
>>> frames[10]
Signal([-6156, -5645, ...,  -253,   671], dtype=int16)
>>> frames.hop_size
441.0
process(data, **kwargs)[source]

Slice the signal into (overlapping) frames.

Parameters:

data : Signal instance

Signal to be sliced into frames.

kwargs : dict, optional

Keyword arguments passed to FramedSignal.

Returns:

frames : FramedSignal instance

FramedSignal instance

static add_arguments(parser, frame_size=2048, fps=None, online=None)[source]

Add signal framing related arguments to an existing parser.

Parameters:

parser : argparse parser instance

Existing argparse parser object.

frame_size : int, optional

Size of one frame in samples.

fps : float, optional

Frames per second.

online : bool, optional

Online mode (use only past signal information, i.e. align the window to the left of the reference sample).

Returns:

argparse argument group

Signal framing argument parser group.

Notes

Parameters are included in the group only if they are not ‘None’.

class madmom.audio.signal.Stream(sample_rate=None, num_channels=None, dtype=<type 'numpy.float32'>, frame_size=2048, hop_size=441.0, fps=None, **kwargs)[source]

A Stream handles live (i.e. online, real-time) audio input via PyAudio.

Parameters:

sample_rate : int

Sample rate of the signal.

num_channels : int, optional

Number of channels.

dtype : numpy dtype, optional

Data type for the signal.

frame_size : int, optional

Size of one frame [samples].

hop_size : int, optional

Progress hop_size samples between adjacent frames.

fps : float, optional

Use given frames per second; if set, this computes and overwrites the given hop_size value (the resulting hop_size must be an integer).

queue_size : int

Size of the FIFO (first in first out) queue. If the queue is full and new audio samples arrive, the oldest item in the queue will be dropped.

Notes

Stream is implemented as an iterable which blocks until enough new data is available.

shape

Shape of the Stream (None, frame_size[, num_channels]).