madmom.audio.signal

This module contains basic signal processing functionality.

madmom.audio.signal.smooth(signal, kernel)[source]

Smooth the signal along its first axis.

Parameters:

signal : numpy array

Signal to be smoothed.

kernel : numpy array or int

Smoothing kernel (size).

Returns:

numpy array

Smoothed signal.

Notes

If kernel is an integer, a Hamming window of that length will be used as a smoothing kernel.

madmom.audio.signal.adjust_gain(signal, gain)[source]

” Adjust the gain of the signal.

Parameters:

signal : numpy array

Signal to be adjusted.

gain : float

Gain adjustment level [dB].

Returns:

numpy array

Signal with adjusted gain.

Notes

The signal is returned with the same dtype, thus rounding errors may occur with integer dtypes.

gain values > 0 amplify the signal and are only supported for signals with float dtype to prevent clipping and integer overflows.

madmom.audio.signal.attenuate(signal, attenuation)[source]

Attenuate the signal.

Parameters:

signal : numpy array

Signal to be attenuated.

attenuation : float

Attenuation level [dB].

Returns:

numpy array

Attenuated signal (same dtype as signal).

Notes

The signal is returned with the same dtype, thus rounding errors may occur with integer dtypes.

madmom.audio.signal.normalize(signal)[source]

Normalize the signal to have maximum amplitude.

Parameters:

signal : numpy array

Signal to be normalized.

Returns:

numpy array

Normalized signal.

Notes

Signals with float dtypes cover the range [-1, +1], signals with integer dtypes will cover the maximally possible range, e.g. [-32768, 32767] for np.int16.

The signal is returned with the same dtype, thus rounding errors may occur with integer dtypes.

madmom.audio.signal.remix(signal, num_channels)[source]

Remix the signal to have the desired number of channels.

Parameters:

signal : numpy array

Signal to be remixed.

num_channels : int

Number of channels.

Returns:

numpy array

Remixed signal (same dtype as signal).

Notes

This function does not support arbitrary channel number conversions. Only down-mixing to and up-mixing from mono signals is supported.

The signal is returned with the same dtype, thus rounding errors may occur with integer dtypes.

If the signal should be down-mixed to mono and has an integer dtype, it will be converted to float internally and then back to the original dtype to prevent clipping of the signal. To avoid this double conversion, convert the dtype first.

madmom.audio.signal.rescale(signal, dtype=<type 'numpy.float32'>)[source]

Rescale the signal to range [-1, 1] and return as float dtype.

Parameters:

signal : numpy array

Signal to be remixed.

dtype : numpy dtype

Data type of the signal.

Returns:

numpy array

Signal rescaled to range [-1, 1].

madmom.audio.signal.trim(signal, where='fb')[source]

Trim leading and trailing zeros of the signal.

Parameters:

signal : numpy array

Signal to be trimmed.

where : str, optional

A string with ‘f’ representing trim from front and ‘b’ to trim from back. Default is ‘fb’, trim zeros from both ends of the signal.

Returns:

numpy array

Trimmed signal.

madmom.audio.signal.root_mean_square(signal)[source]

Computes the root mean square of the signal. This can be used as a measurement of power.

Parameters:

signal : numpy array

Signal.

Returns:

rms : float

Root mean square of the signal.

madmom.audio.signal.sound_pressure_level(signal, p_ref=None)[source]

Computes the sound pressure level of a signal.

Parameters:

signal : numpy array

Signal.

p_ref : float, optional

Reference sound pressure level; if ‘None’, take the max amplitude value for the data-type, if the data-type is float, assume amplitudes are between -1 and +1.

Returns:

spl : float

Sound pressure level of the signal [dB].

Notes

From http://en.wikipedia.org/wiki/Sound_pressure: Sound pressure level (SPL) or sound level is a logarithmic measure of the effective sound pressure of a sound relative to a reference value. It is measured in decibels (dB) above a standard reference level.

madmom.audio.signal.load_wave_file(filename, sample_rate=None, num_channels=None, start=None, stop=None, dtype=None)[source]

Load the audio data from the given file and return it as a numpy array.

Only supports wave files, does not support re-sampling or arbitrary channel number conversions. Reads the data as a memory-mapped file with copy-on-write semantics to defer I/O costs until needed.

Parameters:

filename : string

Name of the file.

sample_rate : int, optional

Desired sample rate of the signal [Hz], or ‘None’ to return the signal in its original rate.

num_channels : int, optional

Reduce or expand the signal to num_channels channels, or ‘None’ to return the signal with its original channels.

start : float, optional

Start position [seconds].

stop : float, optional

Stop position [seconds].

dtype : numpy data type, optional

The data is returned with the given dtype. If ‘None’, it is returned with its original dtype, otherwise the signal gets rescaled. Integer dtypes use the complete value range, float dtypes the range [-1, +1].

Returns:

signal : numpy array

Audio signal.

sample_rate : int

Sample rate of the signal [Hz].

Notes

The start and stop positions are rounded to the closest sample; the sample corresponding to the stop value is not returned, thus consecutive segment starting with the previous stop can be concatenated to obtain the original signal without gaps or overlaps.

exception madmom.audio.signal.LoadAudioFileError(value=None)[source]

Exception to be raised whenever an audio file could not be loaded.

madmom.audio.signal.load_audio_file(filename, sample_rate=None, num_channels=None, start=None, stop=None, dtype=None)[source]

Load the audio data from the given file and return it as a numpy array. This tries load_wave_file() load_ffmpeg_file() (for ffmpeg and avconv).

Parameters:

filename : str or file handle

Name of the file or file handle.

sample_rate : int, optional

Desired sample rate of the signal [Hz], or ‘None’ to return the signal in its original rate.

num_channels: int, optional

Reduce or expand the signal to num_channels channels, or ‘None’ to return the signal with its original channels.

start : float, optional

Start position [seconds].

stop : float, optional

Stop position [seconds].

dtype : numpy data type, optional

The data is returned with the given dtype. If ‘None’, it is returned with its original dtype, otherwise the signal gets rescaled. Integer dtypes use the complete value range, float dtypes the range [-1, +1].

Returns:

signal : numpy array

Audio signal.

sample_rate : int

Sample rate of the signal [Hz].

Notes

For wave files, the start and stop positions are rounded to the closest sample; the sample corresponding to the stop value is not returned, thus consecutive segment starting with the previous stop can be concatenated to obtain the original signal without gaps or overlaps. For all other audio files, this can not be guaranteed.

class madmom.audio.signal.Signal(data, sample_rate=None, num_channels=None, start=None, stop=None, norm=False, gain=0, dtype=None)[source]

The Signal class represents a signal as a (memory-mapped) numpy array and enhances it with a number of attributes.

Parameters:

data : numpy array, str or file handle

Signal data or file name or file handle.

sample_rate : int, optional

Desired sample rate of the signal [Hz], or ‘None’ to return the signal in its original rate.

num_channels : int, optional

Reduce or expand the signal to num_channels channels, or ‘None’ to return the signal with its original channels.

start : float, optional

Start position [seconds].

stop : float, optional

Stop position [seconds].

norm : bool, optional

Normalize the signal to the range [-1, +1].

gain : float, optional

Adjust the gain of the signal [dB].

dtype : numpy data type, optional

The data is returned with the given dtype. If ‘None’, it is returned with its original dtype, otherwise the signal gets rescaled. Integer dtypes use the complete value range, float dtypes the range [-1, +1].

Notes

sample_rate or num_channels can be used to set the desired sample rate and number of channels if the audio is read from file. If set to ‘None’ the audio signal is used as is, i.e. the sample rate and number of channels are determined directly from the audio file.

If the data is a numpy array, the sample_rate is set to the given value and num_channels is set to the number of columns of the array.

The gain can be used to adjust the level of the signal.

If both norm and gain are set, the signal is first normalized and then the gain is applied afterwards.

If norm or gain is set, the selected part of the signal is loaded into memory completely, i.e. .wav files are not memory-mapped any more.

num_samples

Number of samples.

num_channels

Number of channels.

length

Length of signal in seconds.

class madmom.audio.signal.SignalProcessor(sample_rate=None, num_channels=None, start=None, stop=None, norm=False, att=None, gain=0.0, **kwargs)[source]

The SignalProcessor class is a basic signal processor.

Parameters:

sample_rate : int, optional

Sample rate of the signal [Hz]; if set the signal will be re-sampled to that sample rate; if ‘None’ the sample rate of the audio file will be used.

num_channels : int, optional

Number of channels of the signal; if set, the signal will be reduced to that number of channels; if ‘None’ as many channels as present in the audio file are returned.

start : float, optional

Start position [seconds].

stop : float, optional

Stop position [seconds].

norm : bool, optional

Normalize the signal to the range [-1, +1].

att : float, optional

Deprecated in version 0.13, use gain instead.

gain : float, optional

Adjust the gain of the signal [dB].

dtype : numpy data type, optional

The data is returned with the given dtype. If ‘None’, it is returned with its original dtype, otherwise the signal gets rescaled. Integer dtypes use the complete value range, float dtypes the range [-1, +1].

att

Attenuation of the signal [dB].

process(data, start=None, stop=None, **kwargs)[source]

Processes the given audio file.

Parameters:

data : numpy array, str or file handle

Data to be processed.

start : float, optional

Start position [seconds].

stop : float, optional

Stop position [seconds].

Returns:

signal : Signal instance

Signal instance.

static add_arguments(parser, sample_rate=None, mono=None, start=None, stop=None, norm=None, gain=None)[source]

Add signal processing related arguments to an existing parser.

Parameters:

parser : argparse parser instance

Existing argparse parser object.

sample_rate : int, optional

Re-sample the signal to this sample rate [Hz].

mono : bool, optional

Down-mix the signal to mono.

start : float, optional

Start position [seconds].

stop : float, optional

Stop position [seconds].

norm : bool, optional

Normalize the signal to the range [-1, +1].

gain : float, optional

Adjust the gain of the signal [dB].

Returns:

argparse argument group

Signal processing argument parser group.

Notes

Parameters are included in the group only if they are not ‘None’. To include start and stop arguments with a default value of ‘None’, i.e. do not set any start or stop time, they can be set to ‘True’.

madmom.audio.signal.signal_frame(signal, index, frame_size, hop_size, origin=0)[source]

This function returns frame at index of the signal.

Parameters:

signal : numpy array

Signal.

index : int

Index of the frame to return.

frame_size : int

Size of each frame in samples.

hop_size : float

Hop size in samples between adjacent frames.

origin : int

Location of the window center relative to the signal position.

Returns:

frame : numpy array

Requested frame of the signal.

Notes

The reference sample of the first frame (index == 0) refers to the first sample of the signal, and each following frame is placed hop_size samples after the previous one.

The window is always centered around this reference sample. Its location relative to the reference sample can be set with the origin parameter. Arbitrary integer values can be given:

  • zero centers the window on its reference sample
  • negative values shift the window to the right
  • positive values shift the window to the left

An origin of half the size of the frame_size results in windows located to the left of the reference sample, i.e. the first frame starts at the first sample of the signal.

The part of the frame which is not covered by the signal is padded with zeros.

This function is totally independent of the length of the signal. Thus, contrary to common indexing, the index ‘-1’ refers NOT to the last frame of the signal, but instead the frame left of the first frame is returned.

class madmom.audio.signal.FramedSignal(signal, frame_size=2048, hop_size=441.0, fps=None, origin=0, end='normal', num_frames=None, **kwargs)[source]

The FramedSignal splits a Signal into frames and makes it iterable and indexable.

Parameters:

signal : Signal instance

Signal to be split into frames.

frame_size : int, optional

Size of one frame [samples].

hop_size : float, optional

Progress hop_size samples between adjacent frames.

fps : float, optional

Use given frames per second; if set, this computes and overwrites the given hop_size value.

origin : int, optional

Location of the window relative to the reference sample of a frame.

end : int or str, optional

End of signal handling (see notes below).

num_frames : int, optional

Number of frames to return.

kwargs : dict, optional

If no Signal instance was given, one is instantiated with these additional keyword arguments.

Notes

The FramedSignal class is implemented as an iterator. It splits the given signal automatically into frames of frame_size length with hop_size samples (can be float, normal rounding applies) between the frames. The reference sample of the first frame refers to the first sample of the signal.

The location of the window relative to the reference sample of a frame can be set with the origin parameter (with the same behaviour as used by scipy.ndimage filters). Arbitrary integer values can be given:

  • zero centers the window on its reference sample,
  • negative values shift the window to the right,
  • positive values shift the window to the left.

Additionally, it can have the following literal values:

  • ‘center’, ‘offline’: the window is centered on its reference sample,
  • ‘left’, ‘past’, ‘online’: the window is located to the left of its reference sample (including the reference sample),
  • ‘right’, ‘future’: the window is located to the right of its reference sample.

The end parameter is used to handle the end of signal behaviour and can have these values:

  • ‘normal’: stop as soon as the whole signal got covered by at least one frame (i.e. pad maximally one frame),
  • ‘extend’: frames are returned as long as part of the frame overlaps with the signal to cover the whole signal.

Alternatively, num_frames can be used to retrieve a fixed number of frames.

In order to be able to stack multiple frames obtained with different frame sizes, the number of frames to be returned must be independent from the set frame_size. It is not guaranteed that every sample of the signal is returned in a frame unless the origin is either ‘right’ or ‘future’.

frame_rate

Frame rate (same as fps).

fps

Frames per second.

overlap_factor

Overlapping factor of two adjacent frames.

shape

Shape of the FramedSignal (frames x samples).

ndim

Dimensionality of the FramedSignal.

class madmom.audio.signal.FramedSignalProcessor(frame_size=2048, hop_size=441.0, fps=None, online=False, end='normal', **kwargs)[source]

Slice a Signal into frames.

Parameters:

frame_size : int, optional

Size of one frame [samples].

hop_size : float, optional

Progress hop_size samples between adjacent frames.

fps : float, optional

Use given frames per second; if set, this computes and overwrites the given hop_size value.

online : bool, optional

Operate in online mode (see notes below).

end : int or str, optional

End of signal handling (see FramedSignal).

num_frames : int, optional

Number of frames to return.

kwargs : dict, optional

If no Signal instance was given, one is instantiated with these additional keyword arguments.

Notes

The location of the window relative to its reference sample can be set with the online parameter:

  • ‘False’: the window is centered on its reference sample,
  • ‘True’: the window is located to the left of its reference sample (including the reference sample), i.e. only past information is used.
process(data, **kwargs)[source]

Slice the signal into (overlapping) frames.

Parameters:

data : Signal instance

Signal to be sliced into frames.

kwargs : dict

Keyword arguments passed to FramedSignal to instantiate the returned object.

Returns:

frames : FramedSignal instance

FramedSignal instance

static add_arguments(parser, frame_size=2048, fps=100.0, online=None)[source]

Add signal framing related arguments to an existing parser.

Parameters:

parser : argparse parser instance

Existing argparse parser object.

frame_size : int, optional

Size of one frame in samples.

fps : float, optional

Frames per second.

online : bool, optional

Online mode (use only past signal information, i.e. align the window to the left of the reference sample).

Returns:

argparse argument group

Signal framing argument parser group.

Notes

Parameters are included in the group only if they are not ‘None’.