madmom.audio.signal¶
This module contains basic signal processing functionality.
-
madmom.audio.signal.
smooth
(signal, kernel)[source]¶ Smooth the signal along its first axis.
Parameters: signal : numpy array
Signal to be smoothed.
kernel : numpy array or int
Smoothing kernel (size).
Returns: numpy array
Smoothed signal.
Notes
If kernel is an integer, a Hamming window of that length will be used as a smoothing kernel.
-
madmom.audio.signal.
adjust_gain
(signal, gain)[source]¶ ” Adjust the gain of the signal.
Parameters: signal : numpy array
Signal to be adjusted.
gain : float
Gain adjustment level [dB].
Returns: numpy array
Signal with adjusted gain.
Notes
The signal is returned with the same dtype, thus rounding errors may occur with integer dtypes.
gain values > 0 amplify the signal and are only supported for signals with float dtype to prevent clipping and integer overflows.
-
madmom.audio.signal.
attenuate
(signal, attenuation)[source]¶ Attenuate the signal.
Parameters: signal : numpy array
Signal to be attenuated.
attenuation : float
Attenuation level [dB].
Returns: numpy array
Attenuated signal (same dtype as signal).
Notes
The signal is returned with the same dtype, thus rounding errors may occur with integer dtypes.
-
madmom.audio.signal.
normalize
(signal)[source]¶ Normalize the signal to have maximum amplitude.
Parameters: signal : numpy array
Signal to be normalized.
Returns: numpy array
Normalized signal.
Notes
Signals with float dtypes cover the range [-1, +1], signals with integer dtypes will cover the maximally possible range, e.g. [-32768, 32767] for np.int16.
The signal is returned with the same dtype, thus rounding errors may occur with integer dtypes.
-
madmom.audio.signal.
remix
(signal, num_channels)[source]¶ Remix the signal to have the desired number of channels.
Parameters: signal : numpy array
Signal to be remixed.
num_channels : int
Number of channels.
Returns: numpy array
Remixed signal (same dtype as signal).
Notes
This function does not support arbitrary channel number conversions. Only down-mixing to and up-mixing from mono signals is supported.
The signal is returned with the same dtype, thus rounding errors may occur with integer dtypes.
If the signal should be down-mixed to mono and has an integer dtype, it will be converted to float internally and then back to the original dtype to prevent clipping of the signal. To avoid this double conversion, convert the dtype first.
-
madmom.audio.signal.
rescale
(signal, dtype=<type 'numpy.float32'>)[source]¶ Rescale the signal to range [-1, 1] and return as float dtype.
Parameters: signal : numpy array
Signal to be remixed.
dtype : numpy dtype
Data type of the signal.
Returns: numpy array
Signal rescaled to range [-1, 1].
-
madmom.audio.signal.
trim
(signal, where='fb')[source]¶ Trim leading and trailing zeros of the signal.
Parameters: signal : numpy array
Signal to be trimmed.
where : str, optional
A string with ‘f’ representing trim from front and ‘b’ to trim from back. Default is ‘fb’, trim zeros from both ends of the signal.
Returns: numpy array
Trimmed signal.
-
madmom.audio.signal.
root_mean_square
(signal)[source]¶ Computes the root mean square of the signal. This can be used as a measurement of power.
Parameters: signal : numpy array
Signal.
Returns: rms : float
Root mean square of the signal.
-
madmom.audio.signal.
sound_pressure_level
(signal, p_ref=None)[source]¶ Computes the sound pressure level of a signal.
Parameters: signal : numpy array
Signal.
p_ref : float, optional
Reference sound pressure level; if ‘None’, take the max amplitude value for the data-type, if the data-type is float, assume amplitudes are between -1 and +1.
Returns: spl : float
Sound pressure level of the signal [dB].
Notes
From http://en.wikipedia.org/wiki/Sound_pressure: Sound pressure level (SPL) or sound level is a logarithmic measure of the effective sound pressure of a sound relative to a reference value. It is measured in decibels (dB) above a standard reference level.
-
madmom.audio.signal.
load_wave_file
(filename, sample_rate=None, num_channels=None, start=None, stop=None, dtype=None)[source]¶ Load the audio data from the given file and return it as a numpy array.
Only supports wave files, does not support re-sampling or arbitrary channel number conversions. Reads the data as a memory-mapped file with copy-on-write semantics to defer I/O costs until needed.
Parameters: filename : string
Name of the file.
sample_rate : int, optional
Desired sample rate of the signal [Hz], or ‘None’ to return the signal in its original rate.
num_channels : int, optional
Reduce or expand the signal to num_channels channels, or ‘None’ to return the signal with its original channels.
start : float, optional
Start position [seconds].
stop : float, optional
Stop position [seconds].
dtype : numpy data type, optional
The data is returned with the given dtype. If ‘None’, it is returned with its original dtype, otherwise the signal gets rescaled. Integer dtypes use the complete value range, float dtypes the range [-1, +1].
Returns: signal : numpy array
Audio signal.
sample_rate : int
Sample rate of the signal [Hz].
Notes
The start and stop positions are rounded to the closest sample; the sample corresponding to the stop value is not returned, thus consecutive segment starting with the previous stop can be concatenated to obtain the original signal without gaps or overlaps.
-
exception
madmom.audio.signal.
LoadAudioFileError
(value=None)[source]¶ Exception to be raised whenever an audio file could not be loaded.
-
madmom.audio.signal.
load_audio_file
(filename, sample_rate=None, num_channels=None, start=None, stop=None, dtype=None)[source]¶ Load the audio data from the given file and return it as a numpy array. This tries load_wave_file() load_ffmpeg_file() (for ffmpeg and avconv).
Parameters: filename : str or file handle
Name of the file or file handle.
sample_rate : int, optional
Desired sample rate of the signal [Hz], or ‘None’ to return the signal in its original rate.
num_channels: int, optional
Reduce or expand the signal to num_channels channels, or ‘None’ to return the signal with its original channels.
start : float, optional
Start position [seconds].
stop : float, optional
Stop position [seconds].
dtype : numpy data type, optional
The data is returned with the given dtype. If ‘None’, it is returned with its original dtype, otherwise the signal gets rescaled. Integer dtypes use the complete value range, float dtypes the range [-1, +1].
Returns: signal : numpy array
Audio signal.
sample_rate : int
Sample rate of the signal [Hz].
Notes
For wave files, the start and stop positions are rounded to the closest sample; the sample corresponding to the stop value is not returned, thus consecutive segment starting with the previous stop can be concatenated to obtain the original signal without gaps or overlaps. For all other audio files, this can not be guaranteed.
-
class
madmom.audio.signal.
Signal
(data, sample_rate=None, num_channels=None, start=None, stop=None, norm=False, gain=0, dtype=None)[source]¶ The
Signal
class represents a signal as a (memory-mapped) numpy array and enhances it with a number of attributes.Parameters: data : numpy array, str or file handle
Signal data or file name or file handle.
sample_rate : int, optional
Desired sample rate of the signal [Hz], or ‘None’ to return the signal in its original rate.
num_channels : int, optional
Reduce or expand the signal to num_channels channels, or ‘None’ to return the signal with its original channels.
start : float, optional
Start position [seconds].
stop : float, optional
Stop position [seconds].
norm : bool, optional
Normalize the signal to the range [-1, +1].
gain : float, optional
Adjust the gain of the signal [dB].
dtype : numpy data type, optional
The data is returned with the given dtype. If ‘None’, it is returned with its original dtype, otherwise the signal gets rescaled. Integer dtypes use the complete value range, float dtypes the range [-1, +1].
Notes
sample_rate or num_channels can be used to set the desired sample rate and number of channels if the audio is read from file. If set to ‘None’ the audio signal is used as is, i.e. the sample rate and number of channels are determined directly from the audio file.
If the data is a numpy array, the sample_rate is set to the given value and num_channels is set to the number of columns of the array.
The gain can be used to adjust the level of the signal.
If both norm and gain are set, the signal is first normalized and then the gain is applied afterwards.
If norm or gain is set, the selected part of the signal is loaded into memory completely, i.e. .wav files are not memory-mapped any more.
-
num_samples
¶ Number of samples.
-
num_channels
¶ Number of channels.
-
length
¶ Length of signal in seconds.
-
-
class
madmom.audio.signal.
SignalProcessor
(sample_rate=None, num_channels=None, start=None, stop=None, norm=False, att=None, gain=0.0, **kwargs)[source]¶ The
SignalProcessor
class is a basic signal processor.Parameters: sample_rate : int, optional
Sample rate of the signal [Hz]; if set the signal will be re-sampled to that sample rate; if ‘None’ the sample rate of the audio file will be used.
num_channels : int, optional
Number of channels of the signal; if set, the signal will be reduced to that number of channels; if ‘None’ as many channels as present in the audio file are returned.
start : float, optional
Start position [seconds].
stop : float, optional
Stop position [seconds].
norm : bool, optional
Normalize the signal to the range [-1, +1].
att : float, optional
Deprecated in version 0.13, use gain instead.
gain : float, optional
Adjust the gain of the signal [dB].
dtype : numpy data type, optional
The data is returned with the given dtype. If ‘None’, it is returned with its original dtype, otherwise the signal gets rescaled. Integer dtypes use the complete value range, float dtypes the range [-1, +1].
-
att
¶ Attenuation of the signal [dB].
-
process
(data, start=None, stop=None, **kwargs)[source]¶ Processes the given audio file.
Parameters: data : numpy array, str or file handle
Data to be processed.
start : float, optional
Start position [seconds].
stop : float, optional
Stop position [seconds].
Returns: signal :
Signal
instanceSignal
instance.
-
static
add_arguments
(parser, sample_rate=None, mono=None, start=None, stop=None, norm=None, gain=None)[source]¶ Add signal processing related arguments to an existing parser.
Parameters: parser : argparse parser instance
Existing argparse parser object.
sample_rate : int, optional
Re-sample the signal to this sample rate [Hz].
mono : bool, optional
Down-mix the signal to mono.
start : float, optional
Start position [seconds].
stop : float, optional
Stop position [seconds].
norm : bool, optional
Normalize the signal to the range [-1, +1].
gain : float, optional
Adjust the gain of the signal [dB].
Returns: argparse argument group
Signal processing argument parser group.
Notes
Parameters are included in the group only if they are not ‘None’. To include start and stop arguments with a default value of ‘None’, i.e. do not set any start or stop time, they can be set to ‘True’.
-
-
madmom.audio.signal.
signal_frame
(signal, index, frame_size, hop_size, origin=0)[source]¶ This function returns frame at index of the signal.
Parameters: signal : numpy array
Signal.
index : int
Index of the frame to return.
frame_size : int
Size of each frame in samples.
hop_size : float
Hop size in samples between adjacent frames.
origin : int
Location of the window center relative to the signal position.
Returns: frame : numpy array
Requested frame of the signal.
Notes
The reference sample of the first frame (index == 0) refers to the first sample of the signal, and each following frame is placed hop_size samples after the previous one.
The window is always centered around this reference sample. Its location relative to the reference sample can be set with the origin parameter. Arbitrary integer values can be given:
- zero centers the window on its reference sample
- negative values shift the window to the right
- positive values shift the window to the left
An origin of half the size of the frame_size results in windows located to the left of the reference sample, i.e. the first frame starts at the first sample of the signal.
The part of the frame which is not covered by the signal is padded with zeros.
This function is totally independent of the length of the signal. Thus, contrary to common indexing, the index ‘-1’ refers NOT to the last frame of the signal, but instead the frame left of the first frame is returned.
-
class
madmom.audio.signal.
FramedSignal
(signal, frame_size=2048, hop_size=441.0, fps=None, origin=0, end='normal', num_frames=None, **kwargs)[source]¶ The
FramedSignal
splits aSignal
into frames and makes it iterable and indexable.Parameters: signal :
Signal
instanceSignal to be split into frames.
frame_size : int, optional
Size of one frame [samples].
hop_size : float, optional
Progress hop_size samples between adjacent frames.
fps : float, optional
Use given frames per second; if set, this computes and overwrites the given hop_size value.
origin : int, optional
Location of the window relative to the reference sample of a frame.
end : int or str, optional
End of signal handling (see notes below).
num_frames : int, optional
Number of frames to return.
kwargs : dict, optional
If no
Signal
instance was given, one is instantiated with these additional keyword arguments.Notes
The
FramedSignal
class is implemented as an iterator. It splits the given signal automatically into frames of frame_size length with hop_size samples (can be float, normal rounding applies) between the frames. The reference sample of the first frame refers to the first sample of the signal.The location of the window relative to the reference sample of a frame can be set with the origin parameter (with the same behaviour as used by
scipy.ndimage
filters). Arbitrary integer values can be given:- zero centers the window on its reference sample,
- negative values shift the window to the right,
- positive values shift the window to the left.
Additionally, it can have the following literal values:
- ‘center’, ‘offline’: the window is centered on its reference sample,
- ‘left’, ‘past’, ‘online’: the window is located to the left of its reference sample (including the reference sample),
- ‘right’, ‘future’: the window is located to the right of its reference sample.
The end parameter is used to handle the end of signal behaviour and can have these values:
- ‘normal’: stop as soon as the whole signal got covered by at least one frame (i.e. pad maximally one frame),
- ‘extend’: frames are returned as long as part of the frame overlaps with the signal to cover the whole signal.
Alternatively, num_frames can be used to retrieve a fixed number of frames.
In order to be able to stack multiple frames obtained with different frame sizes, the number of frames to be returned must be independent from the set frame_size. It is not guaranteed that every sample of the signal is returned in a frame unless the origin is either ‘right’ or ‘future’.
-
frame_rate
¶ Frame rate (same as fps).
-
fps
¶ Frames per second.
-
overlap_factor
¶ Overlapping factor of two adjacent frames.
-
shape
¶ Shape of the FramedSignal (frames x samples).
-
ndim
¶ Dimensionality of the FramedSignal.
-
class
madmom.audio.signal.
FramedSignalProcessor
(frame_size=2048, hop_size=441.0, fps=None, online=False, end='normal', **kwargs)[source]¶ Slice a Signal into frames.
Parameters: frame_size : int, optional
Size of one frame [samples].
hop_size : float, optional
Progress hop_size samples between adjacent frames.
fps : float, optional
Use given frames per second; if set, this computes and overwrites the given hop_size value.
online : bool, optional
Operate in online mode (see notes below).
end : int or str, optional
End of signal handling (see
FramedSignal
).num_frames : int, optional
Number of frames to return.
kwargs : dict, optional
If no
Signal
instance was given, one is instantiated with these additional keyword arguments.Notes
The location of the window relative to its reference sample can be set with the online parameter:
- ‘False’: the window is centered on its reference sample,
- ‘True’: the window is located to the left of its reference sample (including the reference sample), i.e. only past information is used.
-
process
(data, **kwargs)[source]¶ Slice the signal into (overlapping) frames.
Parameters: data :
Signal
instanceSignal to be sliced into frames.
kwargs : dict
Keyword arguments passed to
FramedSignal
to instantiate the returned object.Returns: frames :
FramedSignal
instanceFramedSignal instance
-
static
add_arguments
(parser, frame_size=2048, fps=100.0, online=None)[source]¶ Add signal framing related arguments to an existing parser.
Parameters: parser : argparse parser instance
Existing argparse parser object.
frame_size : int, optional
Size of one frame in samples.
fps : float, optional
Frames per second.
online : bool, optional
Online mode (use only past signal information, i.e. align the window to the left of the reference sample).
Returns: argparse argument group
Signal framing argument parser group.
Notes
Parameters are included in the group only if they are not ‘None’.