madmom.audio.signal¶
This module contains basic signal processing functionality.
-
madmom.audio.signal.
smooth
(signal, kernel)[source]¶ Smooth the signal along its first axis.
Parameters: signal : numpy array
Signal to be smoothed.
kernel : numpy array or int
Smoothing kernel (size).
Returns: numpy array
Smoothed signal.
Notes
If kernel is an integer, a Hamming window of that length will be used as a smoothing kernel.
-
madmom.audio.signal.
adjust_gain
(signal, gain)[source]¶ ” Adjust the gain of the signal.
Parameters: signal : numpy array
Signal to be adjusted.
gain : float
Gain adjustment level [dB].
Returns: numpy array
Signal with adjusted gain.
Notes
The signal is returned with the same dtype, thus rounding errors may occur with integer dtypes.
gain values > 0 amplify the signal and are only supported for signals with float dtype to prevent clipping and integer overflows.
-
madmom.audio.signal.
attenuate
(signal, attenuation)[source]¶ Attenuate the signal.
Parameters: signal : numpy array
Signal to be attenuated.
attenuation : float
Attenuation level [dB].
Returns: numpy array
Attenuated signal (same dtype as signal).
Notes
The signal is returned with the same dtype, thus rounding errors may occur with integer dtypes.
-
madmom.audio.signal.
normalize
(signal)[source]¶ Normalize the signal to have maximum amplitude.
Parameters: signal : numpy array
Signal to be normalized.
Returns: numpy array
Normalized signal.
Notes
Signals with float dtypes cover the range [-1, +1], signals with integer dtypes will cover the maximally possible range, e.g. [-32768, 32767] for np.int16.
The signal is returned with the same dtype, thus rounding errors may occur with integer dtypes.
-
madmom.audio.signal.
remix
(signal, num_channels)[source]¶ Remix the signal to have the desired number of channels.
Parameters: signal : numpy array
Signal to be remixed.
num_channels : int
Number of channels.
Returns: numpy array
Remixed signal (same dtype as signal).
Notes
This function does not support arbitrary channel number conversions. Only down-mixing to and up-mixing from mono signals is supported.
The signal is returned with the same dtype, thus rounding errors may occur with integer dtypes.
If the signal should be down-mixed to mono and has an integer dtype, it will be converted to float internally and then back to the original dtype to prevent clipping of the signal. To avoid this double conversion, convert the dtype first.
-
madmom.audio.signal.
resample
(signal, sample_rate, **kwargs)[source]¶ Resample the signal.
Parameters: signal : numpy array or Signal
Signal to be resampled.
sample_rate : int
Sample rate of the signal.
kwargs : dict, optional
Keyword arguments passed to
load_ffmpeg_file()
.Returns: numpy array or Signal
Resampled signal.
Notes
This function uses
ffmpeg
to resample the signal.
-
madmom.audio.signal.
rescale
(signal, dtype=<type 'numpy.float32'>)[source]¶ Rescale the signal to range [-1, 1] and return as float dtype.
Parameters: signal : numpy array
Signal to be remixed.
dtype : numpy dtype
Data type of the signal.
Returns: numpy array
Signal rescaled to range [-1, 1].
-
madmom.audio.signal.
trim
(signal, where='fb')[source]¶ Trim leading and trailing zeros of the signal.
Parameters: signal : numpy array
Signal to be trimmed.
where : str, optional
A string with ‘f’ representing trim from front and ‘b’ to trim from back. Default is ‘fb’, trim zeros from both ends of the signal.
Returns: numpy array
Trimmed signal.
-
madmom.audio.signal.
energy
(signal)[source]¶ Compute the energy of a (framed) signal.
Parameters: signal : numpy array
Signal.
Returns: energy : float
Energy of the signal.
Notes
If signal is a FramedSignal, the energy is computed for each frame individually.
-
madmom.audio.signal.
root_mean_square
(signal)[source]¶ Compute the root mean square of a (framed) signal. This can be used as a measurement of power.
Parameters: signal : numpy array
Signal.
Returns: rms : float
Root mean square of the signal.
Notes
If signal is a FramedSignal, the root mean square is computed for each frame individually.
-
madmom.audio.signal.
sound_pressure_level
(signal, p_ref=None)[source]¶ Compute the sound pressure level of a (framed) signal.
Parameters: signal : numpy array
Signal.
p_ref : float, optional
Reference sound pressure level; if ‘None’, take the max amplitude value for the data-type, if the data-type is float, assume amplitudes are between -1 and +1.
Returns: spl : float
Sound pressure level of the signal [dB].
Notes
From http://en.wikipedia.org/wiki/Sound_pressure: Sound pressure level (SPL) or sound level is a logarithmic measure of the effective sound pressure of a sound relative to a reference value. It is measured in decibels (dB) above a standard reference level.
If signal is a FramedSignal, the sound pressure level is computed for each frame individually.
-
exception
madmom.audio.signal.
LoadAudioFileError
(value=None)[source]¶ Exception to be raised whenever an audio file could not be loaded.
-
madmom.audio.signal.
load_wave_file
(filename, sample_rate=None, num_channels=None, start=None, stop=None, dtype=None)[source]¶ Load the audio data from the given file and return it as a numpy array.
Only supports wave files, does not support re-sampling or arbitrary channel number conversions. Reads the data as a memory-mapped file with copy-on-write semantics to defer I/O costs until needed.
Parameters: filename : str
Name of the file.
sample_rate : int, optional
Desired sample rate of the signal [Hz], or ‘None’ to return the signal in its original rate.
num_channels : int, optional
Reduce or expand the signal to num_channels channels, or ‘None’ to return the signal with its original channels.
start : float, optional
Start position [seconds].
stop : float, optional
Stop position [seconds].
dtype : numpy data type, optional
The data is returned with the given dtype. If ‘None’, it is returned with its original dtype, otherwise the signal gets rescaled. Integer dtypes use the complete value range, float dtypes the range [-1, +1].
Returns: signal : numpy array
Audio signal.
sample_rate : int
Sample rate of the signal [Hz].
Notes
The start and stop positions are rounded to the closest sample; the sample corresponding to the stop value is not returned, thus consecutive segment starting with the previous stop can be concatenated to obtain the original signal without gaps or overlaps.
-
madmom.audio.signal.
write_wave_file
(signal, filename, sample_rate=None)[source]¶ Write the signal to disk as a .wav file.
Parameters: signal : numpy array or Signal
The signal to be written to file.
filename : str
Name of the file.
sample_rate : int, optional
Sample rate of the signal [Hz].
Returns: filename : str
Name of the file.
Notes
sample_rate can be ‘None’ if signal is a
Signal
instance. If set, the given sample_rate is used instead of the signal’s sample rate. Must be given if signal is a ndarray.
-
madmom.audio.signal.
load_audio_file
(filename, sample_rate=None, num_channels=None, start=None, stop=None, dtype=None)[source]¶ Load the audio data from the given file and return it as a numpy array. This tries load_wave_file() load_ffmpeg_file() (for ffmpeg and avconv).
Parameters: filename : str or file handle
Name of the file or file handle.
sample_rate : int, optional
Desired sample rate of the signal [Hz], or ‘None’ to return the signal in its original rate.
num_channels: int, optional
Reduce or expand the signal to num_channels channels, or ‘None’ to return the signal with its original channels.
start : float, optional
Start position [seconds].
stop : float, optional
Stop position [seconds].
dtype : numpy data type, optional
The data is returned with the given dtype. If ‘None’, it is returned with its original dtype, otherwise the signal gets rescaled. Integer dtypes use the complete value range, float dtypes the range [-1, +1].
Returns: signal : numpy array
Audio signal.
sample_rate : int
Sample rate of the signal [Hz].
Notes
For wave files, the start and stop positions are rounded to the closest sample; the sample corresponding to the stop value is not returned, thus consecutive segment starting with the previous stop can be concatenated to obtain the original signal without gaps or overlaps. For all other audio files, this can not be guaranteed.
-
class
madmom.audio.signal.
Signal
(data, sample_rate=None, num_channels=None, start=None, stop=None, norm=False, gain=0.0, dtype=None, **kwargs)[source]¶ The
Signal
class represents a signal as a (memory-mapped) numpy array and enhances it with a number of attributes.Parameters: data : numpy array, str or file handle
Signal data or file name or file handle.
sample_rate : int, optional
Desired sample rate of the signal [Hz], or ‘None’ to return the signal in its original rate.
num_channels : int, optional
Reduce or expand the signal to num_channels channels, or ‘None’ to return the signal with its original channels.
start : float, optional
Start position [seconds].
stop : float, optional
Stop position [seconds].
norm : bool, optional
Normalize the signal to maximum range of the data type.
gain : float, optional
Adjust the gain of the signal [dB].
dtype : numpy data type, optional
The data is returned with the given dtype. If ‘None’, it is returned with its original dtype, otherwise the signal gets rescaled. Integer dtypes use the complete value range, float dtypes the range [-1, +1].
Notes
sample_rate or num_channels can be used to set the desired sample rate and number of channels if the audio is read from file. If set to ‘None’ the audio signal is used as is, i.e. the sample rate and number of channels are determined directly from the audio file.
If the data is a numpy array, the sample_rate is set to the given value and num_channels is set to the number of columns of the array.
The gain can be used to adjust the level of the signal.
If both norm and gain are set, the signal is first normalized and then the gain is applied afterwards.
If norm or gain is set, the selected part of the signal is loaded into memory completely, i.e. .wav files are not memory-mapped any more.
Examples
Load a mono audio file:
>>> sig = Signal('tests/data/audio/sample.wav') >>> sig Signal([-2494, -2510, ..., 655, 639], dtype=int16) >>> sig.sample_rate 44100
Load a stereo audio file, down-mix it to mono:
>>> sig = Signal('tests/data/audio/stereo_sample.flac', num_channels=1) >>> sig Signal([ 36, 36, ..., 524, 495], dtype=int16) >>> sig.num_channels 1
Load and re-sample an audio file:
>>> sig = Signal('tests/data/audio/sample.wav', sample_rate=22050) >>> sig Signal([-2470, -2553, ..., 517, 677], dtype=int16) >>> sig.sample_rate 22050
Load an audio file with float32 data type (i.e. rescale it to [-1, 1]):
>>> sig = Signal('tests/data/audio/sample.wav', dtype=np.float32) >>> sig Signal([-0.07611, -0.0766 , ..., 0.01999, 0.0195 ], dtype=float32) >>> sig.dtype dtype('float32')
-
num_samples
¶ Number of samples.
-
num_channels
¶ Number of channels.
-
length
¶ Length of signal in seconds.
-
write
(filename)[source]¶ Write the signal to disk as a .wav file.
Parameters: filename : str
Name of the file.
Returns: filename : str
Name of the written file.
-
rms
()¶ Root mean square of signal.
-
spl
()¶ Sound pressure level of signal.
-
-
class
madmom.audio.signal.
SignalProcessor
(sample_rate=None, num_channels=None, start=None, stop=None, norm=False, gain=0.0, **kwargs)[source]¶ The
SignalProcessor
class is a basic signal processor.Parameters: sample_rate : int, optional
Sample rate of the signal [Hz]; if set the signal will be re-sampled to that sample rate; if ‘None’ the sample rate of the audio file will be used.
num_channels : int, optional
Number of channels of the signal; if set, the signal will be reduced to that number of channels; if ‘None’ as many channels as present in the audio file are returned.
start : float, optional
Start position [seconds].
stop : float, optional
Stop position [seconds].
norm : bool, optional
Normalize the signal to the range [-1, +1].
gain : float, optional
Adjust the gain of the signal [dB].
dtype : numpy data type, optional
The data is returned with the given dtype. If ‘None’, it is returned with its original dtype, otherwise the signal gets rescaled. Integer dtypes use the complete value range, float dtypes the range [-1, +1].
Examples
Processor for loading the first two seconds of an audio file, re-sampling it to 22.05 kHz and down-mixing it to mono:
>>> proc = SignalProcessor(sample_rate=22050, num_channels=1, stop=2) >>> sig = proc('tests/data/audio/sample.wav') >>> sig Signal([-2470, -2553, ..., -173, -265], dtype=int16) >>> sig.sample_rate 22050 >>> sig.num_channels 1 >>> sig.length 2.0
-
process
(data, **kwargs)[source]¶ Processes the given audio file.
Parameters: data : numpy array, str or file handle
Data to be processed.
kwargs : dict, optional
Keyword arguments passed to
Signal
.Returns: signal :
Signal
instanceSignal
instance.
-
static
add_arguments
(parser, sample_rate=None, mono=None, start=None, stop=None, norm=None, gain=None)[source]¶ Add signal processing related arguments to an existing parser.
Parameters: parser : argparse parser instance
Existing argparse parser object.
sample_rate : int, optional
Re-sample the signal to this sample rate [Hz].
mono : bool, optional
Down-mix the signal to mono.
start : float, optional
Start position [seconds].
stop : float, optional
Stop position [seconds].
norm : bool, optional
Normalize the signal to the range [-1, +1].
gain : float, optional
Adjust the gain of the signal [dB].
Returns: argparse argument group
Signal processing argument parser group.
Notes
Parameters are included in the group only if they are not ‘None’. To include start and stop arguments with a default value of ‘None’, i.e. do not set any start or stop time, they can be set to ‘True’.
-
-
madmom.audio.signal.
signal_frame
(signal, index, frame_size, hop_size, origin=0)[source]¶ This function returns frame at index of the signal.
Parameters: signal : numpy array
Signal.
index : int
Index of the frame to return.
frame_size : int
Size of each frame in samples.
hop_size : float
Hop size in samples between adjacent frames.
origin : int
Location of the window center relative to the signal position.
Returns: frame : numpy array
Requested frame of the signal.
Notes
The reference sample of the first frame (index == 0) refers to the first sample of the signal, and each following frame is placed hop_size samples after the previous one.
The window is always centered around this reference sample. Its location relative to the reference sample can be set with the origin parameter. Arbitrary integer values can be given:
- zero centers the window on its reference sample
- negative values shift the window to the right
- positive values shift the window to the left
An origin of half the size of the frame_size results in windows located to the left of the reference sample, i.e. the first frame starts at the first sample of the signal.
The part of the frame which is not covered by the signal is padded with zeros.
This function is totally independent of the length of the signal. Thus, contrary to common indexing, the index ‘-1’ refers NOT to the last frame of the signal, but instead the frame left of the first frame is returned.
-
class
madmom.audio.signal.
FramedSignal
(signal, frame_size=2048, hop_size=441.0, fps=None, origin=0, end='normal', num_frames=None, **kwargs)[source]¶ The
FramedSignal
splits aSignal
into frames and makes it iterable and indexable.Parameters: signal :
Signal
instanceSignal to be split into frames.
frame_size : int, optional
Size of one frame [samples].
hop_size : float, optional
Progress hop_size samples between adjacent frames.
fps : float, optional
Use given frames per second; if set, this computes and overwrites the given hop_size value.
origin : int, optional
Location of the window relative to the reference sample of a frame.
end : int or str, optional
End of signal handling (see notes below).
num_frames : int, optional
Number of frames to return.
kwargs : dict, optional
If no
Signal
instance was given, one is instantiated with these additional keyword arguments.Notes
The
FramedSignal
class is implemented as an iterator. It splits the given signal automatically into frames of frame_size length with hop_size samples (can be float, normal rounding applies) between the frames. The reference sample of the first frame refers to the first sample of the signal.The location of the window relative to the reference sample of a frame can be set with the origin parameter (with the same behaviour as used by
scipy.ndimage
filters). Arbitrary integer values can be given:- zero centers the window on its reference sample,
- negative values shift the window to the right,
- positive values shift the window to the left.
Additionally, it can have the following literal values:
- ‘center’, ‘offline’: the window is centered on its reference sample,
- ‘left’, ‘past’, ‘online’: the window is located to the left of its reference sample (including the reference sample),
- ‘right’, ‘future’, ‘stream’: the window is located to the right of its reference sample.
The end parameter is used to handle the end of signal behaviour and can have these values:
- ‘normal’: stop as soon as the whole signal got covered by at least one frame (i.e. pad maximally one frame),
- ‘extend’: frames are returned as long as part of the frame overlaps with the signal to cover the whole signal.
Alternatively, num_frames can be used to retrieve a fixed number of frames.
In order to be able to stack multiple frames obtained with different frame sizes, the number of frames to be returned must be independent from the set frame_size. It is not guaranteed that every sample of the signal is returned in a frame unless the origin is either ‘right’ or ‘future’.
If used in online real-time mode the parameters origin and num_frames should be set to ‘stream’ and 1, respectively.
Examples
To chop a
Signal
(or anything aSignal
can be instantiated from) into overlapping frames of size 2048 with adjacent frames being 441 samples apart:>>> sig = Signal('tests/data/audio/sample.wav') >>> sig Signal([-2494, -2510, ..., 655, 639], dtype=int16) >>> frames = FramedSignal(sig, frame_size=2048, hop_size=441) >>> frames <madmom.audio.signal.FramedSignal object at 0x...> >>> frames[0] Signal([ 0, 0, ..., -4666, -4589], dtype=int16) >>> frames[10] Signal([-6156, -5645, ..., -253, 671], dtype=int16) >>> frames.fps 100.0
Instead of passing a
Signal
instance as the first argument, anything aSignal
can be instantiated from (e.g. a file name) can be used. We can also set the frames per second (fps) instead, they get converted to hop_size based on the sample_rate of the signal:>>> frames = FramedSignal('tests/data/audio/sample.wav', fps=100) >>> frames <madmom.audio.signal.FramedSignal object at 0x...> >>> frames[0] Signal([ 0, 0, ..., -4666, -4589], dtype=int16) >>> frames.frame_size, frames.hop_size (2048, 441.0)
When trying to access an out of range frame, an IndexError is raised. Thus the FramedSignal can be used the same way as a numpy array or any other iterable.
>>> frames = FramedSignal('tests/data/audio/sample.wav') >>> frames.num_frames 281 >>> frames[281] Traceback (most recent call last): IndexError: end of signal reached >>> frames.shape (281, 2048)
Slices are FramedSignals itself:
>>> frames[:4] <madmom.audio.signal.FramedSignal object at 0x...>
To obtain a numpy array from a FramedSignal, simply use np.array() on the full FramedSignal or a slice of it. Please note, that this requires a full memory copy.
>>> np.array(frames[2:4]) array([[ 0, 0, ..., -5316, -5405], [ 2215, 2281, ..., 561, 653]], dtype=int16)
-
frame_rate
¶ Frame rate (same as fps).
-
fps
¶ Frames per second.
-
overlap_factor
¶ Overlapping factor of two adjacent frames.
-
shape
¶ Shape of the FramedSignal (num_frames, frame_size[, num_channels]).
-
ndim
¶ Dimensionality of the FramedSignal.
-
rms
()¶ Root mean square of the individual frames.
-
spl
()¶ Sound pressure level of the individual frames.
-
class
madmom.audio.signal.
FramedSignalProcessor
(frame_size=2048, hop_size=441.0, fps=None, origin=0, end='normal', num_frames=None, **kwargs)[source]¶ Slice a Signal into frames.
Parameters: frame_size : int, optional
Size of one frame [samples].
hop_size : float, optional
Progress hop_size samples between adjacent frames.
fps : float, optional
Use given frames per second; if set, this computes and overwrites the given hop_size value.
origin : int, optional
Location of the window relative to the reference sample of a frame.
end : int or str, optional
End of signal handling (see
FramedSignal
).num_frames : int, optional
Number of frames to return.
kwargs : dict, optional
If no
Signal
instance was given, one is instantiated with these additional keyword arguments.Notes
When operating on live audio signals, origin must be set to ‘stream’ in order to retrieve always the last frame_size samples.
Examples
Processor for chopping a
Signal
(or anything aSignal
can be instantiated from) into overlapping frames of size 2048, and a frame rate of 100 frames per second:>>> proc = FramedSignalProcessor(frame_size=2048, fps=100) >>> frames = proc('tests/data/audio/sample.wav') >>> frames <madmom.audio.signal.FramedSignal object at 0x...> >>> frames[0] Signal([ 0, 0, ..., -4666, -4589], dtype=int16) >>> frames[10] Signal([-6156, -5645, ..., -253, 671], dtype=int16) >>> frames.hop_size 441.0
-
process
(data, **kwargs)[source]¶ Slice the signal into (overlapping) frames.
Parameters: data :
Signal
instanceSignal to be sliced into frames.
kwargs : dict, optional
Keyword arguments passed to
FramedSignal
.Returns: frames :
FramedSignal
instanceFramedSignal instance
-
static
add_arguments
(parser, frame_size=2048, fps=None, online=None)[source]¶ Add signal framing related arguments to an existing parser.
Parameters: parser : argparse parser instance
Existing argparse parser object.
frame_size : int, optional
Size of one frame in samples.
fps : float, optional
Frames per second.
online : bool, optional
Online mode (use only past signal information, i.e. align the window to the left of the reference sample).
Returns: argparse argument group
Signal framing argument parser group.
Notes
Parameters are included in the group only if they are not ‘None’.
-
-
class
madmom.audio.signal.
Stream
(sample_rate=None, num_channels=None, dtype=<type 'numpy.float32'>, frame_size=2048, hop_size=441.0, fps=None, **kwargs)[source]¶ A Stream handles live (i.e. online, real-time) audio input via PyAudio.
Parameters: sample_rate : int
Sample rate of the signal.
num_channels : int, optional
Number of channels.
dtype : numpy dtype, optional
Data type for the signal.
frame_size : int, optional
Size of one frame [samples].
hop_size : int, optional
Progress hop_size samples between adjacent frames.
fps : float, optional
Use given frames per second; if set, this computes and overwrites the given hop_size value (the resulting hop_size must be an integer).
queue_size : int
Size of the FIFO (first in first out) queue. If the queue is full and new audio samples arrive, the oldest item in the queue will be dropped.
Notes
Stream is implemented as an iterable which blocks until enough new data is available.
-
shape
¶ Shape of the Stream (None, frame_size[, num_channels]).
-