madmom.io.audio¶

This module contains audio input/output functionality.

exception madmom.io.audio.LoadAudioFileError(value=None)[source]¶: Exception to be raised whenever an audio file could not be loaded.

madmom.io.audio.decode_to_disk(infile, fmt='f32le', sample_rate=None, num_channels=1, skip=None, max_len=None, outfile=None, tmp_dir=None, tmp_suffix=None, cmd='ffmpeg')[source]¶

Decode the given audio file to another file.

Parameters:

infile : str: Name of the audio sound file to decode.
fmt : {‘f32le’, ‘s16le’}, optional: Format of the samples: - ‘f32le’ for float32, little-endian, - ‘s16le’ for signed 16-bit int, little-endian.
sample_rate : int, optional: Sample rate to re-sample the signal to (if set) [Hz].
num_channels : int, optional: Number of channels to reduce the signal to.
skip : float, optional: Number of seconds to skip at beginning of file.
max_len : float, optional: Maximum length in seconds to decode.
outfile : str, optional: The file to decode the sound file to; if not given, a temporary file will be created.
tmp_dir : str, optional: The directory to create the temporary file in (if no outfile is given).
tmp_suffix : str, optional: The file suffix for the temporary file if no outfile is given; e.g. “.pcm” (including the dot).
cmd : {‘ffmpeg’, ‘avconv’}, optional: Decoding command (defaults to ffmpeg, alternatively supports avconv).

Returns:

outfile : str: The output file name.

madmom.io.audio.decode_to_pipe(infile, fmt='f32le', sample_rate=None, num_channels=1, skip=None, max_len=None, buf_size=-1, cmd='ffmpeg')[source]¶

Decode the given audio and return a file-like object for reading the samples, as well as a process object.

Parameters:

infile : str: Name of the audio sound file to decode.
fmt : {‘f32le’, ‘s16le’}, optional: Format of the samples: - ‘f32le’ for float32, little-endian, - ‘s16le’ for signed 16-bit int, little-endian.
sample_rate : int, optional: Sample rate to re-sample the signal to (if set) [Hz].
num_channels : int, optional: Number of channels to reduce the signal to.
skip : float, optional: Number of seconds to skip at beginning of file.
max_len : float, optional: Maximum length in seconds to decode.
buf_size : int, optional: Size of buffer for the file-like object: - ‘-1’ means OS default (default), - ‘0’ means unbuffered, - ‘1’ means line-buffered, any other value is the buffer size in bytes.
cmd : {‘ffmpeg’,’avconv’}, optional: Decoding command (defaults to ffmpeg, alternatively supports avconv).

Returns:

pipe : file-like object: File-like object for reading the decoded samples.
proc : process object: Process object for the decoding process.

Notes

To stop decoding the file, call close() on the returned file-like object, then call wait() on the returned process object.

madmom.io.audio.decode_to_memory(infile, fmt='f32le', sample_rate=None, num_channels=1, skip=None, max_len=None, cmd='ffmpeg')[source]¶

Decode the given audio and return it as a binary string representation.

Parameters:

infile : str: Name of the audio sound file to decode.
fmt : {‘f32le’, ‘s16le’}, optional: Format of the samples: - ‘f32le’ for float32, little-endian, - ‘s16le’ for signed 16-bit int, little-endian.
sample_rate : int, optional: Sample rate to re-sample the signal to (if set) [Hz].
num_channels : int, optional: Number of channels to reduce the signal to.
skip : float, optional: Number of seconds to skip at beginning of file.
max_len : float, optional: Maximum length in seconds to decode.
cmd : {‘ffmpeg’, ‘avconv’}, optional: Decoding command (defaults to ffmpeg, alternatively supports avconv).

Returns:

samples : str: Binary string representation of the audio samples.

madmom.io.audio.get_file_info(infile, cmd='ffprobe')[source]¶

Extract and return information about audio files.

Parameters:	infile : str Name of the audio file. cmd : {‘ffprobe’, ‘avprobe’}, optional Probing command (defaults to ffprobe, alternatively supports avprobe).
Returns:	dict Audio file information.

madmom.io.audio.load_ffmpeg_file(filename, sample_rate=None, num_channels=None, start=None, stop=None, dtype=None, cmd_decode='ffmpeg', cmd_probe='ffprobe')[source]¶

Load the audio data from the given file and return it as a numpy array.

This uses ffmpeg (or avconv) and thus supports a lot of different file formats, resampling and channel conversions. The file will be fully decoded into memory if no start and stop positions are given.

Parameters:

filename : str: Name of the audio sound file to load.
sample_rate : int, optional: Sample rate to re-sample the signal to [Hz]; ‘None’ returns the signal in its original rate.
num_channels : int, optional: Reduce or expand the signal to num_channels channels; ‘None’ returns the signal with its original channels.
start : float, optional: Start position [seconds].
stop : float, optional: Stop position [seconds].
dtype : numpy dtype, optional: Numpy dtype to return the signal in (supports signed and unsigned 8/16/32-bit integers, and single and double precision floats, each in little or big endian). If ‘None’, np.int16 is used.
cmd_decode : {‘ffmpeg’, ‘avconv’}, optional: Decoding command (defaults to ffmpeg, alternatively supports avconv).
cmd_probe : {‘ffprobe’, ‘avprobe’}, optional: Probing command (defaults to ffprobe, alternatively supports avprobe).

Returns:

signal : numpy array: Audio samples.
sample_rate : int: Sample rate of the audio samples.

madmom.io.audio.load_wave_file(filename, sample_rate=None, num_channels=None, start=None, stop=None, dtype=None)[source]¶

Load the audio data from the given file and return it as a numpy array.

Only supports wave files, does not support re-sampling or arbitrary channel number conversions. Reads the data as a memory-mapped file with copy-on-write semantics to defer I/O costs until needed.

Parameters:

filename : str: Name of the file.
sample_rate : int, optional: Desired sample rate of the signal [Hz], or ‘None’ to return the signal in its original rate.
num_channels : int, optional: Reduce or expand the signal to num_channels channels, or ‘None’ to return the signal with its original channels.
start : float, optional: Start position [seconds].
stop : float, optional: Stop position [seconds].
dtype : numpy data type, optional: The data is returned with the given dtype. If ‘None’, it is returned with its original dtype, otherwise the signal gets rescaled. Integer dtypes use the complete value range, float dtypes the range [-1, +1].

Returns:

signal : numpy array: Audio signal.
sample_rate : int: Sample rate of the signal [Hz].

Notes

The start and stop positions are rounded to the closest sample; the sample corresponding to the stop value is not returned, thus consecutive segment starting with the previous stop can be concatenated to obtain the original signal without gaps or overlaps.

madmom.io.audio.write_wave_file(signal, filename, sample_rate=None)[source]¶

Write the signal to disk as a .wav file.

Parameters:	signal : numpy array or Signal The signal to be written to file. filename : str Name of the file. sample_rate : int, optional Sample rate of the signal [Hz].
Returns:	filename : str Name of the file.

Notes

sample_rate can be ‘None’ if signal is a Signal instance. If set, the given sample_rate is used instead of the signal’s sample rate. Must be given if signal is a ndarray.

madmom.io.audio.load_audio_file(filename, sample_rate=None, num_channels=None, start=None, stop=None, dtype=None)[source]¶

Load the audio data from the given file and return it as a numpy array. This tries load_wave_file() load_ffmpeg_file() (for ffmpeg and avconv).

Parameters:

filename : str or file handle: Name of the file or file handle.
sample_rate : int, optional: Desired sample rate of the signal [Hz], or ‘None’ to return the signal in its original rate.
num_channels: int, optional: Reduce or expand the signal to num_channels channels, or ‘None’ to return the signal with its original channels.
start : float, optional: Start position [seconds].
stop : float, optional: Stop position [seconds].
dtype : numpy data type, optional: The data is returned with the given dtype. If ‘None’, it is returned with its original dtype, otherwise the signal gets rescaled. Integer dtypes use the complete value range, float dtypes the range [-1, +1].

Returns:

signal : numpy array: Audio signal.
sample_rate : int: Sample rate of the signal [Hz].

Notes

For wave files, the start and stop positions are rounded to the closest sample; the sample corresponding to the stop value is not returned, thus consecutive segment starting with the previous stop can be concatenated to obtain the original signal without gaps or overlaps. For all other audio files, this can not be guaranteed.