madmom.io.audio

This module contains audio input/output functionality.

exception madmom.io.audio.LoadAudioFileError(value=None)[source]

Exception to be raised whenever an audio file could not be loaded.

madmom.io.audio.decode_to_disk(infile, fmt='f32le', sample_rate=None, num_channels=1, skip=None, max_len=None, outfile=None, tmp_dir=None, tmp_suffix=None, cmd='ffmpeg')[source]

Decode the given audio file to another file.

Parameters:
infile : str

Name of the audio sound file to decode.

fmt : {‘f32le’, ‘s16le’}, optional

Format of the samples: - ‘f32le’ for float32, little-endian, - ‘s16le’ for signed 16-bit int, little-endian.

sample_rate : int, optional

Sample rate to re-sample the signal to (if set) [Hz].

num_channels : int, optional

Number of channels to reduce the signal to.

skip : float, optional

Number of seconds to skip at beginning of file.

max_len : float, optional

Maximum length in seconds to decode.

outfile : str, optional

The file to decode the sound file to; if not given, a temporary file will be created.

tmp_dir : str, optional

The directory to create the temporary file in (if no outfile is given).

tmp_suffix : str, optional

The file suffix for the temporary file if no outfile is given; e.g. “.pcm” (including the dot).

cmd : {‘ffmpeg’, ‘avconv’}, optional

Decoding command (defaults to ffmpeg, alternatively supports avconv).

Returns:
outfile : str

The output file name.

madmom.io.audio.decode_to_pipe(infile, fmt='f32le', sample_rate=None, num_channels=1, skip=None, max_len=None, buf_size=-1, cmd='ffmpeg')[source]

Decode the given audio and return a file-like object for reading the samples, as well as a process object.

Parameters:
infile : str

Name of the audio sound file to decode.

fmt : {‘f32le’, ‘s16le’}, optional

Format of the samples: - ‘f32le’ for float32, little-endian, - ‘s16le’ for signed 16-bit int, little-endian.

sample_rate : int, optional

Sample rate to re-sample the signal to (if set) [Hz].

num_channels : int, optional

Number of channels to reduce the signal to.

skip : float, optional

Number of seconds to skip at beginning of file.

max_len : float, optional

Maximum length in seconds to decode.

buf_size : int, optional

Size of buffer for the file-like object: - ‘-1’ means OS default (default), - ‘0’ means unbuffered, - ‘1’ means line-buffered, any other value is the buffer size in bytes.

cmd : {‘ffmpeg’,’avconv’}, optional

Decoding command (defaults to ffmpeg, alternatively supports avconv).

Returns:
pipe : file-like object

File-like object for reading the decoded samples.

proc : process object

Process object for the decoding process.

Notes

To stop decoding the file, call close() on the returned file-like object, then call wait() on the returned process object.

madmom.io.audio.decode_to_memory(infile, fmt='f32le', sample_rate=None, num_channels=1, skip=None, max_len=None, cmd='ffmpeg')[source]

Decode the given audio and return it as a binary string representation.

Parameters:
infile : str

Name of the audio sound file to decode.

fmt : {‘f32le’, ‘s16le’}, optional

Format of the samples: - ‘f32le’ for float32, little-endian, - ‘s16le’ for signed 16-bit int, little-endian.

sample_rate : int, optional

Sample rate to re-sample the signal to (if set) [Hz].

num_channels : int, optional

Number of channels to reduce the signal to.

skip : float, optional

Number of seconds to skip at beginning of file.

max_len : float, optional

Maximum length in seconds to decode.

cmd : {‘ffmpeg’, ‘avconv’}, optional

Decoding command (defaults to ffmpeg, alternatively supports avconv).

Returns:
samples : str

Binary string representation of the audio samples.

madmom.io.audio.get_file_info(infile, cmd='ffprobe')[source]

Extract and return information about audio files.

Parameters:
infile : str

Name of the audio file.

cmd : {‘ffprobe’, ‘avprobe’}, optional

Probing command (defaults to ffprobe, alternatively supports avprobe).

Returns:
dict

Audio file information.

madmom.io.audio.load_ffmpeg_file(filename, sample_rate=None, num_channels=None, start=None, stop=None, dtype=None, cmd_decode='ffmpeg', cmd_probe='ffprobe')[source]

Load the audio data from the given file and return it as a numpy array.

This uses ffmpeg (or avconv) and thus supports a lot of different file formats, resampling and channel conversions. The file will be fully decoded into memory if no start and stop positions are given.

Parameters:
filename : str

Name of the audio sound file to load.

sample_rate : int, optional

Sample rate to re-sample the signal to [Hz]; ‘None’ returns the signal in its original rate.

num_channels : int, optional

Reduce or expand the signal to num_channels channels; ‘None’ returns the signal with its original channels.

start : float, optional

Start position [seconds].

stop : float, optional

Stop position [seconds].

dtype : numpy dtype, optional

Numpy dtype to return the signal in (supports signed and unsigned 8/16/32-bit integers, and single and double precision floats, each in little or big endian). If ‘None’, np.int16 is used.

cmd_decode : {‘ffmpeg’, ‘avconv’}, optional

Decoding command (defaults to ffmpeg, alternatively supports avconv).

cmd_probe : {‘ffprobe’, ‘avprobe’}, optional

Probing command (defaults to ffprobe, alternatively supports avprobe).

Returns:
signal : numpy array

Audio samples.

sample_rate : int

Sample rate of the audio samples.

madmom.io.audio.load_wave_file(filename, sample_rate=None, num_channels=None, start=None, stop=None, dtype=None)[source]

Load the audio data from the given file and return it as a numpy array.

Only supports wave files, does not support re-sampling or arbitrary channel number conversions. Reads the data as a memory-mapped file with copy-on-write semantics to defer I/O costs until needed.

Parameters:
filename : str

Name of the file.

sample_rate : int, optional

Desired sample rate of the signal [Hz], or ‘None’ to return the signal in its original rate.

num_channels : int, optional

Reduce or expand the signal to num_channels channels, or ‘None’ to return the signal with its original channels.

start : float, optional

Start position [seconds].

stop : float, optional

Stop position [seconds].

dtype : numpy data type, optional

The data is returned with the given dtype. If ‘None’, it is returned with its original dtype, otherwise the signal gets rescaled. Integer dtypes use the complete value range, float dtypes the range [-1, +1].

Returns:
signal : numpy array

Audio signal.

sample_rate : int

Sample rate of the signal [Hz].

Notes

The start and stop positions are rounded to the closest sample; the sample corresponding to the stop value is not returned, thus consecutive segment starting with the previous stop can be concatenated to obtain the original signal without gaps or overlaps.

madmom.io.audio.write_wave_file(signal, filename, sample_rate=None)[source]

Write the signal to disk as a .wav file.

Parameters:
signal : numpy array or Signal

The signal to be written to file.

filename : str

Name of the file.

sample_rate : int, optional

Sample rate of the signal [Hz].

Returns:
filename : str

Name of the file.

Notes

sample_rate can be ‘None’ if signal is a Signal instance. If set, the given sample_rate is used instead of the signal’s sample rate. Must be given if signal is a ndarray.

madmom.io.audio.load_audio_file(filename, sample_rate=None, num_channels=None, start=None, stop=None, dtype=None)[source]

Load the audio data from the given file and return it as a numpy array. This tries load_wave_file() load_ffmpeg_file() (for ffmpeg and avconv).

Parameters:
filename : str or file handle

Name of the file or file handle.

sample_rate : int, optional

Desired sample rate of the signal [Hz], or ‘None’ to return the signal in its original rate.

num_channels: int, optional

Reduce or expand the signal to num_channels channels, or ‘None’ to return the signal with its original channels.

start : float, optional

Start position [seconds].

stop : float, optional

Stop position [seconds].

dtype : numpy data type, optional

The data is returned with the given dtype. If ‘None’, it is returned with its original dtype, otherwise the signal gets rescaled. Integer dtypes use the complete value range, float dtypes the range [-1, +1].

Returns:
signal : numpy array

Audio signal.

sample_rate : int

Sample rate of the signal [Hz].

Notes

For wave files, the start and stop positions are rounded to the closest sample; the sample corresponding to the stop value is not returned, thus consecutive segment starting with the previous stop can be concatenated to obtain the original signal without gaps or overlaps. For all other audio files, this can not be guaranteed.