madmom.features.downbeats

This module contains downbeat and bar tracking related functionality.

class madmom.features.downbeats.RNNDownBeatProcessor(**kwargs)[source]

Processor to get a joint beat and downbeat activation function from multiple RNNs.

References

[1]Sebastian Böck, Florian Krebs and Gerhard Widmer, “Joint Beat and Downbeat Tracking with Recurrent Neural Networks” Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), 2016.

Examples

Create a RNNDownBeatProcessor and pass a file through the processor. The returned 2d array represents the probabilities at each frame, sampled at 100 frames per second. The columns represent ‘beat’ and ‘downbeat’.

>>> proc = RNNDownBeatProcessor()
>>> proc  
<madmom.features.downbeats.RNNDownBeatProcessor object at 0x...>
>>> proc('tests/data/audio/sample.wav')
... 
array([[0.00011, 0.00037],
       [0.00008, 0.00043],
       ...,
       [0.00791, 0.00169],
       [0.03425, 0.00494]], dtype=float32)
class madmom.features.downbeats.DBNDownBeatTrackingProcessor(beats_per_bar, min_bpm=55.0, max_bpm=215.0, num_tempi=60, transition_lambda=100, observation_lambda=16, threshold=0.05, correct=True, fps=None, **kwargs)[source]

Downbeat tracking with RNNs and a dynamic Bayesian network (DBN) approximated by a Hidden Markov Model (HMM).

Parameters:
beats_per_bar : int or list

Number of beats per bar to be modeled. Can be either a single number or a list or array with bar lengths (in beats).

min_bpm : float or list, optional

Minimum tempo used for beat tracking [bpm]. If a list is given, each item corresponds to the number of beats per bar at the same position.

max_bpm : float or list, optional

Maximum tempo used for beat tracking [bpm]. If a list is given, each item corresponds to the number of beats per bar at the same position.

num_tempi : int or list, optional

Number of tempi to model; if set, limit the number of tempi and use a log spacing, otherwise a linear spacing. If a list is given, each item corresponds to the number of beats per bar at the same position.

transition_lambda : float or list, optional

Lambda for the exponential tempo change distribution (higher values prefer a constant tempo from one beat to the next one). If a list is given, each item corresponds to the number of beats per bar at the same position.

observation_lambda : int, optional

Split one (down-)beat period into observation_lambda parts, the first representing (down-)beat states and the remaining non-beat states.

threshold : float, optional

Threshold the RNN (down-)beat activations before Viterbi decoding.

correct : bool, optional

Correct the beats (i.e. align them to the nearest peak of the (down-)beat activation function).

fps : float, optional

Frames per second.

References

[1]Sebastian Böck, Florian Krebs and Gerhard Widmer, “Joint Beat and Downbeat Tracking with Recurrent Neural Networks” Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), 2016.

Examples

Create a DBNDownBeatTrackingProcessor. The returned array represents the positions of the beats and their position inside the bar. The position is given in seconds, thus the expected sampling rate is needed. The position inside the bar follows the natural counting and starts at 1.

The number of beats per bar which should be modelled must be given, all other parameters (e.g. tempo range) are optional but must have the same length as beats_per_bar, i.e. must be given for each bar length.

>>> proc = DBNDownBeatTrackingProcessor(beats_per_bar=[3, 4], fps=100)
>>> proc  
<madmom.features.downbeats.DBNDownBeatTrackingProcessor object at 0x...>

Call this DBNDownBeatTrackingProcessor with the beat activation function returned by RNNDownBeatProcessor to obtain the beat positions.

>>> act = RNNDownBeatProcessor()('tests/data/audio/sample.wav')
>>> proc(act)  
array([[0.09, 1. ],
       [0.45, 2. ],
       ...,
       [2.14, 3. ],
       [2.49, 4. ]])
process(activations, **kwargs)[source]

Detect the (down-)beats in the given activation function.

Parameters:
activations : numpy array, shape (num_frames, 2)

Activation function with probabilities corresponding to beats and downbeats given in the first and second column, respectively.

Returns:
beats : numpy array, shape (num_beats, 2)

Detected (down-)beat positions [seconds] and beat numbers.

static add_arguments(parser, beats_per_bar, min_bpm=55.0, max_bpm=215.0, num_tempi=60, transition_lambda=100, observation_lambda=16, threshold=0.05, correct=True)[source]

Add DBN downbeat tracking related arguments to an existing parser object.

Parameters:
parser : argparse parser instance

Existing argparse parser object.

beats_per_bar : int or list, optional

Number of beats per bar to be modeled. Can be either a single number or a list with bar lengths (in beats).

min_bpm : float or list, optional

Minimum tempo used for beat tracking [bpm]. If a list is given, each item corresponds to the number of beats per bar at the same position.

max_bpm : float or list, optional

Maximum tempo used for beat tracking [bpm]. If a list is given, each item corresponds to the number of beats per bar at the same position.

num_tempi : int or list, optional

Number of tempi to model; if set, limit the number of tempi and use a log spacing, otherwise a linear spacing. If a list is given, each item corresponds to the number of beats per bar at the same position.

transition_lambda : float or list, optional

Lambda for the exponential tempo change distribution (higher values prefer a constant tempo over a tempo change from one beat to the next one). If a list is given, each item corresponds to the number of beats per bar at the same position.

observation_lambda : float, optional

Split one (down-)beat period into observation_lambda parts, the first representing (down-)beat states and the remaining non-beat states.

threshold : float, optional

Threshold the RNN (down-)beat activations before Viterbi decoding.

correct : bool, optional

Correct the beats (i.e. align them to the nearest peak of the (down-)beat activation function).

Returns:
parser_group : argparse argument group

DBN downbeat tracking argument parser group

class madmom.features.downbeats.PatternTrackingProcessor(pattern_files, min_bpm=(55, 60), max_bpm=(205, 225), num_tempi=None, transition_lambda=100, fps=None, **kwargs)[source]

Pattern tracking with a dynamic Bayesian network (DBN) approximated by a Hidden Markov Model (HMM).

Parameters:
pattern_files : list

List of files with the patterns (including the fitted GMMs and information about the number of beats).

min_bpm : list, optional

Minimum tempi used for pattern tracking [bpm].

max_bpm : list, optional

Maximum tempi used for pattern tracking [bpm].

num_tempi : int or list, optional

Number of tempi to model; if set, limit the number of tempi and use a log spacings, otherwise a linear spacings.

transition_lambda : float or list, optional

Lambdas for the exponential tempo change distributions (higher values prefer constant tempi from one beat to the next one).

fps : float, optional

Frames per second.

Notes

min_bpm, max_bpm, num_tempo_states, and transition_lambda must contain as many items as rhythmic patterns are modeled (i.e. length of pattern_files). If a single value is given for num_tempo_states and transition_lambda, this value is used for all rhythmic patterns.

Instead of the originally proposed state space and transition model for the DBN [1], the more efficient version proposed in [2] is used.

References

[1](1, 2) Florian Krebs, Sebastian Böck and Gerhard Widmer, “Rhythmic Pattern Modeling for Beat and Downbeat Tracking in Musical Audio”, Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR), 2013.
[2](1, 2) Florian Krebs, Sebastian Böck and Gerhard Widmer, “An Efficient State Space Model for Joint Tempo and Meter Tracking”, Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), 2015.

Examples

Create a PatternTrackingProcessor from the given pattern files. These pattern files include fitted GMMs for the observation model of the HMM. The returned array represents the positions of the beats and their position inside the bar. The position is given in seconds, thus the expected sampling rate is needed. The position inside the bar follows the natural counting and starts at 1.

>>> from madmom.models import PATTERNS_BALLROOM
>>> proc = PatternTrackingProcessor(PATTERNS_BALLROOM, fps=50)
>>> proc  
<madmom.features.downbeats.PatternTrackingProcessor object at 0x...>

Call this PatternTrackingProcessor with a multi-band spectrogram to obtain the beat and downbeat positions. The parameters of the spectrogram have to correspond to those used to fit the GMMs.

>>> from madmom.audio.spectrogram import LogarithmicSpectrogramProcessor, SpectrogramDifferenceProcessor, MultiBandSpectrogramProcessor
>>> from madmom.processors import SequentialProcessor
>>> log = LogarithmicSpectrogramProcessor()
>>> diff = SpectrogramDifferenceProcessor(positive_diffs=True)
>>> mb = MultiBandSpectrogramProcessor(crossover_frequencies=[270])
>>> pre_proc = SequentialProcessor([log, diff, mb])
>>> act = pre_proc('tests/data/audio/sample.wav')
>>> proc(act)  
array([[0.82, 4.  ],
       [1.78, 1.  ],
       ...,
       [3.7 , 3.  ],
       [4.66, 4.  ]])
process(features, **kwargs)[source]

Detect the (down-)beats given the features.

Parameters:
features : numpy array

Multi-band spectral features.

Returns:
beats : numpy array, shape (num_beats, 2)

Detected (down-)beat positions [seconds] and beat numbers.

static add_arguments(parser, pattern_files=None, min_bpm=(55, 60), max_bpm=(205, 225), num_tempi=None, transition_lambda=100)[source]

Add DBN related arguments for pattern tracking to an existing parser object.

Parameters:
parser : argparse parser instance

Existing argparse parser object.

pattern_files : list

Load the patterns from these files.

min_bpm : list, optional

Minimum tempi used for beat tracking [bpm].

max_bpm : list, optional

Maximum tempi used for beat tracking [bpm].

num_tempi : int or list, optional

Number of tempi to model; if set, limit the number of states and use log spacings, otherwise a linear spacings.

transition_lambda : float or list, optional

Lambdas for the exponential tempo change distribution (higher values prefer constant tempi from one beat to the next one).

Returns:
parser_group : argparse argument group

Pattern tracking argument parser group

Notes

pattern_files, min_bpm, max_bpm, num_tempi, and transition_lambda must have the same number of items.

class madmom.features.downbeats.LoadBeatsProcessor(beats, files=None, beats_suffix=None, **kwargs)[source]

Load beat times from file or handle.

process(data=None, **kwargs)[source]

Load the beats from file (handle) or read them from STDIN.

process_single()[source]

Load the beats in bulk-mode (i.e. all at once) from the input stream or file.

Returns:
beats : numpy array

Beat positions [seconds].

process_batch(filename)[source]

Load beat times from file.

First match the given input filename to the beat filenames, then load the beats.

Parameters:
filename : str

Input file name.

Returns:
beats : numpy array

Beat positions [seconds].

Notes

Both the file names to search for the beats as well as the suffix to determine the beat files must be given at instantiation time.

static add_arguments(parser, beats=<open file '<stdin>', mode 'r'>, beats_suffix='.beats.txt')[source]

Add beat loading related arguments to an existing parser.

Parameters:
parser : argparse parser instance

Existing argparse parser object.

beats : FileType, optional

Where to read the beats from (‘single’ mode).

beats_suffix : str, optional

Suffix of beat files (‘batch’ mode)

Returns:
argparse argument group

Beat loading argument parser group.

class madmom.features.downbeats.SyncronizeFeaturesProcessor(beat_subdivisions, fps, **kwargs)[source]

Synchronize features to beats.

First, divide a beat interval into beat_subdivision divisions. Then summarise all features that fall into one subdivision. If no feature value for a subdivision is found, it is set to 0.

Parameters:
beat_subdivisions : int

Number of subdivisions a beat is divided into.

fps : float

Frames per second.

process(data, **kwargs)[source]

Synchronize features to beats.

Average all feature values that fall into a window of beat duration / beat subdivisions, centered on the beat positions or interpolated subdivisions, starting with the first beat.

Parameters:
data : tuple (features, beats)

Tuple of two numpy arrays, the first containing features to be synchronized and second the beat times.

Returns:
numpy array (num beats - 1, beat subdivisions, features dim.)

Beat synchronous features.

class madmom.features.downbeats.RNNBarProcessor(beat_subdivisions=(4, 2), fps=100, **kwargs)[source]

Retrieve a downbeat activation function from a signal and pre-determined beat positions by obtaining beat-synchronous harmonic and percussive features which are processed with a GRU-RNN.

Parameters:
beat_subdivisions : tuple, optional

Number of beat subdivisions for the percussive and harmonic feature.

References

[1]Florian Krebs, Sebastian Böck and Gerhard Widmer, “Downbeat Tracking Using Beat-Synchronous Features and Recurrent Networks”, Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), 2016.

Examples

Create an RNNBarProcessor and pass an audio file and pre-determined (or given) beat positions through the processor. The returned tuple contains the beats positions and the probability to be a downbeat.

>>> proc = RNNBarProcessor()
>>> proc  
<madmom.features.downbeats.RNNBarProcessor object at 0x...>
>>> beats = np.loadtxt('tests/data/detections/sample.dbn_beat_tracker.txt')
>>> downbeat_prob = proc(('tests/data/audio/sample.wav', beats))
>>> np.around(downbeat_prob, decimals=3)
... 
array([[0.1  , 0.378],
       [0.45 , 0.19 ],
       [0.8  , 0.112],
       [1.12 , 0.328],
       [1.48 , 0.27 ],
       [1.8  , 0.181],
       [2.15 , 0.162],
       [2.49 ,   nan]])
process(data, **kwargs)[source]

Retrieve a downbeat activation function from a signal and beat positions.

Parameters:
data : tuple

Tuple containg a signal or file (handle) and corresponding beat times [seconds].

Returns:
numpy array, shape (num_beats, 2)

Array containing the beat positions (first column) and the corresponding downbeat activations, i.e. the probability that a beat is a downbeat (second column).

Notes

Since features are synchronized to the beats, and the probability of being a downbeat depends on a whole beat duration, only num_beats-1 activations can be computed and the last value is filled with ‘NaN’.

class madmom.features.downbeats.DBNBarTrackingProcessor(beats_per_bar=(3, 4), observation_weight=100, meter_change_prob=1e-07, **kwargs)[source]

Bar tracking with a dynamic Bayesian network (DBN) approximated by a Hidden Markov Model (HMM).

Parameters:
beats_per_bar : int or list

Number of beats per bar to be modeled. Can be either a single number or a list or array with bar lengths (in beats).

observation_weight : int, optional

Weight for the downbeat activations.

meter_change_prob : float, optional

Probability to change meter at bar boundaries.

Examples

Create a DBNBarTrackingProcessor. The returned array represents the positions of the beats and their position inside the bar. The position inside the bar follows the natural counting and starts at 1.

The number of beats per bar which should be modelled must be given, all other parameters (e.g. probability to change the meter at bar boundaries) are optional but must have the same length as beats_per_bar.

>>> proc = DBNBarTrackingProcessor(beats_per_bar=[3, 4])
>>> proc  
<madmom.features.downbeats.DBNBarTrackingProcessor object at 0x...>

Call this DBNDownBeatTrackingProcessor with beat positions and downbeat activation function returned by RNNBarProcessor to obtain the positions.

>>> beats = np.loadtxt('tests/data/detections/sample.dbn_beat_tracker.txt')
>>> act = RNNBarProcessor()(('tests/data/audio/sample.wav', beats))
>>> proc(act)  
array([[0.1 , 1. ],
       [0.45, 2. ],
       [0.8 , 3. ],
       [1.12, 1. ],
       [1.48, 2. ],
       [1.8 , 3. ],
       [2.15, 1. ],
       [2.49, 2. ]])
process(data, **kwargs)[source]

Detect downbeats from the given beats and activation function with Viterbi decoding.

Parameters:
data : numpy array, shape (num_beats, 2)

Array containing beat positions (first column) and corresponding downbeat activations (second column).

Returns:
numpy array, shape (num_beats, 2)

Decoded (down-)beat positions and beat numbers.

Notes

The position of the last beat is not decoded, but rather extrapolated based on the position and meter of the second to last beat.

classmethod add_arguments(parser, beats_per_bar, observation_weight=100, meter_change_prob=1e-07)[source]

Add DBN related arguments to an existing parser.

Parameters:
parser : argparse parser instance

Existing argparse parser object.

beats_per_bar : int or list, optional

Number of beats per bar to be modeled. Can be either a single number or a list with bar lengths (in beats).

observation_weight : float, optional

Weight for the activations at downbeat times.

meter_change_prob : float, optional

Probability to change meter at bar boundaries.

Returns:
parser_group : argparse argument group

DBN bar tracking argument parser group