madmom.features.downbeats¶
This module contains downbeat and bar tracking related functionality.
-
class
madmom.features.downbeats.
RNNDownBeatProcessor
(**kwargs)[source]¶ Processor to get a joint beat and downbeat activation function from multiple RNNs.
References
[1] Sebastian Böck, Florian Krebs and Gerhard Widmer, “Joint Beat and Downbeat Tracking with Recurrent Neural Networks” Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), 2016. Examples
Create a RNNDownBeatProcessor and pass a file through the processor. The returned 2d array represents the probabilities at each frame, sampled at 100 frames per second. The columns represent ‘beat’ and ‘downbeat’.
>>> proc = RNNDownBeatProcessor() >>> proc <madmom.features.downbeats.RNNDownBeatProcessor object at 0x...> >>> proc('tests/data/audio/sample.wav') ... array([[0.00011, 0.00037], [0.00008, 0.00043], ..., [0.00791, 0.00169], [0.03425, 0.00494]], dtype=float32)
-
class
madmom.features.downbeats.
DBNDownBeatTrackingProcessor
(beats_per_bar, min_bpm=55.0, max_bpm=215.0, num_tempi=60, transition_lambda=100, observation_lambda=16, threshold=0.05, correct=True, fps=None, **kwargs)[source]¶ Downbeat tracking with RNNs and a dynamic Bayesian network (DBN) approximated by a Hidden Markov Model (HMM).
Parameters: - beats_per_bar : int or list
Number of beats per bar to be modeled. Can be either a single number or a list or array with bar lengths (in beats).
- min_bpm : float or list, optional
Minimum tempo used for beat tracking [bpm]. If a list is given, each item corresponds to the number of beats per bar at the same position.
- max_bpm : float or list, optional
Maximum tempo used for beat tracking [bpm]. If a list is given, each item corresponds to the number of beats per bar at the same position.
- num_tempi : int or list, optional
Number of tempi to model; if set, limit the number of tempi and use a log spacing, otherwise a linear spacing. If a list is given, each item corresponds to the number of beats per bar at the same position.
- transition_lambda : float or list, optional
Lambda for the exponential tempo change distribution (higher values prefer a constant tempo from one beat to the next one). If a list is given, each item corresponds to the number of beats per bar at the same position.
- observation_lambda : int, optional
Split one (down-)beat period into observation_lambda parts, the first representing (down-)beat states and the remaining non-beat states.
- threshold : float, optional
Threshold the RNN (down-)beat activations before Viterbi decoding.
- correct : bool, optional
Correct the beats (i.e. align them to the nearest peak of the (down-)beat activation function).
- fps : float, optional
Frames per second.
References
[1] Sebastian Böck, Florian Krebs and Gerhard Widmer, “Joint Beat and Downbeat Tracking with Recurrent Neural Networks” Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), 2016. Examples
Create a DBNDownBeatTrackingProcessor. The returned array represents the positions of the beats and their position inside the bar. The position is given in seconds, thus the expected sampling rate is needed. The position inside the bar follows the natural counting and starts at 1.
The number of beats per bar which should be modelled must be given, all other parameters (e.g. tempo range) are optional but must have the same length as beats_per_bar, i.e. must be given for each bar length.
>>> proc = DBNDownBeatTrackingProcessor(beats_per_bar=[3, 4], fps=100) >>> proc <madmom.features.downbeats.DBNDownBeatTrackingProcessor object at 0x...>
Call this DBNDownBeatTrackingProcessor with the beat activation function returned by RNNDownBeatProcessor to obtain the beat positions.
>>> act = RNNDownBeatProcessor()('tests/data/audio/sample.wav') >>> proc(act) array([[0.09, 1. ], [0.45, 2. ], ..., [2.14, 3. ], [2.49, 4. ]])
-
process
(activations, **kwargs)[source]¶ Detect the (down-)beats in the given activation function.
Parameters: - activations : numpy array, shape (num_frames, 2)
Activation function with probabilities corresponding to beats and downbeats given in the first and second column, respectively.
Returns: - beats : numpy array, shape (num_beats, 2)
Detected (down-)beat positions [seconds] and beat numbers.
-
static
add_arguments
(parser, beats_per_bar, min_bpm=55.0, max_bpm=215.0, num_tempi=60, transition_lambda=100, observation_lambda=16, threshold=0.05, correct=True)[source]¶ Add DBN downbeat tracking related arguments to an existing parser object.
Parameters: - parser : argparse parser instance
Existing argparse parser object.
- beats_per_bar : int or list, optional
Number of beats per bar to be modeled. Can be either a single number or a list with bar lengths (in beats).
- min_bpm : float or list, optional
Minimum tempo used for beat tracking [bpm]. If a list is given, each item corresponds to the number of beats per bar at the same position.
- max_bpm : float or list, optional
Maximum tempo used for beat tracking [bpm]. If a list is given, each item corresponds to the number of beats per bar at the same position.
- num_tempi : int or list, optional
Number of tempi to model; if set, limit the number of tempi and use a log spacing, otherwise a linear spacing. If a list is given, each item corresponds to the number of beats per bar at the same position.
- transition_lambda : float or list, optional
Lambda for the exponential tempo change distribution (higher values prefer a constant tempo over a tempo change from one beat to the next one). If a list is given, each item corresponds to the number of beats per bar at the same position.
- observation_lambda : float, optional
Split one (down-)beat period into observation_lambda parts, the first representing (down-)beat states and the remaining non-beat states.
- threshold : float, optional
Threshold the RNN (down-)beat activations before Viterbi decoding.
- correct : bool, optional
Correct the beats (i.e. align them to the nearest peak of the (down-)beat activation function).
Returns: - parser_group : argparse argument group
DBN downbeat tracking argument parser group
-
class
madmom.features.downbeats.
PatternTrackingProcessor
(pattern_files, min_bpm=(55, 60), max_bpm=(205, 225), num_tempi=None, transition_lambda=100, fps=None, **kwargs)[source]¶ Pattern tracking with a dynamic Bayesian network (DBN) approximated by a Hidden Markov Model (HMM).
Parameters: - pattern_files : list
List of files with the patterns (including the fitted GMMs and information about the number of beats).
- min_bpm : list, optional
Minimum tempi used for pattern tracking [bpm].
- max_bpm : list, optional
Maximum tempi used for pattern tracking [bpm].
- num_tempi : int or list, optional
Number of tempi to model; if set, limit the number of tempi and use a log spacings, otherwise a linear spacings.
- transition_lambda : float or list, optional
Lambdas for the exponential tempo change distributions (higher values prefer constant tempi from one beat to the next one).
- fps : float, optional
Frames per second.
Notes
min_bpm, max_bpm, num_tempo_states, and transition_lambda must contain as many items as rhythmic patterns are modeled (i.e. length of pattern_files). If a single value is given for num_tempo_states and transition_lambda, this value is used for all rhythmic patterns.
Instead of the originally proposed state space and transition model for the DBN [1], the more efficient version proposed in [2] is used.
References
[1] (1, 2) Florian Krebs, Sebastian Böck and Gerhard Widmer, “Rhythmic Pattern Modeling for Beat and Downbeat Tracking in Musical Audio”, Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR), 2013. [2] (1, 2) Florian Krebs, Sebastian Böck and Gerhard Widmer, “An Efficient State Space Model for Joint Tempo and Meter Tracking”, Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), 2015. Examples
Create a PatternTrackingProcessor from the given pattern files. These pattern files include fitted GMMs for the observation model of the HMM. The returned array represents the positions of the beats and their position inside the bar. The position is given in seconds, thus the expected sampling rate is needed. The position inside the bar follows the natural counting and starts at 1.
>>> from madmom.models import PATTERNS_BALLROOM >>> proc = PatternTrackingProcessor(PATTERNS_BALLROOM, fps=50) >>> proc <madmom.features.downbeats.PatternTrackingProcessor object at 0x...>
Call this PatternTrackingProcessor with a multi-band spectrogram to obtain the beat and downbeat positions. The parameters of the spectrogram have to correspond to those used to fit the GMMs.
>>> from madmom.audio.spectrogram import LogarithmicSpectrogramProcessor, SpectrogramDifferenceProcessor, MultiBandSpectrogramProcessor >>> from madmom.processors import SequentialProcessor >>> log = LogarithmicSpectrogramProcessor() >>> diff = SpectrogramDifferenceProcessor(positive_diffs=True) >>> mb = MultiBandSpectrogramProcessor(crossover_frequencies=[270]) >>> pre_proc = SequentialProcessor([log, diff, mb])
>>> act = pre_proc('tests/data/audio/sample.wav') >>> proc(act) array([[0.82, 4. ], [1.78, 1. ], ..., [3.7 , 3. ], [4.66, 4. ]])
-
process
(features, **kwargs)[source]¶ Detect the (down-)beats given the features.
Parameters: - features : numpy array
Multi-band spectral features.
Returns: - beats : numpy array, shape (num_beats, 2)
Detected (down-)beat positions [seconds] and beat numbers.
-
static
add_arguments
(parser, pattern_files=None, min_bpm=(55, 60), max_bpm=(205, 225), num_tempi=None, transition_lambda=100)[source]¶ Add DBN related arguments for pattern tracking to an existing parser object.
Parameters: - parser : argparse parser instance
Existing argparse parser object.
- pattern_files : list
Load the patterns from these files.
- min_bpm : list, optional
Minimum tempi used for beat tracking [bpm].
- max_bpm : list, optional
Maximum tempi used for beat tracking [bpm].
- num_tempi : int or list, optional
Number of tempi to model; if set, limit the number of states and use log spacings, otherwise a linear spacings.
- transition_lambda : float or list, optional
Lambdas for the exponential tempo change distribution (higher values prefer constant tempi from one beat to the next one).
Returns: - parser_group : argparse argument group
Pattern tracking argument parser group
Notes
pattern_files, min_bpm, max_bpm, num_tempi, and transition_lambda must have the same number of items.
-
class
madmom.features.downbeats.
LoadBeatsProcessor
(beats, files=None, beats_suffix=None, **kwargs)[source]¶ Load beat times from file or handle.
-
process_single
()[source]¶ Load the beats in bulk-mode (i.e. all at once) from the input stream or file.
Returns: - beats : numpy array
Beat positions [seconds].
-
process_batch
(filename)[source]¶ Load beat times from file.
First match the given input filename to the beat filenames, then load the beats.
Parameters: - filename : str
Input file name.
Returns: - beats : numpy array
Beat positions [seconds].
Notes
Both the file names to search for the beats as well as the suffix to determine the beat files must be given at instantiation time.
-
static
add_arguments
(parser, beats=<open file '<stdin>', mode 'r'>, beats_suffix='.beats.txt')[source]¶ Add beat loading related arguments to an existing parser.
Parameters: - parser : argparse parser instance
Existing argparse parser object.
- beats : FileType, optional
Where to read the beats from (‘single’ mode).
- beats_suffix : str, optional
Suffix of beat files (‘batch’ mode)
Returns: - argparse argument group
Beat loading argument parser group.
-
-
class
madmom.features.downbeats.
SyncronizeFeaturesProcessor
(beat_subdivisions, fps, **kwargs)[source]¶ Synchronize features to beats.
First, divide a beat interval into beat_subdivision divisions. Then summarise all features that fall into one subdivision. If no feature value for a subdivision is found, it is set to 0.
Parameters: - beat_subdivisions : int
Number of subdivisions a beat is divided into.
- fps : float
Frames per second.
-
process
(data, **kwargs)[source]¶ Synchronize features to beats.
Average all feature values that fall into a window of beat duration / beat subdivisions, centered on the beat positions or interpolated subdivisions, starting with the first beat.
Parameters: - data : tuple (features, beats)
Tuple of two numpy arrays, the first containing features to be synchronized and second the beat times.
Returns: - numpy array (num beats - 1, beat subdivisions, features dim.)
Beat synchronous features.
-
class
madmom.features.downbeats.
RNNBarProcessor
(beat_subdivisions=(4, 2), fps=100, **kwargs)[source]¶ Retrieve a downbeat activation function from a signal and pre-determined beat positions by obtaining beat-synchronous harmonic and percussive features which are processed with a GRU-RNN.
Parameters: - beat_subdivisions : tuple, optional
Number of beat subdivisions for the percussive and harmonic feature.
References
[1] Florian Krebs, Sebastian Böck and Gerhard Widmer, “Downbeat Tracking Using Beat-Synchronous Features and Recurrent Networks”, Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), 2016. Examples
Create an RNNBarProcessor and pass an audio file and pre-determined (or given) beat positions through the processor. The returned tuple contains the beats positions and the probability to be a downbeat.
>>> proc = RNNBarProcessor() >>> proc <madmom.features.downbeats.RNNBarProcessor object at 0x...> >>> beats = np.loadtxt('tests/data/detections/sample.dbn_beat_tracker.txt') >>> downbeat_prob = proc(('tests/data/audio/sample.wav', beats)) >>> np.around(downbeat_prob, decimals=3) ... array([[0.1 , 0.378], [0.45 , 0.19 ], [0.8 , 0.112], [1.12 , 0.328], [1.48 , 0.27 ], [1.8 , 0.181], [2.15 , 0.162], [2.49 , nan]])
-
process
(data, **kwargs)[source]¶ Retrieve a downbeat activation function from a signal and beat positions.
Parameters: - data : tuple
Tuple containg a signal or file (handle) and corresponding beat times [seconds].
Returns: - numpy array, shape (num_beats, 2)
Array containing the beat positions (first column) and the corresponding downbeat activations, i.e. the probability that a beat is a downbeat (second column).
Notes
Since features are synchronized to the beats, and the probability of being a downbeat depends on a whole beat duration, only num_beats-1 activations can be computed and the last value is filled with ‘NaN’.
-
class
madmom.features.downbeats.
DBNBarTrackingProcessor
(beats_per_bar=(3, 4), observation_weight=100, meter_change_prob=1e-07, **kwargs)[source]¶ Bar tracking with a dynamic Bayesian network (DBN) approximated by a Hidden Markov Model (HMM).
Parameters: - beats_per_bar : int or list
Number of beats per bar to be modeled. Can be either a single number or a list or array with bar lengths (in beats).
- observation_weight : int, optional
Weight for the downbeat activations.
- meter_change_prob : float, optional
Probability to change meter at bar boundaries.
Examples
Create a DBNBarTrackingProcessor. The returned array represents the positions of the beats and their position inside the bar. The position inside the bar follows the natural counting and starts at 1.
The number of beats per bar which should be modelled must be given, all other parameters (e.g. probability to change the meter at bar boundaries) are optional but must have the same length as beats_per_bar.
>>> proc = DBNBarTrackingProcessor(beats_per_bar=[3, 4]) >>> proc <madmom.features.downbeats.DBNBarTrackingProcessor object at 0x...>
Call this DBNDownBeatTrackingProcessor with beat positions and downbeat activation function returned by RNNBarProcessor to obtain the positions.
>>> beats = np.loadtxt('tests/data/detections/sample.dbn_beat_tracker.txt') >>> act = RNNBarProcessor()(('tests/data/audio/sample.wav', beats)) >>> proc(act) array([[0.1 , 1. ], [0.45, 2. ], [0.8 , 3. ], [1.12, 1. ], [1.48, 2. ], [1.8 , 3. ], [2.15, 1. ], [2.49, 2. ]])
-
process
(data, **kwargs)[source]¶ Detect downbeats from the given beats and activation function with Viterbi decoding.
Parameters: - data : numpy array, shape (num_beats, 2)
Array containing beat positions (first column) and corresponding downbeat activations (second column).
Returns: - numpy array, shape (num_beats, 2)
Decoded (down-)beat positions and beat numbers.
Notes
The position of the last beat is not decoded, but rather extrapolated based on the position and meter of the second to last beat.
-
classmethod
add_arguments
(parser, beats_per_bar, observation_weight=100, meter_change_prob=1e-07)[source]¶ Add DBN related arguments to an existing parser.
Parameters: - parser : argparse parser instance
Existing argparse parser object.
- beats_per_bar : int or list, optional
Number of beats per bar to be modeled. Can be either a single number or a list with bar lengths (in beats).
- observation_weight : float, optional
Weight for the activations at downbeat times.
- meter_change_prob : float, optional
Probability to change meter at bar boundaries.
Returns: - parser_group : argparse argument group
DBN bar tracking argument parser group