Music Information Retrieval (MIR)
Marc Groenewegen, HKU Music & Technology, 2025
MIR is a research field that tries to understand sound such that it can be described with certain features.
Levels of features
Physical and mathematical features
- frequency
- pitch and pitch map
- waveform
- spectral flatness
- transients: onset/offset & note duration
- tempo and tempo map
High-level/second order features
- melody
- key
- (dis)harmony
- rhythm
- timbre
- genre
- style
- composer
- artist
Techniques for feature extraction
- time-domain (zero crossing, autocorrelation)
- frequency-domain: FFT / Wavelet / Cepstrum
- Mel-frequency cepstral coefficients (MFCC) revealing periodic structures in spectral developments
- transient detection
YIN
- difference function = the squared difference between a signal and its delayed version
- find the first minimum, which indicates the fundamental
- minimise octave errors
https://www.youtube.com/watch?v=W585xR3bjLM
YINFFT
- use power spectrum as more efficient way for auto-correlation : multiply instead of convolute
Cepstrum
Inverse FFT (or just FFT with different scaling) of log magnitude spectrum. The highest peak (excluding the DC component) indicates the fundamental frequency.
{ IFFT (log( FFT(f(t))) ^ 2) } ^ 2
AI
basic pitch
- https://basicpitch.spotify.com
- https://github.com/spotify/basic-pitch
Applications
- 'Shazam'
- source separation
- score following
- automatic categorisation / labeling & recommendations
- DRM
- real time control
- time stretching, pitch shifting, tempo adaptation
- voice commands
- music education tools, e.g. Guitar Hero, PlayGuru
- auto-tune
- auto-sync
- guitar tuner
- noise reduction
- mixing assistance (trick: noise mixing)
Libraries & tools
Paper: AN EVALUATION OF AUDIO FEATURE EXTRACTION TOOLBOXES --> tables on page 4 and 5 about feature extraction tools
-
-
build -> onset, pitch
-
aubioonset -> (MIDI) fluidsynth
-
aubionotes -> (MIDI) fluidsynth
-
tempo_plot.py
-
tapthebeat.py
-
onset_plot.py
-
-
FFTW - versatile FFT library
-
Python DSP scipy.fftpack -> fft,ifft
-
librosa a Python package for music and audio analysis
Demos
-
AudioPlot (linear and log spectrum)
-
Pitch vs frequency
-
Pitch detection with Aubio
-
Onset detection with Aubio
-
Sonic Visualiser - spectral analysis and much more
-
[TEMPORARILY OUT OF ORDER] Source separation using Spleeter
-
[TEMPORARILY OUT OF ORDER] DAW-based: Ardour onset detection, time stretching
-
[TEMPORARILY OUT OF ORDER] PlugData within Ardour
Books
Meinard Mueller: "Fundamentals of Music Processing" (available in Mediatheek IBB-laan)
Research assignment
Using sources like websites, books and papers, see what MIR can mean to one of your projects. Can you use it as a controller? Try not to discard anything at first but keep your mind open for new insights. When you are ready to use a technique that is the time to do a reality check and see whether your ideas are feasible.
Some starters:
- what information can you get from live audio signals?
- what information can you get from pre-recorded audio signals?
- think of new applications of MIR
- which features can you apply to controlling elements of an installation, performance or application?
- study options for mapping: can you use the feature for fast real-time control or is it interesting for longer time spans?
- find out which frameworks, libraries etc. you can use for feature extraction
- when you're ready to test or use a certain technique, observe factors like latency and processing power