General model
(Yeh 2008)
- Imperfect signals
- Inharmonicity
- Resonance
- Surrounding noise
- \(x(t) = \tilde{x}(t) + z(t)\)
- \(x(t)\) is quasi-periodic
- Performed on short-time periods we refer to as frames using a sliding windowing function
Classification
- Sound: Monophonic (single pitch) vs. Polyphonic (multiple pitch)
- Analysis: Time domain vs. Spectral domain
Single pitch estimation
\[\tilde{x}(t)=\sum_{h=1}^{\infty} A_h\cos(2\pi f_0 t + \varphi_h)
\approx\sum_{h=1}^{H} A_h\cos(2\pi f_0 t + \varphi_h)\]
Task: find \(f_0\)
Time domain
- Analyse signal \(x(t)\) directly with respect to time.
- Compare signal \(x(t)\) with a delayed version of itself \(x(t+\tau)\)
- Similarity/dissimilarity functions
Autocorrelation Function (ACF)
\[r[\tau] = \sum_{t=1}^{N-\tau} x[t]x[t+\tau]\]
- Attains local maximum for \(\tau\approx mT\)
- Sensitive to structures in signals
- (+): useful for speech detection
- (-): resonance structures in music signals
Average Magnitude Difference Function (AMDF)
\[d_{\text{AM}}[\tau] = \frac{1}{N}
\sum_{t=1}^{N-\tau} \left\lvert x[t]-x[t+\tau]\right\rvert\] (Ross et al. 1974)
- Attains local minimum for \(\tau\approx mT\)
- More adapted for music signals
Squared difference function (SDF)
\[d[\tau] = \sum_{t=1}^{N-\tau}(x[t]-x[t+\tau])^2\]
- Attains local minimum for \(\tau\approx mT\)
- Accentuates dips at corresponding periods
- More clear local minima
YIN algorithm (Cheveigné and Kawahara 2002)
Cumulative mean normalized function: \[d[\tau] = \sum_{t=1}^{N-\tau}(x[t]-x[t+\tau])^2\] \[d_{\text{YIN}}[\tau] = \begin{cases}
1 &\text{if}~\tau = 0\\
d[\tau] / \frac{1}{\tau}\sum\limits_{t=0}^{\tau} d[t]
&\text{otherwise}
\end{cases}\]
- Starts at 1 rather than 0
- Divides SDF by its average over shorter lags
- Tends to stay large at short lags
- Drops when SQD falls under its average
Spectral domain
- Analyse fourier transform \(X(f)\) of the signal
- The spectrum of a signal in the magnitude of its fourier transform \(S(f)=\left\lvert X(f)\right\rvert\)
- Local maxima of the spectrum correspond to frequencies of the signal
- Analyse spectrum patterns with adapted similarity/dissimilarity functions