Transcription et séparation automatique de la mélodie principale dans les signaux de musique polyphoniques

Besoin d'aide ?

(Nombre de téléchargements - 4)

Le contenu de quelques documents est entièrement accessible après abonnement ** Pour des questions et des demandes, contactez notre service d’assistance WhatsApp : +64-7-788-0271 ** E-mail : [email protected]

Table des matières

0.1 Introduction
0.1.1 Le traitement automatique des signaux musicaux
0.1.2 Extraction automatique de la mélodie principale
0.1.3 Séparation de l’instrument principal et de l’accompagnement
0.1.4 Contributions
0.2 Modèles de signaux
0.2.1 Modèle gaussien pour la transformée de Fourier des signaux
0.2.2 Modèle à Mélange de Gaussiennes Amplifiées avec Source/Filtre
0.2.3 Modèle de mélange instantané
0.2.4 Modèle pour l’évolution temporelle
0.3 Estimation des paramètres et des séquences cachées
0.3.1 Description des systèmes proposés
0.3.2 Méthode de gradient multiplicatif pour le (S)IMM
0.3.3 Algorithme GEM pour le (S)GSMM
0.3.4 Décodage de séquences
0.3.4.1 Algorithme de Viterbi
0.3.4.2 Algorithme de recherche par faisceaux
0.4 Applications : Extraction de la mélodie principale
0.5 Applications : Séparation de l’instrument principal
0.6 Conclusions et perspectives
Notations
1 Introduction
1.1 Automatic music signal processing
1.2 Main melody estimation
1.3 De-soloing: leading instrument separation
1.4 Contributions
1.5 Organization
2 State of the art
2.1 What is the “main melody”?
2.1.1 A definition for the main melody
2.1.2 Main melody: counter-examples
2.1.3 Scope of this work
2.2 Main melody estimation
2.2.1 Main melody extraction: historical objectives and applications
2.2.2 Frame-wise fundamental frequency estimation of the main melody
2.2.2.1 Existing approaches
2.2.2.2 Discussion and position of the thesis work
2.2.3 Note-wise approaches
2.3 Source separation, leading instrument separation
2.3.1 Source separation
2.3.2 Audio and music source separation
2.3.2.1 Existing systems
2.3.2.2 Position of the thesis work
3 Signal Model
3.1 Modelling the spectrum of the audio signals
3.2 Gaussian Signals
3.3 Primary model for a “voice plus music” polyphonic signal
3.3.1 Graphical generative model
3.3.2 Frame level generative models
3.3.2.1 Source/filter model for the singing voice
3.3.2.2 Instantaneous mixture for the accompaniment
3.3.2.3 Frame level model for the mixture: summary
3.3.3 Physical state layer: constraining the fundamental frequency evolution of the singing voice
3.3.4 “Musicological” state layer to model note level duration
3.4 From the GSMM to the Instantaneous Mixture Model (IMM): links and differences
3.4.1 IMM: formulation and interpretations
3.4.2 Adaptation of the temporal constraint for the evolution of the sequence Z F0
3.4.3 Constraints in SIMM to approximate the monophonic assumption
3.5 Signal Model Summary
3.5.1 Source/Filter (S)GSMM
3.5.2 Source/Filter (S)IMM
4 Probabilistic Non-negative Matrix Factorisation (NMF)
4.1 Non-negative Matrix Factorisation
4.2 Statistical interpretation of Itakura-Saito-NMF (IS-NMF)
4.3 Properties of the Itakura-Saito (IS) divergence
5 Parameter and sequence estimation
5.1 Transcription and separation as statistical estimation
5.1.1 Estimation by Maximum Likelihood (ML) and Maximum A Posteriori (MAP) principle
5.1.2 Predominant fundamental frequency estimation
5.1.3 Musical (notewise) transcription of the main melody
5.1.4 Leading instrument / accompaniment separation
5.1.5 Systems summary
5.2 IMM and SIMM: Multiplicative gradient algorithm
5.2.1 Maximum A Posteriori (MAP) Criterion for the IMM/SIMM
5.2.2 IMM/SIMM updating rules
5.2.3 Approximations and constraints within the IMM/SIMM
5.3 GSMM/SGSMM: Expectation-Maximisation (EM) algorithm
5.3.1 Maximum Likelihood (ML) Criterion for the (S)GSMM
5.3.2 (S)GSMM updating rules and GEM algorithm
5.3.3 Including constraints: Hidden Markov-GSMM (HM-GSMM) algorithm
5.4 Temporal evolution of the states and sequence estimation
5.4.1 Viterbi algorithm to address the HMM of the physical layer for ZΦ and ZF0
5.4.2 Beam search pruning strategy for the musical note layer E
6 Applications
6.1 F0 estimation and musical transcription of the main melody
6.1.1 Frame-wise F0 estimation of the melody
6.1.1.1 Task definition
6.1.1.2 Proposed methods
6.1.1.3 Performance measures
6.1.1.4 Datasets for evaluation
6.1.1.5 Practical choices for the model parameters
6.1.1.6 Convergence
6.1.1.7 Comparison between the proposed models (S)GSMM and (S)IMM
6.1.1.8 MIREX 2008: Main Melody Estimation Results
6.1.1.9 MIREX 2009: comparison with MIREX 2008 on development sets
6.1.1.10 MIREX 2009: results on test set
6.1.1.11 Preliminary results for system F-III
6.1.2 Notewise transcription of the melody
6.1.2.1 Task definition
6.1.2.2 Performance measures
6.1.2.3 Results on a synthetic database (ISMIR 2009)
6.1.2.4 Results for the Quaero evaluation campaign
6.2 Audio separation of the main instrument and the accompaniment
6.2.1 Task definition
6.2.2 Wiener filters
6.2.3 Performance measures
6.2.4 Proposed source separation systems
6.2.4.1 System SEP-I for mono music audio signals
6.2.4.2 Extension to stereo signals
6.2.4.3 Parameter estimation for stereo signals
6.2.5 Experiments and results
6.2.5.1 Datasets
6.2.5.2 Melody Tracking Performance
6.2.5.3 Source Separation with the True Pitch Contour
6.2.5.4 Source Separation with Estimated Melody
6.2.5.5 Multitrack example
6.2.5.6 Stereo signal + unvoiced extension
6.2.5.7 Smooth filters and unvoicing model
6.2.5.8 Stereophonic vs. monophonic algorithm
6.2.5.9 SiSEC campaign results
6.2.5.10 Evaluation on the Quaero Source Separation Database
6.2.5.11 Note on the front-end melody estimation systems: F-I,F-II or F-III?
7 Conclusion
7.1 Conclusions
7.2 Potential improvements
7.2.1 Even more “Musicological” model for note duration
7.2.2 A more complex physical layer
7.2.3 Accompaniment model: towards more supervision?
7.2.4 Decidedly perfectible models.
Glossary
A Probability density function definitions
A.1 Complex proper Gaussian distribution Nc
A.1.1 Complex proper Gaussian distribution definition
A.1.2 Complex proper Gaussian distribution properties
A.2 Gamma distribution G
B Derivation of the algorithms
B.1 (S)IMM multiplicative algorithm derivations
B.1.1 Multiplicative gradient principle
B.1.2 IMM and Itakura-Saito multiplicative rules
B.2 (S)GSMM: Expectation-Maximisation algorithm derivations
B.2.1 E step: Computing the posterior p(k, u|xn; (ΘGSMM) (i−1))
B.2.2 M step: amplitude coefficients B
B.2.3 M step: wΦf k
B.2.4 M step: hΓpk (SGSMM)
B.2.5 M step: hMrn
B.2.6 M step: wMfr
B.2.7 M step: Derivations for the a priori probabilities π
B.2.8 Temporal constraint with HMM during the estimation: adaptation of E-step
B.3 Multiplicative algorithm behaviour
C KLGLOTT88 : a glottal source model
D Databases
D.1 MIREX AME databases
D.2 Quaero Main Melody Database
D.3 Leading instrument / accompaniment separation mono database
D.4 Quaero Source Separation Database
Bibliography

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *