COOKIES: By using this website you agree that we can place Google Analytics Cookies on your device for performance monitoring. |
University of Cambridge > Talks.cam > CUED Speech Group Seminars > Combination of Deep Speaker Embeddings for Diarisation and Discriminative Neural Clustering for Speaker Diarisation
Combination of Deep Speaker Embeddings for Diarisation and Discriminative Neural Clustering for Speaker DiarisationAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Dr Kate Knill. Seminar on zoom Combination of Deep Speaker Embeddings for Diarisation Brian Sun Abstract: Recently, significant progress has been made in speaker diarisation after the introduction of d-vectors as speaker embeddings extracted from neural network (NN) speaker classifiers for clustering speech segments. To extract better-performing and more robust speaker embeddings, this paper proposes a c-vector method by combining multiple sets of complementary d-vectors derived from systems with different NN components. Three structures are used to implement the c-vectors, namely 2D self- attentive, gated additive, and bilinear pooling structures, relying on attention mechanisms, a gating mechanism, and a low-rank bilinear pooling mechanism respectively. Furthermore, a neural-based single-pass speaker diarisation pipeline is also proposed in this paper, which uses NNs to achieve voice activity detection, speaker change point detection, and speaker embedding extraction. Experiments and detailed analyses are conducted on the challenging AMI and NIST RT05 datasets which consist of real meetings with 4–10 speakers and a wide range of acoustic conditions. Consistent improvements are obtained by using c-vectors instead of d-vectors, and similar relative improvements in diarisation error rates are observed on both AMI and RT05 , which shows the robustness of the proposed methods. Discriminative Neural Clustering for Speaker Diarisation Quijia Li and Florian Kreyssig Abstract: In this paper, we propose Discriminative Neural Clustering (DNC) that formulates data clustering with a maximum number of clusters as a supervised sequence-to-sequence learning problem. Compared to traditional unsupervised clustering algorithms, DNC learns clustering patterns from training data without requiring an explicit definition of a similarity measure. An implementation of DNC based on the Transformer architecture is shown to be effective on a speaker diarisation task using the challenging AMI dataset. Since AMI contains only 147 complete meetings as individual input sequences, data scarcity is a significant issue for training a Transformer model for DNC . Accordingly, this paper proposes three data augmentation schemes: sub-sequence randomisation, input vector randomisation, and Diaconis augmentation, which generates new data samples by rotating the entire input sequence of L2-normalised speaker embeddings. Experimental results on AMI show that DNC achieves a reduction in speaker error rate (SER) of 29.4% relative to spectral clustering. This talk is from SLT 2021 where it was awarded Best Student Paper. This talk is part of the CUED Speech Group Seminars series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsInstitution of Civil Engineers (Cambridge Branch) External Seminar Cambridge University Railway ClubOther talksTechWolf: Deep learning for Career Prediction Climate refugees: The big challenge of Public International Law and the European Union AI: the model is simple until proven otherwise Title: Predicting treatment response for advanced ovarian carcinoma Understanding and improving regulation of photosynthesis |