Symposium for Celebrating 40 Years of Bayesian Learning in Speech and Language Processing and Beyond

Taipei, ASRU Satellite Event, December 20th, 2023



Introduction

Since the first Bayesian learning paper was published in ICASSP1983 [1], we have witnessed quite a few studies in the next 20 years on extending Bayesian learning to maximum a posteriari (MAP) estimation hidden Markov model (HMM) [2-3]. Online adapation of HMM and correlated HMM [4-5] have followed. Next, the popular maximum likelihood linear regressiong (MLLR) adaptation approach was extended to MAPLR and joint estimation [9]. To handle unseen units, structural MAP (SMAP) was developed [8] and extended to SMAPLR [10]. Online adaptation is often referred to as temporal prior evolution while tree-based SMAP is also known as spatial prior evolution. In contrast to MAP, variational Bayesian [11] and Bayesian predictive classication [6] approaches have also been developed to extend from point MAP to overall Bayesian estimation. A review of Bayesian learning for speech and lanaguage processing can be found in [7], while a book on variational Bayeasian learning theory was also published [15]. More recently, Bayesian learning has been extended to handling DNN parameters [12-14]. We expect this direction to be extensively studied in the future, especially in the modern era of generative AI and large pre-trained models in which transfer learning becomes a viable tool to adapt general-purpose models to specific domains and applications.



Tentative Schedule (A total of four and half hours after ASRU2023 at 14:00-18:30 on December 20)

  • Plenary Speaker: Chin-Hui Lee (30 minutes)
  • Six Invited Speakers (Key contributions to Bayesian Learning in speech and language processing in the last 40 years): Qiang Huo, Torbjørn Svendsen (or Olivier Siohan), Shinji Watanabe, Koichi Shinoda, Jen-Tzung Chien, Marco Siniscalchi (15 minutes each with a total of 90 minutes)
  • Panel Discussion: all seven invited speakers as panelists (30 minutes)
  • Break and Social Discussions (30 minutes)
  • Poster Session: 12-15 posters (90 minutes)
  • Workshop Dinner: hosted by the Organizer right after the Symposium


14:00 Historical Perspective & Beyond C.-H. Lee
14:30 Online and Correlated HMMs Q. Huo
14:45 Joint MAP of LR and HMMs T. K. Svendsen
15:00 Variational Bayesian Learning S. Watanabe
15:15 Structural MAP for LR & HMMs K. Shinoda
15:30 MAP for N-grams and Beyond J.-T. Chien
15:45 MAP for DNN Parameters S.M. Siniscalchi
16:00 Panel Discussion All 7 speakers
166:30 Break
17:00 Poster Contributions All participants
18:30 Closing



Honorary Committee Chair

Chin-Hui Lee
Georgia Institute of Technology

Invited Speakers

Qiang Huo
Microsoft Research
Torbjørn Svendsen
Norwegian University of Science and Technology
Shinji Watanabe
Carnegie Mellon University
Koichi Shinoda
Tokyo Institute of Technology
Jen-Tzung Chien
National Chiao Tung University
Sabato Marco Siniscalchi
Kore University of Enna

Organizers

Jinyu Li
Microsoft
Chao-Han Huck Yang
Amazon Alexa Speech
Chao Zhang
Tsinghua University
Hsin-Min Wang
Academia Sinica
Yu Tsao
Academia Sinica

Publications

All contributions to this Bayesian Celebration Workshop can be summarized in an abstract (limited to 200 words) to be published in the ASRU2023 Workshop Proceedings, 12-15 poster contributions with relevant topics to Bayesian Learning will be selected from submissions and reviewed by the Organizers (Call for Contributions will be sent to all potential participants and published in the ASRU website soon). Presentation materials, including 1-page summary or posters with references will be published in a symposium page hyperlinked to ASRU website.


Registration

Participants need to register separately from the main ASRU Workshop. A fee of USD$120 (covering Workshop, Proceedings, Break) is required for registering the Bayesian Symposium. ASRU participants are welcome to join this Celebration Workshop with an extra $100 (Satellite Workshop registration will be done separately from ASRU Workshop registration). Invited speakers are waived from paying registration fees.


Selected Key References

  1. P. Brown, C.-H. Lee, and J. Spohrer, "Bayesian Adaptation in Speech Rcognition," Proc. ICASSP, 1983.
  2. C.-H. Lee, C.-H. Lin and B.-H. Juang, "A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models," IEEE Trans. Acoustics, Speech and Signal Proc., Vol. ASSP-39, No. 4, pp. 806-814, April 1991.
  3. J.-L. Gauvain and C.-H. Lee, "Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains," IEEE Trans. on Speech and Audio Proc., Vol. 2, No. 2, pp. 291-298, April 1994.
  4. Q. Huo, and C.-H. Lee, "On-line Adaptive Learning of the Continuous Density Hidden Markov Model Based on Approximate Recursive Bayes Estimate," IEEE Trans. on Speech and Audio Proc., Vol. 5, No. 2, pp. 161-172, March 1997.
  5. Q. Huo and C.-H. Lee, "On-line Adaptive Learning of the Correlated Continuous Density Hidden Markov Model for Speech Recognition," IEEE Trans. on Speech and Audio Proc., Vol. 6, No. 4, pp. 386-397, July 1998.
  6. Q. Huo and C.-H. Lee, "A Bayesian Predictive Classification Approach to Robust Speech Recognition," IEEE Trans. on Speech and Audio Proc., Vol. 8, No. 2, pp. 200-204, March 2000.
  7. C.-H. Lee and Q. Huo, "On Adaptive Decision Rules and Decision Parameter Adaptation for Automatic Speech Recognition," Proceedings of the IEEE, Vol. 88, No. 8, pp. 1241-1269, August 2000.
  8. K. Shinoda and C.-H. Lee, "A Structural Bayes Approach to Speaker Adaptation," IEEE Trans. on Speech and Audio Proc., Vol. 9, No. 3, pp. 276-287, March 2001.
  9. O. Siohan, C. Chesta and C.-H. Lee, "Joint Maximum a Posteriori Adaptation of Transformation and HMM Parameters," IEEE Trans. on Speech and Audio Proc., Vol. 9, No. 4, pp. 417-428, May 2001.
  10. O. Siohan, T. A. Myrvoll and C.-H. Lee, "Structural Maximum A Posteriori Linear Regression for HMM Adaptation," Computer Speech and Language, Vol. 16, No. 1, pp. 5-24, Jan. 2002.
  11. S. Watanabe, Y. Minami, A. Nakamura, and N. Ueda, "Application of Variational Bayesian Approach to Speech Recognition," NIPS, 2002.
  12. Z. Huang, S. M. Siniscalchi and C.-H. Lee, "A Unified Approach to Transfer Learning of Deep Neural Networks with Applications to Speaker Adaptation in Automatic Speech Recognition," Neurocomputing, September 2016.
  13. Z. Huang, S. M. Siniscalchi, and C.-H. Lee, "Bayesian Unsupervised Batch and Online Speaker Adaptation of Activation Function Parameters in Deep Models for Automatic Speech Recognition," IEEE/ACM Trans. Audio, Speech and Language Proc., Vol. 25, No. 1, pp. 60-71, January 2017.
  14. Z. Huang, S. M. Siniscalchi, and C.-H. Lee, "Hierarchical Bayesian Combinations of Plug-in Maximum A Posteriori Decoder in DNN-Based Speech Recognition and Speaker Adaptation,” Pattern Recognition Letters, pp. 1-7, Vol. 98, No.1, 2017.
  15. S. Nakajima, K. Watanabe, M. Sugiyama, "Variational Bayesian Learning Theory," Cambridge University Press, 2019.
Website theme is modified and inspired from the VIGIL workshop Series. Florian S. et al.