Sei Ueno

Assistant Professor Nagoya Institute of Technology, Department of Computer Scitence, Japan

sei.ueno[at]nitech.ac.jp

Education

April 2022 – current: Assistant Professor of Nagoya Institute of Technology, Department of Computer Science, Japan.
April 2019 – March 2022: Doctoral Program of Graduate School of Informatics, Kyoto University, Department Intelligence Science and Technology, Japan. Supervisor: Tatsuya Kawahara
April 2017 – March 2019: Master Program of Graduate School of Informatics, Kyoto University, Department of Intelligence Science and Technology, Japan. Supervisor: Tatsuya Kawahara
April 2013 – March 2017: Doshisha University Faculty of Science and Engineering, Department of Information Systems Design, Japan.

Research

Speech Recognition
Speech Synthesis

Awards

March 2023: IPSJ Yamashita SIG Research Award
February 2019: Student Paper Award of The Information Processing Society of Japan, Special Interest Groups-Spoken Language Processing
September 2018: Student Presentation Award of The Acoustical Society of Japan
August 2018: Student Poster Award of The Institute of Electronics, Information and Communication Engineers, The Award of the Acoustical Society of Japan
September 2015: 1st Place, TOYOTA HSR Hackathon 2015

Journal papers (first author)

Sei Ueno, Akinobu Lee, Tatsuya Kawahara: Refining Synthesized Speech Using Speaker Information and Phone Masking for Data Augmentation of Speech Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol.32, pp.3924-3933, 2024.
Sei Ueno, Akinobu Lee: Multi-setting acoustic feature training for data augmentation of speech recognition. Acoustical Science and Technology, Vol.45, Issue 4, pp.195-203, 2024.
Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara: Synthesizing Waveform Sequence-to-sequence to Augment Training Data for Sequence-to-sequence Speech Recognition. Acoustical Science and Technology, Vol.42, Issue 6 pp.333-343, 2021.

International conference papers (first author)

Sei Ueno, Tatsuya Kawahara: Phone-Informed Refinement of Synthesized Mel Spectrogram for Data Augmentation in Speech Recognition. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.8572-8576, 2022.
Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara: Data Augmentation for ASR Using TTS Via a Discrete Representation, In Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp.68-75, 2021.
Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara: Multi-speaker Sequence-to-sequence Speech Synthesis for Data Augmentation in Acoustic-to-word Speech Recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.6161-6165, 2019.
Sei Ueno, Takafumi Moriya, Mimura Mimura, Shinsuke Sakai, YoshikazuYamaguchi, Yushi Aono, Tatsuya Kawahara: Encoder Transfer for Attention-based Acoustic-to-word Speech Recognition. INTERSPEECH, pp.2424-2428, 2018.
Sei Ueno, Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara: Acoustic-to-word Attention-based Model Complemented with Character-level CTC-based Model.International Conference on Acoustics, Speech, and SignalProcessing (ICASSP), pp.5804-5808, 2018.

International conference papers (co-author)

Keigo Ichikawa, Sei Ueno, Akinobu Lee: ,Data generation for speaker diarization by speaker transition information, APSIPA ASC pp., 2024.
Iago Lourenço Correa, Sei Ueno, Akinobu Lee: Accent-Preserving Voice Conversion between Native-Nonnative Speakers for Second Language Learning, APSIPA ASC, pp.1179-1186, 2023.
Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara: Non-Autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM. INTERSPEECH pp.3889–3893, 2022.
Han Feng, Sei Ueno, Tatsuya Kawahara: End-to-End Speech Emotion Recognition Combined with Acoustic-to-Word ASR Model. INTERSPEECH, pp.501–505, 2020.
Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara: Distilling the Knowledge of BERT for Sequence-to-Sequence ASR. INTERSPEECH, pp.3635–3639, 2020.
Viet-Trung Dang, Tianyu Zhao,Sei Ueno, Hirofumi Inaguma, Tatsuya Kawahara: End-to-End Speech-to-Dialog-Act Recognition. INTERSPEECH, pp.3910–3914, 2020.
Kohei Matsuura, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara: Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language. International Conference on Language Resources and Evaluation (LREC), pp.2622-2628,2020.
Takafumi Moriya, Sei Ueno, Yusuke Shinohara, Marc Delcroix, YoshikazuYamaguchi, Yushi Aono: Multi-task Learning with Augmentation Strategy for Acoustic-to-word Attention-based Encoder-decoder Speech Recognition. INTERSPEECH, pp.2399-2403, 2018.
Masato Mimura, Sei Ueno, Hirofumi Inaguma, Shinsuke Sakai, and Tatsuya Kawahara: Leveraging sequence-to-sequence Speech Synthesis for Enhancing Acoustic-to-word Speech Recognition. Workshop On Spoken Lan-guage Technology (SLT), pp.477-484, 2018.

Grants / Fellowships

2019 – 2022 Japan Society for the Promotion of Science (JSPS) Research Fellowship for Young Scientists (DC1)