Original scientific paper
https://doi.org/10.7305/automatika.2016.07.1084
Towards automatic cross-lingual acoustic modelling applied to HMM-based speech synthesis for under-resourced languages
Tadej Justin
; Laboratory of Artificial Perception, Systems and Cybernetics (LUKS), Faculty of Electrical Engineering, University of Ljubljana, Tržaška 25, SI-1000 Ljubljana, Slovenia
France Mihelič
; Laboratory of Artificial Perception, Systems and Cybernetics (LUKS), Faculty of Electrical Engineering, University of Ljubljana, Tržaška 25, SI-1000 Ljubljana, Slovenia
Janez Žibert
orcid.org/0000-0003-2312-5431
; Faculty of Health Sciences, University of Ljubljana, Zdravstvena pot 5, SI-1000 Ljubljana, Slovenia
Abstract
Nowadays Human Computer Interaction (HCI) can also be achieved with voice user interfaces (VUIs). To enable devices to communicate with humans by speech in the user's own language, low-cost language portability is often discussed and analysed. One of the most time-consuming parts for the language-adaptation process of VUI-capable applications is the target-language speech-data acquisition. Such data is further used in the development of VUIs subsystems, especially of speech-recognition and speech-production systems.The tempting idea to bypass a long-term process of data acquisition is considering the design and development of an automatic algorithms, which can extract the similar target-language acoustic from different language speech databases.This paper focus on the cross-lingual phoneme mapping between an under-resourced and a well-resourced language. It proposes a novel automatic phoneme-mapping technique that is adopted from the speaker-verification field. Such a phoneme mapping is further used in the development of the HMM-based speech-synthesis system for the under-resourced language. The synthesised utterances are evaluated with a subjective evaluation and compared by the expert knowledge cross-language method against to the baseline speech synthesis based just from the under-resourced data. The results reveals, that combining data from well-resourced and under-resourced language with the use of the proposed phoneme-mapping technique, can improve the quality of under-resourced language speech synthesis.
Keywords
voice user interfaces; human language technologies; HMM-based speech synthesis; cross-language synthesis; under-resourced languages; UBM-MAP-GMM phoneme mapping
Hrčak ID:
165554
URI
Publication date:
1.9.2016.
Visits: 1.701 *