Skoči na glavni sadržaj

Izvorni znanstveni članak

https://doi.org/10.7305/automatika.2016.07.1084

Towards automatic cross-lingual acoustic modelling applied to HMM-based speech synthesis for under-resourced languages

Tadej Justin ; Laboratory of Artificial Perception, Systems and Cybernetics (LUKS), Faculty of Electrical Engineering, University of Ljubljana, Tržaška 25, SI-1000 Ljubljana, Slovenia
France Mihelič ; Laboratory of Artificial Perception, Systems and Cybernetics (LUKS), Faculty of Electrical Engineering, University of Ljubljana, Tržaška 25, SI-1000 Ljubljana, Slovenia
Janez Žibert orcid id orcid.org/0000-0003-2312-5431 ; Faculty of Health Sciences, University of Ljubljana, Zdravstvena pot 5, SI-1000 Ljubljana, Slovenia


Puni tekst: engleski pdf 918 Kb

str. 268-281

preuzimanja: 568

citiraj


Sažetak

Nowadays Human Computer Interaction (HCI) can also be achieved with voice user interfaces (VUIs). To enable devices to communicate with humans by speech in the user's own language, low-cost language portability is often discussed and analysed. One of the most time-consuming parts for the language-adaptation process of VUI-capable applications is the target-language speech-data acquisition. Such data is further used in the development of VUIs subsystems, especially of speech-recognition and speech-production systems.The tempting idea to bypass a long-term process of data acquisition is considering the design and development of an automatic algorithms, which can extract the similar target-language acoustic from different language speech databases.This paper focus on the cross-lingual phoneme mapping between an under-resourced and a well-resourced language. It proposes a novel automatic phoneme-mapping technique that is adopted from the speaker-verification field. Such a phoneme mapping is further used in the development of the HMM-based speech-synthesis system for the under-resourced language. The synthesised utterances are evaluated with a subjective evaluation and compared by the expert knowledge cross-language method against to the baseline speech synthesis based just from the under-resourced data. The results reveals, that combining data from well-resourced and under-resourced language with the use of the proposed phoneme-mapping technique, can improve the quality of under-resourced language speech synthesis.

Ključne riječi

voice user interfaces; human language technologies; HMM-based speech synthesis; cross-language synthesis; under-resourced languages; UBM-MAP-GMM phoneme mapping

Hrčak ID:

165554

URI

https://hrcak.srce.hr/165554

Datum izdavanja:

1.9.2016.

Podaci na drugim jezicima: hrvatski

Posjeta: 1.230 *