Interactive Intonation Optimisation Using CMA-ES and DCT Parametrisation of the F0 Contour for Speech Synthesis (bibtex)
by Adriana Stan, Florin-Claudiu Pop, Marcel Cremene, Mircea Giurgiu, Denis Pallez
Abstract:
Expressive speech is one of the latest concerns of text-to-speech systems. Due to the subjectivity of expression and emotion realisation in speech, humans cannot objectively determine if one system is more expressive than the other. Most of the text-to-speech systems have a rather flat intonation and do not provide the option of changing the output speech. We therefore present an interactive intonation optimisation method based on the pitch contour parameterisation and evolution strategies. The Discrete Cosine Transform (DCT) is applied to the phrase level pitch contour. Then, the genome is encoded as a vector that contains 7 most significant DCT coefficients. Based on this initial individual, new speech samples are obtained using an interactive Covariance Matrix Adaptation Evolution Strategy (CMA-ES) algorithm. We evaluate a series of parameters involved in the process, such as the initial standard deviation, population size, the dynamic expansion of the pitch over the generations and the naturalness and expressivity of the resulted individuals. The results have been evaluated on a Romanian parametric-based speech synthesiser and provide the guidelines for the setup of an interactive optimisation system, in which the users can subjectively select the individual which best suits their expectations with minimum amount of fatigue.
Reference:
Adriana Stan, Florin-Claudiu Pop, Marcel Cremene, Mircea Giurgiu, Denis Pallez, "Interactive Intonation Optimisation Using CMA-ES and DCT Parametrisation of the F0 Contour for Speech Synthesis", In Proceedings of the $5^th$ Workshop on Nature Inspired Cooperative Strategies for Optimisation, Springer, vol. 387, pp. 57-71, 2011.
Bibtex Entry:
@inproceedings{NICSO2011,
  author = {Adriana Stan and Florin-Claudiu Pop and Marcel Cremene and 
                    Mircea Giurgiu and Denis Pallez},
  title ={{Interactive Intonation Optimisation Using CMA-ES and DCT Parametrisation 
                    of the F0 Contour for Speech Synthesis}},
  year = 2011,
  abstract = {Expressive speech is one of the latest concerns of text-to-speech systems. 
              Due to the subjectivity of expression and emotion realisation in speech, 
              humans cannot objectively determine if one system is more expressive than 
              the other. Most of the text-to-speech systems have a rather flat intonation 
              and do not provide the option of changing the output speech. We therefore 
              present an interactive intonation optimisation method based on the pitch 
              contour parameterisation and evolution strategies. The Discrete Cosine 
              Transform (DCT) is applied to the phrase level pitch contour. Then, the 
              genome is encoded as a vector that contains 7 most significant DCT coefficients. 
              Based on this initial individual, new speech samples 
              are obtained using an interactive Covariance Matrix Adaptation Evolution 
              Strategy (CMA-ES) algorithm. We evaluate a series of parameters involved 
              in the process, such as the initial standard deviation, population size, 
              the dynamic expansion of the pitch over the generations and the naturalness 
              and expressivity of the resulted individuals. 
              The results have been evaluated on a Romanian parametric-based speech synthesiser 
              and provide the guidelines for the setup of an interactive optimisation system, 
              in which the users can subjectively select the individual which best suits their 
              expectations with minimum amount of fatigue.},
  booktitle = {Proceedings of the $5^{th}$ Workshop on Nature Inspired 
                    Cooperative Strategies for Optimisation},
  publisher = {Springer},
  pages = {57-71},
  
  volume = 387,
  series = {Studies in Computational Intelligence}
  }
Powered by bibtexbrowser