The SPEECH PROCESSING GROUP is part of the

Communications Department within the
Technical University of Cluj-Napoca


We do research in all areas related to speech technology: speech synthesis and recognition, speech coding, speech pre- and post-processing, emotion recognition from speech, unsupervised speech segmentation and analysis.
Other interests relate to natural language processing and multimedia analysis.

[Check out a poster showcasing our research interests]

About us...

We are a team made up of both experienced and young researchers who come together in a friendly academic environment.

OPPORTUNITY - Get a PhD!

We are constantly looking for witty, interested people to join our group. Doctoral Studies, full-time or part-time are offered by our faculty in the domain of Electronics and Telecommunications Engineering. Funding can be obtained either from the Ministry of National Education, or from other research projects.

To do a PhD in Speech Processing, get in touch with us!

OPPORTUNITY - Get a Masters Degree!

The Technical University of Cluj-Napoca has several Masters programs that might interest you. More info here: Postgraduate studies.

To do a Masters in Electronics and Telecommunications, you can get more info here: Master Studies-ETTI*.

The Multimedia Technologies Masters includes two speech processing related topics: 1) Speech Coding Techniques and 2) Speech Analysis, Synthesis and Recognition.

*So far, all the current Masters programs are taught in Romanian

NEWS!


SINTERO Project Started (March, 2018)

SINTERO are ca obiectiv general crearea unui sistem de sinteză text-vorbire în limba română ce permite modelarea și controlul prozodiei (intonația în vorbire) într-un mod apropiat de vorbirea naturală. Alături de acest obiectiv, se urmărește crearea a cât mai multor voci sintetizate în limba română (în acest proiect minim 10 voci), astfel încât acestea să poată fi utilizate de o comunitate extinsă, inclusiv în aplicații comerciale. WEBPAGE.

SWARA Corpus Released (June, 2017)

The SWARA Corpus is a result of the SWARA Project, funded by the Romanian Ministry of Education, under the grant agreement PN-II-PT-PCCA-2013-4 No 6/2014. The corpus contains over 21 hours of high quality recordings from 17 different speakers. The data is segmented in 19,279 utterances and includes their orthographic transcripts and semi-automatic phone-level alignments. WEBPAGE.

MaRePhor Lexicon Released (June, 2017)

An Open Access Machine-Readable Phonetic Dictionary for Romanian: The dictionary consists of 72,375 words and 591,570 letters. The dictionary entries are words from the Romanian Scrabble Association's official list of words and the entries from a 15,517 words dictionary, developed according to the SpeechDat specifications. The phonetic transcriptions are in SAMPA format WEBPAGE.

ALISA Tool Released (June, 2017)

ALISA uses a two step approach for the task of aligning speech with imperfect transcripts: 1) sentence-level speech segmentation and 2) sentence-level speech and text alignment. Both processes are fully automated and require as little as 10 minutes of manually labelled speech: inter-sentence silence segments for the segmentation, and orthographic transcripts of these sentences for the aligner. The tool can be applied to any language with an alphabetic writing system and can align up to 75% of the original data with a sentence error rate of less then 8% and a word error rate of less than 1%. WEBPAGE.

MARA Corpus Released (April, 2013)

Mr. Mihai Nae from Cartea Sonora has kindly released a complete professional audiobook recording for use in speech processing research for Romanian. You can download it from here: WEBPAGE.

Congratulations to dr. Mihai ORDEAN (November, 2012)

Mihai Ordean had a successful public defense of his PhD Thesis entitled Secure Authentication using One-Time Visual Passwords. WEBPAGE.

THE TEAM


prof. Mircea GIURGIU, PhD

Head of the group

[personal webpage]

Adriana STAN, PhD

[personal webpage]

Beáta LŐRINCZ

[personal webpage]

Maria NUȚU

[personal webpage]


Collaborators


prof. Aurel VLAICU, PhD

Professor, Communications Department, TUC-N

prof. Rodica POTOLEA, PhD

Professor, Computer Science Department, TUC-N

Bogdan ORZA, PhD

Associate Professor, Communications Department, TUC-N

Mihaela DÎNȘOREANU, PhD

Associate Professor, Computer Science Department, TUC-N

Magdalena CHIRILĂ, PhD

Associate Professor - "Iuliu Hațieganu" University of Medicine and Pharmacy

Camelia LEMNARU, PhD

Associate Professor, Computer Science Department, TUC-N

Șerban MEZA, PhD

Assistant Professor, Communications Department, TUC-N


Past Collaborators


dr. Cristian CONȚAN

Research Fellow, PhD, Simple4All Project

dr. József DOMOKOS

Research Fellow, PhD, Simple4All Project

Andrei HOMODI

Research Assistant, Simple4All Project

Dalia POPESCU

Intern, Simple4All Project (Sep 2013 - Feb 2014)

Ioana MUREȘAN

Research Assistant, Simple4All Project (2012 - 2013)

Andrei BĂRBOS

Research Assistant, Simple4All Project (2012 - 2013)

dr. Mihai ORDEAN

PhD student (2009-2012) and Network admin

Projects

Sintero
SINTERO (2018-2020)
SWARA (2014-2017)
Simple4ALL (2011-2014)
Sound2Sense (2007-2011)

Publications

2020
Beáta Lőrincz, Maria Nutu, Adriana Stan, Mircea Giurgiu "An Evaluation of Postfiltering for Deep Learning Based Speech Synthesis with Limited Data", IEEE 10th International Conference on Intelligent Systems (IS), Bulgaria, 2020 [pdf]
Beáta Lőrincz, "Concurrent phonetic transcription, lexical stress assignment and syllabification with deep neural networks", Proceedings of the 24th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems KES2020, 2020 [pdf]
Adriana Stan, "RECOApy: Data Recording, Pre-Processing and Phonetic Transcription for End-to-End Speech-Based Applications", In Proceedings of the Interspeech, Shanghai, China, 2020 [pdf]
Kristen M Scott, Simone Ashby, Adriana Stan, "Designing a Synthesized Content Feed System for Community Radio", Proceedings of the 11th Nordic Conference on Human-Computer Interaction: Shaping Experiences, Shaping Society, Estonia, 2020 [pdf]
2019
Adriana Stan, "Input Encoding for Sequence-to-Sequence Learning of Romanian Grapheme-to-Phoneme Conversion", In Proceedings of the 10th IEEE International Conference on Speech Technology and Human-Computer Dialogue (SpeD), Timisoara, Romania, 2019. [bib] [pdf]
Beata Lorincz, Maria Nutu, Adriana Stan, "Romanian Part of Speech Tagging using LSTM Networks", In Proceedings of the IEEE 15th International Conference on Intelligent Computer Communication and Processing, Cluj-Napoca, Romania, 2019. [bib] [pdf]
Maria Nutu, Beata Lorincz, Adriana Stan,"Deep Learning for Automatic Diacritics Restoration in Romanian", In Proceedings of the IEEE 15th International Conference on Intelligent Computer Communication and Processing, Cluj-Napoca, Romania, 2019. [bib] [pdf]
David A. Braude, Matthew P. Aylett, Caoimhin Laoide-Kemp, Simone Ashby, Kristen M. Scott, Brian O Raghallaigh, Anna Braudo, Alex Brouwer, Adriana Stan,"All Together Now: The Living Audio Dataset", In Proceedings of Interspeech, Graz, Austria, 2019. [bib] [pdf]
2018
Adriana Stan, Mircea Giurgiu, "A Comparison Between Traditional Machine Learning Approaches And Deep Neural Networks For Text Processing In Romanian", In Proceedings of the 13th International Conference on Linguistic Resources and Tools for Processing Romanian Language (ConsILR), Jassy, Romania, 2018. [bib] [pdf]
2017
Adriana Stan, Florina Dinescu, Cristina Tiple, Serban Meza, Bogdan Orza, Magdalena Chirila, Mircea Giurgiu, "The SWARA Speech Corpus: A Large Parallel Romanian Read Speech Dataset", In Proceedings of the 9th Conference on Speech Technology and Human-Computer Dialogue (SpeD), Bucharest, Romania, 2017. [bib] [pdf]
Stefan-Adrian Toma, Adriana Stan, Mihai-Lica Pura, Traian Barsan, "MaRePhoR - An Open Access Machine-Readable Phonetic Dictionary for Romanian", In Proceedings of the 9th Conference on Speech Technology and Human-Computer Dialogue (SpeD), Bucharest, Romania, 2017. [bib] [pdf]
2016
Adriana Stan, Cassia Valentini-Botinhao, Bogdan Orza, Mircea Giurgiu, "Blind Speech Segmentation using Spectrogram-image Based Features and Mel Cepstral Coefficients", In Proc. IEEE Workshop on Spoken Language Technology, San Diego, USA, 2016. [bib] [pdf]
Alexandru Moldovan, Adriana Stan, Mircea Giurgiu, "Improving Sentence-level Alignment of Speech with Imperfect Transcripts using Utterance Concatenation and VAD", In Proc. of IEEE ICCP, Cluj-Napoca, Romania, 2016. [bib] [pdf]
Adriana Stan, Yoshitaka Mamiya, Junichi Yamagishi, Peter Bell, Oliver Watts, Rob Clark, Simon King, "ALISA: An automatic lightly supervised speech segmentation and alignment tool", In Computer Speech and Language, vol. 35, pp. 116-133, 2016. [bib] [pdf] [doi]
2015
Adriana Stan, Cassia Valentini-Botinhao, Mircea Giurgiu, Simon King, "Phonetic Segmentation of Speech using STEP and t-SNE", In Proc. of the 8th International Conference on Speech Technology and Human-Computer Dialogue (SpeD), Bucuresti, Romania, 2015. [bib] [pdf]
2014
Jószef Domokos, Adriana Stan, Mircea Giurgiu, "An Approach to Lexical Stress Detection from Transcribed Continuous Speech Using Acoustic Features", In Proc. 22nd Telecommunications Forum, Belgrade, Serbia, 2014. [bib] [pdf]
Dhananjaya Gowda, Heikki Kallasjoki, Reima Karhila, Cristian Contan, Jalle Palomaki, Mircea Giurgiu, Mikko Kurimo, "On the Role of Missing Data Imputation and NMF Feature Enhancement in Building Synthetic Voices Using Reverberant Speech", In Proc. Interspeech, Singapore, 2014. [bib]
O. Watts, S. Gangireddy, J. Yamagishi, S. King, S. Renals, A. Stan, M. Giurgiu, "Neural Net Word Representations for Phrase-Break Prediction Without a Part of Speech Tagger", In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence, Italy, 2014. [bib] [pdf]
Tiberiu Boroș, Adriana Stan, Oliver Watts, Stefan Daniel Dumitrescu, "RSS-TOBI - A Prosodically Enhanced Romanian Speech Corpus", In Proc. The 9th edition of the Language Resources and Evaluation Conference, Reykjavik, Iceland, 2014. [bib] [pdf]
2013
Adriana Stan, Peter Bell, Junichi Yamagishi, Simon King, "Lightly Supervised Discriminative Training of Grapheme Models for Improved Sentence-level Alignment of Speech and Text Data", In Proc. Interspeech, 2013. [bib]
Y. Mamiya, A. Stan, J. Yamagishi, P. Bell, O. Watts, R.A.J. Clark, S. King, "Using Adaptation to Improve Speech Transcription Alignment in Noisy and Reverberant Environments", In Proc. SSW8, 2013. [bib]
O. Watts, A. Stan, R. Clark, Y. Mamiya, M. Giurgiu, J. Yamagishi, S. King, "Unsupervised and lightly-supervised learning for rapid construction of TTS systems in multiple languages from ‘found’ data: evaluation and analysis", In Proc. SSW8, 2013. [bib]
O. Watts, A. Stan, Y. Mamiya, A. Suni, M. Burgos, J.M. Montero, "The Simple4All entry to the Blizzard Challenge 2013", In Proc. Blizzard Challenge 2013, 2013. [bib]
A. Stan, O. Watts, Y. Mamiya, M. Giurgiu, R. A. J. Clark, J. Yamagishi, S. King, "TUNDRA: A Multilingual Corpus of Found Data for TTS Research Created with Light Supervision", In Proc. Interspeech, 2013. [bib]
Yoshitaka Mamiya, Junichi Yamagishi, Oliver Watts, Robert A.J. Clark, Simon King, Adriana Stan, "Lightly Supervised GMM VAD to use Audiobook for Speech Synthesiser", In Proc. ICASSP, 2013. [bib]
Ioana Muresan, Adriana Stan, Mircea Giurgiu, Rodica Potolea, "Evaluation of Sentiment Polarity Prediction using a Dimensional and a Categorical Approach", In Proc. SPED, 2013. [bib]
2012
Adriana Stan, Peter Bell, Simon King, "A Grapheme-based Method for Automatic Alignment of Speech and Text Data", In Proc. IEEE Workshop on Spoken Language Technology, Miami, Florida, USA, 2012. [bib]
M. Giurgiu, A. Kabir, "Automatic transcription and speech recognition of Romanian corpus RO-GRID", In Telecommunications and Signal Processing (TSP), 2012 35th Intl Conf on, pp. 465 -468, 2012. [bib] [doi]
M. Ordean, M. Giurgiu, "Towards securing client-server connections against man-in-the-middle attacks", In Electronics and Telecommunications (ISETC), 2012 10th International Symposium on, pp. 127 -130, 2012. [bib] [doi]
2011
M. Giurgiu, A. Kabir, "Improving automatic speech recognition in noise by energy normalization and signal resynthesis", In Intelligent Computer Communication and Processing (ICCP), 2011 IEEE Intl Conference on, pp. 311 -314, 2011. [bib] [doi]
M. Giurgiu, A. Kabir, "Comparison of Vocal Tract Length Normalization technique applied for clean and noisy speech", In Telecommunications and Signal Processing (TSP), 2011 34th Intl Conference on, pp. 351 -354, 2011. [bib] [doi]
Z.I. Kiss, Z.A. Polgar, M. Giurgiu, V. Dobrota, "Resource efficient network coding based congestion control for streaming applications", In Telecommunications and Signal Processing (TSP), 2011 34th Intl Conference on, pp. 85 -90, 2011. [bib] [doi]
Adriana STAN, "Romanian HMM-based Text-to-Speech Synthesis with Interactive Intonation Optimisation", PhD thesis, Technical University of Cluj-Napoca, 2011. [bib]
Adriana Stan, Junichi Yamagishi, Simon King, Matthew Aylett, "The Romanian speech synthesis (RSS) corpus: Building a high quality HMM-based speech synthesis system using a high sampling rate", In Speech Communication, vol. 53, no. 3, pp. 442-450, 2011. [bib] [pdf] [doi]
Adriana Stan, Mircea Giurgiu, "A Superpositional Model Applied to F0 Parametrisation using DCT for Text-to-Speech Synthesis", In Proceedings of the $6^th$ Conference on Speech Technology and Human-Computer Dialogue, Brasov, Romania, 2011. [bib]
Adriana Stan, Florin-Claudiu Pop, Marcel Cremene, Mircea Giurgiu, Denis Pallez, "Interactive Intonation Optimisation Using CMA-ES and DCT Parametrisation of the F0 Contour for Speech Synthesis", In Proceedings of the $5^th$ Workshop on Nature Inspired Cooperative Strategies for Optimisation, Springer, vol. 387, pp. 57-71, 2011. [bib]
2010
A. Kabir, M. Giurgiu, J. Barker, "Robust automatic transcription of English speech corpora", In Communications (COMM), 2010 8th International Conference on, pp. 79 -82, 2010. [bib] [doi]
C.F.M. Veja, G. Hagedorn, G. Weber, M. Giurgiu, "Metadata repository management using the MediaWiki interoperability framework a case study: The KeyToNature project", In eChallenges, 2010, pp. 1 -9, 2010. [bib]
C. Veja, M. Giurgiu, G. Hagedorn, G. Weber, "Semantic MediaWiki interoperability framework from a semantic social software perspective", In , pp. 403 -406, 2010. [bib] [doi]
C. Veja, M. Giurgiu, G. Weber, G. Hagedorn, "MediaWiki interoperability framework for multimedia digital resources", In Intelligent Computer Communication and Processing (ICCP), 2010 IEEE International Conference on, pp. 329 -335, 2010. [bib] [doi]
M. Ordean, M. Giurgiu, "Implementation of a security layer for the SSL/TLS protocol", In Electronics and Telecommunications (ISETC), 2010 9th International Symposium on, pp. 209 -212, 2010. [bib] [doi]
Adriana Stan, Mircea Giurgiu, "Romanian language statistics and resources for text-to-speech systems", In Proceedings of the $9^th$ Edition of the International Symposium on Electronics and Telecommunications, Timisoara, Romania, 2010. [bib]
2009
Adriana Stan, "Linear Interpolation of Spectrotemporal Excitation Pattern Representations for Automatic Speech Recognition in the Presence of Noise", In Proceedings of the 5th Conference on Speech Technology and Human- Computer Dialogue, Constanta, Romania, 2009. [bib]

Tools, Data & Demos

Tools & Data

ALISA
A Lightly Supervised Speech Segmentation and Alignment Tool

The SWARA Corpus
A Large Parallel Romanian Read Speech Dataset

The MaRePhor Lexicon
An Open Access Machine-Readable Phonetic Dictionary for Romanian

The TUNDRA Corpus
A Multilingual Corpus of Found Data for TTS Research Created with Light Supervision

The MARA Corpus
A Complete Romanian Profesionally Read Audiobook

The Romanian Speech Synthesis (RSS) Database

Demos

RomanianTTS
an online demo of 17 voices in Romanian

Expressive Romanian TTS
samples of style adaptation

Contact

Speech Processing Group

26-28 George Barițiu Street, room S2.3

400027, Cluj-Napoca, România

+40-264-202452

http://speech.utcluj.ro