SINTERO Project Started

March, 2018 | by Admin

SINTERO are ca obiectiv general crearea unui sistem de sinteză text-vorbire în limba română ce permite modelarea și controlul prozodiei (intonația în vorbire) într-un mod apropiat de vorbirea naturală. Alături de acest obiectiv, se urmărește crearea a cât mai multor voci sintetizate în limba română (în acest proiect minim 10 voci), astfel încât acestea să poată fi utilizate de o comunitate extinsă, inclusiv în aplicații comerciale. WEBPAGE.

Swara Corpus Released

June, 2017 | by Admin

The SWARA Corpus is a result of the SWARA Project, funded by the Romanian Ministry of Education, under the grant agreement PN-II-PT-PCCA-2013-4 No 6/2014. The corpus contains over 21 hours of high quality recordings from 17 different speakers. The data is segmented in 19,279 utterances and includes their orthographic transcripts and semi-automatic phone-level alignments. WEBPAGE.

MaRePhor Lexicon Released

June, 2017 | by Admin

An Open Access Machine-Readable Phonetic Dictionary for Romanian: The dictionary consists of 72,375 words and 591,570 letters. The dictionary entries are words from the Romanian Scrabble Association's official list of words and the entries from a 15,517 words dictionary, developed according to the SpeechDat specifications. The phonetic transcriptions are in SAMPA format. WEBPAGE.

ALISA Tool Released

January, 2016 | by Admin

ALISA uses a two step approach for the task of aligning speech with imperfect transcripts: 1) sentence-level speech segmentation and 2) sentence-level speech and text alignment. Both processes are fully automated and require as little as 10 minutes of manually labelled speech: inter-sentence silence segments for the segmentation, and orthographic transcripts of these sentences for the aligner. The tool can be applied to any language with an alphabetic writing system and can align up to 75% of the original data with a sentence error rate of less then 8% and a word error rate of less than 1%. WEBPAGE.

Tundra Corpus Released

September, 2013 | by Admin

This is an ongoing project which aims at collecting an extended number of speech resources in multiple languages and to make them freely available for the speech processing community. The first version of the Tundra corpus is a collection of 14 audiobooks in 14 languages: Bulgarian, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Polish, Portuguese, Romanian, Russian and Spanish. The sources for the speech and text data of each audiobook are listed below. WEBPAGE.

New research positions and internships available

July 03, 2013 | by Admin

We have 3 new research positions open on the Simple4All project.

For more information please visit the OPPORTUNITIES PAGE.

Mara Corpus has been added

April 20, 2013 | by Admin

Mr. Mihai Nae from Cartea Sonora has kindly released a complete professional audiobook recording for use in speech processing research for Romanian. You can download it from here: http://speech.utcluj.ro/corpora/mara.html

Simple4All project review meeting in Luxembourg

January 24, 2013 | by Admin

On 25th January 2013, Simple4All is having its first review meeting in Luxembourg.

Simple4All end of year 1 meeting in Helsinki

November 14, 2012 | by Admin
image 3

End of year 1 meeting of Simple4All is taking place in Helsinki - 15-16 November 2012.

Congratulations to dr. Mihai Ordean

November 2, 2012 | by Admin
image 3

Mihai Ordean has had a succesuful public defense of his PhD Thesis entitled Secure Authentication using One-Time Visual Passwords.