About the project

SWARA (Mobile System for Rehabilitative Vocal Assistance of Surgical Aphonia) will assist the patients in their daily communication with the family, doctor, or while shopping, using speech synthesis technology. The solution will be delivered as an interactive application that will offer the following facilities:

  • a personalized speech synthesis system which uses the patient's voice or a similar one created from a speech database with many speakers, in the case where the voice can no longer be recorded;
  • a fast text input service based on adaptive text prediction, so that there is no unnecessary delay in the natural dialogue;
  • an application accessible from mobile devices in order to get access to the web-based speech synthesis service.

MOTIVATION

With the advances in voice conservative surgeries and radiotherapy techniques, most patients with cancer of the larynx can be cured. However, for those who do not respond or present with recurrent or advanced disease, total laryngectomy is the only curative approach that can be offered. The prognosis of laryngectomized patients has remained relatively favourable over the years, with five-year survival rates of 65 - 75%.

Voice is an important component of human identity. When people lose their voices, although there are devices which enable them to use speech again, these devices have a limited number of "identities" and this can lead to a negative psychological impact upon that person. Also, when faced with mutilating surgery, especially in cases where there's a trade-off between the ability to speak, and the maximum cure of disease, people tend to balance towards the use of their own personal voice.

Immediately after a laryngectomy, patients are unable to speak. Most patients find this extremely distressing. The consequences of such a massive communicational disorder are, in general, fear, depression, hopelessness and passivity. From a psychological point of view one can expect that regaining vocal communication quickly, generally facilitates the social and psychological rehabilitation of laryngectomized patients. However, one should pay attention to the fact that many laryngectomized patients, while learning to speak with a voice prosthesis, suffer great emotional distress. At this point, the patients realize that normal speech articulation cannot be attained and that their new voice attracts social attention. Many of the social responses that are experienced by the patients are ambivalent or negative. Thus laryngectomized patients often experience communication failures and open or covert rejection (e.g. early termination of the conversation, interruption by others). As a consequence of the subjective experience of the noticeable difference in their new voice, patients tend to depreciate themselves in terms of stigmatization. Frequently this results in social withdrawal and isolation. Restoring a patientís ability to communicate in all of the daily activities is therefore an essential goal in the patientís complete physical and mental health restoration.

OBJECTIVES

OBJECTIVE 1: PERSONALISED SPEECH SYNTHESIS SYSTEM

Speech synthesis systems are widely used as assistive technology for speech impaired persons. But these systems have the disadvantage of offering only a limited number of synthetic voices, fact which can make the patient perceive the system as being less representative for oneself. We will build upon current technology in order to enable the use of personalised synthetic voices for text-to-speech synthesis systems. This will be achieved by employing state-of-the-art speaker adaptation methods in order to create synthetic voices.

OBJECTIVE 2: FAST AND EFFICIENT TYPING

If the average word per minute when speaking is around 150 words per minute when typing this value drops to nearly 10 -15 words per minute. It is therefore important to develop means of text inputting which can make typing nearly as fast as the average spoken rate. The project will develop context aware text prediction methods for the Romanian language, with a special focus on the speech impaired personís needs and use-case scenarios. We will also integrate a secondary, alternative text input method based on silent speech interfaces.

OBJECTIVE 3: PORTABLE TECHNOLOGY

With the advancements in mobile hardware industry, smart phones and tablets posses the computational resources which allow numerous applications to be ported onto them. SWARA will aim at providing the personalised smart interface speech synthesis system accessible as an Internet service from a mobile application. This will include optimising both the algorithms, as well as the code for the entire system.

CHALLENGES

There are a number of challenges that need to be addressed. Most of these require substantial innovation and forward thinking. We underline the original key aspects of our proposal in the following list:

  • The system as a whole -- There are no assistive systems available which incorporate all the modules presented in this project. Even more so for the Romanian language.
  • Audio-Video Databank -- This will be a valuable resource of this type. Its functionality and application will not be limited to the scope of the present project, but will aim at maintaining a generic design so that future research can be carried out using it.

  • Predictive text input -- Aside from the specificity of Romanian language tuning, this task will also take into account the typing specifics of speech impaired persons (e.g. vowel elision).
  • Speaker adaptation -- Although there are a number of adaptation techniques on the market, we will try to improve their speaker similarity and speed of adaptation, by proposing alternative or complementary techniques.
  • Application design -- The originality of the approach consists in the fact that the application will have from the beginning an architecture for the todayís computer landscape and will leverage existing open standards and protocols to adapt to future changes in technologies and devices. We also aim at providing same user experience across devices and platforms.
  • Psychological and social impact -- All aspects of the application will be evaluated by a highly specialised team of psychologists and MDs: from patientís needs and requirements, to the social impact of synthetic voices seconded by gestures and human presence, and the improved quality of life as a result of using the system.

ACTIVITIES

The activities are assigned to the main tasks of the project and designed to ensure an effective parallel development of the final system, in terms of RTD and integration, as well as dissemination and management. We will validate the RTD work at various stages, via the milestones and quality control of the deliverables. The figure bellow presents the work package interlinks, and it is followed by brief activities description.

  1. High Quality Romanian Text-to-Speech Synthesis System

    The activity represents the starting point of the speaker adaptive speech synthesis engine. Its main objective is to deliver a high-quality Romanian parametric text-to-speech synthesis system. The results will constitute the input for the speaker adaptation work package and it will also be used in order to test the relative naturalness of the synthetic voices.

  2. Evaluation of Synthetic Voices in Social Interaction

    The effects of synthetic voices used as an alternative means of communications have not yet been thoroughly studied. We therefore integrate this work package as a means of controlling the requirements and effect of the developed system and methods.

  3. Audio and Video Databank

    The aim is to collect large amounts of speech and video data from multiple individuals over the entire duration of the project in order to be used for the speaker adaptation methods. The resulting resources will not only be useful for the project, but also for future developments, for example for automatic speech recognition in Romanian, bimodal speech recognition, advanced lip reading methods, surface articulatory movements etc.

  4. Smart Assistive Text Prediction

    We integrate in this activity two main essential tasks for a smart text input interface: text prediction and baseline speech reading. The text prediction task will take into account a series of pre-defined scenarios, such as domestic needs, or going to the market, for which the dictionary is limited, and can be accurately predicted at higher than word level.

  5. Adaptive Speech Synthesis

    Speaker adaptation represents one of the most innovative techniques in the field of speech synthesis systems. The speech synthesis is highly personalised to the userís desired voice characteristics, or even oneís personal voice. We will also develop new methods for fast adaptation and better speaker similarity.

  6. Speech Synthesis Web-Services

    The main objective is to develop web-based services for speech synthesis accesible from mobile devices. It will also need to design the user interface and adhere to the user's expectations and requirements. This will provide to the user an interactive assistive techology.