pixel pixel pixel
arrow      Home arrow Open positions arrow Job openings arrow PhD student position

Membership application form
Statute of the OpenInterface Association
Job Offers
- - - - - - -
Journal on Multimodal User Interfaces
OI Platform Repository
- - - - - - -
Login Form


Remember me
Password Reminder
No account yet? Create one


Who's Online
We have 3 guests online

PhD student position Print E-mail
May 31, 2007 at 05:49 PM
Subject: Multimodal Fusion for mobile terminals and interactive contents

Laboratory: France Télécom Orange Labs, Technologies Division in Lannion France
Period: Autumn 2007- autumn 2010.
CIFRE Thesis. Contract: Contract of 36 months (Contrat à Durée Déterminée-Formation par la Recherche)
Take home monthly wages around1680 € net
Supervisors at France Telecom R&D: Jean Emmanuel VIALLET (TECH/IRIS) and Eric PETIT (TECH/IDEA)
University supervisor: to be determined

Thesis subject
Within the framework of interaction with contents, a telecommunication provider can rely on a triptyque involving interactive contents it broadcast, communications means with an end user (home hub, network) and a communication interactive terminal such as a mobile phone. The trityque is associated to processing unit associated to the terminal, a local PC or through the network.
Mobile phones have different sensors allowing an interaction with contents either sent to the mobile, or to a PC/TV. Some mobile phones have automatic speech recognition. Others behave as IR remote control or include accelerometers for gesture recognition (as the Samsung s310). Image analysis techniques allow to control a cursor associated to the displacement of the intern camera [Ballaga][Wang], or to recognize visual tags such as overlaid spot codes [Toye] to interact with an image. Sensitive surfaces can be introduces such as mini touchpads or capacitive surfaces allowing 2D if not 3D contact-less interaction [Wimmer]. Phones have screens, loudspeakers and buzzers to provide a feedback on the interaction. Bluetooth or Wifi allows for bidirectional communication with a fix terminal (PC, home hub).
These different functionalities authorize an interaction richer than those obtained with keyboard interfaces (remote control or vocal services through DTMF), to recover standard functionalities of WIMP interfaces (such as pointing and selection) and to reach novel functionalities by combining different modalities, beyond the usual voice and pen multimodal interaction [adaptx, kirusa, Oviatt], for more natural interaction better suited for interactive contents, by one or more users [Arthur, Carbini].
Multimodal fusion must take into account the characteristics of the signals (acquisition rate, duration, temporal overlap and the confidence of the recognition results), the application context and its impact on the interpretation task but also the adequation of the fusion technique with regard to the task. From a formal point of view, the combination of such heterogeneous data from different sources shall be investigated in order to obtain the most relevant information. A theoretical framework (Bayes classifier, fuzzy logic, belief theory,) shall be chosen taking into account the nature of data (numerical, symbolic). Data fusion can also, , depending on the task, involve a classification process.
The goal of this thesis is to propose a multimodal fusion allowing a natural, intuitive and reactive interaction with interactive contents. The work will consist in determining an adequate multimodal combination, adequate from the point of view of the monomodal technologies available on a mobile terminal, of the processing power available and from a user point of view determined by evaluations. A continuous process such as pointing, obtained with a computer vision, capacitive or accelerometer technique, shall be associated to discrete, simple and short events (for example for selection) and to discrete and symbolic events (gesture or speech). Feedbacks through different channels, on the mobile terminal and end user screen shall also be investigated as well as problems regarding sharing processing load and communications between local and remote units.

This thesis will take place at the Orange Labs FTR&D/TECH/IRIS/VIA located in Lannion, in France.
FTR&D/TECH/IRIS/VIA lab has developed during S. Carbini thesis (2003-2006) a speech and gesture multimodal interface. This interface associates pointing and selection gestures with computer vision techniques and automatic speech recognition.
This thesis work will be done in collaboration of FTR&D/TECH/IDEA lab, located in Grenoble, France. This
Laboratory has developed the SYMBAL-LATIN technology of recognition of symbols and graphical gestures and has an experience on processing on mobile terminal.

He or she will hold a Masters degree in Engineering, Computer Science or a related field, with thorough knowledges in signal processing and proven skills in developing applications on mobile devices.
A fair knowledge of French would be useful in a French-speaking laboratory.
Applicants shall sent a full CV, a letter of motivation, grades obtained for the BSc and MSc, and when applicable, previous technical reports and references of supervisors

Jean Emmanuel VIALLET
France Télécom R&D TECH-IRIS-VIA, 2, Avenue Pierre Marzin - BP40, 22307 Lannion Cedex - France
Mail: ,
Web site: http://perso.rd.francetelecom.fr/VIALLET

France Télécom R&D TECH-TECH-IDEA-TIPS, 28 Chemin du Vieux Chêne BP 98, 38243  Meylan - France

  • http://www.adapx.com/research/default.aspx
  • A. M. Arthur, R. Lunsford, M. Wesson, S. Oviatt, Prototyping novel collaborative multimodal systems: simulation, data collection and analysis tools for the next decade, Proceedings of the 8th international conference on Multimodal interfaces table of contents, Pages: 209 – 216, Banff, Alberta, Canada, 2006
  • R Ballaga, M Rohs, J G Sheridan, Sweep and point and shoot: phonecam-based interactions for large public displays. In CHI '05 Extended Abstracts on Human Factors in Computing Systems
  • S. Carbini, J.E. Viallet , O. Bernier, B. Bascle, "Tracking Body Parts of Multiple Persons for Multi-Person Multimodal Interface", IEEE International ICCV Workshop on Human-Computer Interaction, p. 16-25, Beijing, China, October 21, 2005
  • http://www.kirusa.com/multimodality.html
  • S. Oviatt, R. Coulston, R. Lunsford. When Do We Interact Multimodally? Cognitive Load and Multimodal Communication Patterns. In Proceedings of the Sixth International Conference on Multimodal Interfaces (ICMI 2004), State College, Pennsylvania, USA, October 14-15, 2004.
  • E Toye, R Sharp, A Madhavapeddy, D Scott, E Upton and Alan Blackwell, Interacting with Mobile Services: An Evaluation of Camera-Phones and Visual Tags, In Personal and Ubiquitous Computing Journal, February 2006
  • J Wang, S Zhai, J Canny, Camera Phone Based Motion Sensing: Interaction Techniques, Applications and Performance Study, In ACM UIST 2006, Montreux, Switzerland, October 15-18, 2006.
  • R. Wimmer, M. Kranz, S. Boring, A. Schmidt  A Capacitive Sensing Toolkit for Pervasive Activity Detection and Recognition In Proceedings of the Fifth Annual IEEE Conference on Pervasive Computing and Communications (PerCom), New York, NY, USA, Mar. 2007

User Comments
Please login or register to add comments

go to top Go To Top go to top
powered by mambo OS
pixel pixel