PhD Position F/M Immersive Sound and Virtual Acoustics on Distributed FPGAs

Le descriptif de l’offre ci-dessous est en Anglais

Type de contrat : CDD

Niveau de diplôme exigé : Bac + 5 ou équivalent

Fonction : Doctorant

A propos du centre ou de la direction fonctionnelle

 

The Inria research centre in Lyon is the 9th Inria research centre, formally created in January 2022.  It brings together approximately 320 people in 19 research teams and research support services.

Its staff are distributed in Villeurbanne, Lyon Gerland, and Saint-Etienne.

The Lyon centre is active in the fields of software, distributed and high-performance computing, embedded systems, quantum computing and privacy in the digital world, but also in digital health and computational biology.

Contexte et atouts du poste

General Description

The main objectives of this PhD are to:

  • Design a distributed FPGA-based spatial audio system providing unparalleled computational capabilities and perfect synchronicity between all the speakers.
  • Adapt and optimize spatial audio, virtual acoustics, and soundscape rendering algorithms to FPGAs to fully take advantage of their computational power.
  • Explore the use of the Faust programming language (https://faust.grame.fr) to facilitate the deployment of this kind of applications on a distributed FPGA-based system, automatically finding the best implementation.

The system that will be developed as part of this PhD will be evaluated in the context of a broad range of “real-world” scenarios. The first application that will be considered is virtual reality with real-time and interactive soundscape rendering. One concrete study case for this immersive VR system will be the Chauvet cave in the south of France through a collaboration with Chauvet’s scientific team. Another application that will be considered is virtual acoustics in the context of live performances. For that, we will have to investigate the possibility to deploy the FPGA-based distributed immersive sound system at a larger scale in a concert hall. GRAME-CNCM in Lyon will act as a collaborator in this context.

Context

The successful candidate will join the Emeraude INRIA/INSA-Lyon research team which is physically based at the CITI Lab of INSA-Lyon (Villeurbanne, France). This PhD will be conducted under the supervision of Romain Michon (Inria) and Pierre Lecomte (École Centrale de Lyon). The Emeraude team gathers the strengths of Inria, INSA Lyon, and GRAME-CNCM. It specializes on embedded audio systems and their programming as well as arithmetic. It develops various tools such as the Faust programming language (a DSL for Real-Time audio DSP), Syfala (a tool to facilitate the programming of FPGAs for real-time audio DSP), and FloPoCo (a generator of arithmetic cores for FPGAs). The team hosts 6 faculty, 6 PhD students, 3 postdocs, 2 engineer, and multiple research interns. Additional information can be found on team website: https://team.inria.fr/emeraude.

Mission confiée

Full PhD Topic

Immersive 3d audio has been receiving booming interest in recent years [1]. An increasingly high number of movie theaters, concert halls, Virtual Reality (VR) platforms in museums, attractions in amusement parks, cars, etc. are equipped with advanced immersive sound systems involving a large number of speakers that can be used to render soundscapes in 3d, modify the acoustics of a room, enhance sound quality and speech intelligibility, carry out active noise cancellation, etc. The need for advanced immersive sound systems recently culminated with the opening of the “Las Vegas Sphere,” which hosts the largest and densest speaker array in the world involving a total of 167,000 amplified speakers [2].

Two kinds of configurations can be considered when dealing with immersive 3d audio: that only involving “playback” (i.e., the pre-recorded sound track of a movie) and that implying real-time computations and potentially reconfiguring the system for different applications (i.e., virtual acoustics, soundscape rendering in VR, noise canceling, speaker correction, sound quality enhancement, etc.). A broad range of unresolved scientific and technical challenges are associated to that second scenario on which this PhD will therefore focus.

Managing a large number of individual audio channels in real-time requires a tremendous amount of computational power and incredibly high bandwidths, preventing systems involving a lot of speakers to be managed “collectively.” For example, in the case of the Las Vegas Sphere, speakers are organized in groups of acoustic panels which all receive the same source signal and that implement spatial audio algorithms “locally.”

The norm for managing real-time immersive 3d audio systems [3, 4] is to rely on a centralized software-based approach: a single powerful computer connected to one or multiple audio interfaces providing a limited number of audio outputs. Thus, the computer’s throughput determines the maximum number of manageable parallel digital audio streams (and applied computations), creating a hardware-related bottleneck.

Contrary to this norm, we would like to rely on a distributed computing approach leveraging the computational power of large Field-Programmable Gate Array (FPGA). In this approach, each FPGA is in charge of computing the sound of a limited set of speakers. This will allow for a very large number of speakers to be simultaneously managed at a very low cost with unparalleled latency performances.

In order to reach this goal, multiple challenges ranging from transmitting audio streams to many audio devices [5] with close to perfect synchronicity to running complex audio Digital Signal Processing (DSP) algorithms on FPGAs [6] must be tackled. For that, the work carried out within the Emeraude team on this topic during the past five years will be leveraged [7, 8, 9]. We plan to heavily rely on High-Level Synthesis (HLS) for programming FPGAs, with the Syfala tool in particular [10]. Allowing an unparalleled number of speakers to be collectively managed in the context of immersive 3d audio will open the door to the deployment of a broad range of new immersive audio techniques.

The main objectives of this PhD are to:

  • Design a distributed FPGA-based spatial audio system providing unparalleled computational capabilities and perfect synchronicity between all the speakers.
  • Adapt and optimize spatial audio, virtual acoustics, and soundscape rendering algorithms to FPGAs to fully take advantage of their computational power.
  • Explore the use of the Faust programming language [11] to facilitate the deployment of this kind of applications on a distributed FPGA-based system, automatically finding the best implementation.

The system that will be developed as part of this PhD will be evaluated in the context of a broad range of “real-world” scenarios. The first application that will be considered is virtual reality with real-time and interactive soundscape rendering. One concrete study case for this immersive VR system will be the Chauvet cave in the south of France through a collaboration with Chauvet’s scientific team. Another application that will be considered is virtual acoustics in the context of live performances. For that, we will have to investigate the possibility to deploy the FPGA-based distributed immersive sound system at a larger scale in a concert hall. GRAME - Centre National de Création Musicale in Lyon will act as a collaborator in this context.

References

[1] Charles de Laubier. Le son immersif, nouveau d´efi de l’industrie musicale pour s´eduire spectateurs et auditeurs. Le Monde, May 2024.
[2] Las vegas sphere sound system description. https://www.sphereentertainmentco.com.
[3] Michael Gerzon. Multi-system ambisonic. Wireless World, 83:43–47, 1977.
[4] Lauri Savioja and Peter Svensson. Overview of geometrical room acoustic modeling techniques. The Journal of the Acoustical Society of America, 138(2):708–730, 2015.
[5] Pierre Cochard, Jurek Weber, Romain Michon, Tanguy Risset, and Stéphane Letz. Ethernet real-time audio transmission to FPGA. In Proceedings of the 5th IEEE International Symposium on the Internet of Sounds, Erlangen, Germany, 2024.
[6] Maxime Popoff. Compilation of Real-Time Audio DSP on FPGA. Phd thesis, INSA Lyon, Lyon, France, January 2025.
[7] Thomas Rushton, Romain Michon, and Tanguy Risset. All together now: A synchronous platform for distributed spatial audio. In Procedings of the 2025 Sound and Music Computing Conference, Graz, Austria, 2025.
[8] Maxime Popoff, Romain Michon, and Tanguy Risset. Enabling affordable and scalable audio spatialization with multichannel audio expansion boards for FPGA. In Proceedings of the 2024 Sound and Music Computing Conference, Porto, Portugal, 2024.
[9] Thomas Albert Rushton, Romain Michon, and St´ephane Letz. A microcontroller-based network client towards distributed spatial audio. In Proceedings of the 2023 Sound and Music Computing Conference (SMC-23), Stockholm, Sweden, 2023.
[10] Maxime Popoff, Romain Michon, Tanguy Risset, Yann Orlarey, and Stéphane Letz. Towards an FPGA-based compilation flow for ultra-low latency audio signal processing. In Proceedings of the 2022 Sound and Music Computing conference (SMC-22, Saint-Etienne, France, 2022.
[11] Yann Orlarey, Dominique Fober, and St´ephane Letz. Faust: an efficient functional approach to DSP programming. New Computational Paradigms for Computer Music, pages 65–96, 2009.

Principales activités

Main Activities

  • Design a distributed FPGA-based spatial audio system providing unparalleled computational capabilities and perfect synchronicity between all the speakers.
  • Adapt and optimize spatial audio, virtual acoustics, and soundscape rendering algorithms to FPGAs to fully take advantage of their computational power.
  • Explore the use of the Faust programming language (https://faust.grame.fr) to facilitate the deployment of this kind of applications on a distributed FPGA-based system, automatically finding the best implementation.
  • Evaluate the tools and systems designed as part of this PhD by using them at the heart of concerts, interactive installations, VR experiences, etc.

Additional Activities

  • Write papers
  • Write a thesis
  • Fulfill requirements for graduating, etc.

Compétences

Key Technical Skills

The "ideal" candidate should have the following skills (with some level of flexibility):

  • Masters degree in computer science (or equivalent)
  • Experience with spatial audio
  • Advanced C++ programming (High-Level Synthesis)
  • Network protocols
  • Faust programming basics
  • Low level digital audio systems
  • Professional audio systems/sound engineering
  • FPGAs (VHDL)

Languages

  • Fluent written and spoken English (all communication within the lab happens in English and the PhD thesis will be written in English)

Relational skills

  • Independent
  • Team work

Avantages

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage

Rémunération

1st and 2nd year: 2200 euros gross salary /month
 
3rd year: 2300 euros gross salary / month