Physically-informed machine learning for modelling the dynamics of plant-pathogens molecular interactions

Level of qualifications required : Master's or equivalent

Fonction : Internship Research

About the research centre or Inria department

The Inria center at Université Côte d'Azur includes 42 research teams and 9 support services. The center’s staff (about 500 people) is made up of scientists of different nationalities, engineers, technicians and administrative staff. The teams are mainly located on the university campuses of Sophia Antipolis and Nice as well as Montpellier, in close collaboration with research and higher education laboratories and establishments (Université Côte d'Azur, CNRS, INRAE, INSERM ...), but also with the regional economic players.

With a presence in the fields of computational neuroscience and biology, data science and modeling, software engineering and certification, as well as collaborative robotics, the Inria Centre at Université Côte d'Azur  is a major player in terms of scientific excellence through its results and collaborations at both European and international levels.

 

Context

Acumes project-team is a joint team between Inria and mathematics laboratory Jean-Alexandre Dieudonné (LJAD) at Université Côte d'Azur. The research carried out focuses on analysis and optimisation of systems governed by partial differential equations, with multidisciplinary applications ranging from fluid and structural mechanics to the modelling of biological phenomena, road and pedestrian traffic. The team is now interested in deep learning methods to efficiently combine data and physical models.

Assignment

Plants live in a constantly changing environment that happens to be unfavourable or even hostile. Therefore, plant defense against biotic threats requires multiple signaling processes responsible for surveillance, perception, and immune response activation that are influenced by varying spatial and temporal factors.  On the top of that, those interactions are based on a molecular dialogue between the pathogen and its host that occur at different time frames, altogether concurring for a successful or unsuccessful infection.

The flourishing of omics techniques has led to the possibility of studying complex biological systems, through the analysis of its content at the molecular level. Transcriptomics is by far the most used omics providing the quantification of change in gene expression. Time-course transcriptomics data are analyzed considering each time point as independent and using approaches based on profile analysis in which the temporal continuity of the data is not fully appreciated. An alternative approach consists in the use of mathematical models including regression and spline models. However, these models generally fail to provide mechanistic interpretations. On the top of that, transcriptome analysis with high resolution in time is challenging, particularly with plant tissues, resulting in longitudinal experiments usually composed of few time points. Those are insufficient to robustly infer statistically significant changes over the infection.

Despite often longitudinal experiments are performed, current available analysis methods do not explicitly consider the time dependency of successive observations.

A novel class of models which is growing in popularity is the physics-informed neural networks (PINNs), a novel class of deep learning algorithms that can integrate observational data and physical or mathematical understanding. Those are particularly suitable for high-dimensional, noisy data and time-series, although their application in this domain is very recent. However, all those methods have never been used on omics data, certainly because the physical system is generally unknown in this case.

Main activities

In this project we will investigate the potentiality of these methods and their suitability to analyse information contained in longitudinal multi-transcriptomics data. This internship is at the crossroad between systems biology and scientific machine learning; therefore the candidate will develop activities in both fields and leverage the interactions between observational data and models to try to overcome present barriers. The following steps are proposed:

Task 1. Identify possible mathematical models to describe typical omics time series and develop a suitable hybrid machine learning framework, based on both time series observations and modeling principles.

Task2. Leveraging POMOdORO, an internal database of omics data of tomato in interaction with several different pathogens developed at the Institut Sophia Agrobiotech, the candidate will process the omics raw data to test the model developed.

Skills

the candidate should be a Master 2 student in applied mathematics, machine learning or bio-informatics

Benefits package

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Contribution to mutual insurance (subject to conditions)

Remuneration

Traineeship grant depending on attendance hours.