PhD Position F/M Study of the estimation and control principle for Markov decision processes (IDP 2024)

Contract type : Fixed-term contract

Level of qualifications required : Graduate degree or equivalent

Fonction : PhD Position

Context

The Inria team Astral is a joint Inria-Naval Group project team, Naval Group being a French industrial group specializing in naval defense construction. With this thesis, we aim to carry out exploratory and preparatory theoretical studies that could have an impact on the work carried out with Naval Group, without however having any guarantee of direct applications in the short or medium term.

Assignment

Markov decision processes are non-diffusive stochastic processes whose defining parameters (jump rates, transition measures, flows) have a variable on which one can act in such a way that it is hoped to be able to control the process to achieve a certain goal. In practice, these processes may depend on parameters that are a priori unknown and whose value one may want to estimate. If one also seeks to control the process in real-time, this estimation must then also be done in real-time, and our decision-making must adapt to the current estimate of the parameters, and the principle of estimation and control comes into play, linking the choice of estimators to that of strategies, and vice versa.

The principle of adaptation and control for discrete-time Markov decision processes has been the subject of numerous studies. These studies show that the class of minimum contrast estimators constitutes a class of estimators allowing the estimation of the parameters of the observed process at the same time as its control via the construction of asymptotically optimal policies, at least for the criterion of total reward with discount factor (and also when the time horizon is finite).

Depending on the candidate's profile, theoretical or practical aspects should be developed in this area of research.

Main activities

Here are a few lines of theoretical research that could be studied:
- the assumptions made in [H12] about the characteristics of the process need to be weakened to cover more numerous situations.
- the asymptotic properties of the proposed estimators need to be studied more in depth.
- the central limit theorem has not been obtained for these estimators and thus deserved to be studied.
- the work initiated in [Maigret79] around the large deviations principle deserves to be explored further and extended to the context of Markov decision processes.

In this vein, we have recently obtained results that extend the principle of estimation and control to the framework of continuous-time Markov decision processes, see [CG23,CDG23]. The research program presented in discrete time is of course also to be developed in this technically more demanding context, which is notably due to the presence of forced jumps at the boundary.

During this thesis, the practical aspect should not be neglected: the numerical implementation of the studied estimators and the obtained optimal policies will allow the illustration of their properties. This is an important point that will demonstrate the usefulness of theoretical studies and developed methods. In this context, one can look at various classic problems related to target tracking, as explained in [Zhang17], which can be modelled using Markov decision processes with adaptation.

[CG23] Costa, O., \& Dufour, F. (2023). Adaptive discounted control for piecewise deterministic Markov processes. Journal of Mathematical Analysis and Applications, 127517.
[CDG23] Costa, O., Dufour, F. \& Génadot, A. (2023). Minimum Contrast Estimators for Piecewise Deterministic Markov Processes. Soumis.
[Maigret79] Maigret, N. (1979). Majorations de Chernoff et statistique séquentielle pour des chaînes de Markov récurrentes au sens de Doeblin. Astérisque, 68, 125-142.
[H12] Hernández-Lerma, O. (2012). Adaptive Markov control processes (Vol. 79). Springer Science $\&$ Business Media.
[Zhang17] Zhang, H., Dufour, F., Anselmi, J., Laneuville, D., \& Nègre, A. (2017). Piecewise optimal trajectories of observer for bearings-only tracking by quantization. In 2017 20th International Conference on Information Fusion (Fusion) (pp. 1-7). IEEE.

Skills

The candidate should have a solid background in probability theory and notably in the theory of Markov processes. Previous experience of a course in control theory (deterministic or stochastic) would be a plus. The ability to develop numerical examples is also expected.

Benefits package

Subsidized meals
Partial reimbursement of public transport costs
Possibility of teleworking and flexible organization of working hours
Professional equipment available (videoconferencing, loan of computer equipment, etc.)
Social, cultural and sports events and activities
Access to vocational training
Social security coverage

Remuneration

2100€ / month (before taxs) during the first 2 years,
2190€ / month (before taxs) during the third year.

Apply for this position

General Information

Theme/Domain : Stochastic approaches
Statistics (Big data) (BAP E)
Town/city : Talence
Inria Center : Centre Inria de l'université de Bordeaux
Starting date : 2024-10-01
Duration of contract : 3 years
Deadline to apply : 2024-05-03

Warning : you must enter your e-mail address in order to save your application to Inria. Applications must be submitted online on the Inria website. Processing of applications sent from other channels is not guaranteed.

Instruction to apply

Thank you to send:
- CV
- Cover letter
- Master marks and ranking
- Support letter(s)

Defence Security :
This position is likely to be situated in a restricted area (ZRR), as defined in Decree No. 2011-1425 relating to the protection of national scientific and technical potential (PPST).Authorisation to enter an area is granted by the director of the unit, following a favourable Ministerial decision, as defined in the decree of 3 July 2012 relating to the PPST. An unfavourable Ministerial decision in respect of a position situated in a ZRR would result in the cancellation of the appointment.

Recruitment Policy :
As part of its diversity policy, all Inria positions are accessible to people with disabilities.

Contacts

Inria Team : ASTRAL
PhD Supervisor :
Genadot Alexandre / alexandre.genadot@inria.fr

About Inria

Inria is the French national research institute dedicated to digital science and technology. It employs 2,600 people. Its 200 agile project teams, generally run jointly with academic partners, include more than 3,500 scientists and engineers working to meet the challenges of digital technology, often at the interface with other disciplines. The Institute also employs numerous talents in over forty different professions. 900 research support staff contribute to the preparation and development of scientific and entrepreneurial projects that have a worldwide impact.