

Dans le cadre d’un partenariat projet PEPR Brein Health Trajectories

The main goal of this project is thus to develop a generative approach for covariance models that is consistent with the Riemannian framework for connectivity modeling in fMRI [8]. We will learn to generate such samples from large populations and to fine-tune models for smaller populations. We will then assess the quality of the generated samples using a variety of metrics (authenticity, coverage, recall, GAN-train, GAN-test). Finally, we will evaluate the utility of the generative approach on longitudinal and cross-sectional connectivity-based diagnostic problems.

Technical developments The candidate will implement different generative models for covariance matri-
ces that are used in functional connectivity analysis.
• The so-called R-CNN network embedded in the score-based generative modeling can learn real images
at the pixel level, resulting in highly realistic covariance matrices [9].• Graph neural networks can generate good representations of node relationships. Whether the generated
covariance matrices appear realistic is still an open question.
• A well-known approach called stable diffusion [10] maps data to latent space via an encoder and then
uses a diffusion model. For specific neuroimaging tasks, many details need to be explored. This can be
combined with Riemannian score-based generative modeling [11, 12] for further experimentation.
The generative model has to be a conditional one: it should be tuned to some input information: age, sex
and characteristics of the target population, such as a disease status.
In a second step, the candidate will then systematically evaluate the quality of the generated connectivity
models using qualitative evaluation and the metrics described in [13] to assess the coverage (i.e. do the
generated covariance matrices cover the entire distribution of the observed covariance matrices ?), recall (i.e.
are all the generated samples close enough to the input distribution ?), and fidelity (i.e. are the generated
samples distinct enough from the input samples ?).
Validation on brain imaging datasets We will train the generative model on the large-scale UKBiobank
time series [14] (40k samples). The validation will consist of i) assessing whether the effects of some covariates
or treatments (age, disease, education) can be captured and reproduced by the generated data, for example
when creating explicit counterfactuals ii) assessing the utility of the data generation in downstream prediction
When the quality is good enough, we will apply it to augment datasets from other cohorts with similar
population profiles (CamCAN [15, 16], 1000 brains[17]).
In particular, we will study the adaptation of the generators to the specific settings of new cohorts. To
handle potential covariate shifts, we will consider Riemannian approaches [18, 8] as well as optimal transport
techniques [5].

Main activities

The project will produce experimental code in Python, available as an open source project, for the sake of
scientific reproducibility. Care will be taken to have good compatibility with existing frameworks such as
The best parts of the code could be incorporated into Nilearn or Geomstats.
The experiments will be carried out using data available on MIND servers, that are protected.


