2023-05769 - Engineer F/M: Benchmarking of applications'I/O behavior and storage systems (performance and energy consumption)

Contract type : Fixed-term contract

Level of qualifications required : Graduate degree or equivalent

Fonction : Temporary scientific engineer

Level of experience : From 3 to 5 years

About the research centre or Inria department

The goal of the TADaaM project is to design and build a stateful system-wide service layer for HPC systems. This layer will be twofold. First, it will abstract low-level features of the system (e.g. topology, network, resource usage) and of the software stack (e.g. threads, data, runtime system). Second, applications will be able to register their needs and behaviors thanks to a carefully designed API. With these two sets of information, the layer will optimize the execution of all the running applications in a coordinated fashion and at system-scale.

Context

Within the framework of a Numpex project called Exadost, in collaboration with Inria Rennes, La Maison de la Simulation, the CEA-DAM, and DDN.

This project aims at building the I/O infrastructure of next-generation HPC machines.

 

Assignment

Assignments:

The recruited person will be responsible for characterizing the I/O behavior of applications that have been chosen as representative of the French HPC workload. This characterization will be done by using profiling tools such as Darshan and Tau, tracing tools such as Recorder, and by inspecting the source code of the applications. We are interested in developping I/O kernels, which are codes that mimic the I/O activities (accesses to persistent data) of the applications and can be used to more easily evaluate them on different platforms.

In addition to that, the person will be responsible for performing experiments on different I/O infrastructures to characterize their behavior and how they are affected by different characteristics of the accesses. For that, existing benchmarks such as IOR and mdtest will be used at first, but new benchmarks may need to be developed.

The selection of benchmarks and access pattern will involve the study of research papers.

Finally, the expected results are a suite of benchmarks that can be easily applied to new platforms, the I/O kernels, a database of obtained results, and a report.

 

For a better knowledge of the proposed research subject:

Examples of similar work conducted in the same research team:

https://hal.inria.fr/hal-03753813

https://hal.inria.fr/hal-03808833/

 

Main activities

Main activities:

  • Studying papers about workload of real large HPC machines and imposed by known classes of applications (for example, machine learning);
  • Running applications and benchmarks on HPC systems using scripts, treating and plotting results;
  • Studying large HPC applications (usually written in C/C++ or Fortran) to understand their I/O behavior;
  • Development of I/O kernels and benchmarks in C/C++ using MPI-IO;
  • Statistical analysis of results and modeling (Python or R).

Additional activities:

  • Writing reports and research papers (Latex)

 

Skills

Technical skills and level required:

  • C/C++;
  • Fortran not really required but should be capable of reading and understanding Fortran code;
  • Scripting (Bash, Python,etc);
  • using an HPC infrastructure: slurm, ssh, etc;

experience in research, especially in HPC, would be a plus.

 

Languages:

  • English mandatory, French appreciated.

Benefits package

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Possibility of teleworking and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage

Remuneration

2724€ / month (before taxs)