PhD Position F/M PhD: Generation of software variants

Contract type : Fixed-term contract

Level of qualifications required : Graduate degree or equivalent

Other valued qualifications : PhD

Fonction : PhD Position

About the research centre or Inria department

The Inria Centre at Rennes University is one of Inria's eight centres and has more than thirty research teams. The Inria Centre is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative PMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institute, etc.

Context

This PhD thesis will be carried out in the DiverSE team (https://www.diverse-team.fr/) which is located in Rennes. DiverSE's research is in the area of software engineering.

Assignment

Many software systems leverage different mechanisms (feature toggles, compiler flags, configuration files, command-line parameters, etc.) and offer numerous configuration options (or features) that can be combined to generate variants [4]. All these mechanisms aim at augmenting the configurability and features of the system, with positive effects on functionality and performance. The generative nature of LLMs makes them good candidates to produce various software variants with several possible applications: software product lines, self-adaptive system, or simply software systems that want to offer more variants to fit different requirements.
Several research topics are thus of interest in the field of LLM for configurable systems, software product lines, self-adaptive system, or simply software systems that want to offer more variants.
A first line of research is that LLM can be used to synthesize software variations based on requirements provided as prompts or high-level feature descriptions. As explored in [1]. the idea is to use LLM as compilers capable of synthesizing code variants, corresponding to features, out of prompts written in natural language. First, LLMs can be used to synthesize software variations based on requirements provided as prompts or high-level feature descriptions. As explored, LLMs can act as compilers capable of synthesizing code variants, corresponding to features, out of prompts written in natural language  [1]. In  [1], we showed how LLMs can assist developers in implementing variability in different programming languages (C, Rust, Java, TikZ, etc.) and mechanisms (conditional compilation, feature toggles, command-line parameters, template, etc.). With ``features as prompts", there is hope to raise the level of abstraction, increase automation, and bring more flexibility when synthesizing and exploring software variants. Out of prompts, LLMs can assist developers in implementing variability in different programming languages (C, Rust, Java, TikZ, etc.) and mechanisms (conditional compilation, template-based generator, etc.). The applicability of LLMs for synthesizing code variants seems broad (e.g., we envision to synthesize configuration files in the context of infrastructure as code) but deserves more research. However, there is a major barrier: LLMs are by construction stochastic, non-determinist and highly sensitive to prompt variations -- and so are corresponding implementations of features and variability. A second line is as follows: LLMs can handle high-level requirements and objectives to suggest influential and interpretable configuration options, helping developers make well-informed configuration decisions. Second, LLMs can handle high-level requirements and objectives to suggest influential and interpretable configuration options, helping developers make well-informed configuration decisions. Though statistical learning has been largely employed in this context [3], we believe that LLMs can also bring values when offering recommender and predictive models that estimate software performance based on various configuration settings. These models can communicate to developers or end-users interpretable insights about the complex relationships between configurations and system performance. Hence, LLMs are complementary to statistical learning and symbolic reasoning when handling large variants' space. An interesting perspective is to leverage, in addition to the code, different sources of information (coming from mailing list, documentation, man pages, issues, and discussions) to integrate configuration knowledge. Finally, we will investigate the use of LLMs for automatic feature identification and modeling within software systems. We aim to develop techniques that can identify and represent configurable units (features) effectively [2].  Preliminary experiments suggest that LLMs can either locate features into unmanaged code variants or refactor a configurable system with another set of (meaningful) features.
All of these research directions need further inquiries to propose valid approaches, as variations in prompts or temperature may introduce significant variability-related issues and lead to incorrect generations. All of these research directions need further inquiries to propose valid approaches. One challenge is that LLMs are sensitive to perturbations, and small variations in prompts may introduce significant variability-related issues and lead to incorrect generated variants. We aim to create specific benchmarks for the task of generating software variants, with the objective of continuously evaluating the robustness of LLMs in handling large variants' space.
It will also guide the development of automated techniques to refine the prompts or provide additional context, thereby enhancing the LLM's understanding and quality of variants.

 

[1] Mathieu Acher, Jos´e Angel Galindo Duarte, and Jean-Marc J´ez´equel. On ´ programming variability with large language model-based assistant. In Paolo Arcaini, Maurice H. ter Beek, Gilles Perrouin, Iris Reinhartz-Berger, Miguel R. Luaces, Christa Schwanninger, Shaukat Ali, Mahsa Varshosaz, Angelo Gargantini, Stefania Gnesi, Malte Lochau, Laura Semini, and Hironori Washizaki, editors, Proceedings of the 27th ACM International Systems and Software Product Line Conference - Volume A, SPLC 2023, Tokyo, Japan, 28 August 2023- 1 September 2023, pages 8–14. ACM, 2023.

[2] Mathieu Acher and Jabier Martinez. Generative AI for reengineering variants into software product lines: An experience report. In Paolo Arcaini, Maurice H. ter Beek, Gilles Perrouin, Iris Reinhartz-Berger, Ivan Machado, Silvia Regina Vergilio, Rick Rabiser, Tao Yue, Xavier Devroey, M´onica Pinto, and Hironori Washizaki, editors, Proceedings of the 27th ACM International Systems and Software Product Line Conference - Volume B, SPLC 2023, Tokyo, Japan, 28 August 2023- 1 September 2023, pages 57–66. ACM, 2023.

[3] Juliana Alves Pereira, Hugo Martin, Mathieu Acher, Jean-Marc J´ez´equel, Goetz Botterweck, and Anthony Ventresque. Learning Software Configuration Spaces: A Systematic Literature Review. Journal of Systems and Software, 182:111044, August 2021.

[4] S. Apel, D. Batory, C. K¨astner, and G. Saake. Feature-Oriented Software Product Lines: Concepts and Implementation. Springer Berlin Heidelberg, 2013.

 

Main activities

The PhD candidate will investigate the following research questions:

- How to use LLM tio generate software variants?

- What is the most suitable granularity for generating variants?

- What contextual information needs to be built up for effective generation?

Skills

You need to:

  • have (or soon receive) a Masters degree in computer science/engineering, informatics, or related fields
  • be ok with assisting in teaching and in taking courses where needed
  • be ok investing 3+ years as a "research apprentice" (aka PhD student)

Benefits package

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Possibility of teleworking (90 days per year) and flexible organization of working hours
  • Partial payment of insurance costs

Remuneration

Monthly gross salary: 2100€ during the 2 1st years and 2200€ during the 3rd year.