M2 internship - Unsupervised language-aided landmark discovery and matching for visual localization in complex environments

Contract type : Internship agreement

Level of qualifications required : Master's or equivalent

Fonction : Internship Research

Assignment

1. General information

- Position: M2 internship
- Duration: 5 to 6 months, starting in February 2025 (flexible)
- Location: Inria, Nancy, France
- Affiliation: TANGRAM team
- Supervisors: Vincent Gaudillière, Marie-Odile Berger and Gilles Simon

2. Context, description and objectives

This internship will deal with the problem of visual localization, which involves determining a camera's viewpoint by automatically matching features in an image with elements from a known 3D model of the environment. These features are referred to as landmarks.

Object-based localization uses ``high-level'' landmarks, such as objects (e.g., chairs, tables, cupboards), as opposed to the more commonly used ``low-level'' keypoints (e.g., SIFT, ORB). This approach offers the advantage of relying on fewer, more discriminative landmarks but is currently limited to environments that are rich in common objects, often artificially created for research purposes. Moreover, creating the 3D model requires manually matching objects detected across multiple images, a process that can be time-consuming and tedious.

For this internship, we will focus on complex industrial environments (e.g., factories, power plants, ships) where the concept of an object is not always clearly defined. The goal is to automatically identify high-level landmarks in each image and ensure automatic matching of the detected landmarks across different images. To achieve this, we will employ ``unsupervised'' methods, which do not require environment-specific training, and explore the role of language in describing objects.

3. Candidate profile

- The candidate is pursuing his/her last year of Master's or engineering’s degree in Computer Vision, Electrical Engineering, Computer Science, Applied Mathematics or a related field.
- A strong background in image processing or/and in computer vision is required.
- A strong level of Python (or Matlab) programming is required.
- An interest in deep learning frameworks (Pytorch) is also required.
- Commitment, team working and a critical mind.
- Good oral and written communication skills in English.

Main activities

The first part of the internship will involve a literature review on unsupervised object localization in images and the use of vision-language models (e.g., CLIP). The second part will involve applying some of these methods to images of industrial environments and analyzing their results in terms of relevancy and repeatability. The final part will focus on proposing methods for automatically matching detected landmarks across different images.

Skills

- The candidate is pursuing his/her last year of Master's or engineering’s degree in Computer Vision, Electrical Engineering, Computer Science, Applied Mathematics or a related field.
- A strong background in image processing or/and in computer vision is required.
- A strong level of Python (or Matlab) programming is required.
- An interest in deep learning frameworks (Pytorch) is also required.
- Commitment, team working and a critical mind.
- Good oral and written communication skills in English.

Benefits package

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage

Remuneration

€4.35/hour