Post-Doctoral Research Visit F/M Data management and job scheduling for Geo-distributed Workflows
Type de contrat : CDD
Contrat renouvelable : Oui
Niveau de diplôme exigé : Thèse ou équivalent
Fonction : Post-Doctorant
A propos du centre ou de la direction fonctionnelle
The Inria Centre at Rennes University is one of Inria's nine centres and has more than thirty research teams. The Inria Centre is a major and recognized player in the field of digital sciences. It is at the heart of a rich R&D and innovation ecosystem: highly innovative PMEs, large industrial groups, competitiveness clusters, research and higher education players, laboratories of excellence, technological research institute, etc.
Contexte et atouts du poste
Financial and working environment.
This postdoc position will be in the context of IPCEI-CIS (Important Project of Common European Interest – Next Generation Cloud Infrastructure and Services) DXP (Data Exchange Platform) project involving Amadeus and three Inria research teams (COAST, CEDAR and MAGELLAN). This project aims to design and develop an open-source management solution for a federated and distributed data exchange platform (DXP), operating in an open, scalable, and massively distributed environment (cloud-edge continuum). The position will be recruited and hosted at the Inria Center at Rennes University; and the work will be carried out within the MAGELLAN team in collaboration with other partners.
The position is for one year, with the possibility of an extension to 24 months.
Mission confiée
Context:
Data are usually hosted in multiple geo-distributed locations (private and public clouds, distributed caches and across the edge-to-cloud continuum). Collectively processing these data is a must (e.g. distributed data analysis and queries), but this presents several challenges to existing data-intensive distributed workflow frameworks (e.g. MapReduce [1], Spark [2], TensorFlow [3] and Dataflow [4]). This is due to the low capacity of wide area network (WAN) links, as well as the heterogeneity of networks, computation power and monetary cost in geo-distributed environments [5].
Much effort has devoted on optimizing the performance of geo-distributed workflows [5, 6, 7, 8, 9]. These efforts mainly focus on reducing cross-data center data transfer and optimal task placement according to performance heterogeneity of the data centers [5, 6, 7]. However, these efforts do not consider the storage services heterogeneity [10] (input and intermediate data are stored on different devices), monetary cost heterogeneity in terms of computing, storage and network, or multi-tenancy when multiple workflows run concurrently.
Objectives:
We will study the interplay and correlation between the different factors that contribute to the performance of geo-distributed workflows (i.e., input data, intermediate data, number of iterations, capacity of site, etc). Accordingly, we will design scheduling policies and associated data movement to improve the overall performance, monetary cost, and resource utilization by considering the input location, the network cost and status, intermediate data, and the capacity of the different sites, when scheduling multiple workflows across massively distributed environments.
[1] J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Commun. ACM, pp. 107–113, Jan. 2008.
[2] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, “Spark: Cluster Computing with Working Sets,” in Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud’10). USENIX Association, 2010, pp. 10–10.
[3] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: a system for large-scale machine learning. In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI'16).
[4] Tyler Akidau, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael J. Fernández-Moctezuma, Reuven Lax, Sam McVeety, Daniel Mills, Frances Perry, Eric Schmidt, and Sam Whittle. 2015. The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proc. VLDB Endow.
[5] Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, and Ion Stoica. Low latency geo-distributed data analytics. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM’15, pages 421–434, New York, NY, USA, 2015. ACM.
[6] Sharad Agarwal, John Dunagan, Navendu Jain, Stefan Saroiu, Alec Wol- man, and Harbinder Bhogan. Volley: Automated data placement for geo- distributed cloud services. In Proceedings of the7th USENIX Conference on Networked Systems Design and Implementation, NSDI’10, pages 2–2, Berkeley, CA, USA, 2010. USENIX Association.
[7] WANG, Qingyuan, GAO, Bin, ZHOU, Zhi, et al. Dag-aware optimization for geo-distributed data analytics. In: Proceedings of the 52nd International Conference on Parallel Processing. 2023. p. 472-481.
[8] Ashish Vulimiri, Carlo Curino, Philip Brighten Godfrey, Thomas Jungblut, Konstantinos Karanasos, Jitendra Padhye, and George Varghese. Wanalytics: Geo-distributed analytics for a data intensive world. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD’15, pages 1087–1092, New York, NY, USA, 2015. ACM.
[9] Chien-Chun Hung, Leana Golubchik, and Minlan Yu. Scheduling jobs across geo-distributed datacenters. In Proceedings of the Sixth ACM Symposium on Cloud Computing, SoCC’15, pages 111–124, New York, NY, USA, 2015. ACM.
[10] Hao Wu, Junxiao Deng, Hao Fan, Shadi Ibrahim, Song Wu, Hai Jin, "QoS-Aware and Cost-Efficient Dynamic Resource Allocation for Serverless ML Workflows," 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS), St. Petersburg, FL, USA, 2023, pp. 886-896.
Principales activités
- Read and synthesize literature work.
- Design new scheduling policies and data management for Geo-distributed Workflows
- Implementation and large-scale validation.
- Participate in project meetings and discussions with other partners.
- Write research papers and disseminate results through presentations at project meetings, conferences, and workshops.
Compétences
- A Ph.D. in computer science
- A solid background in the area of distributed systems
- Ability to conduct experimental systems research
- Experience with building systems and tools
- Working experience in the areas of Big Data management, Cloud Computing, Data Analytics are advantageous
- Very good communication skills in oral and written English
Avantages
- Subsidized meals
- Partial reimbursement of public transport costs
- Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
- Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
- Professional equipment available (videoconferencing, loan of computer equipment, etc.)
- Social, cultural and sports events and activities
- Access to vocational training
- Social security coverage
Rémunération
Monthly gross salary amounting to 2788 euros
Informations générales
- Thème/Domaine :
Systèmes distribués et intergiciels
Système & réseaux (BAP E) - Ville : Rennes
- Centre Inria : Centre Inria de l'Université de Rennes
- Date de prise de fonction souhaitée : 2025-11-01
- Durée de contrat : 12 mois
- Date limite pour postuler : 2025-09-18
Attention: Les candidatures doivent être déposées en ligne sur le site Inria. Le traitement des candidatures adressées par d'autres canaux n'est pas garanti.
Consignes pour postuler
Please submit online : your resume, cover letter and letters of recommendation eventually
Sécurité défense :
Ce poste est susceptible d’être affecté dans une zone à régime restrictif (ZRR), telle que définie dans le décret n°2011-1425 relatif à la protection du potentiel scientifique et technique de la nation (PPST). L’autorisation d’accès à une zone est délivrée par le chef d’établissement, après avis ministériel favorable, tel que défini dans l’arrêté du 03 juillet 2012, relatif à la PPST. Un avis ministériel défavorable pour un poste affecté dans une ZRR aurait pour conséquence l’annulation du recrutement.
Politique de recrutement :
Dans le cadre de sa politique diversité, tous les postes Inria sont accessibles aux personnes en situation de handicap.
Contacts
- Équipe Inria : MAGELLAN
-
Recruteur :
Ibrahim Shadi / Shadi.Ibrahim@inria.fr
A propos d'Inria
Inria est l’institut national de recherche dédié aux sciences et technologies du numérique. Il emploie 2600 personnes. Ses 215 équipes-projets agiles, en général communes avec des partenaires académiques, impliquent plus de 3900 scientifiques pour relever les défis du numérique, souvent à l’interface d’autres disciplines. L’institut fait appel à de nombreux talents dans plus d’une quarantaine de métiers différents. 900 personnels d’appui à la recherche et à l’innovation contribuent à faire émerger et grandir des projets scientifiques ou entrepreneuriaux qui impactent le monde. Inria travaille avec de nombreuses entreprises et a accompagné la création de plus de 200 start-up. L'institut s'efforce ainsi de répondre aux enjeux de la transformation numérique de la science, de la société et de l'économie.