Suechsch nachere usforderige Stelli als Master Thesis, Internship? Läs wyter!
Master Thesis, Internship
Internship or Master Thesis on ML-based optimization of LLM kernels for multiple platforms
Ref. 2024_024
Project description
Manually optimized kernels (e.g., flash attention) are critical for the performance of LLM inference and training. However, most of these kernels have typically been carefully optimized for a specific GPU platform and may pose a serious obstacle to the portability of LLM applications. Consequently, to achieve high-performance on different GPUs, LLM kernels need to be re-implemented or manually re-optimized.
Open AI Triton (
https://github.com/triton-lang/triton)
has recently emerged as a promising open-source alternative to writing custom CUDA kernels. It enables one to write kernels for execution on GPUs using simple Python code. Triton kernels can be both highly performant, as well as portable across different GPU architectures. For this reason, Triton is growing in popularity, and many LLM inference frameworks, e.g. VLLM (https://github.com/vllm-project/vllm)
, already include several kernels written in Triton.Despite the promise of Triton being adaptable to many different GPU platforms, to do so still requires manual performance fine-tuning in practice. In this context, we aim to answer the following research questions:
Qualifications:
Preferred Qualifications:
Diversity
IBM is committed to
diversity
at the workplace. With us you will find an open, multicultural environment. Excellent flexible working arrangements enable all genders to strike the desired balance between their professional development and their personal lives.Please submit your application through the link below. This position is available starting immediately or at a later date.
13-03-2025
Bitte sage uns, wo du ähnliche Stellenanzeigen suchst und vergiss nicht deine E-Mail Adresse anzugeben!