Your mission
We are looking for a motivated student assistant to support our R&D team in setting up and maintaining large language model (LLM) inference environments and related API services. The role involves hands-on work with modern inference frameworks and GPU-based infrastructures, both cloud-hosted and on-premises.
- Setting up, configuring, and maintaining LLM inference frameworks such as vLLM, TensorRT-LLM, llama.cpp, Ollama, and SGLang.
- Deploying and managing API endpoints for model inference on self-hosted GPU servers and cloud GPU instances (e.g., RunPod, Hetzner, AWS).
- Performing DevOps-related activities such as container setup, port forwarding, reverse proxy configuration, and HTTPS endpoint deployment.