Job Responsibilties:

  • Design and implement traditional ML and LLM-based systems and applications
  • Optimize model inference performance and cost efficiency
  • Fine-tune foundation models for specific use cases and domains
  • Implement diverse prompt engineering strategies
  • Build robust backend infrastructure for AI-powered applications
  • Implement and maintain MLOps pipelines for AI lifecycle management
  • Design and implement comprehensive traditional ML and LLM monitoring and evaluation systems
  • Develop automated testing frameworks for model quality and performance tracking

Basic Qualifications:

  • 4–8 years of relevant experience in LLMs, Backend Engineering, and MLOps.
  • LLM Expertise
  • Model Fine-tuning: Experience with parameter-efficient fine-tuning methods (LoRA, QLoRA, adapter layers)
  • Inference Optimization: Knowledge of quantization, pruning, caching strategies, and serving optimizations
  • Prompt Engineering: Prompt design, few-shot learning, chain-of-thought prompting, and retrieval-augmented generation (RAG)
  • Model Evaluation: Experience with AI evaluation frameworks and metrics for different use cases
  • Monitoring & Testing: Design of automated evaluation pipelines, A/B testing for models, and continuous monitoring systems
  • Backend Engineering
  • Languages: Proficiency in Python, with experience in FastAPI, Flask, or similar frameworks
  • APIs: Design and implementation of RESTful APIs and real-time systems
  • Databases: Experience with vector databases and traditional databases
  • Cloud Platforms: AWS, GCP, or Azure with focus on ML services
  • MLOps & Infrastructure
  • Deployment: Experience with model serving frameworks (vLLM, SGLang, TensorRT)
  • Containerization: Docker and Kubernetes for ML workloads
  • Monitoring: ML model monitoring, performance tracking, and alerting systems
  • Evaluation Systems: Building automated evaluation pipelines with custom metrics and benchmarks
  • CI/CD: MLOps pipelines for automated testing, and deployment
  • Orchestration: Experience with workflow tools like Airflow.

Preferred Qualifications:

  • LLM Frameworks: Hands-on experience with Transformers, LangChain, LlamaIndex, or similar
  • Monitoring Platforms: Knowledge of LLM-specific monitoring tools and general ML monitoring
  • Distributed Training and Inference: Experience with multi-GPU and distributed training and inference setups
  • Model Compression: Knowledge of techniques like distillation, quantization, and efficient architectures
  • Production Scale: Experience deploying models handling high-throughput, low-latency requirements
  • Research Background: Familiarity with recent LLM research and ability to implement novel techniques
  • Tools & Technologies We Use
  • Frameworks: PyTorch, Transformers, TensorFlow
  • Serving: vLLM, TensorRT-LLM, SGlang, OpenAI API,
  • Infrastructure: Kubernetes, Docker, AWS/GCP
  • Databases: PostgreSQL, Redis, Vector DBs

Job Summary

CompanyShyftLabs
LocationNoida, Uttar Pradesh
TypeFull-Time
LevelMid-level
DomainAI / Data Science