AI Engineer
Noida, Uttar PradeshFull-TimeMid-levelAI / Data Science
Skills
PythonFlaskFastAPIRESTPostgreSQLRedisAWSGCPAzureKubernetesDockerCI/CDMachine learningTensorFlowPyTorchLLMAirflowRAGLarge language modelsTransformersPrompt engineeringFew-shot promptingLoRARetrieval-augmented generationLangChainLlamaIndexModel servingInference optimizationQuantizationTensorRTDistributed trainingModel evaluation
Job Responsibilties:
- Design and implement traditional ML and LLM-based systems and applications
- Optimize model inference performance and cost efficiency
- Fine-tune foundation models for specific use cases and domains
- Implement diverse prompt engineering strategies
- Build robust backend infrastructure for AI-powered applications
- Implement and maintain MLOps pipelines for AI lifecycle management
- Design and implement comprehensive traditional ML and LLM monitoring and evaluation systems
- Develop automated testing frameworks for model quality and performance tracking
Basic Qualifications:
- 4–8 years of relevant experience in LLMs, Backend Engineering, and MLOps.
- LLM Expertise
- Model Fine-tuning: Experience with parameter-efficient fine-tuning methods (LoRA, QLoRA, adapter layers)
- Inference Optimization: Knowledge of quantization, pruning, caching strategies, and serving optimizations
- Prompt Engineering: Prompt design, few-shot learning, chain-of-thought prompting, and retrieval-augmented generation (RAG)
- Model Evaluation: Experience with AI evaluation frameworks and metrics for different use cases
- Monitoring & Testing: Design of automated evaluation pipelines, A/B testing for models, and continuous monitoring systems
- Backend Engineering
- Languages: Proficiency in Python, with experience in FastAPI, Flask, or similar frameworks
- APIs: Design and implementation of RESTful APIs and real-time systems
- Databases: Experience with vector databases and traditional databases
- Cloud Platforms: AWS, GCP, or Azure with focus on ML services
- MLOps & Infrastructure
- Deployment: Experience with model serving frameworks (vLLM, SGLang, TensorRT)
- Containerization: Docker and Kubernetes for ML workloads
- Monitoring: ML model monitoring, performance tracking, and alerting systems
- Evaluation Systems: Building automated evaluation pipelines with custom metrics and benchmarks
- CI/CD: MLOps pipelines for automated testing, and deployment
- Orchestration: Experience with workflow tools like Airflow.
Preferred Qualifications:
- LLM Frameworks: Hands-on experience with Transformers, LangChain, LlamaIndex, or similar
- Monitoring Platforms: Knowledge of LLM-specific monitoring tools and general ML monitoring
- Distributed Training and Inference: Experience with multi-GPU and distributed training and inference setups
- Model Compression: Knowledge of techniques like distillation, quantization, and efficient architectures
- Production Scale: Experience deploying models handling high-throughput, low-latency requirements
- Research Background: Familiarity with recent LLM research and ability to implement novel techniques
- Tools & Technologies We Use
- Frameworks: PyTorch, Transformers, TensorFlow
- Serving: vLLM, TensorRT-LLM, SGlang, OpenAI API,
- Infrastructure: Kubernetes, Docker, AWS/GCP
- Databases: PostgreSQL, Redis, Vector DBs
