$ whoami
Hi, I'm Mukesh N
Senior AI Engineer architecting healthcare AI at CVS Health. I ship multi-agent systems, HIPAA-compliant RAG pipelines, and GPU inference platforms that serve millions — and I build the evaluation harnesses that prove they work.
Platform users
Ask AT&T — enterprise GenAI
Annual infra savings
Intelligent model routing
Production SLA
vLLM on GKE GPU clusters
Retrieval precision
Hybrid search + re-ranking
Inference throughput
NVIDIA Triton on AKS
Faster claims processing
45 min → 12 min with RAG
// 01. about
Production AI, not demos.
I build production AI systems that Fortune 500 companies actually ship to millions of users. Not demos. Not POCs that die in a slide deck — systems with SLAs, audit logs, and evaluation gates.
My path: I started in the engine room at DXC Technologies building CI/CD pipelines and GPU infrastructure. Then two years scaling Ask AT&T — the internal generative AI platform serving 68,000+ employees — where I owned core frontend, LLM tool orchestration, and the evaluation pipelines behind 8 production use cases.
Today I architect CVS Health's AI-native consumer engagement platform, connecting CVS Pharmacy, Caremark, and Aetna through LangGraph multi-agent systems, HIPAA-compliant RAG, and GPU-backed inference on GCP Vertex AI.
What separates me from most AI engineers is evaluation. Anyone can call an LLM API. I build the harnesses that measure whether AI systems actually work — and the infrastructure to keep them working in production.
$ mukesh --profile
// 02. experience
Where I've shipped.
Three chapters: building AI infrastructure, scaling enterprise GenAI to 68,000 users, and now architecting healthcare AI for millions.
CVS Health
May 2025 — PresentSenior AI Engineer · Dallas, TX
Architecting the AI-native consumer engagement platform connecting CVS Pharmacy, Caremark, and Aetna for millions of members — from GPU inference clusters to React agent-config portals.
AT&T
Jan 2023 — Apr 2025Full Stack Engineer · Dallas, TX
Core engineer on Ask AT&T — the internal generative AI platform serving 68,000+ employees with millions of monthly interactions at sub-2-second response times.
DXC Technologies
May 2020 — Jun 2022Python Developer & AI Infrastructure Engineer · Bengaluru, India
Built the infrastructure layer for ML at scale — GPU Kubernetes clusters, IaC-driven provisioning, and observability from scratch.
// 03. the differentiator
“I build the harnesses that measure whether AI systems actually work.”
Anyone can call an LLM API. The hard part is knowing — with evidence — that your system is accurate, safe, and not silently degrading. Evaluation engineering is my edge.
LLM-as-Judge Pipelines
8 production use casesAutomated scoring of accuracy, task completion, and tone using calibrated judge models — regression suites that run before every model update ships.
Golden Datasets
pharmacy · insurance · care deliveryCurated, version-controlled test sets that encode what 'correct' means per domain — the ground truth every deploy is measured against.
Hallucination Detection
-25% hallucination rateNeMo Guardrails as a real-time safety layer: hallucination detection, toxicity filtering, and PII redaction on every patient-facing output.
Drift Monitoring
continuous, in productionLive tracking of quality metrics across production workloads — catching silent degradation before users do, with weekly reporting to leadership.
// 04. projects
Open source & side builds.
Public code you can actually read — RAG engines, serverless AI infrastructure, and Terraform you can deploy.
// 05. skills
The toolkit.
AI / LLM Systems
Languages & Frameworks
Data & Retrieval
Cloud & MLOps
Certifications
- AWS Certified Machine Learning Engineer — AssociateAmazon Web Services
- Generative AI with Large Language ModelsDeepLearning.AI × AWS
Education
- M.S. Computer Science (AI & Machine Learning)University of South FloridaDeep Learning · NLP · Distributed Systems · Reinforcement Learning
- B.Tech Computer Science & EngineeringGeethanjali College of Engineering and Technology
// 06. contact
Let’s build something that actually works.
Open to conversations about AI engineering, multi-agent systems, and roles where evaluation rigor matters.
Dallas, TX · open to remote