$ whoami

Hi, I'm Mukesh N

I build production-grade RAG Systems

Senior AI Engineer architecting healthcare AI at CVS Health. I ship multi-agent systems, HIPAA-compliant RAG pipelines, and GPU inference platforms that serve millions — and I build the evaluation harnesses that prove they work.

LangGraph·MCP·vLLM·Vertex AI·RAG·FastAPI·React·NeMo Guardrails·Kubernetes·Terraform·LangChain·Pinecone·AWS Bedrock·Azure OpenAI·NVIDIA Triton·TypeScript·
0+

Platform users

Ask AT&T — enterprise GenAI

$0M+

Annual infra savings

Intelligent model routing

0.0%

Production SLA

vLLM on GKE GPU clusters

+0%

Retrieval precision

Hybrid search + re-ranking

0x

Inference throughput

NVIDIA Triton on AKS

0%

Faster claims processing

45 min → 12 min with RAG

// 01. about

Production AI, not demos.

I build production AI systems that Fortune 500 companies actually ship to millions of users. Not demos. Not POCs that die in a slide deck — systems with SLAs, audit logs, and evaluation gates.

My path: I started in the engine room at DXC Technologies building CI/CD pipelines and GPU infrastructure. Then two years scaling Ask AT&T — the internal generative AI platform serving 68,000+ employees — where I owned core frontend, LLM tool orchestration, and the evaluation pipelines behind 8 production use cases.

Today I architect CVS Health's AI-native consumer engagement platform, connecting CVS Pharmacy, Caremark, and Aetna through LangGraph multi-agent systems, HIPAA-compliant RAG, and GPU-backed inference on GCP Vertex AI.

What separates me from most AI engineers is evaluation. Anyone can call an LLM API. I build the harnesses that measure whether AI systems actually work — and the infrastructure to keep them working in production.

mukesh --profile
$ mukesh --profile

// 02. experience

Where I've shipped.

Three chapters: building AI infrastructure, scaling enterprise GenAI to 68,000 users, and now architecting healthcare AI for millions.

CVS Health

May 2025 — Present

Senior AI Engineer · Dallas, TX

Architecting the AI-native consumer engagement platform connecting CVS Pharmacy, Caremark, and Aetna for millions of members — from GPU inference clusters to React agent-config portals.

+35%retrieval precision
$1M+annual savings
99.9%SLA on GPU inference
LangGraphMCPVertex AIvLLMNeMo GuardrailsPineconeOpenSearchFastAPIReactVue 3GKELiteLLM

AT&T

Jan 2023 — Apr 2025

Full Stack Engineer · Dallas, TX

Core engineer on Ask AT&T — the internal generative AI platform serving 68,000+ employees with millions of monthly interactions at sub-2-second response times.

68K+employees served
4xinference throughput
-25%hallucination rate
ReactAzure OpenAISemantic KernelMCPLangChainHaystackNVIDIA TritonAKSRedisKafkaCosmos DBFlask

DXC Technologies

May 2020 — Jun 2022

Python Developer & AI Infrastructure Engineer · Bengaluru, India

Built the infrastructure layer for ML at scale — GPU Kubernetes clusters, IaC-driven provisioning, and observability from scratch.

days→minenv provisioning
50K+labeled samples
10+validated datasets
PythonTerraformPulumiKubernetesEKSRayJenkinsPrometheusGrafanaMLflowDjangoAnsible

// 03. the differentiator

I build the harnesses that measure whether AI systems actually work.

Anyone can call an LLM API. The hard part is knowing — with evidence — that your system is accurate, safe, and not silently degrading. Evaluation engineering is my edge.

LLM-as-Judge Pipelines

8 production use cases

Automated scoring of accuracy, task completion, and tone using calibrated judge models — regression suites that run before every model update ships.

Golden Datasets

pharmacy · insurance · care delivery

Curated, version-controlled test sets that encode what 'correct' means per domain — the ground truth every deploy is measured against.

Hallucination Detection

-25% hallucination rate

NeMo Guardrails as a real-time safety layer: hallucination detection, toxicity filtering, and PII redaction on every patient-facing output.

Drift Monitoring

continuous, in production

Live tracking of quality metrics across production workloads — catching silent degradation before users do, with weekly reporting to leadership.

// 05. skills

The toolkit.

AI / LLM Systems

LangGraphMCPLangChainCrewAIHaystackRAG (hybrid search, re-ranking)vLLMNVIDIA TritonNeMo GuardrailsSemantic KernelLLM-as-judge evaluationPrompt engineeringMLflowLangSmithWeights & Biases

Languages & Frameworks

PythonTypeScriptJavaScriptSQLFastAPIDjangoFlaskReactVue 3Node.jsGraphQLWebSocketsTailwindStreamlit

Data & Retrieval

PostgreSQLRedisMongoDBCosmos DBPineconeQdrantOpenSearchSnowflakeKafkaSparkPandasETL pipelines

Cloud & MLOps

GCP Vertex AIAWS BedrockAzure OpenAIKubernetesGKE / EKS / AKSTerraformPulumiDockerHelmGitHub ActionsPrometheus / GrafanaOpenTelemetry

Certifications

  • AWS Certified Machine Learning Engineer — AssociateAmazon Web Services
  • Generative AI with Large Language ModelsDeepLearning.AI × AWS

Education

  • M.S. Computer Science (AI & Machine Learning)University of South FloridaDeep Learning · NLP · Distributed Systems · Reinforcement Learning
  • B.Tech Computer Science & EngineeringGeethanjali College of Engineering and Technology

// 06. contact

Let’s build something that actually works.

Open to conversations about AI engineering, multi-agent systems, and roles where evaluation rigor matters.

Dallas, TX · open to remote