$ whoami

Hi, I'm Mukesh N

I build production-grade RAG Systems

Senior AI Engineer architecting healthcare AI at CVS Health. I ship multi-agent systems, HIPAA-compliant RAG pipelines, and GPU inference platforms that serve millions — and I build the evaluation harnesses that prove they work.

View Work

GitHub

LangGraph·MCP·vLLM·Vertex AI·RAG·FastAPI·React·NeMo Guardrails·Kubernetes·Terraform·LangChain·Pinecone·AWS Bedrock·Azure OpenAI·NVIDIA Triton·TypeScript·

Platform users

Ask AT&T — enterprise GenAI

$0M+

Annual infra savings

Intelligent model routing

0.0%

Production SLA

vLLM on GKE GPU clusters

+0%

Retrieval precision

Hybrid search + re-ranking

Inference throughput

NVIDIA Triton on AKS

Faster claims processing

45 min → 12 min with RAG

// 01. about

Production AI, not demos.

I build production AI systems that Fortune 500 companies actually ship to millions of users. Not demos. Not POCs that die in a slide deck — systems with SLAs, audit logs, and evaluation gates.

My path: I started in the engine room at DXC Technologies building CI/CD pipelines and GPU infrastructure. Then two years scaling Ask AT&T — the internal generative AI platform serving 68,000+ employees — where I owned core frontend, LLM tool orchestration, and the evaluation pipelines behind 8 production use cases.

Today I architect CVS Health's AI-native consumer engagement platform, connecting CVS Pharmacy, Caremark, and Aetna through LangGraph multi-agent systems, HIPAA-compliant RAG, and GPU-backed inference on GCP Vertex AI.

What separates me from most AI engineers is evaluation. Anyone can call an LLM API. I build the harnesses that measure whether AI systems actually work — and the infrastructure to keep them working in production.

mukesh --profile

$ mukesh --profile

// 02. experience

Where I've shipped.

Three chapters: building AI infrastructure, scaling enterprise GenAI to 68,000 users, and now architecting healthcare AI for millions.

CVS Health

May 2025 — Present

Senior AI Engineer · Dallas, TX

Architecting the AI-native consumer engagement platform connecting CVS Pharmacy, Caremark, and Aetna for millions of members — from GPU inference clusters to React agent-config portals.

+35%retrieval precision

$1M+annual savings

99.9%SLA on GPU inference

LangGraphMCPVertex AIvLLMNeMo GuardrailsPineconeOpenSearchFastAPIReactVue 3GKELiteLLM

AT&T

Jan 2023 — Apr 2025

Full Stack Engineer · Dallas, TX

Core engineer on Ask AT&T — the internal generative AI platform serving 68,000+ employees with millions of monthly interactions at sub-2-second response times.

68K+employees served

4xinference throughput

-25%hallucination rate

ReactAzure OpenAISemantic KernelMCPLangChainHaystackNVIDIA TritonAKSRedisKafkaCosmos DBFlask

DXC Technologies

May 2020 — Jun 2022

Python Developer & AI Infrastructure Engineer · Bengaluru, India

Built the infrastructure layer for ML at scale — GPU Kubernetes clusters, IaC-driven provisioning, and observability from scratch.

days→minenv provisioning

50K+labeled samples

10+validated datasets

PythonTerraformPulumiKubernetesEKSRayJenkinsPrometheusGrafanaMLflowDjangoAnsible

eval run #2847 — golden_dataset_pharmacy_v12✓ accuracy: 0.94 (threshold 0.90)✓ hallucination_rate: 0.018 (threshold 0.05)✓ task_completion: 0.91judge: claude — scoring 412 test cases…✓ pii_redaction: 100% (0 leaks detected)drift check: retrieval_precision Δ +0.2% (7d)✓ regression suite passed — 412/412deploy gate: APPROVED → rolling to prodguardrails: 3 toxic outputs blocked (24h)eval run #2848 — golden_dataset_claims_v8✓ entity_extraction_f1: 0.96latency p99: 840ms (sla 1000ms)✓ cost_per_query: $0.0041 (-12% w/w)eval run #2847 — golden_dataset_pharmacy_v12✓ accuracy: 0.94 (threshold 0.90)✓ hallucination_rate: 0.018 (threshold 0.05)✓ task_completion: 0.91judge: claude — scoring 412 test cases…✓ pii_redaction: 100% (0 leaks detected)drift check: retrieval_precision Δ +0.2% (7d)✓ regression suite passed — 412/412deploy gate: APPROVED → rolling to prodguardrails: 3 toxic outputs blocked (24h)eval run #2848 — golden_dataset_claims_v8✓ entity_extraction_f1: 0.96latency p99: 840ms (sla 1000ms)✓ cost_per_query: $0.0041 (-12% w/w)

// 03. the differentiator

“I build the harnesses that measure whether AI systems actually work.”

Anyone can call an LLM API. The hard part is knowing — with evidence — that your system is accurate, safe, and not silently degrading. Evaluation engineering is my edge.

LLM-as-Judge Pipelines

8 production use cases

Automated scoring of accuracy, task completion, and tone using calibrated judge models — regression suites that run before every model update ships.

Golden Datasets

pharmacy · insurance · care delivery

Curated, version-controlled test sets that encode what 'correct' means per domain — the ground truth every deploy is measured against.

Hallucination Detection

-25% hallucination rate

NeMo Guardrails as a real-time safety layer: hallucination detection, toxicity filtering, and PII redaction on every patient-facing output.

Drift Monitoring

continuous, in production

Live tracking of quality metrics across production workloads — catching silent degradation before users do, with weekly reporting to leadership.

// 04. projects

Open source & side builds.

Public code you can actually read — RAG engines, serverless AI infrastructure, and Terraform you can deploy.

Advanced RAG Slack Bot

Production RAG engine deployed as an AWS Lambda-backed Slack bot — semantic retrieval over ingested documents, answered in-channel.

Full pipeline: data ingestion, S3 archiving, a custom RAG engine, and enhanced Lambda handlers — all provisioned with Terraform IaC.

PythonAWS LambdaRAGTerraformS3Slack API

Serverless Bedrock RAG Platform

Modular Terraform platform for AWS Bedrock agents — reusable modules for Bedrock Guardrails, API Gateway, DynamoDB, and Lambda.

Infrastructure-as-code architecture: each capability (agents, guardrails, gateway, storage) is an independent, composable Terraform module.

TerraformAWS BedrockGuardrailsAPI GatewayDynamoDBLambda

Bedrock Lambda Integration

Clean reference integration of AWS Bedrock foundation models behind Lambda — IAM permission templates and JSON payload contracts included.

The minimal, secure pattern for invoking Bedrock from serverless: scoped IAM policies, typed payloads, and a Python handler.

PythonAWS BedrockLambdaIAM

also on github:Tic-Tac-Toe AI Sentiment Analysis

// 05. skills

The toolkit.

AI / LLM Systems

LangGraphMCPLangChainCrewAIHaystackRAG (hybrid search, re-ranking)vLLMNVIDIA TritonNeMo GuardrailsSemantic KernelLLM-as-judge evaluationPrompt engineeringMLflowLangSmithWeights & Biases

Languages & Frameworks

PythonTypeScriptJavaScriptSQLFastAPIDjangoFlaskReactVue 3Node.jsGraphQLWebSocketsTailwindStreamlit

Data & Retrieval

PostgreSQLRedisMongoDBCosmos DBPineconeQdrantOpenSearchSnowflakeKafkaSparkPandasETL pipelines

Cloud & MLOps

GCP Vertex AIAWS BedrockAzure OpenAIKubernetesGKE / EKS / AKSTerraformPulumiDockerHelmGitHub ActionsPrometheus / GrafanaOpenTelemetry

Certifications

AWS Certified Machine Learning Engineer — AssociateAmazon Web Services
Generative AI with Large Language ModelsDeepLearning.AI × AWS

Education

M.S. Computer Science (AI & Machine Learning)University of South FloridaDeep Learning · NLP · Distributed Systems · Reinforcement Learning
B.Tech Computer Science & EngineeringGeethanjali College of Engineering and Technology

// 06. contact

Let’s build something that actually works.

Open to conversations about AI engineering, multi-agent systems, and roles where evaluation rigor matters.

nmukeshoff@gmail.com

GitHub

Dallas, TX · open to remote