00 — Portfolio · 2026

Hi, I'm Vishal.

AI Engineering Manager and full-stack builder. I build the platforms that make engineers faster and the cultures that make them stay, from RAG incident response to payments at $1B+/month. Previously platform, payments, and reliability leadership at Recharge, Nua, and Wayfair.

Years20+
Scale$1B+/mo
FocusAI · Platforms

01 — About

Two decades of engineering, now building the AI era.

I'm an AI Engineering Manager and a Google Cloud certified Generative AI Leader. Most of my career has gone into building production systems the hard way: backend services, distributed architectures, payments, and the infrastructure that keeps them reliable at scale. That foundation is exactly what the AI era needs now.

Most teams reaching for AI are missing the discipline that makes it safe in production. Observability, idempotency, error budgets, the operational reflexes you only earn from running systems at $1B+/month. I bring that to the new layer, with production RAG, agentic systems, and AI-assisted development across 200+ engineers, so the AI actually ships and stays up.

And I lead from the front. I still design systems, set technical standards, and build alongside the teams I run. That's the bridge: hard-won engineering rigor, pointed at where the field is going next.

Most engineering problems aren't technical problems. They're trust problems.

01 / 04

Platform & Reliability

Two decades on the backbone: distributed systems, event-driven services, and payments at $1B+/month. Infrastructure that holds up under real load and tells the truth about system health.

02 / 04

Engineering Leadership

Setting technical standards for 200+ engineers, building orgs and hiring pipelines from the ground up, and owning multi-year roadmaps across teams in Boston, Berlin, and the Bay.

03 / 04

AI Engineering

Shipping AI in production: a RAG incident-response platform on vector embeddings, agentic systems, and org-wide AI-assisted development. The new layer, built with old-school reliability discipline.

04 / 04

AI Hackathons

Recognized across multiple AI hackathons, with prizes and awards along the way. My team was named the winner of the company-wide AI hackathon, and the idea we built was carried into the product roadmap.

02 — Professional Journey

The work I've done, and where I've done it.

Dec 2022 – Nov 2025

Engineering Manager

Recharge powers 71% of Shopify subscriptions and processes $1B+/month. At that scale, reliability isn't optional. I led the Product Reliability org, setting standards for 200+ engineers, then moved to own the Payments platform, making sure money moved correctly every time.

$1B+processed / month
200+engineers
71%of Shopify subs
  • Shipped a production RAG incident-response platform (vector embeddings) that cut time-to-remediation for revenue-critical merchant incidents.
  • Drove org-wide adoption of AI-assisted development (Cursor) and set technical standards for 200+ engineers.
  • Gave the company's first AI demo at the weekly demo, built solo end to end. It earned recognition from group leadership and set an example that encouraged other engineers to start sharing their own AI work.
  • Led the Product Reliability org, then owned Payments platform reliability across $1B+/month.
Feb 2021 – Apr 2022

Director of Engineering

A fast-growing D2C brand had a monolith that was becoming its ceiling. I rebuilt the backend from the ground up: migrated to Go microservices, cut infra costs 50%, and built an engineering org that could keep up with the business.

  • Scaled the platform to 10M+ backend requests/day and cut infra costs 50%.
  • Migrated the monolith to Go-based microservices and led the next-gen D2C platform.
  • Built the engineering org and hiring pipeline from the ground up.
Sep 2019 – Dec 2021 TorchFi

Head of Engineering

A food-tech startup needed a production-grade distributed platform, fast. I designed and built it end to end: .NET Core services, real-time Kafka streams, merchant analytics, and a QR ordering system that helped merchants grow revenue 20%+.

  • Integrated Redis and Apache Kafka for caching and real-time streaming.
  • Deployed containerized services on Linux using DevOps best practices.
  • Delivered a QR-based ordering solution that lifted merchant revenue 20%+.
May 2018 – Aug 2019

Engineering Manager

Global order systems at one of the world's largest furniture retailers were slow, brittle, and hard to observe. I led 15 engineers across U.S. and European time zones to cut checkout latency 40%, build SLO frameworks, and make the platform trustworthy at scale.

  • Led 15 engineers across U.S. and European time zones on global order systems.
  • Reduced checkout latency 40% via a distributed caching architecture.
  • Built SLI/SLO monitoring frameworks adopted across global services.

Accenture · Software Development Manager · 2007–2014

Delivered enterprise platforms for global clients as a forward-deployed engineer, leading and coordinating teams across U.S., European, and Indian time zones.

Earlier still: founding engineer to director on an HR-tech SaaS, and co-founder of a software company in Mumbai (2002). View full profile on LinkedIn →

03 — Skills

AI-native platform engineering, where reliable systems meet the teams that ship them.

Capabilities I lead with, shaped at Recharge and Wayfair.

AI & Agentic Engineering

Agentic systems Multi-agent orchestration RAG & vector embeddings Prompt engineering Claude Agent SDK MCP AI-assisted DevEx (Cursor, Claude Code) LLM evals & observability AIOps

Platform & Reliability

Distributed systems Event-driven microservices SLOs & error budgets Distributed tracing Grafana & Prometheus Idempotency & retries Incident response SRE practices

Backend & Cloud

Go Python .NET Node.js Kafka Redis SQL AWS (Aurora, EC2, S3) Docker & Kubernetes API design Payments (Stripe, Shopify) CI/CD

Frontend

JavaScript ES6 React Next.js Redux

Engineering Leadership

Scaling orgs to 200+ Hiring & career ladders Multi-year roadmapping Stakeholder management Mentorship Build-vs-buy strategy

04 — Contact

If you're building something hard and need someone who's done it before, let's talk.

05 — Certificates

Education and credentials.

Google Cloud

Generative AI Leader · 2025

Anthropic

Claude Certified Architect — Foundations · in progress, 2026

Amazon Web Services

AWS Certified Developer — Associate

Cornell

Machine Learning

Stanford

Algorithms: Design & Analysis

University of Iowa

Master's, Business Administration

University of Mumbai

Bachelor's, Computer Science

06 — Projects

Things I've built in the open.

Open source / Fintech

Payment-Agent

An AI-driven investigative tool that automatically traces payment anomalies across multi-ledger environments using LLMs.

LLMsLedgersRecon
View code

Infrastructure / Systems

Reliable Orders

A high-throughput idempotency framework for distributed transactional systems, with self-healing retry logic and circuit breaking.

IdempotencyRetriesGo
View code

07 — Writing

Thinking out loud.

I write about reliability, engineering leadership, and what happens when agents reach production.

Recent

A framework decision guide for building agents

When to reach for the Claude Agent SDK, OpenAI Agents SDK, AWS Strands, and others, and the trade-offs that actually matter once agents hit production.

Claude Agent SDK OpenAI Agents SDK AWS Strands
Read Vishal's Archive on Substack