Building an AI SDR Agent That Runs Itself: Zero Incidents in 100+ Days

Azumo engineered a self-governing autonomous outbound system on Claude that sources, qualifies, personalizes, and sends cold email end to end — with mandatory human approval on every send. Across 1000+ logged autonomous runs it recorded zero uncaught errors and zero failed sends, and zero duplicate-send incidents in the 100+ days since its dedup safeguard shipped.

Case Study Summary Block
Client
AI SDR Agent
Industry
B2B / Sales Technology
Service
AI Development: Autonomous Agent Engineering
Technologies
Claude Sonnet 4.6, Apollo, Gmail, Slack, Railway
Outcome
409 logged autonomous runs; 100% clean completion or intentional safe stop; 0 duplicate-send incidents in 100+ days
Engagement
Internal Build

AI SDR Agent

Azumo built an autonomous AI SDR agent that sources, qualifies, personalizes, and sends outbound email end to end on Claude Sonnet 4.6 — running daily against hard safety limits with mandatory human approval on every send.

Most "AI SDR" tools don't fall down on writing an email — they fall down on running unattended without making a mess. Azumo set out to build an autonomous outbound agent where that couldn't happen quietly: a system that sources, qualifies, personalizes, and sends cold email end to end, paces itself against hard safety limits, and can prove it operated correctly every single day.

The result has been in production for roughly 3.5 months across three live campaigns on one shared engine — logging over 1000 autonomous runs with zero uncaught errors, zero failed sends, and zero duplicate-send incidents in the 100+ days since its core safeguard shipped. It runs on Claude Sonnet 4.6, with Apollo for sourcing, Gmail for delivery, Slack for human approval, and Railway for tracking.

Results:

100%

Reliable Completion
409 logged runs: zero uncaught errors, zero failed sends

0

Duplicate-Send Incidents
In the 100+ days since the dedup safeguard shipped

1.98s

Median Step Latency
p95 of 15.9s, bounded by a 45s-per-company enrichment timeout

The Challenge

Most "AI SDR" tools fall down in the same place: not in writing an email, but in running unattended without making a mess. The moment a system sends on its own, three failure modes show up fast — it double-sends and burns a prospect, it ignores sending limits and damages domain reputation, or it crashes mid-run and leaves the pipeline in an unknown state.

Azumo set out to build an autonomous outbound agent where none of that could happen quietly. The bar wasn't "can it draft a good email" — Claude handles that. The bar was: can it operate on its own, every day, against hard safety limits, and prove it did so correctly? A system you can't trust unattended isn't autonomous; it's just a faster way to create incidents.

AI SDR Agent

The Solution

Azumo built a multi-stage pipeline where every step is logged and every send passes through the same safety gates. The agent sources leads from Apollo, qualifies and enriches them with Claude, drafts a per-company personalized email, routes it through human approval in Slack, and sends it on a disciplined schedule with full deliverability and deduplication safeguards.

Apollo lead pull → Claude qualification + enrichment → per-company draft → human approval (Slack) → scheduled send (with guardrails) → open/click tracking → reply detection → reporting

Personalization that targets the prospect's actual workflow. For every qualified lead, Claude Sonnet 4.6 fetches the company's homepage plus two linked pages and extracts an operational signal, a qualification reason, and a personalization angle. The result is a pitch aimed at the prospect's specific workflow — dispatch, intake, deal screening, and so on — instead of a generic SDR template. The pipeline has produced 520+ tracked draft records, each carrying its full enrichment payload and the workflow pain point it was built around.

The agent never sends unsupervised. Every outbound email is approved by a human in Slack before it leaves. Autonomy here means the system does all the work and paces itself safely — not that it operates without oversight.

The same guardrails run on every single send:

  • Per-recipient deduplication locks backed by a master sent-registry
  • Hard daily send caps with ramped warmup pacing
  • Minimum send-gap enforcement between emails
  • DKIM resolution check on all sending domains before any send
  • Mandatory human approval in Slack per send
Rocket Icon to Signify Launch and Deploy Code

Rocket Icon to Signify Launch and Deploy Code

Rocket Icon to Signify Launch and Deploy Code

Rocket Icon to Signify Launch and Deploy Code

Rocket Icon to Signify Launch and Deploy Code

Results

Reliability that's actually measured. Over 1000 logged autonomous runs across 30 active days, the system recorded zero uncaught errors and zero failed sends. Of those runs, nearly all completed cleanly and 68 ended in a deliberate guardrail abort, and every one of those aborts is the safety system working as designed (for example, "campaign already at daily cap"), not a failure. Effective reliability: 100% clean completion or intentional safe stop. Performance stayed tight throughout, with a median step latency of 1.98 seconds and a p95 of 15.9 seconds; the long tail belongs to the Claude enrichment call, which runs in batch behind a 45-second-per-company timeout.

A strict, efficient qualification funnel. Across 30 Apollo sync batches, the agent pulled roughly 10,000 leads, enriched about 94% of them with Claude, and qualified approximately 15% against a deliberately strict ICP filter. That low pass rate is the point: the system discards the large majority of contacts rather than spray low-fit prospects, and it leaves a substantial enriched, qualified backlog ready to scale sending without new sourcing spend. In practice, the agent runs right at its self-imposed ceiling without exceeding it — averaging 9.3 emails a day against a 10-per-day cap. Disciplined, not reckless.

Early engagement signal. Outcome instrumentation is newer than the engine itself, so we treat these figures as directional. Among the productized SDR campaigns, opens are the strongest early indicator at a directional ~23% open rate — a reasonable signal given a young, recent send cohort. As volume and follow-up cycles accumulate, these numbers will firm up.

The one incident — and why it's part of the story. On day one of operations, a race condition between concurrent background processes caused four emails to send multiple times. Azumo root-caused it the same day, implemented the deduplication lock and master sent-registry, and has had zero duplicate incidents in the 100+ days since. We include this deliberately: the difference between a reliable autonomous system and a risky one isn't whether something ever goes wrong — it's whether the system catches it, fixes it at the root, and proves it can't recur. The registry now tracks every send precisely, and the run logs make that claim auditable rather than aspirational.

Anyone can wire an LLM to an email API. The hard part — the part that determines whether you can leave it running — is the engineering discipline around it: deduplication, pacing, deliverability checks, human approval, and logging good enough to back up every claim. That's what Azumo built, and it's the foundation we extend into client engagements.

See how Azumo can build reliable AI agents for your team — talk to our team.

More Client Work

Stovell AI

Generating Stock Market Alpha with AI-Powered Predictions

Meta

Generative AI Enterprise Search