← all posts
engineering2026-04-30 · Orion Jones

Day-one abuse defense: the limits I shipped before customer 5

Concrete checklist of every limit, rate cap, and circuit breaker I built into Lander to prevent a single user from racking up our AWS bill.

If you're building a multi-tenant SaaS on AWS, the failure mode that ends you isn't the one your tests cover. It's a single customer accidentally going viral and racking up $500 in CloudFront egress overnight, or a misconfigured CI loop pushing 100 builds a day, or a bot finding your AI builder and burning $200 of Anthropic spend before you notice.

I built Lander to run customer code on real per-customer AWS Fargate. That means every customer can independently spike compute, bandwidth, or external API spend. Before I had even 5 paying customers, I shipped the following defenses. None of them are clever; all of them have prevented real billing surprises.

The cap inventory

Defense Threshold Enforcement
Per-env deploys/hour 10 429 + Retry-After
Per-env deploys/day 50 429 + Retry-After
Concurrent builds/user 3 429 + Retry-After
Builder turns/min/user 5 429 + Retry-After
Builder turns/month hard cap 4× included quota 402
Anthropic global daily $ $50 default 503
Builder body size 1 MB 413
Per-account custom domains 5 / 10 / 25 by plan 402
Builds per month 100 / 100 / 300 / 1000 402
Bandwidth alert 80% / 95% / 100% of cap email
Fargate CPU anomaly 80% sustained 1h email

What each one stopped

Per-env deploys/hour (10). A customer's CI pushed a tag every 30 seconds for an hour. Without the cap, that's ~120 builds at $0.025/build CodeBuild cost = $3. Trivial. But the customer also had a recovery loop bug that retried failed deploys, multiplying it. The cap returned 429 and the customer noticed within 10 minutes.

Concurrent builds/user (3). Same scenario, different angle. Without concurrency limits, queued builds pile up in CodeBuild and other customers wait. The 3-cap means even a misconfigured CI never starves the queue.

Builder turns/min (5). Caught a bot that hit /api/build/generate from a residential proxy. Without the cap, the bot could have burned $50 of Sonnet calls in 2 minutes. With the cap, it got 429s and gave up after 10 attempts.

Anthropic global daily $. This is the safety net for the rate limit. If 100 users each fire 5 turns/min simultaneously (legit scenario at scale), the per-user cap doesn't help. The global $50/day cap pauses the builder for everyone for the rest of the day.

Bandwidth alert. Not a hard cap; just an email. But the email lets me see "customer X is at 80% of their plan in week 1" before week 4 when they hit 200% and I'm eating the egress bill.

The 5-line pattern

Every cap follows the same structure:

// lib/abuse-throttle.ts
export async function canDeploy(envId: string): Promise<ThrottleDecision> {
  const [hourRow] = await rawRows<{ n: number }>(sql`
    SELECT COUNT(*)::int AS n FROM "deploy"
    WHERE "environmentId" = ${envId}
      AND "startedAt" >= NOW() - INTERVAL '1 hour'
  `)
  if ((hourRow?.n ?? 0) >= DEPLOYS_PER_HOUR_PER_ENV) {
    return { allowed: false, reason: "...", retryAfterSec: 600 }
  }
  return { allowed: true }
}

The wins:

  1. Postgres-backed. No Redis, no in-memory state. Survives restarts.
  2. Returns a structured decision. The caller decides what to do (return 429, log, alert, etc.)
  3. Stores the threshold in env vars. Tunable per-environment without redeploy.

What I haven't shipped yet

The honest list of gaps:

  • Bandwidth hard cap. Currently alerts only. Hard cap requires a WAF rate-limit rule keyed on the per-customer ALB target group. Working on it.
  • Auto-pause on sustained CPU anomaly. Currently emails. Should auto-pause Fargate after 24h of >80% CPU. Schema change blocking.
  • Per-IP signup rate limit. Auth.js owns the signin path; need to add Next.js middleware.
  • Stripe webhook idempotency. Current handlers are deterministic upserts so dupe events are no-ops, but adding event.id tracking would be cleaner.

Why ship before customer 5

The argument against building this stuff early is "premature optimization." That's wrong for abuse defense. The cost of hitting the failure mode (real $ to AWS, hard to refund) is way higher than the cost of building the defense.

Each cap took me 30-60 minutes to write. Total time on the inventory above: maybe 4 hours. That's nothing compared to one $500 surprise bill.

Build them before you ship the platform. They cost almost nothing in dev time and prevent the only class of failure that makes you actually go bankrupt.


Lander's defense matrix is visible at lander.host/admin/security (admin-only). The full source for lib/abuse-throttle.ts and lib/anthropic-budget.ts is in the lander-control repo.

Stop reading. Start shipping.

free plan · no credit card · commercial use OK$ deploy now → lander.host