Skip to content

Delivery watchdog

runDeliveryWatchdog in src/lib/delivery-watchdog.ts. Runs every 15 minutes (part of the */15 * * * * cron).

Why

For every channel (email, SMS, Beds24 thread), we log a row to outbound_log on send. Webhook handlers update delivered_at or failed_at when the terminal signal arrives. If neither lands within 10 minutes, something silent went wrong:

  • Mailgun never accepted (rare — usually a queue or auth issue)
  • Twilio queue stuck without a status callback
  • Beds24 thread never confirmed via reconciliation

Query

SELECT id, channel, purpose, booking_id, recipient, message_id, sent_at
FROM outbound_log
WHERE delivered_at IS NULL
AND failed_at IS NULL
AND watchdog_alerted_at IS NULL
AND sent_at < {now - 10min}
ORDER BY sent_at ASC
LIMIT 20

LIMIT 20 is a soft cap — if 100 sends went sideways at once, we don’t blast Bill with 100 SMSes. The next cron run picks up the remainder.

What it sends

For each stale row:

SMS to Bill:

Stale outbound: {channel} {purpose} to {recipient} sent {sent_at} — no delivery confirmation after 10min. Booking #{id} Message-id: {id}

Dedup

After sending, watchdog_alerted_at is set. Next cron run skips rows where watchdog_alerted_at IS NOT NULL. One alert per stale row, forever — manual investigation required.

Channel-specific considerations

  • Email: Mailgun webhook should land delivered within seconds. Watchdog catches scenarios where the entire webhook path is down.
  • SMS: Twilio status callback lands within seconds usually. Slow carriers can take longer.
  • Beds24 thread: Doesn’t have a webhook for delivery. The Beds24 reconciliation cron fetches the thread + marks delivered_at if the message is present. Runs before the watchdog on each */15 tick to keep this honest.

Source

  • src/lib/delivery-watchdog.ts
  • migrations/0024_delivery_observability.sql (creates outbound_log)
  • Tests: test/lib/delivery-watchdog.test.ts (6 cases)