Delivery watchdog
runDeliveryWatchdog in src/lib/delivery-watchdog.ts. Runs every 15
minutes (part of the */15 * * * * cron).
Why
For every channel (email, SMS, Beds24 thread), we log a row to
outbound_log on send. Webhook handlers update delivered_at or
failed_at when the terminal signal arrives. If neither lands within
10 minutes, something silent went wrong:
- Mailgun never accepted (rare — usually a queue or auth issue)
- Twilio queue stuck without a status callback
- Beds24 thread never confirmed via reconciliation
Query
SELECT id, channel, purpose, booking_id, recipient, message_id, sent_at FROM outbound_log WHERE delivered_at IS NULL AND failed_at IS NULL AND watchdog_alerted_at IS NULL AND sent_at < {now - 10min} ORDER BY sent_at ASC LIMIT 20LIMIT 20 is a soft cap — if 100 sends went sideways at once, we
don’t blast Bill with 100 SMSes. The next cron run picks up the
remainder.
What it sends
For each stale row:
SMS to Bill:
Stale outbound: {channel} {purpose} to {recipient} sent {sent_at} — no delivery confirmation after 10min. Booking #{id} Message-id: {id}
Dedup
After sending, watchdog_alerted_at is set. Next cron run skips rows
where watchdog_alerted_at IS NOT NULL. One alert per stale row,
forever — manual investigation required.
Channel-specific considerations
- Email: Mailgun webhook should land
deliveredwithin seconds. Watchdog catches scenarios where the entire webhook path is down. - SMS: Twilio status callback lands within seconds usually. Slow carriers can take longer.
- Beds24 thread: Doesn’t have a webhook for delivery. The
Beds24 reconciliation cron
fetches the thread + marks
delivered_atif the message is present. Runs before the watchdog on each*/15tick to keep this honest.
Source
src/lib/delivery-watchdog.tsmigrations/0024_delivery_observability.sql(createsoutbound_log)- Tests:
test/lib/delivery-watchdog.test.ts(6 cases)