Production Readiness
Operational runbook
Operator runbook for incidents, manual review, and support workflows.
Operational runbook
Use this runbook for day-2 operations after launch.
Watch items
- operator API auth failures
- wallet adapter timeouts
- balance operations in
unknownormanual_review - webhook deliveries in
failedordead_letter - unexpected market visibility complaints
First-response flow
- identify affected user or event
- capture UTC time window
- locate
idempotency_key,event_id,order_id, or delivery ID - determine if money movement may be affected
- stop retries or replays until state is understood
Wallet incidents
If balance operation is unknown:
- query operation lookup by same
idempotency_key - confirm final remote result
- only then decide whether replay, compensation, or manual review is needed
If balance operation is manual_review:
- gather operator-side wallet logs
- gather related trade or redemption record
- escalate with exact IDs
Webhook incidents
If delivery is failed:
- inspect last status code and response body sample
- fix receiver issue
- wait for retry or replay manually if needed
If delivery is dead_letter:
- confirm receiver fix deployed
- replay from dashboard or operator API
- confirm latest attempt becomes
sent
Key rotation runbook
- create replacement key
- deploy secret
- smoke one endpoint
- revoke old key
- monitor for
INVALID_API_KEYspikes
