# QRIS Soundbox Platform Operational Runbook ## Scope Runbook ini untuk pilot/staging/production operator. Semua command diasumsikan dijalankan dari root repo atau release directory. ## Pre-Deploy 1. Pull/build release artifact. 2. Isi environment production dan pastikan secret bukan default. 3. Jalankan: ```bash npm ci npm run typecheck npm audit npm run db:migrate npm run deploy:check-env npm run mqtt:check-acl -- --file /etc/mosquitto/acl ``` 4. Buat/cek admin dan merchant user production: ```bash npm run admin:create-user -- --email --name --role admin --password npm run merchant:create-user -- --merchant --email --name --role owner --password ``` ## Deploy 1. Jalankan migration sebelum service baru menerima traffic: ```bash npm run db:migrate ``` 2. Start/restart service dengan `LOG_FORMAT=json`. 3. Cek: ```bash curl -fsS http://127.0.0.1:3000/health curl -fsS http://127.0.0.1:3000/health/deep ``` 4. Cek admin authenticated health: ```bash curl -fsS -H "Authorization: Bearer " http://127.0.0.1:3000/admin/health/deep ``` ## Post-Deploy Smoke ```bash npm run smoke:e2e npm run ui:qa npm run smoke:mqtt-real MQTT_TEST_DEVICE_A_USERNAME= MQTT_TEST_DEVICE_A_PASSWORD= MQTT_TEST_DEVICE_B_USERNAME= npm run smoke:mqtt-acl ``` Untuk staging/production-like baseline: ```bash BASE_URL=https://staging.example.com npm run load:test:staging ``` Simpan report `reports/load-staging-*.json` bersama catatan release. ## Backup Sebelum deploy besar dan minimal harian: ```bash npm run backup:production -- --out /var/backups/qris --include-mosquitto ``` Pastikan backup disalin ke storage aman dan terenkripsi. File penting: - Postgres dump `.dump` - Mosquitto passwd - Mosquitto ACL - Environment/secret reference di secret manager, bukan file plain text ## Restore Drill 1. Siapkan database disposable. 2. Tampilkan rencana: ```bash npm run restore:plan -- --backup /var/backups/qris/.dump ``` 3. Jalankan restore hanya ke database disposable: ```bash npm run restore:plan -- --backup /var/backups/qris/.dump -- --execute ``` 4. Start service mengarah ke DB restore. 5. Validasi: ```bash npm run restore:validate ``` ## Rollback 1. Hentikan traffic ke release baru. 2. Rollback service image/release ke versi sebelumnya. 3. Jika migration baru hanya additive, jangan rollback database. 4. Jika database harus dikembalikan, restore dari backup terbaru ke database disposable dulu, lalu promote sesuai prosedur infra. 5. Jalankan `/health`, `/admin/health/deep`, dan smoke minimal. ## Incident Response ### API latency/error naik 1. Cek `/admin/observability/summary`. 2. Cek log dengan `request_id`/`trace_id`. 3. Cek Postgres connection dan slow query. 4. Turunkan traffic atau rate limit jika perlu. ### MQTT publish/subscribe bermasalah 1. Cek `/admin/mqtt/status`. 2. Cek broker service, certificate, ACL, dan passwd. 3. Jalankan `npm run smoke:mqtt-real`. 4. Untuk credential device, rotate via UI atau `npm run mqtt:provision-device`. ### Export macet 1. Cek `/admin/observability/summary` bagian `export_jobs`. 2. Pastikan `EXPORT_STORAGE_DIR` writable. 3. Restart worker/app untuk reset stale running job. 4. Jika file expired, minta user membuat export baru. ### Login brute force 1. Cek audit log action `admin.login.failed` dan `merchant.login.failed`. 2. Naikkan strictness `RATE_LIMIT_LOGIN_MAX`. 3. Disable user mencurigakan via DB/admin tooling sementara. ## Routine Operations - Harian: cek health/deep health, backup, MQTT status, failed notification. - Mingguan: restore drill sample, review audit failed login, review export storage usage. - Sebelum pilot device baru: provision credential, update broker passwd, validate ACL, smoke MQTT ACL.