--- name: implementing-deadman-switch description: Guides implementation of deadman switch (dead hand system) and heartbeat mechanism in watchdog-agent for authorization enforcement. Use when modifying heartbeat intervals, failure thresholds, or business process termination logic. Keywords: deadman, heartbeat, agent, authorization, sigterm, fail-count, self-destruct. argument-hint: ": agent-heartbeat | fail-threshold | kill-logic | interval-config" allowed-tools: - Read - Glob - Grep - Bash - Edit - Write --- # Implementing Deadman Switch watchdog-agent 内置死手系统,当连续授权失败达到阈值时终止业务进程。 ## 动态上下文注入 ```bash # 查找Agent心跳实现 !`grep -rn "heartbeat\|Heartbeat" rmdc-watchdog-agent/` # 查找kill逻辑 !`grep -n "SIGTERM\|Kill\|Signal" rmdc-watchdog-agent/` ``` ## Plan 根据 `$ARGUMENTS` 确定修改范围: | Component | 涉及文件 | 关键参数 | |-----------|----------|----------| | agent-heartbeat | agent心跳模块 | HeartbeatRequest/Response | | fail-threshold | 失败计数逻辑 | maxRetryCount=12 | | kill-logic | 进程终止逻辑 | SIGTERM信号 | | interval-config | 心跳间隔配置 | 成功2h/失败1h | **产物清单**: - Agent心跳循环实现 - 失败计数与阈值判断 - 业务进程终止逻辑 ## Verify - [ ] 失败阈值:maxRetryCount = 12 - [ ] 心跳间隔:成功后2小时,失败后1小时 - [ ] TOTP验证:首次连接获取密钥,后续请求双向验证 - [ ] 终止信号:使用SIGTERM(优雅终止),非SIGKILL - [ ] 计数重置:授权成功后 failCount = 1(非0) - [ ] 时间戳校验:|now - timestamp| < 5分钟 ```bash # 验证Agent编译 !`cd rmdc-watchdog-agent && go build ./...` # 验证心跳逻辑 !`cd rmdc-watchdog-agent && go test ./... -v -run TestHeartbeat` ``` ## Execute ### 心跳循环实现 ```go func (a *Agent) heartbeatLoop() { failCount := 0 for { resp, err := a.sendHeartbeat() if err != nil || !resp.Authorized { failCount++ if failCount >= 12 { a.killBusiness() return } time.Sleep(1 * time.Hour) // 失败后等待1小时 } else { failCount = 1 // 成功后重置为1 time.Sleep(2 * time.Hour) // 成功后等待2小时 } } } ``` ### 业务终止实现 ```go func (a *Agent) killBusiness() { log.Warn("deadman switch triggered, terminating business process") a.businessProcess.Signal(syscall.SIGTERM) } ``` ### 首次连接处理 ```go func (a *Agent) sendHeartbeat() (*HeartbeatResponse, error) { req := &HeartbeatRequest{ HostInfo: a.hostInfo, EnvInfo: a.envInfo, Timestamp: time.Now().UnixMilli(), TOTPCode: "", // 首次为空 } // 非首次连接,生成TOTP if a.tierTwoSecret != "" { req.TOTPCode = totp.GenerateTierTwo(a.tierTwoSecret) } resp, err := a.httpClient.Post(a.watchdogURL+"/api/heartbeat", req) if err != nil { return nil, err } // 首次连接,保存密钥 if resp.SecondTOTPSecret != "" { a.tierTwoSecret = resp.SecondTOTPSecret } // 验证服务端TOTP(双向验证) if req.TOTPCode != "" && !totp.VerifyTierTwo(resp.TOTPCode, a.tierTwoSecret) { return nil, errors.New("invalid server totp") } return resp, nil } ``` ## Pitfalls 1. **failCount初始值**:成功后设为1而非0,避免边界条件错误 2. **SIGKILL误用**:应使用SIGTERM允许业务优雅退出 3. **心跳阻塞**:sendHeartbeat需设置超时,避免网络问题导致卡死 4. **双向验证遗漏**:必须验证服务端返回的TOTP 5. **首次连接特殊处理**:TOTPCode为空时获取密钥,不计入失败 6. **间隔配置硬编码**:应支持配置化,便于不同项目调整 7. **日志泄露**:禁止在日志中打印TOTP密钥 ## Reference - [心跳参数配置](reference/heartbeat-params.md) - [Agent生命周期](reference/agent-lifecycle.md)