4.0 KiB
4.0 KiB
name: implementing-deadman-switch
description: Guides implementation of deadman switch (dead hand system) and heartbeat mechanism in watchdog-agent for authorization enforcement. Use when modifying heartbeat intervals, failure thresholds, or business process termination logic. Keywords: deadman, heartbeat, agent, authorization, sigterm, fail-count, self-destruct.
argument-hint: ": agent-heartbeat | fail-threshold | kill-logic | interval-config"
allowed-tools:
- Read
- Glob
- Grep
- Bash
- Edit
- Write
Implementing Deadman Switch
watchdog-agent 内置死手系统,当连续授权失败达到阈值时终止业务进程。
动态上下文注入
# 查找Agent心跳实现
!`grep -rn "heartbeat\|Heartbeat" rmdc-watchdog-agent/`
# 查找kill逻辑
!`grep -n "SIGTERM\|Kill\|Signal" rmdc-watchdog-agent/`
Plan
根据 $ARGUMENTS 确定修改范围:
| Component | 涉及文件 | 关键参数 |
|---|---|---|
| agent-heartbeat | agent心跳模块 | HeartbeatRequest/Response |
| fail-threshold | 失败计数逻辑 | maxRetryCount=12 |
| kill-logic | 进程终止逻辑 | SIGTERM信号 |
| interval-config | 心跳间隔配置 | 成功2h/失败1h |
产物清单:
- Agent心跳循环实现
- 失败计数与阈值判断
- 业务进程终止逻辑
Verify
- 失败阈值:maxRetryCount = 12
- 心跳间隔:成功后2小时,失败后1小时
- TOTP验证:首次连接获取密钥,后续请求双向验证
- 终止信号:使用SIGTERM(优雅终止),非SIGKILL
- 计数重置:授权成功后 failCount = 1(非0)
- 时间戳校验:|now - timestamp| < 5分钟
# 验证Agent编译
!`cd rmdc-watchdog-agent && go build ./...`
# 验证心跳逻辑
!`cd rmdc-watchdog-agent && go test ./... -v -run TestHeartbeat`
Execute
心跳循环实现
func (a *Agent) heartbeatLoop() {
failCount := 0
for {
resp, err := a.sendHeartbeat()
if err != nil || !resp.Authorized {
failCount++
if failCount >= 12 {
a.killBusiness()
return
}
time.Sleep(1 * time.Hour) // 失败后等待1小时
} else {
failCount = 1 // 成功后重置为1
time.Sleep(2 * time.Hour) // 成功后等待2小时
}
}
}
业务终止实现
func (a *Agent) killBusiness() {
log.Warn("deadman switch triggered, terminating business process")
a.businessProcess.Signal(syscall.SIGTERM)
}
首次连接处理
func (a *Agent) sendHeartbeat() (*HeartbeatResponse, error) {
req := &HeartbeatRequest{
HostInfo: a.hostInfo,
EnvInfo: a.envInfo,
Timestamp: time.Now().UnixMilli(),
TOTPCode: "", // 首次为空
}
// 非首次连接,生成TOTP
if a.tierTwoSecret != "" {
req.TOTPCode = totp.GenerateTierTwo(a.tierTwoSecret)
}
resp, err := a.httpClient.Post(a.watchdogURL+"/api/heartbeat", req)
if err != nil {
return nil, err
}
// 首次连接,保存密钥
if resp.SecondTOTPSecret != "" {
a.tierTwoSecret = resp.SecondTOTPSecret
}
// 验证服务端TOTP(双向验证)
if req.TOTPCode != "" && !totp.VerifyTierTwo(resp.TOTPCode, a.tierTwoSecret) {
return nil, errors.New("invalid server totp")
}
return resp, nil
}
Pitfalls
- failCount初始值:成功后设为1而非0,避免边界条件错误
- SIGKILL误用:应使用SIGTERM允许业务优雅退出
- 心跳阻塞:sendHeartbeat需设置超时,避免网络问题导致卡死
- 双向验证遗漏:必须验证服务端返回的TOTP
- 首次连接特殊处理:TOTPCode为空时获取密钥,不计入失败
- 间隔配置硬编码:应支持配置化,便于不同项目调整
- 日志泄露:禁止在日志中打印TOTP密钥