RMDC系统设计文档 整体转换为SKILL
This commit is contained in:
101
1-AgentSkills/developing-watchdog/SKILL.md
Normal file
101
1-AgentSkills/developing-watchdog/SKILL.md
Normal file
@@ -0,0 +1,101 @@
|
||||
---
|
||||
name: developing-watchdog
|
||||
description: Guides development of rmdc-watchdog edge agent module including K8S operations, MQTT messaging, authorization management, and node/agent coordination. Use when implementing watchdog features, adding K8S actions, modifying heartbeat logic, or debugging authorization flows. Keywords: watchdog, edge-agent, k8s-operator, mqtt, authorization, heartbeat, node, agent.
|
||||
argument-hint: "<feature-type>: k8s-action | heartbeat | mqtt-handler | node-comm | auth-flow"
|
||||
allowed-tools:
|
||||
- Read
|
||||
- Glob
|
||||
- Grep
|
||||
- Bash
|
||||
- Edit
|
||||
- Write
|
||||
---
|
||||
|
||||
# Developing rmdc-watchdog
|
||||
|
||||
rmdc-watchdog 是部署在项目环境的边缘代理,职责包括:二级授权中心、K8S操作代理、指令接收执行、监控数据上报。
|
||||
|
||||
## 动态上下文注入
|
||||
|
||||
```bash
|
||||
# 查看项目结构
|
||||
!`ls -la rmdc-watchdog/internal/`
|
||||
|
||||
# 查找现有Handler实现
|
||||
!`grep -rn "func.*Handler" rmdc-watchdog/internal/handler/`
|
||||
|
||||
# 查找MQTT消息路由
|
||||
!`grep -n "case\|switch" rmdc-watchdog/internal/service/message_router.go`
|
||||
```
|
||||
|
||||
## Plan
|
||||
|
||||
根据 `$ARGUMENTS` 确定开发类型:
|
||||
|
||||
| 类型 | 产物 | 影响模块 |
|
||||
|------|------|----------|
|
||||
| k8s-action | `pkg/k8s/client.go`, `service/k8s_service.go` | exchange-hub指令定义 |
|
||||
| heartbeat | `handler/heartbeat_handler.go`, `service/auth_service.go` | watchdog-agent同步修改 |
|
||||
| mqtt-handler | `service/mqtt_service.go`, `service/message_router.go` | exchange-hub Topic契约 |
|
||||
| node-comm | `service/node_service.go` | watchdog-node API同步 |
|
||||
| auth-flow | `service/auth_service.go`, `dao/auth_dao.go` | project-management授权契约 |
|
||||
|
||||
**决策点**:
|
||||
1. 是否新增MQTT消息类型?→ 需同步 exchange-hub
|
||||
2. 是否修改心跳结构?→ 需同步 watchdog-agent
|
||||
3. 是否修改K8S指令参数?→ 需同步 octopus-operator
|
||||
|
||||
## Verify
|
||||
|
||||
- [ ] TOTP验证逻辑:一级(8位/30分钟/SHA256) vs 二级(6位/30秒/SHA1)
|
||||
- [ ] K8S操作边界:仅允许审计过的操作(logs/exec/scale/restart/delete/get/apply)
|
||||
- [ ] MQTT Topic格式:`wdd/RDMC/{command|message}/{up|down}/{project_id}`
|
||||
- [ ] 时间戳校验:|now - timestamp| < 5分钟
|
||||
- [ ] Node通信:HTTP + Tier-Two TOTP认证
|
||||
- [ ] 执行结果上报:包含 command_id, status, exit_code, output, duration
|
||||
|
||||
```bash
|
||||
# 验证编译
|
||||
!`cd rmdc-watchdog && go build ./...`
|
||||
|
||||
# 验证单元测试
|
||||
!`cd rmdc-watchdog && go test ./internal/... -v`
|
||||
```
|
||||
|
||||
## Execute
|
||||
|
||||
### 添加新K8S操作
|
||||
|
||||
1. 在 `pkg/k8s/client.go` 添加K8S API方法
|
||||
2. 在 `internal/service/k8s_service.go` 的 switch 添加 case
|
||||
3. 更新 `K8sExecCommand` 结构(如需新参数)
|
||||
4. 同步更新 exchange-hub 指令下发定义
|
||||
|
||||
### 添加新指令类型
|
||||
|
||||
1. 在 `message_router.go` 添加路由分支
|
||||
2. 创建对应 Handler 和 Service
|
||||
3. 同步更新 exchange-hub 指令下发
|
||||
|
||||
### 修改心跳逻辑
|
||||
|
||||
1. 修改 `auth_service.go` 的 `VerifyHeartbeat`
|
||||
2. 同步修改 watchdog-agent 心跳发送
|
||||
3. 更新 DTO 结构
|
||||
|
||||
## Pitfalls
|
||||
|
||||
1. **TOTP层级混淆**:一级授权(project-management↔watchdog)与二级授权(watchdog↔agent/node)使用不同参数
|
||||
2. **时间偏移未处理**:授权文件需计算 `timeOffset = now - firstAuthTime`
|
||||
3. **Node离线未检测**:转发主机指令前需 `CheckHostOnline(host_id)`
|
||||
4. **日志截断遗漏**:业务故障日志仅回传最近300行
|
||||
5. **密钥公网传输**:tier_one_secret/tier_two_secret 必须通过配置文件离线部署,禁止MQTT传输
|
||||
6. **响应TOTP缺失**:双向验证要求服务端返回TOTP供客户端校验
|
||||
7. **心跳间隔不一致**:watchdog→exchange-hub 5秒;agent/node→watchdog 10秒(默认)
|
||||
|
||||
## Reference
|
||||
|
||||
- [状态机](reference/state-machine.md)
|
||||
- [MQTT Topics](reference/mqtt-topics.md)
|
||||
- [API端点](reference/api-endpoints.md)
|
||||
- [安全机制](reference/security-mechanisms.md)
|
||||
56
1-AgentSkills/developing-watchdog/reference/api-endpoints.md
Normal file
56
1-AgentSkills/developing-watchdog/reference/api-endpoints.md
Normal file
@@ -0,0 +1,56 @@
|
||||
# Watchdog API 端点
|
||||
|
||||
## Watchdog HTTP API (Port: 8990)
|
||||
|
||||
| 路径 | 方法 | 说明 | 认证 |
|
||||
|------|------|------|------|
|
||||
| `/api/heartbeat` | POST | Agent心跳接口 | Tier-Two TOTP |
|
||||
| `/api/heartbeat/hosts` | GET | 获取所有心跳主机 | 内部调用 |
|
||||
| `/api/node/info` | POST | Node信息上报接口 | Tier-Two TOTP |
|
||||
| `/api/node/list` | GET | 获取所有Node列表 | 内部调用 |
|
||||
| `/api/node/metrics/:node_id` | GET | 获取指定Node运行指标 | 内部调用 |
|
||||
| `/api/authorization/generate` | GET | 生成授权文件 | 内部调用 |
|
||||
| `/api/authorization/auth` | POST | 接收授权码 | Tier-One TOTP |
|
||||
| `/api/authorization/hosts` | GET | 获取所有已授权主机 | 内部调用 |
|
||||
|
||||
## Node HTTP API (Port: 8081)
|
||||
|
||||
| 路径 | 方法 | 说明 | 认证 |
|
||||
|------|------|------|------|
|
||||
| `/api/exec` | POST | 执行命令 | Tier-Two TOTP |
|
||||
| `/api/info` | GET | 获取主机信息 | Tier-Two TOTP |
|
||||
| `/api/metrics` | GET | 获取运行指标 | Tier-Two TOTP |
|
||||
| `/api/dltu` | POST | 镜像操作(Download-Load-Tag-Upload) | Tier-Two TOTP |
|
||||
|
||||
## 请求/响应结构
|
||||
|
||||
### HeartbeatRequest
|
||||
```go
|
||||
type HeartbeatRequest struct {
|
||||
HostInfo HostInfo `json:"host_info"`
|
||||
EnvInfo EnvInfo `json:"env_info"`
|
||||
Timestamp int64 `json:"timestamp"`
|
||||
TOTPCode string `json:"totp_code"`
|
||||
}
|
||||
```
|
||||
|
||||
### HeartbeatResponse
|
||||
```go
|
||||
type HeartbeatResponse struct {
|
||||
Authorized bool `json:"authorized"`
|
||||
TOTPCode string `json:"totp_code"`
|
||||
Timestamp int64 `json:"timestamp"`
|
||||
SecondTOTPSecret string `json:"second_totp_secret,omitempty"`
|
||||
}
|
||||
```
|
||||
|
||||
### NodeInfoRequest
|
||||
```go
|
||||
type NodeInfoRequest struct {
|
||||
NodeID string `json:"node_id"`
|
||||
HostInfo HostInfo `json:"host_info"`
|
||||
Metrics NodeRuntimeMetrics `json:"metrics"`
|
||||
Timestamp int64 `json:"timestamp"`
|
||||
TOTPCode string `json:"totp_code"`
|
||||
}
|
||||
```
|
||||
50
1-AgentSkills/developing-watchdog/reference/mqtt-topics.md
Normal file
50
1-AgentSkills/developing-watchdog/reference/mqtt-topics.md
Normal file
@@ -0,0 +1,50 @@
|
||||
# MQTT Topic 定义
|
||||
|
||||
## 上行(Watchdog → Exchange-Hub)
|
||||
|
||||
| Topic | 消息类型 | 说明 |
|
||||
|-------|----------|------|
|
||||
| `wdd/RDMC/command/up` | register | 项目注册 |
|
||||
| `wdd/RDMC/command/up` | auth_request | 授权申请 |
|
||||
| `wdd/RDMC/message/up` | register_complete | 注册完成确认 |
|
||||
| `wdd/RDMC/message/up` | heartbeat | 心跳数据 |
|
||||
| `wdd/RDMC/message/up` | monitor | 监控数据上报 |
|
||||
| `wdd/RDMC/message/up` | exec_result | 指令执行结果 |
|
||||
| `wdd/RDMC/message/up` | log_result | 日志查询结果 |
|
||||
| `wdd/RDMC/message/up` | alert | 告警信息 |
|
||||
|
||||
## 下行(Exchange-Hub → Watchdog)
|
||||
|
||||
| Topic | 消息类型 | 说明 |
|
||||
|-------|----------|------|
|
||||
| `wdd/RDMC/command/down/{project_id}` | auth_response | 授权响应 |
|
||||
| `wdd/RDMC/command/down/{project_id}` | auth_revoke | 授权撤销 |
|
||||
| `wdd/RDMC/command/down/{project_id}` | log_query | 日志查询指令 |
|
||||
| `wdd/RDMC/command/down/{project_id}` | host_exec | 主机执行指令 |
|
||||
| `wdd/RDMC/command/down/{project_id}` | k8s_exec | K8S执行指令 |
|
||||
| `wdd/RDMC/command/down/{project_id}` | update | 业务更新指令 |
|
||||
| `wdd/RDMC/message/down/{project_id}` | register_ack | 注册确认消息 |
|
||||
|
||||
## Topic命名规范
|
||||
|
||||
- 前缀:`wdd/RDMC/`
|
||||
- 类型:`command`(指令)或 `message`(消息)
|
||||
- 方向:`up`(上行)或 `down`(下行)
|
||||
- 项目ID:下行Topic需包含 `{project_id}` 用于路由
|
||||
|
||||
## 消息结构
|
||||
|
||||
```go
|
||||
type BaseMessage struct {
|
||||
MessageID string `json:"message_id"`
|
||||
Type string `json:"type"` // command | message
|
||||
ProjectID string `json:"project_id"`
|
||||
Timestamp int64 `json:"timestamp"`
|
||||
}
|
||||
|
||||
type DataMessage struct {
|
||||
BaseMessage
|
||||
DataType string `json:"data_type"` // 具体消息类型
|
||||
Payload interface{} `json:"payload"`
|
||||
}
|
||||
```
|
||||
@@ -0,0 +1,43 @@
|
||||
# 安全机制汇总
|
||||
|
||||
## 通信安全
|
||||
|
||||
| 场景 | 安全机制 | 参数 |
|
||||
|------|----------|------|
|
||||
| Center ↔ Watchdog | Tier-One TOTP + AES-GCM | 8位码, 30分钟有效期, SHA256 |
|
||||
| Watchdog ↔ Agent | Tier-Two TOTP | 6位码, 30秒有效期, SHA1 |
|
||||
| Watchdog ↔ Node | Tier-Two TOTP复用 | 内网HTTP + TOTP认证 |
|
||||
| HTTP备用接口 | 复用Tier-Two TOTP密钥 | 需要TOTP认证 |
|
||||
| 消息传输 | TLS加密 | MQTT over TLS |
|
||||
| 敏感数据 | AES-256-GCM加密 | 授权码、密钥等 |
|
||||
|
||||
## 身份认证
|
||||
|
||||
| 机制 | 说明 |
|
||||
|------|------|
|
||||
| 主机信息 | 硬件指纹绑定: MachineID+CPU+Memory+Serial |
|
||||
| 双向TOTP验证 | 请求方发送TOTP,响应方返回新TOTP |
|
||||
| 挑战应答 | 32位随机挑战码确保通信双方身份 |
|
||||
|
||||
## 授权保护
|
||||
|
||||
| 机制 | 说明 |
|
||||
|------|------|
|
||||
| 死手系统 | 心跳失败自毁,连续12次失败触发SIGTERM |
|
||||
| 授权时间校验 | 检测时间篡改,timeOffset异常触发降级 |
|
||||
| 授权撤销 | 支持远程撤销项目授权 |
|
||||
|
||||
## 密钥传输原则
|
||||
|
||||
- tier_one_secret 和 tier_two_secret 在 project-management 创建项目时生成
|
||||
- 密钥通过项目配置文件离线部署到 Watchdog
|
||||
- **禁止通过公网MQTT传输密钥**
|
||||
|
||||
## 操作审计
|
||||
|
||||
| 操作类型 | 审计要求 |
|
||||
|----------|----------|
|
||||
| K8S操作 | 记录command_id, action, 执行结果 |
|
||||
| 主机命令 | 记录script, args, exit_code |
|
||||
| 授权变更 | 记录授权/撤销时间、操作人 |
|
||||
| 数据导出 | 需签名+TOTP校验,写审计日志 |
|
||||
45
1-AgentSkills/developing-watchdog/reference/state-machine.md
Normal file
45
1-AgentSkills/developing-watchdog/reference/state-machine.md
Normal file
@@ -0,0 +1,45 @@
|
||||
# Watchdog 状态机
|
||||
|
||||
## 连接状态机
|
||||
|
||||
```
|
||||
状态流转:offline -> connecting -> verifying -> online -> disconnecting -> offline
|
||||
```
|
||||
|
||||
| 状态 | 触发条件 | 下一状态 |
|
||||
|------|----------|----------|
|
||||
| offline | 初始/心跳超时30秒 | connecting |
|
||||
| connecting | 尝试MQTT连接 | verifying |
|
||||
| verifying | TOTP双向验证 | online/offline |
|
||||
| online | 验证成功 | disconnecting |
|
||||
| disconnecting | 主动断开/网络异常 | offline |
|
||||
|
||||
## 授权状态机
|
||||
|
||||
```
|
||||
未初始化 -> 收集主机信息 -> 等待授权 -> 已授权
|
||||
↓
|
||||
授权过期/撤销 -> 未授权 -> 等待授权(重新申请)
|
||||
```
|
||||
|
||||
## 状态转换详情
|
||||
|
||||
### 未初始化 → 收集主机信息
|
||||
- 触发:Node/Agent首次连接
|
||||
- 动作:AddHostInfo()
|
||||
|
||||
### 收集主机信息 → 等待授权
|
||||
- 触发:GenerateAuthorizationFile()
|
||||
- 动作:发布授权申请Command到MQTT
|
||||
|
||||
### 等待授权 → 已授权
|
||||
- 触发:收到有效授权码
|
||||
- 动作:解密并持久化授权信息
|
||||
|
||||
### 已授权 → 授权过期
|
||||
- 触发:时间篡改检测
|
||||
- 动作:设置initialized=false
|
||||
|
||||
### 已授权 → 未授权
|
||||
- 触发:收到auth_revoke指令
|
||||
- 动作:清除本地授权存储
|
||||
89
1-AgentSkills/developing-watchdog/scripts/verify-watchdog.sh
Normal file
89
1-AgentSkills/developing-watchdog/scripts/verify-watchdog.sh
Normal file
@@ -0,0 +1,89 @@
|
||||
#!/bin/bash
|
||||
# verify-watchdog.sh - Watchdog模块验证脚本
|
||||
# 依赖: go 1.21+, golangci-lint (可选)
|
||||
# 用法: ./verify-watchdog.sh [watchdog_dir]
|
||||
|
||||
set -e
|
||||
|
||||
WATCHDOG_DIR="${1:-./rmdc-watchdog}"
|
||||
|
||||
echo "=== Watchdog 模块验证 ==="
|
||||
echo "目标目录: $WATCHDOG_DIR"
|
||||
echo ""
|
||||
|
||||
# 检查目录存在
|
||||
if [ ! -d "$WATCHDOG_DIR" ]; then
|
||||
echo "错误: 目录不存在 $WATCHDOG_DIR"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 1. 编译检查
|
||||
echo "[1/5] 编译检查..."
|
||||
cd "$WATCHDOG_DIR"
|
||||
if go build ./... 2>&1; then
|
||||
echo "✓ 编译通过"
|
||||
else
|
||||
echo "✗ 编译失败"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 2. 单元测试
|
||||
echo ""
|
||||
echo "[2/5] 单元测试..."
|
||||
if go test ./internal/... -v -cover 2>&1; then
|
||||
echo "✓ 单元测试通过"
|
||||
else
|
||||
echo "⚠ 部分测试失败,请检查"
|
||||
fi
|
||||
|
||||
# 3. Lint检查
|
||||
echo ""
|
||||
echo "[3/5] Lint检查..."
|
||||
if command -v golangci-lint &> /dev/null; then
|
||||
if golangci-lint run ./... 2>&1; then
|
||||
echo "✓ Lint检查通过"
|
||||
else
|
||||
echo "⚠ Lint检查有警告"
|
||||
fi
|
||||
else
|
||||
echo "⚠ golangci-lint未安装,跳过"
|
||||
fi
|
||||
|
||||
# 4. TOTP参数验证
|
||||
echo ""
|
||||
echo "[4/5] TOTP参数验证..."
|
||||
|
||||
# Tier-One: 8位/30分钟
|
||||
if grep -rq "Digits.*8" pkg/totp/ 2>/dev/null || grep -rq "8.*Digits" pkg/totp/ 2>/dev/null; then
|
||||
echo "✓ Tier-One TOTP位数配置存在"
|
||||
else
|
||||
echo "⚠ 未找到Tier-One TOTP位数定义 (应为8位)"
|
||||
fi
|
||||
|
||||
# Tier-Two: 6位/30秒
|
||||
if grep -rq "Digits.*6" pkg/totp/ 2>/dev/null || grep -rq "6.*Digits" pkg/totp/ 2>/dev/null; then
|
||||
echo "✓ Tier-Two TOTP位数配置存在"
|
||||
else
|
||||
echo "⚠ 未找到Tier-Two TOTP位数定义 (应为6位)"
|
||||
fi
|
||||
|
||||
# 5. K8S操作白名单验证
|
||||
echo ""
|
||||
echo "[5/5] K8S操作白名单验证..."
|
||||
ALLOWED_ACTIONS="logs exec scale restart delete get apply"
|
||||
K8S_SERVICE="internal/service/k8s_service.go"
|
||||
|
||||
if [ -f "$K8S_SERVICE" ]; then
|
||||
for action in $ALLOWED_ACTIONS; do
|
||||
if grep -q "case \"$action\"" "$K8S_SERVICE" 2>/dev/null; then
|
||||
echo "✓ K8S操作 '$action' 已实现"
|
||||
else
|
||||
echo "⚠ K8S操作 '$action' 未找到"
|
||||
fi
|
||||
done
|
||||
else
|
||||
echo "⚠ K8S服务文件不存在: $K8S_SERVICE"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "=== 验证完成 ==="
|
||||
Reference in New Issue
Block a user