大量更新

This commit is contained in:
zeaslity
2026-03-18 16:16:47 +08:00
parent 8efefcc230
commit ed945abdf1
136 changed files with 28252 additions and 16 deletions

View File

@@ -0,0 +1,438 @@
---
name: backend-go-gin-gorm
description: >
使用 Gin + GORM 生成、编写、修改、评审 production-ready 的 Go 后端代码Generate & Review Go backend code with Gin/GORM
强制分层架构 handler → service → dao/repository避免业务逻辑堆在 handlerDAO/Repo 只做数据访问与查询组装),并统一 API 响应包装
consistent response envelopecode/message/data + request_id/trace_id 等可观测字段)。接口风格默认推荐 POST + JSON RequestBody
as default必要时遵循 REST 语义与幂等约定),规范 DTO/VO/DO 命名与字段映射 conventions入参 DTO、出参 VO、持久化 DO/Model
代码注释使用中文Chinese comments for maintainability时间处理默认 Asia/Shanghaitime zone aware time handling
采用结构化日志 structured logging携带 request_id/trace_id/user_id/path/latency 等上下文),并遵循 Gin/GORM 工程化最佳实践
(transactions, context propagation, error wrapping, pagination, soft delete, optimistic locking when needed)。
触发场景 Trigger: Go 后端开发 / Gin Handler 创建 / GORM DAO/Repository 实现 / 代码走查与 Reviewrefactor suggestions, bug fixes, performance tips
argument-hint: "<动作 action> <目标 target>" 例如/ e.g.:
"create user-handler", "review service/order.go", "scaffold api/v1/product", "add repo for table/users", "optimize gorm query"
allowed-tools:
- Read
- Write
- Edit
- Glob
- Grep
- Bash
---
# Go GIN/GORM 开发规范 Skill
## 触发条件
- 用户请求创建/修改 Go 后端代码
- 用户请求代码审查
- 用户提及 API 开发、数据库操作、统一响应、日志、时间处理
- 用户请求设计 API 接口、DTO 结构
## 上下文收集
执行前先收集项目信息:
!`ls -la go.mod go.sum 2>/dev/null || echo "No go.mod found"`
!`head -20 go.mod 2>/dev/null || echo ""`
## $ARGUMENTS 解析
期望格式:`<action> <target>`
| action | 说明 |
|--------|------|
| `create` | 创建新文件handler/service/dao/dto |
| `review` | 审查现有代码 |
| `scaffold` | 生成完整模块骨架 |
| `fix` | 修复不符合规范的代码 |
---
## Plan 阶段
### 产物清单(按 action 确定)
| action | 产物 |
|--------|------|
| `create handler` | `/api/xxx_handler.go``/internal/handler/xxx.go` |
| `create service` | `/internal/service/xxx_service.go` |
| `create dao` | `/internal/dao/xxx_dao.go` |
| `create dto` | `/internal/model/dto/xxx_dto.go` |
| `scaffold` | 上述全部 + entity |
### 决策点
1. **目录风格**:检查项目是用 `/api` 还是 `/internal/handler`
2. **模块命名**:从 $ARGUMENTS 提取资源名(如 `user``order`
3. **是否已存在**:先 Glob 检查目标文件
---
## Execute 阶段
### Handler 层编写规则
```
1. 仅做:参数解析 → 调用 service → 返回响应
2. 禁止:编写业务逻辑、直接操作数据库
3. 必须:使用 common.ResponseSuccess / common.ResponseError
4. 错误处理gorm.ErrRecordNotFound → CodeNotFound
```
### Service 层编写规则
```
1. 编排 dao 层完成业务
2. 记录关键业务日志Info 级别)
3. 错误包装fmt.Errorf("xxx: %w", err)
4. 业务异常记录 Warning 级别日志
```
### DAO 层编写规则
```
1. 封装所有 GORM 操作
2. 禁止在 service 层写 SQL
3. 复杂查询用 Raw/Exec
4. 善用链式调用,但复杂场景优先原生 SQL
```
### 统一响应格式(强制)
```go
// 成功
common.ResponseSuccess(c, data)
common.ResponseSuccessWithMessage(c, data, "创建成功")
// 失败
common.ResponseError(c, common.CodeParamError, "参数错误")
common.ResponseErrorWithDetail(c, common.CodeServerError, "系统错误", err)
```
错误码定义 → 读取 `reference/error-codes.go`
### 注释规范(强制中文)
```go
// GetUserByID 根据用户ID获取用户信息
// @param ctx context.Context - 请求上下文
// @param userID int64 - 用户唯一ID
// @return *model.User - 用户信息,未找到返回nil
// @return error - 查询错误
func (s *UserService) GetUserByID(ctx context.Context, userID int64) (*model.User, error)
```
---
## API 设计规范(强制)
### 核心原则POST + RequestBody
```
所有 API 优先使用 POST 方法,参数通过 RequestBody 传递
避免使用 PathVariables 和 RequestParams
```
### 禁止与推荐
| 禁止 | 推荐 |
|------|------|
| `GET /api/projects/{project_id}` | `POST /api/projects/detail` + RequestBody |
| `GET /api/users?role=admin&page=1` | `POST /api/users/list` + RequestBody |
| URL 中传递敏感信息 | RequestBody 传递所有参数 |
### API 路径命名规范
| 操作 | 后缀 | 示例 |
|------|------|------|
| 列表查询 | `/list` | `POST /api/projects/list` |
| 详情查询 | `/detail` | `POST /api/projects/detail` |
| 创建 | `/create` | `POST /api/projects/create` |
| 更新 | `/update` | `POST /api/projects/update` |
| 删除 | `/delete` | `POST /api/projects/delete` |
| 同步 | `/sync` | `POST /api/jenkins/organizations/sync` |
| 触发 | `/trigger` | `POST /api/builds/trigger` |
### DTO 命名规范
| 类型 | 命名格式 | 示例 |
|------|----------|------|
| 列表请求 | `List{资源}Request` | `ListBuildsRequest` |
| 详情请求 | `Get{资源}Request` | `GetBuildRequest` |
| 创建请求 | `Create{资源}Request` | `CreateProjectRequest` |
| 更新请求 | `Update{资源}Request` | `UpdateProjectRequest` |
| 删除请求 | `Delete{资源}Request` | `DeleteProjectRequest` |
| 列表响应 | `List{资源}Response` | `ListBuildsResponse` |
| 详情响应 | `{资源}DetailResponse` | `BuildDetailResponse` |
### 通用分页结构
```go
// 请求
type PageRequest struct {
Page int `json:"page" binding:"required,min=1"`
PageSize int `json:"page_size" binding:"required,min=1,max=100"`
}
// 响应
type ListResponse struct {
List []interface{} `json:"list"`
Total int64 `json:"total"`
Page int `json:"page"`
PageSize int `json:"page_size"`
}
```
### 模块错误码范围
| 范围 | 模块 |
|------|------|
| 0 | 成功 |
| 1000-1999 | 通用错误 |
| 2000-2999 | 用户/权限 |
| 3000-3999 | Jenkins |
| 4000-4999 | 项目管理 |
| 5000-5999 | Exchange-Hub |
详细规范 → 读取 `reference/api-design-spec.md`
---
## 日志规范(强制)
### 指定框架
项目统一使用 `rmdc-common/wdd_log/log_utils.go`
### 日志级别使用场景
| 级别 | 使用场景 | 示例 |
|------|----------|------|
| `Debug` | 开发调试,详细流程、变量值 | `log.Debug(ctx, "查询参数", map[string]interface{}{"userID": id})` |
| `Info` | 关键业务节点 | `log.Info(ctx, "用户登录成功", ...)` / `log.Info(ctx, "订单创建成功", ...)` |
| `Warning` | 可预期非致命异常,程序可继续 | `log.Warning(ctx, "外部API超时,启用备用方案", ...)` |
| `Error` | 严重错误,业务流程中断 | `log.Error(ctx, "数据库连接失败", ...)` 必须记录堆栈 |
### 日志内容要求
```
1. 简练、关键
2. 必须包含 TraceID、UserID 等追溯信息
3. Error 级别必须记录完整错误堆栈
```
### 日志记录位置
| 层级 | 记录内容 |
|------|----------|
| Handler | 使用 `ResponseErrorWithDetail` 自动记录 Error 日志 |
| Service | 关键业务操作记录 Info业务异常记录 Warning |
| DAO | 一般不记录日志,错误向上抛出 |
---
## 时间处理(强制东八区)
### 核心规则
```
时区Asia/Shanghai (UTC+8)
格式RFC3339
```
### 禁止与必须
| 禁止 | 必须使用 |
|------|----------|
| `time.Now()` | `TimeUtils.Now()` |
| `time.Parse()` | `TimeUtils.Parse()` |
| 直接格式化 | `TimeUtils.Format()` |
### 工具库位置
- 后端:`rmdc-common/utils/TimeUtils.go`
- 前端:`TonyMask/src/utils/timeUtils.ts`
### 使用示例
```go
// ✅ 正确
now := TimeUtils.Now()
timestamp := TimeUtils.Now().Format(time.RFC3339)
// ❌ 错误
now := time.Now() // 禁止直接使用
```
---
## 框架使用规范
### GIN 框架
#### 路由组织(强制分组)
```go
// ✅ 正确:使用路由分组
v1 := r.Group("/api/v1")
{
users := v1.Group("/users")
{
users.GET("/:id", userHandler.GetByID)
users.POST("/", userHandler.Create)
}
}
// ❌ 错误:扁平路由
r.GET("/api/v1/users/:id", ...)
```
#### 中间件使用
```go
// 全局中间件
r.Use(middleware.Recovery()) // 恢复
r.Use(middleware.Logger()) // 日志
r.Use(middleware.CORS()) // 跨域
// 路由组中间件
authGroup := r.Group("/admin")
authGroup.Use(middleware.Auth())
```
#### 响应规范
```
所有 API 响应必须通过 pkg/common 统一响应函数
禁止直接使用 c.JSON()、c.String() 等
```
### GORM 框架
#### 操作位置
```
所有 GORM 操作必须在 dao 层
严禁在 service 层拼接查询
```
#### 链式调用 vs 原生 SQL
| 场景 | 推荐方式 |
|------|----------|
| 简单 CRUD | 链式调用 `db.Where().First()` |
| 复杂查询(多表 JOIN、子查询 | `Raw()` / `Exec()` 原生 SQL |
| 批量操作 | `Raw()` / `Exec()` 保证性能 |
```go
// 简单查询 - 链式调用
db.Where("status = ?", 1).Find(&users)
// 复杂查询 - 原生 SQL
db.Raw(`
SELECT u.*, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.status = ?
GROUP BY u.id
`, 1).Scan(&results)
```
#### 错误处理
```go
// 必须处理 ErrRecordNotFound
if errors.Is(err, gorm.ErrRecordNotFound) {
common.ResponseError(c, common.CodeNotFound, "资源不存在")
return
}
```
---
## Verify 阶段 Checklist
### 结构检查
- [ ] 依赖方向正确handler → service → dao无反向引用
- [ ] handler 层无业务逻辑
- [ ] dao 层无 service 引用
- [ ] 使用 internal 包保护私有代码
### 响应检查
- [ ] 所有 API 使用 `common.ResponseSuccess/Error`
- [ ] 错误码来自 `common.Code*` 常量
- [ ] 时间戳格式为 RFC3339
- [ ] 无直接 `c.JSON()` 调用
### 代码检查
- [ ] 公开函数/结构体有中文注释
- [ ] 注释格式:`// 函数名 功能描述`
- [ ] 无直接 `time.Now()` 调用
- [ ] 无丢弃的 error`_ = err` 禁止)
- [ ] 包名小写无下划线
### 日志检查
- [ ] 使用项目统一日志库
- [ ] Error 日志包含完整堆栈
- [ ] 关键业务操作有 Info 日志
- [ ] 日志包含 TraceID 等追溯信息
### GORM 检查
- [ ] `gorm.ErrRecordNotFound` 已处理
- [ ] 复杂查询在 dao 层使用 Raw/Exec
- [ ] 无 service 层直接 DB 操作
### GIN 检查
- [ ] 使用路由分组组织 API
- [ ] 通用逻辑使用中间件处理
- [ ] 响应通过统一函数返回
### API 设计检查
- [ ] 使用 POST + RequestBody非 GET + PathVariables
- [ ] API 路径使用正确后缀(/list, /detail, /create 等)
- [ ] DTO 命名符合规范List/Get/Create/Update/Delete + 资源 + Request/Response
- [ ] 分页请求嵌入 PageRequest
- [ ] 分页响应包含 list/total/page/page_size
- [ ] 敏感信息不在 URL 中
- [ ] 请求体必须验证ShouldBindJSON
---
## 常见陷阱
| 陷阱 | 正确做法 |
|------|----------|
| handler 写业务逻辑 | 移到 service 层 |
| 直接 `c.JSON()` | 用 `common.ResponseSuccess()` |
| 忽略 `ErrRecordNotFound` | 转为 `CodeNotFound` 返回 |
| `time.Now()` | `TimeUtils.Now()` |
| 英文注释 | 改为中文 |
| dao 引用 service | 违反依赖原则,重构 |
| service 写 SQL | 移到 dao 层 |
| 扁平路由 | 使用 Router Group |
| 日志缺少上下文 | 添加 TraceID、UserID |
| Error 日志无堆栈 | 记录完整错误信息 |
| `GET /api/users/{id}` | `POST /api/users/detail` + RequestBody |
| URL 传参数 `?page=1` | RequestBody 传递 |
| DTO 命名不规范 | 使用 `List/Get/Create/Update/Delete` + 资源名 |
| 敏感信息在 URL | 移到 RequestBody |
---
## Reference 文件索引
| 场景 | 读取文件 |
|------|----------|
| 需要完整目录结构说明 | `reference/project-structure.md` |
| 需要响应结构体定义 | `reference/api-response-spec.md` |
| 需要错误码完整列表 | `reference/error-codes.go` |
| 需要编码规范细节 | `reference/coding-standards.md` |
| 需要日志使用详细说明 | `reference/logging-standards.md` |
| 需要时间处理详细说明 | `reference/time-handling.md` |
| 需要框架使用详细说明 | `reference/framework-usage.md` |
| 需要 API 设计详细说明 | `reference/api-design-spec.md` |
| 需要代码示例 | `examples/*.go` |
---
## 快速命令
验证项目结构:
```bash
./scripts/validate-structure.sh
```

View File

@@ -0,0 +1,55 @@
package dao
import (
"context"
"gorm.io/gorm"
"my-project/internal/model/entity"
)
// UserDAO 用户数据访问对象
type UserDAO struct {
db *gorm.DB
}
// NewUserDAO 创建用户DAO实例
// @param db *gorm.DB - 数据库连接
// @return *UserDAO - DAO实例
func NewUserDAO(db *gorm.DB) *UserDAO {
return &UserDAO{db: db}
}
// FindByID 根据ID查询用户
// @param ctx context.Context - 请求上下文
// @param id int64 - 用户ID
// @return *entity.User - 用户实体
// @return error - 查询错误,未找到返回gorm.ErrRecordNotFound
func (d *UserDAO) FindByID(ctx context.Context, id int64) (*entity.User, error) {
var user entity.User
if err := d.db.WithContext(ctx).First(&user, id).Error; err != nil {
return nil, err
}
return &user, nil
}
// Create 创建用户
// @param ctx context.Context - 请求上下文
// @param user *entity.User - 用户实体
// @return error - 创建错误
func (d *UserDAO) Create(ctx context.Context, user *entity.User) error {
return d.db.WithContext(ctx).Create(user).Error
}
// FindByEmail 根据邮箱查询用户
// @param ctx context.Context - 请求上下文
// @param email string - 用户邮箱
// @return *entity.User - 用户实体
// @return error - 查询错误
func (d *UserDAO) FindByEmail(ctx context.Context, email string) (*entity.User, error) {
var user entity.User
if err := d.db.WithContext(ctx).Where("email = ?", email).First(&user).Error; err != nil {
return nil, err
}
return &user, nil
}

View File

@@ -0,0 +1,70 @@
package handler
import (
"errors"
"strconv"
"github.com/gin-gonic/gin"
"gorm.io/gorm"
"my-project/internal/model/dto"
"my-project/internal/service"
"my-project/pkg/common"
)
// UserHandler 用户相关API处理器
type UserHandler struct {
userService *service.UserService
}
// NewUserHandler 创建用户Handler实例
// @param userService *service.UserService - 用户服务
// @return *UserHandler - Handler实例
func NewUserHandler(userService *service.UserService) *UserHandler {
return &UserHandler{userService: userService}
}
// GetUserByID 根据ID获取用户信息
// @param c *gin.Context - GIN上下文
func (h *UserHandler) GetUserByID(c *gin.Context) {
// 1. 参数解析
idStr := c.Param("id")
userID, err := strconv.ParseInt(idStr, 10, 64)
if err != nil {
common.ResponseError(c, common.CodeParamError, "用户ID格式错误")
return
}
// 2. 调用Service
user, err := h.userService.GetUserByID(c.Request.Context(), userID)
if err != nil {
if errors.Is(err, gorm.ErrRecordNotFound) {
common.ResponseError(c, common.CodeNotFound, "用户不存在")
return
}
common.ResponseErrorWithDetail(c, common.CodeServerError, "获取用户失败", err)
return
}
// 3. 成功响应
common.ResponseSuccess(c, user)
}
// CreateUser 创建用户
// @param c *gin.Context - GIN上下文
func (h *UserHandler) CreateUser(c *gin.Context) {
var req dto.CreateUserRequest
if err := c.ShouldBindJSON(&req); err != nil {
common.ResponseErrorWithDetail(c, common.CodeValidationFail, "参数验证失败", err)
return
}
user, err := h.userService.CreateUser(c.Request.Context(), &req)
if err != nil {
common.ResponseErrorWithDetail(c, common.CodeBusiness, "创建用户失败", err)
return
}
common.ResponseSuccessWithMessage(c, user, "用户创建成功")
}

View File

@@ -0,0 +1,60 @@
package service
import (
"context"
"fmt"
"my-project/internal/dao"
"my-project/internal/model/dto"
"my-project/internal/model/entity"
"my-project/pkg/log"
)
// UserService 用户业务服务
type UserService struct {
userDAO *dao.UserDAO
}
// NewUserService 创建用户服务实例
// @param userDAO *dao.UserDAO - 用户数据访问对象
// @return *UserService - 服务实例
func NewUserService(userDAO *dao.UserDAO) *UserService {
return &UserService{userDAO: userDAO}
}
// GetUserByID 根据用户ID获取用户信息
// @param ctx context.Context - 请求上下文
// @param userID int64 - 用户唯一ID
// @return *entity.User - 用户实体
// @return error - 查询错误
func (s *UserService) GetUserByID(ctx context.Context, userID int64) (*entity.User, error) {
user, err := s.userDAO.FindByID(ctx, userID)
if err != nil {
return nil, fmt.Errorf("查询用户失败: %w", err)
}
return user, nil
}
// CreateUser 创建新用户
// @param ctx context.Context - 请求上下文
// @param req *dto.CreateUserRequest - 创建请求
// @return *entity.User - 创建的用户实体
// @return error - 创建错误
func (s *UserService) CreateUser(ctx context.Context, req *dto.CreateUserRequest) (*entity.User, error) {
user := &entity.User{
Username: req.Username,
Email: req.Email,
}
if err := s.userDAO.Create(ctx, user); err != nil {
return nil, fmt.Errorf("创建用户失败: %w", err)
}
// 记录关键业务日志
log.Info(ctx, "用户创建成功", map[string]interface{}{
"userID": user.ID,
"username": user.Username,
})
return user, nil
}

View File

@@ -0,0 +1,332 @@
# API 设计规范
## 核心原则
### 1. 使用 POST + RequestBody
> **核心规范**: 所有 API 优先使用 POST 方法,参数通过 RequestBody 传递
```go
// ✅ 推荐方式
POST /api/jenkins/builds/list
{
"organization_folder": "Backend",
"repository_name": "cmii-fly-center",
"branch_name": "master",
"page": 1,
"page_size": 10
}
// ❌ 避免使用
GET /api/jenkins/organizations/{org}/repositories/{repo}/branches/{branch}/builds?page=1&page_size=10
```
### 2. 避免 PathVariables
```go
// ❌ 不推荐
GET /api/projects/{project_id}
GET /api/builds/{build_id}/console
// ✅ 推荐
POST /api/projects/detail
{
"project_id": "namespace_abc12345"
}
POST /api/builds/console
{
"organization_folder": "Backend",
"repository_name": "cmii-fly-center",
"branch_name": "master",
"build_number": 123
}
```
### 3. 避免 RequestParams
```go
// ❌ 不推荐
GET /api/users/list?role=admin&status=active&page=1
// ✅ 推荐
POST /api/users/list
{
"role": "admin",
"status": "active",
"page": 1,
"page_size": 20
}
```
---
## 统一响应格式
### 成功响应
```json
{
"code": 0,
"message": "success",
"data": {
// 业务数据
}
}
```
### 分页响应
```json
{
"code": 0,
"message": "success",
"data": {
"list": [...],
"total": 100,
"page": 1,
"page_size": 20
}
}
```
### 错误响应
```json
{
"code": 1001,
"message": "参数错误: organization_folder不能为空",
"data": null
}
```
---
## 请求结构规范
### 通用分页请求
```go
type PageRequest struct {
Page int `json:"page" binding:"required,min=1"`
PageSize int `json:"page_size" binding:"required,min=1,max=100"`
}
```
### 通用筛选请求
```go
type ListRequest struct {
PageRequest
Keyword string `json:"keyword,omitempty"` // 搜索关键词
Status string `json:"status,omitempty"` // 状态筛选
SortBy string `json:"sort_by,omitempty"` // 排序字段
SortOrder string `json:"sort_order,omitempty"` // asc/desc
}
```
---
## API 命名规范
### 操作类型后缀
| 操作 | 后缀 | 示例 |
|------|------|------|
| 列表查询 | `/list` | `/api/projects/list` |
| 详情查询 | `/detail` | `/api/projects/detail` |
| 创建 | `/create` | `/api/projects/create` |
| 更新 | `/update` | `/api/projects/update` |
| 删除 | `/delete` | `/api/projects/delete` |
| 同步 | `/sync` | `/api/jenkins/organizations/sync` |
| 触发 | `/trigger` | `/api/builds/trigger` |
| 导出 | `/export` | `/api/projects/export` |
### 模块前缀
| 模块 | 前缀 |
|------|------|
| Jenkins | `/api/jenkins/` |
| 项目管理 | `/api/projects/` |
| 用户 | `/api/users/` |
| 权限 | `/api/permissions/` |
| 权限-Jenkins | `/api/permissions/jenkins/` |
| 权限-项目 | `/api/permissions/projects/` |
| 审计 | `/api/audit/` |
| Exchange-Hub | `/api/exchange-hub/` |
| DCU | `/api/dcu/` |
---
## Handler 实现模板
```go
// ListBuilds 获取构建列表
// @Summary 获取构建列表
// @Tags 构建管理
// @Accept json
// @Produce json
// @Param request body dto.ListBuildsRequest true "请求参数"
// @Success 200 {object} response.Response{data=dto.ListBuildsResponse}
// @Router /api/jenkins/builds/list [post]
func (h *BuildHandler) ListBuilds(c *gin.Context) {
var req dto.ListBuildsRequest
if err := c.ShouldBindJSON(&req); err != nil {
response.ParamError(c, err)
return
}
resp, err := h.buildService.ListBuilds(c.Request.Context(), &req)
if err != nil {
response.Error(c, err)
return
}
response.Success(c, resp)
}
```
---
## DTO 设计规范
### 请求 DTO 命名
```go
// 列表请求: List{资源}Request
type ListBuildsRequest struct {
PageRequest
OrganizationFolder string `json:"organization_folder" binding:"required"`
RepositoryName string `json:"repository_name" binding:"required"`
BranchName string `json:"branch_name,omitempty"`
}
// 详情请求: Get{资源}Request 或 {资源}DetailRequest
type GetBuildRequest struct {
OrganizationFolder string `json:"organization_folder" binding:"required"`
RepositoryName string `json:"repository_name" binding:"required"`
BranchName string `json:"branch_name" binding:"required"`
BuildNumber int `json:"build_number" binding:"required"`
}
// 创建请求: Create{资源}Request
type CreateProjectRequest struct {
Name string `json:"name" binding:"required"`
Namespace string `json:"namespace" binding:"required"`
Province string `json:"province" binding:"required"`
City string `json:"city" binding:"required"`
}
// 更新请求: Update{资源}Request
type UpdateProjectRequest struct {
ProjectID string `json:"project_id" binding:"required"`
Name string `json:"name,omitempty"`
Province string `json:"province,omitempty"`
City string `json:"city,omitempty"`
}
// 删除请求: Delete{资源}Request
type DeleteProjectRequest struct {
ProjectID string `json:"project_id" binding:"required"`
}
```
### 响应 DTO 命名
```go
// 列表响应: List{资源}Response
type ListBuildsResponse struct {
List []*BuildDTO `json:"list"`
Total int64 `json:"total"`
Page int `json:"page"`
PageSize int `json:"page_size"`
}
// 详情响应: {资源}DetailResponse 或直接使用 {资源}DTO
type BuildDetailResponse struct {
*BuildDTO
ConsoleOutput string `json:"console_output,omitempty"`
}
```
---
## 错误码规范
### 错误码范围
| 范围 | 模块 |
|------|------|
| 1000-1999 | 通用错误 |
| 2000-2999 | 用户/权限 |
| 3000-3999 | Jenkins模块 |
| 4000-4999 | 项目管理 |
| 5000-5999 | Exchange-Hub |
| 6000-6999 | Watchdog |
### 通用错误码
| 错误码 | 说明 |
|--------|------|
| 0 | 成功 |
| 1001 | 参数错误 |
| 1002 | 未授权 |
| 1003 | 禁止访问 |
| 1004 | 资源不存在 |
| 1005 | 内部错误 |
---
## 前端调用示例
```typescript
// api/modules/jenkins.ts
export const jenkinsApi = {
// 获取构建列表
listBuilds: (data: ListBuildsRequest) =>
request.post<ListBuildsResponse>('/api/jenkins/builds/list', data),
// 触发构建
triggerBuild: (data: TriggerBuildRequest) =>
request.post<TriggerBuildResponse>('/api/jenkins/builds/trigger', data),
// 获取构建详情
getBuildDetail: (data: GetBuildRequest) =>
request.post<BuildDetailResponse>('/api/jenkins/builds/detail', data),
};
```
---
## 安全规范
### 1. 敏感字段不出现在 URL
```go
// ❌ 敏感信息泄露到URL
GET /api/auth/login?username=admin&password=123456
// ✅ 使用RequestBody
POST /api/auth/login
{
"username": "admin",
"password": "123456"
}
```
### 2. 必须验证请求体
```go
func (h *Handler) CreateProject(c *gin.Context) {
var req dto.CreateProjectRequest
if err := c.ShouldBindJSON(&req); err != nil {
response.ParamError(c, err)
return
}
// 后续处理...
}
```
### 3. 审计敏感操作
所有写操作需通过审计中间件记录。

View File

@@ -0,0 +1,35 @@
# API 响应规范
## 统一响应结构
```go
type Response struct {
Code int `json:"code"` // 业务状态码0=成功
Status int `json:"status"` // HTTP 状态码
Timestamp string `json:"timestamp"` // RFC3339 东八区
Data interface{} `json:"data"` // 业务数据
Message string `json:"message,omitempty"` // 消息
Error string `json:"error,omitempty"` // 错误详情
}
```
## 使用函数
| 场景 | 函数 |
|------|------|
| 查询成功 | `ResponseSuccess(c, data)` |
| 操作成功 | `ResponseSuccessWithMessage(c, data, "msg")` |
| 普通错误 | `ResponseError(c, code, "msg")` |
| 详细错误 | `ResponseErrorWithDetail(c, code, "msg", err)` |
## HTTP 状态码映射
| 业务码 | HTTP 状态码 |
|--------|-------------|
| CodeSuccess | 200 |
| CodeParamError, CodeValidationFail | 400 |
| CodeUnauthorized | 401 |
| CodeForbidden | 403 |
| CodeNotFound | 404 |
| CodeTimeout | 408 |
| 其他 | 500 |

View File

@@ -0,0 +1,44 @@
# 编码规范
## 命名规范
| 类型 | 规则 | 示例 |
|------|------|------|
| 包名 | 小写单词,无下划线 | `service`, `utils` |
| 变量/函数 | 驼峰命名 | `getUserByID` |
| 公开标识 | 首字母大写 | `GetUserByID` |
| 接口 | 单方法以 `er` 结尾 | `Reader`, `Writer` |
## 注释规范(中文,必须)
```go
// GetUserByID 根据用户ID获取用户信息
// @param ctx context.Context - 请求上下文
// @param userID int64 - 用户唯一ID
// @return *model.User - 用户信息
// @return error - 查询错误
func (s *UserService) GetUserByID(ctx context.Context, userID int64) (*model.User, error)
```
## 错误处理
1. 必须 `if err != nil` 处理
2.`fmt.Errorf("xxx: %w", err)` 包装
3. 禁止 `_ = err` 丢弃错误
4. Handler 层必须通过统一响应返回
## 日志级别
| 级别 | 用途 |
|------|------|
| Debug | 开发调试,详细流程 |
| Info | 关键业务节点 |
| Warning | 可预期非致命异常 |
| Error | 严重错误,必须记录堆栈 |
## 时间处理
- 时区Asia/Shanghai (UTC+8)
- 格式RFC3339
- 禁止:`time.Now()`
- 使用:`TimeUtils.Now()`

View File

@@ -0,0 +1,35 @@
package common
// 业务状态码常量
const (
CodeSuccess = 0 // 成功
CodeServerError = 10001 // 服务器内部错误
CodeParamError = 10002 // 参数错误
CodeUnauthorized = 10003 // 未授权
CodeForbidden = 10004 // 禁止访问
CodeNotFound = 10005 // 资源不存在
CodeTimeout = 10006 // 请求超时
CodeValidationFail = 10007 // 验证失败
CodeBusiness = 20001 // 业务逻辑错误 (20001-29999)
)
// CodeMessage 错误码消息映射
var CodeMessage = map[int]string{
CodeSuccess: "success",
CodeServerError: "服务器内部错误",
CodeParamError: "参数错误",
CodeUnauthorized: "未授权,请先登录",
CodeForbidden: "权限不足,禁止访问",
CodeNotFound: "请求的资源不存在",
CodeTimeout: "请求超时",
CodeValidationFail: "数据验证失败",
CodeBusiness: "业务处理失败",
}
// GetMessage 根据错误码获取默认消息
func GetMessage(code int) string {
if msg, ok := CodeMessage[code]; ok {
return msg
}
return "未知错误"
}

View File

@@ -0,0 +1,264 @@
# 框架使用规范
## GIN 框架
### 路由组织
#### 强制使用路由分组 (Router Group)
```go
func SetupRouter(r *gin.Engine) {
// API 版本分组
v1 := r.Group("/api/v1")
{
// 用户模块
users := v1.Group("/users")
{
users.GET("/", userHandler.List)
users.GET("/:id", userHandler.GetByID)
users.POST("/", userHandler.Create)
users.PUT("/:id", userHandler.Update)
users.DELETE("/:id", userHandler.Delete)
}
// 订单模块
orders := v1.Group("/orders")
{
orders.GET("/", orderHandler.List)
orders.GET("/:id", orderHandler.GetByID)
orders.POST("/", orderHandler.Create)
}
}
}
```
#### 禁止扁平路由
```go
// ❌ 错误:扁平路由,难以维护
r.GET("/api/v1/users", ...)
r.GET("/api/v1/users/:id", ...)
r.POST("/api/v1/users", ...)
r.GET("/api/v1/orders", ...)
```
### 中间件使用
#### 全局中间件
```go
func SetupMiddleware(r *gin.Engine) {
// Recovery - 恢复 panic防止程序崩溃
r.Use(middleware.Recovery())
// Logger - 请求日志记录
r.Use(middleware.Logger())
// CORS - 跨域处理
r.Use(middleware.CORS())
// TraceID - 请求追踪
r.Use(middleware.TraceID())
}
```
#### 路由组中间件
```go
// 需要认证的路由组
authGroup := r.Group("/api/v1/admin")
authGroup.Use(middleware.Auth())
{
authGroup.GET("/dashboard", adminHandler.Dashboard)
authGroup.GET("/users", adminHandler.ListUsers)
}
// 需要特定权限的路由组
superAdmin := authGroup.Group("/super")
superAdmin.Use(middleware.RequireRole("super_admin"))
{
superAdmin.DELETE("/users/:id", adminHandler.DeleteUser)
}
```
#### 常用中间件职责
| 中间件 | 职责 |
|--------|------|
| Recovery | 捕获 panic返回 500 错误 |
| Logger | 记录请求日志(方法、路径、耗时等) |
| CORS | 处理跨域请求 |
| Auth | 验证用户身份JWT/Session |
| TraceID | 生成/传递请求追踪 ID |
| RateLimit | 请求频率限制 |
### 响应规范
#### 强制使用统一响应
```go
// ✅ 正确:使用统一响应函数
common.ResponseSuccess(c, data)
common.ResponseError(c, common.CodeParamError, "参数错误")
// ❌ 错误:直接使用 GIN 原生方法
c.JSON(200, data)
c.String(200, "success")
c.AbortWithStatusJSON(400, gin.H{"error": "bad request"})
```
---
## GORM 框架
### 操作位置规范
```
所有 GORM 操作必须在 dao 层实现
严禁在 service 层直接操作数据库
```
### 查询方式选择
#### 简单 CRUD - 链式调用
```go
// 单条查询
var user entity.User
db.Where("id = ?", userID).First(&user)
// 列表查询
var users []entity.User
db.Where("status = ?", 1).
Order("created_at DESC").
Limit(10).
Offset(0).
Find(&users)
// 创建
db.Create(&user)
// 更新
db.Model(&user).Updates(map[string]interface{}{
"name": "new name",
"status": 1,
})
// 删除
db.Delete(&user, userID)
```
#### 复杂查询 - Raw/Exec
**推荐场景**
- 多表 JOIN
- 子查询
- 复杂聚合
- 批量操作
- 性能敏感场景
```go
// 多表 JOIN 查询
type UserWithOrderCount struct {
entity.User
OrderCount int64 `json:"order_count"`
}
var results []UserWithOrderCount
db.Raw(`
SELECT u.*, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.status = ?
GROUP BY u.id
ORDER BY order_count DESC
LIMIT ?
`, 1, 10).Scan(&results)
// 批量更新
db.Exec(`
UPDATE orders
SET status = ?
WHERE user_id = ? AND status = ?
`, "completed", userID, "pending")
// 复杂子查询
db.Raw(`
SELECT * FROM users
WHERE id IN (
SELECT user_id FROM orders
WHERE amount > ?
GROUP BY user_id
HAVING COUNT(*) > ?
)
`, 1000, 5).Scan(&users)
```
### 错误处理
#### 必须处理 ErrRecordNotFound
```go
// DAO 层
func (d *UserDAO) FindByID(ctx context.Context, id int64) (*entity.User, error) {
var user entity.User
if err := d.db.WithContext(ctx).First(&user, id).Error; err != nil {
return nil, err // 包含 ErrRecordNotFound
}
return &user, nil
}
// Handler 层
user, err := h.userService.GetUserByID(ctx, userID)
if err != nil {
if errors.Is(err, gorm.ErrRecordNotFound) {
common.ResponseError(c, common.CodeNotFound, "用户不存在")
return
}
common.ResponseErrorWithDetail(c, common.CodeServerError, "查询失败", err)
return
}
```
### 事务处理
```go
// Service 层事务
func (s *OrderService) CreateOrder(ctx context.Context, req *dto.CreateOrderRequest) error {
return s.db.Transaction(func(tx *gorm.DB) error {
// 1. 创建订单
order := &entity.Order{...}
if err := tx.Create(order).Error; err != nil {
return fmt.Errorf("创建订单失败: %w", err)
}
// 2. 扣减库存
if err := tx.Model(&entity.Product{}).
Where("id = ? AND stock >= ?", req.ProductID, req.Quantity).
Update("stock", gorm.Expr("stock - ?", req.Quantity)).Error; err != nil {
return fmt.Errorf("扣减库存失败: %w", err)
}
// 3. 创建支付记录
payment := &entity.Payment{...}
if err := tx.Create(payment).Error; err != nil {
return fmt.Errorf("创建支付记录失败: %w", err)
}
return nil
})
}
```
### Context 传递
```go
// 必须使用 WithContext 传递上下文
db.WithContext(ctx).First(&user, id)
db.WithContext(ctx).Create(&order)
// 支持超时控制和取消
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
db.WithContext(ctx).Find(&users)
```

View File

@@ -0,0 +1,100 @@
# 日志规范
## 指定框架
项目统一使用内部日志库:`rmdc-common/wdd_log/log_utils.go`
## 日志级别定义
### Debug
- **用途**:开发调试,记录程序执行流程、变量值等详细信息
- **场景**:默认开发日志级别
- **示例**
```go
log.Debug(ctx, "开始处理用户请求", map[string]interface{}{
"userID": userID,
"requestID": requestID,
})
```
### Info
- **用途**:记录关键业务操作节点
- **场景**:用户登录、订单创建、支付成功等关键业务
- **示例**
```go
log.Info(ctx, "用户登录成功", map[string]interface{}{
"userID": user.ID,
"username": user.Username,
"ip": c.ClientIP(),
})
log.Info(ctx, "订单创建成功", map[string]interface{}{
"orderID": order.ID,
"amount": order.Amount,
"userID": order.UserID,
})
```
### Warning
- **用途**:记录可预期的、非致命的异常情况,程序仍可继续运行
- **场景**:外部 API 超时启用备用方案、配置缺失使用默认值等
- **示例**
```go
log.Warning(ctx, "外部API调用超时,已启用备用方案", map[string]interface{}{
"api": "payment-gateway",
"timeout": "5s",
"fallback": "local-cache",
})
```
### Error
- **用途**:记录严重错误,导致当前业务流程无法继续
- **场景**:数据库连接失败、关键参数校验失败等
- **要求**:必须详细记录错误信息和堆栈
- **示例**
```go
log.Error(ctx, "数据库连接失败", map[string]interface{}{
"host": dbConfig.Host,
"port": dbConfig.Port,
"error": err.Error(),
"stack": debug.Stack(),
})
```
## 日志内容规范
### 必须包含
1. **TraceID** - 请求追踪 ID
2. **UserID** - 用户标识(如适用)
3. **操作描述** - 简练的中文描述
4. **关键参数** - 与操作相关的关键数据
### 格式要求
```go
log.Info(ctx, "操作描述", map[string]interface{}{
"key1": value1,
"key2": value2,
})
```
## 各层日志职责
### Handler 层
- 使用 `ResponseErrorWithDetail` 自动记录 Error 日志
- 一般不主动记录日志
### Service 层
- **Info**:关键业务操作成功(创建订单、支付、用户注册等)
- **Warning**:业务逻辑异常但可处理
- **Error**:通过 ResponseErrorWithDetail 在 Handler 层统一记录
### DAO 层
- 一般不记录日志
- 错误向上抛出,由 Handler 层统一处理
## 禁止事项
1. 禁止在日志中记录敏感信息密码、Token、完整银行卡号等
2. 禁止使用 `fmt.Println``log.Println`
3. 禁止在循环中大量记录日志
4. Error 日志禁止缺少堆栈信息

View File

@@ -0,0 +1,39 @@
# 项目目录结构规范
## 核心目录
| 目录 | 职责 | 禁止事项 |
|------|------|----------|
| `/api``/internal/handler` | GIN Handler 层,解析请求、调用 service、返回响应 | 禁止写业务逻辑 |
| `/internal/service` | 业务逻辑核心,编排 dao 完成功能 | - |
| `/internal/dao``/internal/repository` | 数据访问层,封装 GORM 操作 | 禁止引用 service |
| `/internal/model/entity` | 数据库表结构对应的持久化对象 | - |
| `/internal/model/dto` | API 数据传输对象(请求/响应) | - |
| `/pkg/common` | 统一响应、错误码、公共工具 | - |
| `/configs` | 配置文件 | - |
| `/cmd` | main.go 入口 | - |
## 依赖规则
```
handler → service → dao
↓ ↓ ↓
pkg/common (任意层可引用)
```
**严禁反向或跨层依赖**
## go.mod 内部模块引用
```go
module my-project
go 1.24
require (
wdd.io/TonyCommon v1.0.0
)
// 本地开发使用 replace
replace wdd.io/TonyCommon => ../TonyCommon
```

View File

@@ -0,0 +1,120 @@
# 时间处理规范
## 核心原则
所有在前端和后端之间传输、以及在数据库中存储的时间,**必须统一为东八区时间 (Asia/Shanghai, UTC+8)**。
## 指定工具库
| 端 | 工具库路径 |
|----|-----------|
| 后端 | `rmdc-common/utils/TimeUtils.go` |
| 前端 | `TonyMask/src/utils/timeUtils.ts` |
## 时间格式
- API 响应中的 `timestamp` 字段统一使用 **RFC3339** 格式
- 示例:`2024-01-15T14:30:00+08:00`
## 禁止与必须
### 禁止直接使用
```go
// ❌ 禁止
time.Now()
time.Parse(layout, value)
t.Format(layout)
```
### 必须使用工具库
```go
// ✅ 正确
TimeUtils.Now()
TimeUtils.Parse(layout, value)
TimeUtils.Format(t, layout)
```
## 常用场景示例
### 获取当前时间
```go
// ❌ 错误
now := time.Now()
// ✅ 正确
now := TimeUtils.Now()
```
### 格式化时间戳
```go
// ❌ 错误
timestamp := time.Now().Format(time.RFC3339)
// ✅ 正确
timestamp := TimeUtils.Now().Format(time.RFC3339)
```
### 解析时间字符串
```go
// ❌ 错误
t, err := time.Parse(time.RFC3339, timeStr)
// ✅ 正确
t, err := TimeUtils.Parse(time.RFC3339, timeStr)
```
### 数据库时间字段
```go
type Order struct {
ID int64 `gorm:"primaryKey"`
CreatedAt time.Time `gorm:"autoCreateTime"` // GORM 自动处理
UpdatedAt time.Time `gorm:"autoUpdateTime"` // GORM 自动处理
ExpireAt time.Time // 业务时间使用 TimeUtils
}
// 设置业务时间
order.ExpireAt = TimeUtils.Now().Add(24 * time.Hour)
```
### API 响应时间
```go
type Response struct {
Code int `json:"code"`
Status int `json:"status"`
Timestamp string `json:"timestamp"` // RFC3339 格式
Data interface{} `json:"data"`
}
// 构建响应
resp := Response{
Timestamp: TimeUtils.Now().Format(time.RFC3339),
// ...
}
```
## TimeUtils 常用方法
| 方法 | 说明 |
|------|------|
| `Now()` | 获取当前东八区时间 |
| `Parse(layout, value)` | 解析时间字符串(东八区) |
| `Format(t, layout)` | 格式化时间 |
| `StartOfDay(t)` | 获取当天零点 |
| `EndOfDay(t)` | 获取当天 23:59:59 |
| `AddDays(t, days)` | 增加天数 |
## 时区配置
确保服务器和数据库时区配置正确:
```go
// 数据库连接配置
dsn := "user:pass@tcp(host:3306)/db?charset=utf8mb4&parseTime=True&loc=Asia%2FShanghai"
```

View File

@@ -0,0 +1,51 @@
#!/bin/bash
# 验证 Go GIN/GORM 项目结构
set -e
echo "=== Go 项目结构验证 ==="
# 检查 go.mod
if [ ! -f "go.mod" ]; then
echo "❌ 缺少 go.mod"
exit 1
fi
echo "✅ go.mod 存在"
# 检查核心目录
DIRS=("internal/service" "internal/dao" "internal/model" "pkg/common")
for dir in "${DIRS[@]}"; do
if [ -d "$dir" ]; then
echo "$dir 存在"
else
echo "⚠️ $dir 不存在"
fi
done
# 检查 handler 目录(两种风格)
if [ -d "api" ] || [ -d "internal/handler" ]; then
echo "✅ handler 目录存在"
else
echo "⚠️ 缺少 api/ 或 internal/handler/"
fi
# 检查反向依赖dao 不应引用 service
echo ""
echo "=== 检查依赖方向 ==="
if grep -r "internal/service" internal/dao/ 2>/dev/null; then
echo "❌ dao 层存在对 service 的反向依赖"
exit 1
fi
echo "✅ 无反向依赖"
# 检查 time.Now() 使用
echo ""
echo "=== 检查 time.Now() 使用 ==="
if grep -rn "time\.Now()" --include="*.go" internal/ api/ 2>/dev/null | grep -v "_test.go"; then
echo "⚠️ 发现直接使用 time.Now(),应使用 TimeUtils.Now()"
else
echo "✅ 无直接 time.Now() 调用"
fi
echo ""
echo "=== 验证完成 ==="

View File

@@ -0,0 +1,391 @@
---
name: dds-to-skill
description: >
将 DDS详细设计说明书/ PRD / 架构文档转换为一套可落地的 Claude Code Agent SkillsConverts DDS/PRD/Architecture docs into production-ready Agent Skills
包含系统级 Skill、模块级 Skills、横切 Skills 的完整生成流程涵盖设计细节抽取、reference 分层、frontmatter 规范、质量自检。
触发场景 Trigger: 当用户需要将 DDS 文档转为 Skills / 需要从架构设计文档生成开发指导 Skill / 需要批量创建模块级 Skill 套件。
关键词 Keywords: DDS, PRD, 架构说明, 设计文档, skill 生成, skill 套件, agent skill, 模块拆分, reference 抽取, 契约, API, 状态机, 事件, Schema。
argument-hint: "<dds-file-path> [--output-dir <skills-output-dir>] [--project-name <name>]"
allowed-tools:
- Read
- Write
- Edit
- Glob
- Grep
- Bash
---
# DDS-to-Skill从设计文档生成 Agent Skills
本 Skill 指导你将一份 DDSDetailed Design Specification或 PRD / 架构说明文档,转换为一套**可落地、含设计细节**的 Claude Code Agent Skills 套件。
> **核心理念**:生成的不是"空洞的工作流提示词",而是**绑定了 DDS 设计细节**、能指导真实开发/审查的 Skill 套件。
---
## Phase 0读取与理解 DDS
### 0.1 动态注入读取(必须执行)
```bash
# 动态注入:查看源文档目录上下文
!`ls -la $(dirname "$ARGUMENTS")`
# 动态注入:读取 DDS 正文(至少 3 段,覆盖全文)
!`sed -n '1,150p' "$ARGUMENTS"`
!`sed -n '150,300p' "$ARGUMENTS"`
!`sed -n '300,500p' "$ARGUMENTS"`
# 动态注入:抽取章节标题(构建 TOC
!`grep -nE '^(#{1,6}\s+|[0-9]+(\.[0-9]+){0,3}\s+|第[一二三四五六七八九十]+章|第[0-9]+章)' "$ARGUMENTS" | head -n 80`
```
### 0.2 设计要素定向扫描(至少执行 3 项)
```bash
# API/接口
!`grep -nE "API|接口|路径|路由|request|response|错误码|error|handler" "$ARGUMENTS" | head -n 60`
# 事件/消息/Topic
!`grep -nE "事件|event|MQTT|topic|outbox|消息|payload|幂等|retry|publish|subscribe" "$ARGUMENTS" | head -n 60`
# 数据库/Schema
!`grep -nE "表|schema|字段|索引|unique|constraint|migration|DDL|PostgreSQL|MySQL|GORM" "$ARGUMENTS" | head -n 60`
# 状态机/流程
!`grep -nE "状态机|state|transition|流转|工单|workflow|回调|补偿|lifecycle" "$ARGUMENTS" | head -n 60`
# 安全/授权
!`grep -nE "RBAC|DAC|鉴权|JWT|claim|授权|TOTP|权限|auth|token|session" "$ARGUMENTS" | head -n 60`
# 模块/服务/依赖
!`grep -nE "模块|module|service|微服务|依赖|dependency|import|gateway" "$ARGUMENTS" | head -n 60`
```
### 0.3 无法读取时的降级
若无法读取文件,**必须停止**,输出"继续所需的最小信息清单"
1. 系统模块列表(名称 + 职责 + 关键技术)
2. 每个模块的接口/API 列表
3. 事件/Topic 定义
4. 数据库表结构
5. 状态机/流程定义
6. 授权模型
7. 模块间依赖关系
**禁止在缺少源文档的情况下臆造设计细节。**
---
## Phase 1分析与规划
### 1.1 模块识别
从 DDS 中识别所有业务模块,生成模块清单表:
| 模块名 | 职责概述 | 关键技术 | Skill 类型 |
|--------|---------|---------|-----------|
| *从 DDS 抽取* | *从 DDS 抽取* | *从 DDS 抽取* | 系统级/模块级/横切 |
### 1.2 Skill 三层架构规划
必须生成 3 类 Skills
**A) 系统级 Skill1 个)**
- 跨模块一致性、依赖规则、全局变更流程
- 命名:`developing-<system-name>-system`
**B) 模块级 SkillsN 个,每模块 1 个)**
- 高频开发指导:实现步骤 + 依赖影响检查
- 命名:`developing-<module-name>`
**C) 横切 Skills≥ 3 个)**
- 基于 DDS 内容选择,常见横切关注点:
| 横切主题 | 适用场景 | 参考命名 |
|---------|---------|---------|
| API/事件/Schema 契约 | 有跨模块接口定义 | `designing-contracts` |
| 数据库迁移 | 有 DB Schema 定义 | `managing-db-migrations` |
| 可观测性/审计 | 有日志/监控/审计需求 | `managing-observability` |
| 安全/认证 | 有 RBAC/JWT/授权体系 | `implementing-auth` |
| 前端开发规范 | 有前端架构设计 | `frontend-<framework>` |
| 后端编码规范 | 有后端技术栈规范 | `backend-<framework>` |
| 部署/运维 | 有 K8S/Docker/CI 设计 | `deploying-<target>` |
> 实际横切 Skills 必须根据 DDS 内容动态决定,不可少于 3 个。
### 1.3 Name 候选与确认
为每个 Skill 提供 2~3 个命名候选,从中选择 1 个并说明理由。命名规则:
- 动名词形式(如 `developing-*``managing-*``implementing-*`
- 小写字母 + 数字 + 连字符
- ≤ 64 字符
- 包含模块名或领域名
---
## Phase 2DDS 设计细节抽取
### 2.1 章节提取与 reference 目录构建
> **详细规则见** `reference/dds-extraction-guide.md`
从 DDS 章节标题构建 `reference/` 分层目录:
```
<skill-name>/reference/
├── 01-<section-slug>/
│ ├── apis.md
│ ├── db-schema.md
│ └── events-topics.md
├── 02-<section-slug>/
│ └── state-machine.md
└── 03-<section-slug>/
└── security-model.md
```
**目录命名规范**
- 有序前缀 `01-``02-`... + slug
- slug全小写非字母数字字符替换为 `-`,连续 `-` 合并,≤ 48 字符
### 2.2 六类设计要素抽取(必须覆盖)
每个模块级 Skill 的 reference/ 必须覆盖**至少 3 类**
| 要素类型 | 抽取内容 | reference 文件名 |
|---------|---------|-----------------|
| **API/接口** | 路径、方法、请求/响应字段、错误码 | `apis.md` |
| **事件/Topic** | 字段、版本、幂等键、重试语义 | `events-topics.md` |
| **DB Schema** | 字段、索引、约束、迁移策略 | `db-schema.md` |
| **状态机/流程** | 状态、转移、守卫条件、回调、补偿 | `state-machine.md` |
| **授权模型** | JWT claims、RBAC/DAC、权限层级 | `security-model.md` |
| **依赖关系** | 跨模块调用链路、协议、集成点 | `dependencies.md` |
### 2.3 reference 条目格式(强制)
每条 reference 必须包含溯源信息:
```markdown
## <设计要素名称>
- **DDS-Section**: <章节标题原文>
- **DDS-Lines**: L120-L168或近似行号
### Extract
<结构化内容表格/列表/代码块>
```
### 2.4 TBD 标注
如果 DDS 中某个设计要素写得不清楚或缺失:
- **必须标注 `[TBD]`**
- 输出"最小补充信息清单"
- **禁止脑补细节**
---
## Phase 3逐个生成 SKILL.md
### 3.1 SKILL.md 结构模板
> **详细模板见** `reference/skill-templates.md`
每个 SKILL.md 必须包含以下结构:
```markdown
---
name: <skill-name>
description: <单行< 1024 字符中英文混合第三人称含功能+触发场景+关键词>
argument-hint: "<参数格式说明>"
allowed-tools:
- Read
- Write # 按需
- Edit # 按需
- Glob
- Grep
- Bash # 按需
---
# <Skill 标题>
<一段话概述本 Skill 的用途和适用范围>
## Quick Context
<动态注入命令至少 2 !`command`>
## Plan
### 产物清单
### 决策点
## Verify
<按类别组织的 Checklist可勾选>
## Execute
<分步骤的可操作指令>
## Pitfalls
<3~8 条与该模块/主题强相关的常见坑至少 2 条引用 reference>
## Related References
<指向 reference/ 的链接列表说明何时查阅>
```
### 3.2 Frontmatter 编写规则
> **详细规范见** `reference/frontmatter-spec.md`
**关键要点**
- `description` **必须单行**,否则 skill 触发会失败
- 必须中英文混合,确保中文和英文查询都能命中
- 必须包含:功能说明 + 触发场景 + 关键词(含模块名)
- `allowed-tools` 遵循最小授权原则
### 3.3 内容编写原则
1. **删除常识**:只保留 DDS 特有设计与可操作步骤
2. **解释 Why**:对重要约束解释原因,不要堆砌 MUST/ALWAYS
3. **可执行动作**:禁止空话(如"检查 API 兼容"),必须写成具体审查动作
4. **设计细节绑定**Pitfalls 和 Verify 中至少 2 处引用 `reference/` 的具体内容
5. **行数限制**SKILL.md 主体 < 500
**示例 — 空话 vs 可执行动作**
```
❌ "检查事件一致性"
✅ "在 reference/events-topics.md 找到 topic 列表,对照仓库 grep 出 publish/subscribe 点"
❌ "验证 JWT 安全"
✅ "校验 JWT claims 是否包含 tenant_id/project_id/role来自 reference/security-model.md"
❌ "检查 migration 可回滚"
✅ "migration 必须包含 down SQLverify.sh grep 检查 `-- +migrate Down` 或回滚段落存在"
```
---
## Phase 4生成 Supporting Files
### 4.1 目录结构
每个 Skill 遵循标准目录模板
```
<skill-name>/
├── SKILL.md # 主文件(< 500 行)
├── reference/ # 设计细节(按章节分层)
│ ├── 01-<section>/
│ │ ├── apis.md
│ │ ├── db-schema.md
│ │ └── ...
│ └── 02-<section>/
│ └── ...
├── examples/ # 骨架代码示例
│ └── ...
└── scripts/ # 验证脚本
└── verify.sh # 必须提供
```
### 4.2 verify.sh 编写要求
每个 Skill 必须至少包含 1 `verify.sh`
```bash
#!/bin/bash
# verify.sh - <skill-name> Skill 结构与内容验证
set -e
PASS=0; FAIL=0
check() {
if eval "$2"; then
echo "✅ PASS: $1"; ((PASS++))
else
echo "❌ FAIL: $1"; ((FAIL++))
fi
}
SKILL_DIR="$(cd "$(dirname "$0")/.." && pwd)"
# 结构检查
check "SKILL.md 存在" "test -f '$SKILL_DIR/SKILL.md'"
check "reference/ 目录存在" "test -d '$SKILL_DIR/reference'"
check "SKILL.md < 500 行" "[ $(wc -l < '$SKILL_DIR/SKILL.md') -lt 500 ]"
# 内容检查
check "frontmatter 包含 name" "head -20 '$SKILL_DIR/SKILL.md' | grep -q '^name:'"
check "frontmatter 包含 description" "head -20 '$SKILL_DIR/SKILL.md' | grep -q '^description:'"
check "包含 Plan 章节" "grep -q '## Plan' '$SKILL_DIR/SKILL.md'"
check "包含 Verify 章节" "grep -q '## Verify' '$SKILL_DIR/SKILL.md'"
check "包含 Execute 章节" "grep -q '## Execute' '$SKILL_DIR/SKILL.md'"
check "包含 Pitfalls 章节" "grep -q '## Pitfalls' '$SKILL_DIR/SKILL.md'"
# reference 检查
check "reference 有章节子目录" "find '$SKILL_DIR/reference' -maxdepth 1 -type d -name '0*' | grep -q ."
check "reference 文件含 DDS-Section" "grep -rq 'DDS-Section:' '$SKILL_DIR/reference/' 2>/dev/null"
echo ""
echo "=== 结果: $PASS PASS / $FAIL FAIL ==="
[ $FAIL -eq 0 ] && exit 0 || exit 1
```
### 4.3 examples/ 编写要求
- 只放**骨架与关键接口签名**不放完整实现
- 与模块职责强相关
- 注释说明关键设计决策
---
## Phase 5全局自检
### 5.1 输出顺序(必须遵守)
1. **Skills 清单表**系统级 / 模块级 / 横切含最终 name 与理由
2. **总目录树**Unix 路径风格
3. **每个 SKILL.md**完整内容
4. **Supporting files** `文件路径 → 文件内容` 逐个输出
5. **全局自检结果**逐条 PASS/FAIL + 修复建议
### 5.2 自检 Checklist
按以下维度逐条检查
**结构完整性**
- [ ] 系统级 Skill 存在1
- [ ] 模块级 Skills 数量 = 模块数
- [ ] 横切 Skills 3
- [ ] 每个 Skill 都有 SKILL.md + reference/ + scripts/verify.sh
**Frontmatter 规范**
- [ ] description 为单行
- [ ] description < 1024 字符
- [ ] 中英文混合
- [ ] 包含触发场景和关键词
- [ ] allowed-tools 最小授权
**内容质量**
- [ ] SKILL.md < 500
- [ ] 包含 Plan/Verify/Execute/Pitfalls 四个章节
- [ ] 2 `!command` 动态注入
- [ ] Pitfalls 2 条引用 reference
- [ ] 无空话"检查 XX 一致性"这类无具体动作的描述
**Reference 质量**
- [ ] 每个模块 Skill 覆盖 3 类设计要素
- [ ] reference 有章节分层目录非扁平
- [ ] 每条 reference DDS-Section + DDS-Lines 溯源
- [ ] DDS 缺失内容标注 [TBD]
- [ ] 无脑补设计细节
---
## Quick Reference
| 需要了解... | 查阅... |
|------------|--------|
| DDS 抽取的详细方法 | `reference/dds-extraction-guide.md` |
| SKILL.md 模板系统/模块/横切 | `reference/skill-templates.md` |
| Frontmatter 详细规范 | `reference/frontmatter-spec.md` |
| 质量自检的完整清单 | `reference/quality-checklist.md` |
| 成功案例的目录结构 | `examples/` |

View File

@@ -0,0 +1,137 @@
# DDS-to-Skill 转换实例:完整系统(多模块)
本文展示将一个包含多模块的 DDS 文档转换为完整 Skill 套件的过程。
---
## 1. 系统概述(模拟 DDS
```
系统名称ProjectMoneyX个人财务管理系统
技术栈Go + Gin + GORM / Vue3 + TypeScript + Vuetify3
模块列表:
- 账单导入模块bill-import
- 多维分析模块analysis
- 预算管理模块budget
- 账户管理模块account
- 规则引擎模块rules
```
---
## 2. Skill 套件规划
### 2.1 Skills 清单
| 类型 | Skill Name | 职责 |
|------|-----------|------|
| 系统级 | `developing-moneyx-system` | 跨模块架构、技术栈规范、依赖管理 |
| 模块级 | `developing-bill-import` | 账单导入 ETL 流水线 |
| 模块级 | `developing-analysis` | 多维财务分析与图表 |
| 模块级 | `developing-budget` | 预算创建与跟踪 |
| 模块级 | `developing-account` | 账户 CRUD 与余额同步 |
| 模块级 | `developing-rules` | 分类规则引擎 |
| 横切 | `designing-contracts` | API/DTO 契约规范 |
| 横切 | `managing-db-migrations` | 数据库迁移策略 |
| 横切 | `managing-observability` | 日志、错误追踪 |
### 2.2 总目录树
```
1-AgentSkills/
├── developing-moneyx-system/
│ ├── SKILL.md
│ ├── reference/
│ │ ├── 01-architecture/
│ │ │ └── dependencies.md
│ │ └── 02-tech-stack/
│ │ └── conventions.md
│ └── scripts/
│ └── verify.sh
├── developing-bill-import/
│ ├── SKILL.md
│ ├── reference/
│ │ ├── 01-etl-pipeline/
│ │ │ └── pipeline-design.md
│ │ ├── 02-api-design/
│ │ │ └── apis.md
│ │ └── 03-data-model/
│ │ └── db-schema.md
│ ├── examples/
│ │ └── etl-processor.go
│ └── scripts/
│ └── verify.sh
├── developing-analysis/
│ ├── ...(同上结构)
├── developing-budget/
│ ├── ...
├── developing-account/
│ ├── ...
├── developing-rules/
│ ├── ...
├── designing-contracts/
│ ├── SKILL.md
│ ├── reference/
│ │ └── api-response-spec.md
│ └── scripts/
│ └── verify.sh
├── managing-db-migrations/
│ ├── SKILL.md
│ ├── reference/
│ │ └── migration-conventions.md
│ └── scripts/
│ └── verify.sh
└── managing-observability/
├── SKILL.md
├── reference/
│ └── logging-standards.md
└── scripts/
└── verify.sh
```
---
## 3. 关键转换决策
### 3.1 模块边界划分
> **决策依据**DDS 中每个"章节"对应一个业务域,每个业务域生成一个模块级 Skill。
### 3.2 横切关注点识别
从 DDS 全文 grep 识别跨模块使用的技术点:
```bash
# 发现所有模块都用了统一响应格式 → designing-contracts
grep -c "ResponseError\|ResponseSuccess" *.go
# 发现多个模块有 migration 文件 → managing-db-migrations
find . -name "*migration*" -o -name "*migrate*"
# 发现多个模块有日志调用 → managing-observability
grep -rn "log\.\(Info\|Error\|Debug\)" --include="*.go" | wc -l
```
### 3.3 Reference 深度决策
| 要素 | 模块 | DDS 覆盖度 | reference 策略 |
|------|------|-----------|---------------|
| API | bill-import | 完整 | 全量抽取到 apis.md |
| DB Schema | budget | 部分 | 抽取已有 + [TBD] 标注缺失 |
| 事件 | analysis | 无 | 跳过,无需创建事件 reference |
| 状态机 | bill-import | 完整 | ETL 状态流转到 state-machine.md |
---
## 4. 输出示例:系统级 Skill
```yaml
---
name: developing-moneyx-system
description: >
指导 ProjectMoneyX 个人财务管理系统的全局架构决策与跨模块一致性Guides system-level architecture for ProjectMoneyX personal finance system
包含:模块注册、技术栈规范、依赖管理、响应格式统一。
触发场景 Trigger: 新增模块 / 跨模块变更 / 架构决策 / 技术栈选型。
关键词 Keywords: moneyx, system, architecture, 架构, 财务, finance, 模块, cross-module。
---
```

View File

@@ -0,0 +1,95 @@
# DDS-to-Skill 转换实例:工单流程模块
本文展示了一个从 DDS 片段到完整 Skill 的转换过程。
---
## 1. DDS 原文片段(模拟)
```markdown
## 5. 工单管理模块rmdc-work-procedure
### 5.1 模块职责
负责工单生命周期管理,包括创建、审批、执行、完成等流转。
### 5.2 数据库设计
#### workflows 主表
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | BIGINT | PK, AUTO_INCREMENT | 工单ID |
| type | VARCHAR(50) | NOT NULL | 工单类型 |
| status | VARCHAR(30) | NOT NULL, DEFAULT 'pending' | 当前状态 |
| creator_id | BIGINT | NOT NULL, FK → users.id | 创建人 |
| assignee_id | BIGINT | FK → users.id | 处理人 |
| version | INT | NOT NULL, DEFAULT 1 | 乐观锁版本号 |
### 5.3 状态机
- pending → submitted → under_review → approved/rejected
- submitted → revoked创建人可撤销
- 终态approved, rejected, revoked, closed
### 5.4 API 接口
- POST /api/workflow/create - 创建工单
- POST /api/workflow/transition - 状态转换
- POST /api/workflow/callback - 业务回调
- POST /api/workflow/list - 工单列表
```
---
## 2. 转换步骤演示
### 步骤 1模块识别
| 模块 | 职责 | 技术 | Skill 类型 |
|------|------|------|-----------|
| rmdc-work-procedure | 工单生命周期管理 | Go + Gin + PostgreSQL | 模块级 |
### 步骤 2设计要素抽取
从 DDS 中识别到 4 类要素:
- ✅ API/接口 → 4 个 API 端点
- ✅ DB Schema → workflows 主表
- ✅ 状态机 → 5 个状态 + 转换规则
- ❌ 事件/Topic → DDS 未提及 → 标注 [TBD]
- ❌ 授权模型 → DDS 未提及 → 标注 [TBD]
- ✅ 依赖关系 → 业务模块回调
### 步骤 3reference 文件生成
```
reference/
├── 01-data-model/
│ └── db-schema.md # workflows 表结构
├── 02-api-design/
│ └── apis.md # 4 个 API 定义
├── 03-workflow-engine/
│ └── state-machine.md # 状态机定义
└── 04-integration/
└── dependencies.md # 回调接口
```
### 步骤 4SKILL.md 关键段落
```markdown
## Pitfalls
1. **版本号遗漏**: 更新工单时忘记传递 `version` 字段,导致乐观锁失效
(参考 `reference/01-data-model/db-schema.md` 中 workflows.version 字段定义)
2. **终态误转换**: 对 approved/rejected/revoked/closed 状态尝试非法转换
(参考 `reference/03-workflow-engine/state-machine.md` 中的终态定义)
3. **事件推送遗漏**: 状态变更后忘记通知相关方 [TBD - DDS 未定义事件机制]
```
---
## 3. 自检结果
| # | 检查项 | 结果 | 说明 |
|---|-------|------|------|
| S4 | SKILL.md 存在 | ✅ PASS | |
| R1 | 设计要素 ≥ 3 类 | ✅ PASS | API + DB + 状态机 + 依赖 = 4 类 |
| R5 | TBD 标注 | ✅ PASS | 事件和授权标注了 [TBD] |
| C7 | Pitfalls 引用 reference | ✅ PASS | 2 条引用了 reference 路径 |
| R6 | 无脑补 | ✅ PASS | 缺失内容均标注 [TBD] |

View File

@@ -0,0 +1,260 @@
# DDS 设计细节抽取指南
本文档详细说明如何从 DDS详细设计说明书中抽取设计细节并组织到 reference/ 目录中。
---
## 1. 章节标题提取
### 1.1 标题识别规则(按优先级)
| 优先级 | 格式 | 示例 |
|-------|------|------|
| 1 | Markdown 标题 | `# 系统架构``## 接口设计` |
| 2 | 编号标题 | `1 概述``2.3 数据库设计` |
| 3 | 中文章标题 | `第一章 总体设计``第3章` |
| 4 | 中文小节 | `一、系统概述``(二)接口规范` |
### 1.2 提取命令
```bash
# 综合提取(推荐首选)
grep -nE '^(#{1,6}\s+|[0-9]+(\.[0-9]+){0,3}\s+|第[一二三四五六七八九十]+章|第[0-9]+章|[一二三四五六七八九十]+、)' "$DDS_FILE" | head -n 120
# 如果上面匹配不足,尝试更宽松的模式
sed -n '1,200p' "$DDS_FILE" | nl -ba | sed -n '1,120p'
```
### 1.3 降级策略
当标题提取不足(少于 3 个)或 DDS 格式混乱时:
```
reference/00-unknown/
├── 01-apis/
├── 02-events/
├── 03-db/
├── 04-state-machine/
└── 05-security/
```
同时在自检中标记 FAIL
- **原因**DDS 标题结构不可识别
- **建议**:提供 Markdown 标题 / 章节目录 / md 格式导出版本
---
## 2. 六类设计要素抽取方法
### 2.1 API/接口
**扫描关键词**
```bash
grep -nE "API|接口|路径|路由|request|response|错误码|error|handler|endpoint|method" "$DDS_FILE" | head -n 80
```
**抽取内容**
| 字段 | 说明 |
|------|------|
| 路径 | `/api/v1/users/list` |
| 方法 | POST / GET 等 |
| 请求字段 | 字段名、类型、是否必须、校验规则 |
| 响应字段 | 字段名、类型、说明 |
| 错误码 | code + message + 触发场景 |
| 鉴权要求 | JWT / API Key / 公开 |
**输出格式**
```markdown
## POST /api/v1/users/list
- **DDS-Section**: 3.2 用户管理接口
- **DDS-Lines**: L120-L168
### Request
| 字段 | 类型 | 必须 | 说明 |
|------|------|------|------|
| page | int | N | 页码,默认 1 |
| page_size | int | N | 每页数量,默认 20 |
### Response
| 字段 | 类型 | 说明 |
|------|------|------|
| list | []User | 用户列表 |
| total | int | 总数 |
### 错误码
| code | message | 触发场景 |
|------|---------|---------|
| 1001 | 参数校验失败 | 字段格式错误 |
```
### 2.2 事件/Topic/消息
**扫描关键词**
```bash
grep -nE "事件|event|MQTT|topic|outbox|消息|payload|幂等|retry|publish|subscribe|Kafka|RabbitMQ" "$DDS_FILE" | head -n 80
```
**抽取内容**
| 字段 | 说明 |
|------|------|
| Topic/Queue 名 | `cmii/rmdc/{project_id}/command` |
| 方向 | Publish / Subscribe |
| Payload 字段 | 字段名、类型、说明 |
| QoS / 可靠性 | At-least-once / Exactly-once |
| 幂等键 | 用于去重的唯一标识字段 |
| 重试策略 | 重试间隔、最大次数、死信队列 |
### 2.3 数据库/Schema
**扫描关键词**
```bash
grep -nE "表|schema|字段|索引|unique|constraint|migration|DDL|PostgreSQL|MySQL|GORM|column|CREATE TABLE" "$DDS_FILE" | head -n 80
```
**抽取内容**
| 字段 | 说明 |
|------|------|
| 表名 | `users``workflows` |
| 字段定义 | 名称、类型、约束、默认值 |
| 索引 | 类型(唯一/普通/组合)、字段 |
| 外键关系 | 引用表、级联策略 |
| 迁移策略 | 向前兼容 / 字段演进方案 |
### 2.4 状态机/流程
**扫描关键词**
```bash
grep -nE "状态机|state|transition|流转|工单|workflow|回调|补偿|lifecycle|FSM|guard" "$DDS_FILE" | head -n 80
```
**抽取内容**
| 字段 | 说明 |
|------|------|
| 状态枚举 | 名称、值、描述 |
| 转换规则 | from → to、触发动作、守卫条件 |
| 角色权限 | 谁可以触发哪些转换 |
| 回调/副作用 | 状态变更后执行的操作 |
| 补偿机制 | 转换失败时的回滚策略 |
### 2.5 授权模型
**扫描关键词**
```bash
grep -nE "RBAC|DAC|鉴权|JWT|claim|授权|TOTP|权限|auth|token|session|role|permission" "$DDS_FILE" | head -n 80
```
**抽取内容**
| 字段 | 说明 |
|------|------|
| 认证方式 | JWT / Session / OAuth |
| JWT Claims | 包含的字段tenant_id, role 等) |
| 角色定义 | 角色名、权限描述 |
| 权限矩阵 | 角色 × 资源 × 操作 |
| 层级设计 | 一级授权 / 二级授权 |
### 2.6 依赖关系
**扫描关键词**
```bash
grep -nE "模块|module|service|依赖|dependency|import|gateway|调用|集成|protocol" "$DDS_FILE" | head -n 80
```
**抽取内容**
| 字段 | 说明 |
|------|------|
| 源模块 | 调用方 |
| 目标模块 | 被调用方 |
| 协议 | HTTP / gRPC / MQTT / 内部调用 |
| 关键接口 | 跨模块调用的接口清单 |
| 失败处理 | 超时、重试、熔断策略 |
---
## 3. reference 目录组织
### 3.1 命名规范
```
reference/
├── 01-architecture-overview/ # 章节序号 + slug
│ └── dependencies.md
├── 02-api-design/
│ ├── apis.md
│ └── error-codes.md
├── 03-data-model/
│ └── db-schema.md
├── 04-message-system/
│ └── events-topics.md
├── 05-workflow-engine/
│ └── state-machine.md
└── 06-security/
└── security-model.md
```
**Slug 生成规则**
1. 全小写
2. 非字母数字字符替换为 `-`
3. 连续 `-` 合并为单个
4. 截断到 48 字符以内
5. 序号来自 DDS 中的章节顺序
### 3.2 SKILL.md 中的引用方式
```markdown
## Pitfalls
1. **MQTT Topic 命名冲突**:新增 topic 前必须检查
`reference/04-message-system/events-topics.md` 中的 topic 清单
## Related References
| 需要了解... | 查阅... |
|------------|--------|
| API 完整定义 | `reference/02-api-design/apis.md` |
| 数据库表结构 | `reference/03-data-model/db-schema.md` |
```
### 3.3 扁平化兼容
当 DDS 章节结构不明显时,也可以采用扁平 reference但需在自检中说明
```
reference/
├── apis.md
├── db-schema.md
├── events-topics.md
├── state-machine.md
├── security-model.md
└── dependencies.md
```
---
## 4. TBD 标注规范
当 DDS 中某个设计要素不完整或不清晰时:
```markdown
## 消息重试策略
- **DDS-Section**: 4.3 消息可靠性
- **DDS-Lines**: L245-L260
### Extract
| 配置项 | 值 |
|-------|---|
| 最大重试次数 | [TBD - DDS 未明确指定] |
| 重试间隔 | [TBD - DDS 未明确指定] |
| 死信队列 | [TBD - DDS 仅提及概念,未给出配置] |
### 最小补充信息清单
1. 重试次数上限(建议 3~5 次)
2. 重试间隔策略(固定 / 指数退避)
3. 死信队列名称与消费策略
```

View File

@@ -0,0 +1,162 @@
# Frontmatter 编写规范
Frontmatter 是 Skill 的"身份证",决定了 Skill 何时被触发、是否被正确识别。编写不当会导致 Skill 永远不会被使用。
---
## 1. 必须字段
### 1.1 name
**规则**
- 小写字母 + 数字 + 连字符(`-`
- 动名词形式开头(`developing-``managing-``implementing-``designing-`
- ≤ 64 字符
- 包含模块名或领域名
**常用前缀**
| 前缀 | 适用场景 |
|------|---------|
| `developing-` | 模块开发、功能实现 |
| `managing-` | 管理类操作DB、配置、部署 |
| `implementing-` | 特定技术方案实现 |
| `designing-` | 设计阶段的规范和契约 |
| `writing-` | 编写文档、脚本、测试 |
**示例**
```yaml
# ✅ 正确
name: developing-work-procedure
name: managing-db-migrations
name: implementing-totp-auth
# ❌ 错误
name: WorkProcedure # 不能大写
name: work_procedure # 不能用下划线
name: wp # 太短,无法触发
```
### 1.2 description
**这是最关键的字段** —— 决定 Skill 是否能被正确触发。
**硬性规则**
1. **必须单行**(不换行) —— 换行会导致 YAML 解析出错Skill 静默失败
2. **< 1024 字符**
3. **第三人称**描述
4. **中英文混合** —— 确保中文和英文查询都能命中
5. **包含触发场景**Trigger**关键词**Keywords
**结构模板**
```
<功能概述(中英文)>。包含:<具体能力列表>。触发场景 Trigger: <场景列表>。关键词 Keywords: <关键词列表>。
```
**示例**
```yaml
# ✅ 正确(单行,中英文混合,包含触发场景和关键词)
description: 指导 rmdc-work-procedure 工单流程模块的开发Guides development of rmdc-work-procedure workflow module。包含状态机实现、工单 CRUD、并发控制、WebSocket 事件。触发场景 Trigger: 修改工单表 / 添加工单类型 / 变更状态转换 / 实现工单 API。关键词 Keywords: workflow, work-procedure, state-machine, 工单, 状态机, 流转。
# ❌ 错误 - 多行(会静默失败!)
description: |
指导工单模块开发。
包含状态机实现。
# ❌ 错误 - 太短,无关键词
description: 工单模块开发指导
# ❌ 错误 - 纯英文,中文查询无法命中
description: Guides the development of workflow module with state machine
```
**推动触发的技巧**
- Claude "不触发"的倾向所以 description 应该稍微"激进"一些
- 多列出触发场景覆盖用户可能的表述方式
- 包含同义词工单/workflow/ticket
### 1.3 argument-hint
**规则**
- 说明 `$ARGUMENTS` 的期望格式
- 给出 2~3 个具体示例
**示例**
```yaml
argument-hint: "<action> <target> - e.g., 'create handler user', 'add api /workflow/create', 'update schema workflows'"
```
---
## 2. 可选字段
### 2.1 allowed-tools
**原则**最小授权 —— 只声明 Skill 真正需要的工具
| 工具 | 适用场景 |
|------|---------|
| `Read` | 读取文件几乎总是需要 |
| `Glob` | 搜索文件几乎总是需要 |
| `Grep` | 搜索文件内容几乎总是需要 |
| `Bash` | 执行 shell 命令按需 |
| `Write` | 创建新文件开发类 Skill 需要 |
| `Edit` | 编辑现有文件开发类 Skill 需要 |
**示例**
```yaml
# 只读 Skill审查/分析类)
allowed-tools:
- Read
- Glob
- Grep
# 开发 Skill需要写文件
allowed-tools:
- Read
- Write
- Edit
- Glob
- Grep
- Bash
```
---
## 3. YAML 格式注意事项
### 3.1 多行 description 的安全写法
如果 description 确实很长使用 `>` 折叠块语法注意这仍然会被解析为单行
```yaml
description: >
指导 rmdc-work-procedure 工单流程模块的开发。
包含状态机实现、工单 CRUD、并发控制。
触发场景 Trigger: 修改工单表、添加工单类型。
```
> ⚠️ 使用 `>` 时YAML 会将换行替换为空格,最终合并为单行。这是安全的。
> ❌ 绝不要使用 `|`(保留换行块语法),那会导致多行 description。
### 3.2 特殊字符转义
```yaml
# 包含冒号时用引号包裹
argument-hint: "<action>: <target>"
# 包含 # 时用引号
description: "指导 C# 项目开发"
```
---
## 4. 自检清单
- [ ] `name` 为动名词形式小写连字符、≤ 64 字符
- [ ] `description` 在最终 YAML 中为单行
- [ ] `description` < 1024 字符
- [ ] `description` 包含中文和英文
- [ ] `description` 包含触发场景Trigger
- [ ] `description` 包含关键词Keywords
- [ ] `argument-hint` 有具体示例
- [ ] `allowed-tools` 遵循最小授权

View File

@@ -0,0 +1,114 @@
# 全局质量自检清单
DDS-to-Skill 转换完成后,必须按以下清单逐条检查。每条标记 PASS 或 FAILFAIL 必须附修复建议。
---
## 1. 结构完整性
| # | 检查项 | PASS 条件 |
|---|-------|----------|
| S1 | 系统级 Skill 存在 | 恰好 1 个 `developing-*-system` Skill |
| S2 | 模块级 Skills 数量 | = DDS 中识别的模块数 |
| S3 | 横切 Skills 数量 | ≥ 3 个 |
| S4 | 每个 Skill 有 SKILL.md | 所有 Skill 目录下存在 SKILL.md |
| S5 | 每个 Skill 有 reference/ | 所有 Skill 目录下存在 reference/ |
| S6 | 每个 Skill 有 verify.sh | 所有 Skill 的 scripts/ 下存在 verify.sh |
| S7 | 目录命名规范 | 全小写、连字符、动名词形式 |
---
## 2. Frontmatter 规范
| # | 检查项 | PASS 条件 |
|---|-------|----------|
| F1 | description 单行 | YAML 解析后 description 为单行字符串 |
| F2 | description 长度 | < 1024 字符 |
| F3 | description 中英文 | 同时包含中文和英文描述 |
| F4 | description 含触发场景 | 包含 "触发场景" "Trigger" 关键词 |
| F5 | description 含关键词 | 包含 "关键词" "Keywords" |
| F6 | name 格式 | 小写字母 + 数字 + 连字符动名词开头 |
| F7 | argument-hint 存在 | frontmatter 中包含 argument-hint 字段 |
| F8 | allowed-tools 最小授权 | 只读 Skill 不包含 Write/Edit |
---
## 3. 内容质量
| # | 检查项 | PASS 条件 |
|---|-------|----------|
| C1 | SKILL.md 行数 | < 500 |
| C2 | 包含 Plan 章节 | grep `## Plan` |
| C3 | 包含 Verify 章节 | grep `## Verify` |
| C4 | 包含 Execute 章节 | grep `## Execute` |
| C5 | 包含 Pitfalls 章节 | grep `## Pitfalls` |
| C6 | 动态注入 | 2 `!` + 反引号命令 |
| C7 | Pitfalls 引用 reference | 2 Pitfall 中出现 `reference/` 路径 |
| C8 | 无空话 | 不含"检查 XX 一致性"这类无具体动作的描述 |
| C9 | 无常识内容 | 不含 Claude 已知的通用知识 HTTP 状态码定义 |
| C10 | 术语一致 | 同一概念在所有 Skill 中使用相同术语 |
---
## 4. Reference 质量
| # | 检查项 | PASS 条件 |
|---|-------|----------|
| R1 | 设计要素覆盖率 | 每个模块 Skill 覆盖 3 API/事件/DB/状态机/权限/依赖 |
| R2 | 章节分层 | reference/ 下存在 `01-*` 等编号目录或使用扁平+说明 |
| R3 | DDS 溯源 | 每条 reference `DDS-Section:` 字段 |
| R4 | DDS 行号 | 每条 reference `DDS-Lines:` 字段 |
| R5 | TBD 标注 | DDS 缺失内容标注 `[TBD]`附最小补充清单 |
| R6 | 无脑补 | 所有设计细节可溯源到 DDS 原文 |
| R7 | 内容充分 | reference 包含足够的结构化数据表格/列表/代码块 |
---
## 5. 跨 Skill 一致性
| # | 检查项 | PASS 条件 |
|---|-------|----------|
| X1 | 模块名一致 | 所有 Skill 中模块名拼写相同 |
| X2 | 错误码不冲突 | 相同错误码在不同 Skill 中含义相同 |
| X3 | API 路径不冲突 | 不同模块的 API 路径无重叠 |
| X4 | 事件/Topic 定义一致 | 同一 Topic 在发布方和订阅方 Skill 中定义相同 |
| X5 | 授权模型一致 | JWT Claims角色定义在所有 Skill 中一致 |
---
## 6. 自检输出格式
```markdown
# 全局自检结果
## 结构完整性
- ✅ S1 PASS: 系统级 Skill `developing-xxx-system` 存在
- ✅ S2 PASS: 模块级 Skills 数量 = 5匹配 DDS 中的 5 个模块)
- ❌ S3 FAIL: 横切 Skills 仅 2 个,少于要求的 3 个
- **修复**: 从 DDS 中识别出缓存策略章节,建议增加 `managing-cache` Skill
- ✅ S4 PASS: 所有 Skill 目录下存在 SKILL.md
## Frontmatter 规范
- ✅ F1 PASS: 所有 description 为单行
- ❌ F2 FAIL: `developing-core` 的 description 超过 1024 字符1156 字符)
- **修复**: 精简触发场景描述,移除重复关键词
## 内容质量
- ✅ C1 PASS: 所有 SKILL.md < 500
- C8 FAIL: `developing-gateway` Verify 包含"检查 API 一致性"
- **修复**: 改为"对照 reference/02-api-design/apis.md 中的接口清单grep 仓库中的 handler 注册点确认路径和方法一致"
## 总计: XX PASS / YY FAIL
```
---
## 7. 常见 FAIL 及修复方案
| FAIL 类型 | 常见原因 | 修复方案 |
|----------|---------|---------|
| description 多行 | 使用了 `\|` 语法 | 改用 `>` 或单行字符串 |
| reference 不足 | DDS 内容被遗漏 | 重新扫描 DDS补充缺失要素 |
| 空话 | 直接复制 DDS 原文 | 转化为可执行的审查动作 |
| 脑补 | DDS 未提及的细节 | 标注 [TBD] 并列出补充清单 |
| 横切不足 | 未充分分析 DDS | DDS 中识别更多跨模块关注点 |

View File

@@ -0,0 +1,255 @@
# SKILL.md 模板库
本文档包含系统级 Skill、模块级 Skill、横切 Skill 的 SKILL.md 模板,供 DDS-to-Skill 转换时参照。
---
## 1. 系统级 Skill 模板
```markdown
---
name: developing-<system>-system
description: >
指导 <系统名> 系统级开发决策与跨模块一致性Guides system-level development for <system>)。
包含:架构总览、模块注册、依赖规则、全局变更流程、版本兼容策略、技术栈规范。
触发场景 Trigger: 新增模块 / 跨模块变更 / 全局架构决策 / 技术栈选型。
关键词 Keywords: <system>, system, architecture, 架构, 模块, 依赖, 兼容, cross-module。
argument-hint: "<module-name|change-type> - 指定涉及的模块名或变更类型"
allowed-tools:
- Read
- Glob
- Grep
- Bash
---
# Developing <System> System
<一段话描述系统整体架构技术栈模块组成>
## Quick Context
```bash
# 动态注入:查看系统模块结构
!`ls -la <project-root>/`
# 动态注入:搜索模块间依赖
!`grep -rnE "import|module|service" <project-root>/ | head -30`
```
## Architecture Overview
<ASCII 架构图或层次说明>
## Module Registry
| 模块 | 职责 | 技术 | Skill |
|------|------|------|-------|
| ... | ... | ... | `developing-<module>` |
## Plan
### 产物清单
- [ ] 确定变更涉及的模块列表
- [ ] 确认是否涉及跨模块通信
- [ ] 确认是否涉及契约变更
- [ ] 确认是否需要数据库迁移
### 决策点
1. 变更是否影响多个模块?
2. 是否需要版本兼容处理?
3. 是否需要全局配置变更?
## Verify
- [ ] 模块间依赖无循环
- [ ] 共享契约版本一致
- [ ] 全局配置项完整
- [ ] 技术栈版本对齐
## Execute
### 添加新模块
1. 在项目根目录创建模块目录...
2. 注册到路由/网关...
3. 更新模块依赖图...
### 跨模块变更
1. 列出所有受影响模块...
2. 按依赖顺序逐个修改...
3. 运行集成测试...
## Pitfalls
1. **循环依赖**: 模块间禁止直接 import必须通过共享接口定义
2. **版本不一致**: 修改共享结构需同步更新所有消费方
3. ...
## Related References
- [模块依赖关系](reference/dependencies.md)
- [技术栈规范](reference/tech-stack.md)
```
---
## 2. 模块级 Skill 模板
```markdown
---
name: developing-<module>
description: >
指导 <module> 模块的开发Guides development of <module> module
包含:<模块职责概述>、API 实现、数据库操作、状态管理、安全校验。
触发场景 Trigger: 开发/修改 <module> 相关功能 / <模块特定场景>。
关键词 Keywords: <module>, <技术关键词>, <业务关键词>。
argument-hint: "<action> <target> - e.g., 'create handler', 'add api', 'update schema'"
allowed-tools:
- Read
- Write
- Edit
- Glob
- Grep
- Bash
---
# Developing <Module>
<一段话描述模块职责、技术栈、在系统中的位置>
## Quick Context
```bash
# 动态注入:查看模块结构
!`find . -name "*.go" -path "*/<module>/*" | head -20`
# 动态注入:查看现有接口
!`grep -rn "func.*Handler\|func.*Service" ./<module>/ | head -20`
```
## Plan
### 产物清单
- [ ] <根据 DDS 列出具体产物>
### 决策点
1. < DDS 抽取的关键决策>
2. ...
## Verify
### <验证类别 1>
- [ ] <具体检查项引用 reference>
### <验证类别 2>
- [ ] <具体检查项>
## Execute
### 1. <步骤标题>
```bash
# 具体操作命令
```
### 2. <步骤标题>
```go
// 关键代码骨架
```
## Pitfalls
1. **<坑名>**: <描述>(参考 `reference/<file>.md`
2. ...(至少 3 条,至少 2 条引用 reference
## Related References
- [API 定义](reference/01-<section>/apis.md)
- [数据库 Schema](reference/02-<section>/db-schema.md)
```
---
## 3. 横切 Skill 模板
```markdown
---
name: <crosscut-skill-name>
description: >
<横切关注点>的统一规范与实现指导Guides <crosscut concern> across all modules
包含:<具体内容列表>。
触发场景 Trigger: <触发场景列表>。
关键词 Keywords: <关键词列表>。
argument-hint: "<module-name|file-path> - 指定要应用规范的模块或文件"
allowed-tools:
- Read
- Glob
- Grep
- Bash
---
# <横切 Skill 标题>
<描述这个横切关注点在系统中的重要性和适用范围>
## Quick Context
```bash
# 动态注入
!`<扫描所有模块中与该横切主题相关的文件>`
```
## Plan
### 产物清单
- [ ] <横切维度的产物>
### 决策点
1. <跨模块的统一决策>
2. ...
## Verify
- [ ] <跨模块一致性检查>
- [ ] <规范合规检查>
- [ ] ...
## Execute
### 全局规范
<适用于所有模块的规则>
### 模块适配
<各模块的特殊处理>
## Pitfalls
1. **<跨模块一致性问题>**: <描述>
2. ...
## Related References
- [全局规范定义](reference/<global-spec>.md)
```
---
## 4. 模板使用注意事项
### 4.1 必须自定义的部分
- `<尖括号>` 中的所有占位符
- Plan 的产物清单和决策点必须来自 DDS
- Verify 的检查项必须与模块设计细节对应
- Pitfalls 必须与模块/主题强相关,不可用通用建议填充
### 4.2 禁止照搬模板
模板是结构参考,不是内容来源。以下行为将导致自检 FAIL
- 产物清单中出现模板占位符
- Pitfalls 与模块无关(如:在前端 Skill 中出现数据库 Pitfall
- Verify 中没有引用任何 reference
### 4.3 按 DDS 内容增减
- 如果 DDS 中没有状态机,模块 Skill 可以不包含状态机相关 Verify
- 如果 DDS 中有额外的关注点(如性能优化、缓存策略),应增加对应章节
- 横切 Skill 的数量和主题必须由 DDS 内容决定

View File

@@ -0,0 +1,214 @@
#!/bin/bash
# verify-skill-output.sh
# 验证 DDS-to-Skill 转换输出的完整性和质量
#
# 用法:./verify-skill-output.sh <skills-output-dir>
# 示例:./verify-skill-output.sh /path/to/1-AgentSkills
#
# 依赖bash, grep, sed, find, wc
set -e
SKILLS_DIR="${1:-.}"
PASS=0
FAIL=0
WARN=0
# 颜色输出
GREEN='\033[0;32m'
RED='\033[0;31m'
YELLOW='\033[0;33m'
NC='\033[0m' # No Color
pass() {
echo -e "${GREEN}✅ PASS${NC}: $1"
((PASS++))
}
fail() {
echo -e "${RED}❌ FAIL${NC}: $1"
echo -e " ${RED}修复${NC}: $2"
((FAIL++))
}
warn() {
echo -e "${YELLOW}⚠️ WARN${NC}: $1"
((WARN++))
}
echo "============================================"
echo " DDS-to-Skill 输出质量验证"
echo " 目标目录: $SKILLS_DIR"
echo "============================================"
echo ""
# ============================================
# 1. 结构完整性检查
# ============================================
echo "--- 1. 结构完整性 ---"
# S1: 检查是否有系统级 Skill
SYSTEM_SKILLS=$(find "$SKILLS_DIR" -maxdepth 1 -type d -name "*-system*" 2>/dev/null | wc -l)
if [ "$SYSTEM_SKILLS" -ge 1 ]; then
pass "S1: 存在系统级 Skill ($SYSTEM_SKILLS 个)"
else
warn "S1: 未找到系统级 Skill名称包含 '-system'"
fi
# S4: 每个 Skill 都有 SKILL.md
SKILL_DIRS=$(find "$SKILLS_DIR" -maxdepth 1 -type d ! -name "$(basename "$SKILLS_DIR")" 2>/dev/null)
MISSING_SKILLMD=0
for dir in $SKILL_DIRS; do
if [ ! -f "$dir/SKILL.md" ]; then
fail "S4: $dir 缺少 SKILL.md" "创建该目录下的 SKILL.md"
((MISSING_SKILLMD++))
fi
done
if [ "$MISSING_SKILLMD" -eq 0 ]; then
pass "S4: 所有 Skill 目录都有 SKILL.md"
fi
# S5: 每个 Skill 都有 reference/
MISSING_REF=0
for dir in $SKILL_DIRS; do
if [ ! -d "$dir/reference" ]; then
warn "S5: $dir 缺少 reference/ 目录"
((MISSING_REF++))
fi
done
if [ "$MISSING_REF" -eq 0 ]; then
pass "S5: 所有 Skill 目录都有 reference/"
fi
echo ""
# ============================================
# 2. Frontmatter 规范检查
# ============================================
echo "--- 2. Frontmatter 规范 ---"
for dir in $SKILL_DIRS; do
SKILL_FILE="$dir/SKILL.md"
[ ! -f "$SKILL_FILE" ] && continue
SKILL_NAME=$(basename "$dir")
# F1: name 字段
if head -20 "$SKILL_FILE" | grep -q '^name:'; then
pass "F1 [$SKILL_NAME]: frontmatter 包含 name"
else
fail "F1 [$SKILL_NAME]: 缺少 name 字段" "在 frontmatter 中添加 name 字段"
fi
# F2: description 字段
if head -20 "$SKILL_FILE" | grep -q '^description:'; then
pass "F2 [$SKILL_NAME]: frontmatter 包含 description"
else
fail "F2 [$SKILL_NAME]: 缺少 description 字段" "在 frontmatter 中添加 description 字段"
fi
# C1: 行数 < 500
LINE_COUNT=$(wc -l < "$SKILL_FILE")
if [ "$LINE_COUNT" -lt 500 ]; then
pass "C1 [$SKILL_NAME]: SKILL.md = $LINE_COUNT 行 (< 500)"
else
fail "C1 [$SKILL_NAME]: SKILL.md = $LINE_COUNT 行 (>= 500)" "将冗长内容移到 reference/ 中"
fi
done
echo ""
# ============================================
# 3. 内容质量检查
# ============================================
echo "--- 3. 内容质量 ---"
for dir in $SKILL_DIRS; do
SKILL_FILE="$dir/SKILL.md"
[ ! -f "$SKILL_FILE" ] && continue
SKILL_NAME=$(basename "$dir")
# C2-C5: 必须章节
for section in "Plan" "Verify" "Execute" "Pitfalls"; do
if grep -q "## $section" "$SKILL_FILE"; then
pass "C [$SKILL_NAME]: 包含 ## $section"
else
warn "C [$SKILL_NAME]: 缺少 ## $section 章节"
fi
done
# C6: 动态注入
INJECT_COUNT=$(grep -c '!`' "$SKILL_FILE" 2>/dev/null || echo 0)
if [ "$INJECT_COUNT" -ge 2 ]; then
pass "C6 [$SKILL_NAME]: $INJECT_COUNT 处动态注入 (>= 2)"
else
warn "C6 [$SKILL_NAME]: 仅 $INJECT_COUNT 处动态注入 (建议 >= 2)"
fi
# C7: Pitfalls 引用 reference
REF_IN_PITFALLS=$(sed -n '/## Pitfalls/,/## /p' "$SKILL_FILE" | grep -c 'reference/' 2>/dev/null || echo 0)
if [ "$REF_IN_PITFALLS" -ge 2 ]; then
pass "C7 [$SKILL_NAME]: Pitfalls 中 $REF_IN_PITFALLS 处引用 reference (>= 2)"
else
warn "C7 [$SKILL_NAME]: Pitfalls 中仅 $REF_IN_PITFALLS 处引用 reference (建议 >= 2)"
fi
done
echo ""
# ============================================
# 4. Reference 质量检查
# ============================================
echo "--- 4. Reference 质量 ---"
for dir in $SKILL_DIRS; do
[ ! -d "$dir/reference" ] && continue
SKILL_NAME=$(basename "$dir")
# R2: 章节分层
SECTION_DIRS=$(find "$dir/reference" -maxdepth 1 -type d -name '0*' 2>/dev/null | wc -l)
if [ "$SECTION_DIRS" -ge 1 ]; then
pass "R2 [$SKILL_NAME]: reference 有 $SECTION_DIRS 个章节子目录"
else
warn "R2 [$SKILL_NAME]: reference 无章节子目录(使用扁平结构)"
fi
# R3: DDS 溯源
DDS_SECTION_COUNT=$(grep -r 'DDS-Section:' "$dir/reference/" 2>/dev/null | wc -l)
if [ "$DDS_SECTION_COUNT" -ge 1 ]; then
pass "R3 [$SKILL_NAME]: $DDS_SECTION_COUNT 处 DDS-Section 溯源"
else
warn "R3 [$SKILL_NAME]: 无 DDS-Section 溯源标记"
fi
# R5: TBD 标注
TBD_COUNT=$(grep -r '\[TBD' "$dir/reference/" 2>/dev/null | wc -l)
if [ "$TBD_COUNT" -ge 0 ]; then
pass "R5 [$SKILL_NAME]: $TBD_COUNT 处 [TBD] 标注"
fi
# R1: 设计要素类型数
REF_FILES=$(find "$dir/reference" -name "*.md" 2>/dev/null | wc -l)
if [ "$REF_FILES" -ge 3 ]; then
pass "R1 [$SKILL_NAME]: $REF_FILES 个 reference 文件 (>= 3)"
else
warn "R1 [$SKILL_NAME]: 仅 $REF_FILES 个 reference 文件 (建议 >= 3)"
fi
done
echo ""
# ============================================
# 总结
# ============================================
echo "============================================"
echo " 验证完成"
echo " ✅ PASS: $PASS"
echo " ❌ FAIL: $FAIL"
echo " ⚠️ WARN: $WARN"
echo "============================================"
if [ "$FAIL" -gt 0 ]; then
exit 1
else
exit 0
fi

View File

@@ -0,0 +1,60 @@
#!/bin/bash
# verify.sh - dds-to-skill Skill 自身结构验证
#
# 验证本 Skill 的文件结构和内容完整性
# 用法cd dds-to-skill && ./scripts/verify.sh
set -e
PASS=0; FAIL=0
check() {
if eval "$2"; then
echo "✅ PASS: $1"; ((PASS++))
else
echo "❌ FAIL: $1"; ((FAIL++))
fi
}
SKILL_DIR="$(cd "$(dirname "$0")/.." && pwd)"
echo "=== dds-to-skill Skill 自检 ==="
echo "目录: $SKILL_DIR"
echo ""
# 结构检查
check "SKILL.md 存在" "test -f '$SKILL_DIR/SKILL.md'"
check "reference/ 目录存在" "test -d '$SKILL_DIR/reference'"
check "examples/ 目录存在" "test -d '$SKILL_DIR/examples'"
check "scripts/ 目录存在" "test -d '$SKILL_DIR/scripts'"
# SKILL.md 内容检查
check "SKILL.md < 500 行" "[ \$(wc -l < '$SKILL_DIR/SKILL.md') -lt 500 ]"
check "包含 name 字段" "head -20 '$SKILL_DIR/SKILL.md' | grep -q '^name:'"
check "包含 description 字段" "head -20 '$SKILL_DIR/SKILL.md' | grep -q '^description:'"
check "包含 argument-hint" "head -20 '$SKILL_DIR/SKILL.md' | grep -q 'argument-hint:'"
# 阶段结构检查
check "包含 Phase 0读取" "grep -q 'Phase 0' '$SKILL_DIR/SKILL.md'"
check "包含 Phase 1分析" "grep -q 'Phase 1' '$SKILL_DIR/SKILL.md'"
check "包含 Phase 2抽取" "grep -q 'Phase 2' '$SKILL_DIR/SKILL.md'"
check "包含 Phase 3生成" "grep -q 'Phase 3' '$SKILL_DIR/SKILL.md'"
check "包含 Phase 4支撑文件" "grep -q 'Phase 4' '$SKILL_DIR/SKILL.md'"
check "包含 Phase 5自检" "grep -q 'Phase 5' '$SKILL_DIR/SKILL.md'"
# Reference 文件检查
check "dds-extraction-guide.md 存在" "test -f '$SKILL_DIR/reference/dds-extraction-guide.md'"
check "skill-templates.md 存在" "test -f '$SKILL_DIR/reference/skill-templates.md'"
check "frontmatter-spec.md 存在" "test -f '$SKILL_DIR/reference/frontmatter-spec.md'"
check "quality-checklist.md 存在" "test -f '$SKILL_DIR/reference/quality-checklist.md'"
# Examples 检查
check "至少 1 个转换示例" "find '$SKILL_DIR/examples' -name '*.md' | grep -q ."
# 动态注入检查
INJECT_COUNT=$(grep -c '!\`' "$SKILL_DIR/SKILL.md" 2>/dev/null || echo 0)
check "SKILL.md 包含动态注入 (>= 2 处)" "[ $INJECT_COUNT -ge 2 ]"
echo ""
echo "=== 结果: $PASS PASS / $FAIL FAIL ==="
[ $FAIL -eq 0 ] && exit 0 || exit 1

View File

@@ -0,0 +1,267 @@
---
name: developing-projectmoneyx
description: >
指导 ProjectMoneyX 多源账单数据治理系统的全栈开发Guides full-stack development of ProjectMoneyX bill data governance system
包含ETL Pipeline 编排Parse → Normalize → Dedup → Link → Rule → Export、插件化解析器对接、三层去重策略、规则引擎映射、Firefly III 适配、SQLite 数据模型、审计追溯。
触发场景 Trigger: 开发/修改 ProjectMoneyX 的 Parser / Pipeline / 去重 / 规则 / 导入导出 / 审计 / 前端页面 / API 接口。
关键词 Keywords: ProjectMoneyX, 账单, bill, ETL, parser, dedup, 去重, 链路合并, transfer link, rule engine, 规则引擎, Firefly III, 导入, import, export, audit, 审计, SQLite, GORM, GIN, Vue3, Vuetify。
argument-hint: "<action> <target>" 例如/ e.g.:
"add parser for ccb", "implement dedup scorer", "create rule handler",
"update transaction schema", "build import preview page"
allowed-tools:
- Read
- Write
- Edit
- Glob
- Grep
- Bash
---
# Developing ProjectMoneyX
ProjectMoneyX 是 Firefly III 生态的**本地化多源账单数据治理中间件**,技术栈为 Go (GIN + GORM) + Vue3 (TypeScript + Vuetify) + SQLite。系统核心是一条 ETL Pipeline`Parse → Normalize → Dedup → Link → Rule → Export`,将支付宝/微信/银行账单标准化后推送至 Firefly III。
> **架构关键词**DDD 分层 · 插件化 Adapter · 三层去重 · 规则可解释 · 全链路审计
## Quick Context
```bash
# 动态注入:后端项目结构
!`find projectmoneyx-server/internal -type f -name "*.go" | head -40`
# 动态注入:前端项目结构
!`find projectmoneyx-web/src -type f -name "*.ts" -o -name "*.vue" | head -30`
# 动态注入:数据库表定义
!`grep -rn "TableName\|func.*TableName" projectmoneyx-server/internal/ | head -20`
# 动态注入API 路由注册
!`grep -rn "Group\|GET\|POST\|PUT\|DELETE" projectmoneyx-server/internal/handler/ | head -30`
```
## Architecture Overview
```
┌─────────────────────────────────────────────────────┐
│ 展现层: Vue3 + TypeScript + Vuetify │
│ 导入中心 / 清洗预览 / 去重处理 / 规则管理 / 导入任务 / 审计 │
├─────────────────────────────────────────────────────┤
│ 接入层: GIN RESTful API (/api/v1/*) │
│ import / transactions / dedup / rules / export / audit │
├─────────────────────────────────────────────────────┤
│ 应用服务层: Pipeline 编排 │
│ ImportBatchService → PipelineService │
├─────────────────────────────────────────────────────┤
│ 业务逻辑层 (ETL Core Domain) │
│ Parser(插件) → Normalize → Match → Link → Rule → Export │
├─────────────────────────────────────────────────────┤
│ 数据持久层: GORM + SQLite (WAL) │
│ 11 张核心表,分阶段事务 │
└─────────────────────────────────────────────────────┘
```
**分层依赖规则**handler → service → domainentity/repository← dao。Parser/Matcher/Linker/Rule/Exporter 为独立可测试组件。
## Module Registry
| 模块 | 包路径 | 职责 | 优先级 |
|------|--------|------|--------|
| 导入中心 | `handler/import` + `service/import_batch` | 文件上传、批次管理 | P0 |
| 解析引擎 | `parser/` | 插件化平台解析器 | P0 |
| 标准化引擎 | `normalize/` | 异构字段 → 统一 Transaction 模型 | P0 |
| 去重引擎 | `matcher/` | 严格去重 + 模糊去重(P1) | P0/P1 |
| 链路引擎 | `linker/` | 转账闭环 + 订单链路合并 | P0 |
| 规则引擎 | `rule/` | 6 类规则按序执行 | P0/P1 |
| 导出引擎 | `exporter/` | Firefly API/CSV 导出 | P0 |
| 审计中心 | `service/audit` | 全链路追溯 | P0 |
| 系统设置 | `handler/settings` + `config/` | Firefly 连接、阈值参数 | P1 |
## Plan
### 产物清单
| 动作 | 产物 |
|------|------|
| `add parser` | `parser/<platform>/<platform>_parser.go` — 实现 `BillParser` 接口 |
| `create handler` | `handler/<resource>_handler.go` — GIN Handler |
| `create service` | `service/<resource>_service.go` — 应用服务 |
| `create dao` | `dao/<resource>_dao.go` — GORM 数据访问 |
| `create entity` | `domain/entity/<resource>.go` — 领域实体 |
| `add rule type` | `rule/<type>_mapper.go` — 规则映射器 |
| `scaffold module` | 上述全部 + DTO + repository 接口 |
### 决策点
1. **Parser 选择**:先检查 `parser/registry.go` 中已注册的解析器,确认目标平台是否已有实现
2. **去重层级**:严格去重(P0) vs 模糊去重(P1) — 新功能默认只实现严格去重
3. **规则执行顺序**:必须遵守 6 步固定顺序(`reference/04-rule-engine/rule-execution.md`
4. **事务边界**ETL 每阶段独立事务,禁止跨阶段长事务
5. **SQLite 约束**:单写连接 `MaxOpenConns=1`,启用 WAL 模式
---
## Execute
### 1. 新增平台解析器
```go
// 1. 实现 BillParser 接口 (parser/<platform>/<platform>_parser.go)
type Parser struct{}
func (p *Parser) Platform() string { return "<platform>" }
func (p *Parser) Detect(meta FileMeta, header []string) bool {
// 基于文件名/表头特征判定
}
func (p *Parser) Parse(ctx context.Context, reader io.Reader) ([]RawBillRecord, error) {
// 逐行读取 → 填充 RawBillRecord.RawFields
// 必须设置 SourcePlatform, SourceRecordID, RowNo, RowFingerprint
}
// 2. 注册到 Registry (parser/registry.go)
r.Register(&<platform>.Parser{})
```
**字段映射要求**(参考 `reference/02-parser-engine/field-mappings.md`
- `trade_time`:统一 UTC+8`time.Time`
- `amount`:去除货币符号,正数 `decimal(18,6)`
- `direction``income` / `expense` / `transfer` / `refund` / `fee` / `other`
- `category_raw`:保留原始分类,不在 Parser 中做映射
- `order_id`:去除空格,作为唯一标识
### 2. ETL Pipeline 阶段开发
每个阶段必须:
1. 接收 `context.Context` + 数据切片
2. 返回处理后切片 + error
3. 在独立事务中持久化(`db.Transaction`,每批 500 条 `CreateInBatches`
4. 更新批次状态
```go
// 阶段签名模式
func (s *StageService) Execute(ctx context.Context, txns []*Transaction) ([]*Transaction, error)
```
### 3. 规则引擎扩展
新增规则类型时:
1.`rule/engine.go``executionOrder` 中确认位置
2. 实现 `MatchConditions(txn)``ApplyActions(txn)` 方法
3. 确保 `RuleHit` 记录命中日志(含 `BeforeValue` / `AfterValue`
4. 规则条件 JSON 存储,参考 `reference/04-rule-engine/rule-conditions.md`
### 4. API 开发
遵循统一响应格式:
```go
type Response struct {
Code int `json:"code"` // 0=成功
Message string `json:"message"`
Data interface{} `json:"data"`
}
```
路由分组:`/api/v1/import/*``/api/v1/transactions/*``/api/v1/dedup/*``/api/v1/rules/*``/api/v1/export/*``/api/v1/audit/*``/api/v1/settings/*`
### 5. 前端页面开发
7 个核心页面,全部使用 Vue3 + Composition API + TypeScript
- 导入中心:`FileUploader.vue` + 拖拽上传 + 进度条
- 清洗预览:`TransactionTable.vue` + `v-data-table` + 行展开对比
- 去重处理:`DedupCompare.vue` + 左右分栏 + 评分因子展开
- 规则管理:`RuleEditor.vue` + 条件构建器 + 测试预览
- 导入任务:统计概览 + 失败列表 + 单条/批量重试
- 审计追溯:`AuditTimeline.vue` + `v-timeline` + 快照展开
- 系统设置Firefly 连接配置 + 测试连接 + 去重参数配置
---
## Verify
### 架构层级检查
- [ ] handler 层不包含业务逻辑,仅做参数绑定 + 调用 service + 返回响应
- [ ] service 层不直接操作 `*gorm.DB`,通过 repository 接口访问数据
- [ ] domain/entity 不依赖 handler/service
- [ ] 无循环依赖handler → service → domain ← dao
### Parser 检查
- [ ] 新增 Parser 实现了 `BillParser` 接口的全部 3 个方法(`Platform()`, `Detect()`, `Parse()`
- [ ] 已注册到 `parser/registry.go``reference/02-parser-engine/parser-interface.md`
- [ ] 字段映射覆盖了所有原始字段(对照 `reference/02-parser-engine/field-mappings.md`
- [ ] `amount` 为正数,`direction` 独立表达收支方向
- [ ] `RowFingerprint` 使用 SHA256 生成(`reference/03-dedup-engine/fingerprint.md`
### 去重与链路检查
- [ ] 严格去重判定键按优先级 3 级执行(`reference/03-dedup-engine/strict-dedup.md`
- [ ] 模糊去重评分因子 6 项,阈值可配置(`reference/03-dedup-engine/fuzzy-dedup.md`
- [ ] 转账闭环 5 条件全部满足才匹配(`reference/03-dedup-engine/transfer-link.md`
- [ ] 疑似重复60-84 分)进入 `PENDING_REVIEW` 人工确认队列
### 规则引擎检查
- [ ] 6 类规则按固定顺序执行:对手方归一 → 商户归一 → 分类 → 账户 → 标签 → Firefly`reference/04-rule-engine/rule-execution.md`
- [ ] 同类型内按 `priority` 升序执行,首条命中即停止
- [ ] 每条命中记录 `RuleHit`,含 `BeforeValue` / `AfterValue`
- [ ] 规则条件 JSON 结构正确(`reference/04-rule-engine/rule-conditions.md`
### 数据库检查
- [ ] 表结构 11 张表齐全(`reference/05-database/db-schema.md`
- [ ] 关键索引已创建(`reference/05-database/indexes.md`
- [ ] SQLite 配置:`MaxOpenConns=1`, WAL 模式, `cache_size=-64000`
- [ ] ETL 每阶段独立事务,`CreateInBatches` 每批 500 条
### API 检查
- [ ] 路由路径遵循 `reference/06-api-design/api-catalog.md`
- [ ] 统一 `Response` / `PageResponse` 结构
- [ ] 导入前 6 项校验完整(`reference/07-export-engine/import-validation.md`
### 前端检查
- [ ] 所有页面使用 `<script setup lang="ts">` + Composition API
- [ ] 数据表格使用 `v-data-table` + `fixed-header`
- [ ] 处理三种 UI 状态加载中skeleton、空数据empty-state、错误snackbar + 重试)
- [ ] 路由定义匹配 `reference/08-frontend/routes.md`
---
## Pitfalls
1. **支付宝分类是全局基准字典**:系统使用支付宝 22 种分类作为统一标准。微信/银行等平台必须映射到此分类枚举,不要在 Parser 中创造新的分类值体系(参考 `reference/02-parser-engine/field-mappings.md` 中的分类枚举表)
2. **微信分类需要"交易类型 + 商品"联合推断**:微信的"交易类型"是支付动作(商户消费/转账/红包),不是消费语义。必须结合"商品"字段做关键词推断,未命中的归入"其他"并记录。不要直接把微信交易类型当作分类使用(参考 `reference/02-parser-engine/field-mappings.md` 中的微信推断规则流程图)
3. **规则执行顺序不可变**6 类规则的执行顺序是固定的(对手方归一 → 商户归一 → 分类 → 账户 → 标签 → Firefly先归一再分类可提升命中率。修改顺序会导致规则相互依赖断裂参考 `reference/04-rule-engine/rule-execution.md`
4. **SQLite 单写连接**`MaxOpenConns` 必须设为 1SQLite 不支持多写。ETL Pipeline 已通过分阶段事务避免超长锁定。如果遇到 `database is locked` 错误,检查是否有未关闭的事务(参考 `reference/05-database/db-schema.md` 中的 SQLite 配置)
5. **转账闭环需 5 条件全部满足**:金额一致 + 方向互补 + 时间窗口内 + 不同平台 + 非退款/手续费。漏掉任何一项都会导致误合并。特别注意退款交易direction=refund不应参与转账闭环参考 `reference/03-dedup-engine/transfer-link.md`
6. **行指纹是分钟粒度**`GenerateRowFingerprint` 使用 `2006-01-02 15:04` 格式(分钟级),不是秒级。这是为了容忍不同平台对同一交易记录的秒级时间差异(参考 `reference/03-dedup-engine/fingerprint.md`
7. **批次状态机严格受控**:批次状态只能按 `CREATED → UPLOADED → PARSING → ... → IMPORT_SUCCESS` 的线性路径推进,不可跳跃。失败时只能回退至上一可重试状态(参考 `reference/01-architecture/batch-state-machine.md`
---
## Related References
| 需要了解... | 查阅... |
|------------|--------|
| 系统架构与模块依赖 | `reference/01-architecture/system-overview.md` |
| 批次状态机定义 | `reference/01-architecture/batch-state-machine.md` |
| Parser 接口与注册 | `reference/02-parser-engine/parser-interface.md` |
| 平台字段映射规则 | `reference/02-parser-engine/field-mappings.md` |
| 严格去重判定键 | `reference/03-dedup-engine/strict-dedup.md` |
| 模糊去重评分模型 | `reference/03-dedup-engine/fuzzy-dedup.md` |
| 转账闭环识别规则 | `reference/03-dedup-engine/transfer-link.md` |
| 行指纹生成算法 | `reference/03-dedup-engine/fingerprint.md` |
| 规则条件 JSON 格式 | `reference/04-rule-engine/rule-conditions.md` |
| 规则执行顺序与可解释性 | `reference/04-rule-engine/rule-execution.md` |
| 数据库 11 张表结构 | `reference/05-database/db-schema.md` |
| 关键索引设计 | `reference/05-database/indexes.md` |
| API 接口目录 | `reference/06-api-design/api-catalog.md` |
| 统一响应与错误码 | `reference/06-api-design/response-format.md` |
| Firefly 导出适配 | `reference/07-export-engine/firefly-mapping.md` |
| 导入前校验清单 | `reference/07-export-engine/import-validation.md` |
| 前端路由与页面 | `reference/08-frontend/routes.md` |
| 前端组件交互 | `reference/08-frontend/components.md` |
| 非功能设计要求 | `reference/09-nonfunctional/performance.md` |
| 部署与安全 | `reference/09-nonfunctional/deployment.md` |

View File

@@ -0,0 +1,57 @@
## 批次状态机
- **DDS-Section**: 5.2 批次状态机
- **DDS-Lines**: L362-L387
### Extract
#### 状态枚举
| 状态 | 说明 | 备注 |
|------|------|------|
| `CREATED` | 批次已创建 | 初始状态 |
| `UPLOADED` | 文件上传完成 | |
| `PARSING` | 正在解析 | |
| `PARSED` | 解析完成 | |
| `NORMALIZING` | 正在标准化 | |
| `NORMALIZED` | 标准化完成 | |
| `MATCHING` | 正在去重/链路合并 | |
| `MATCHED` | 去重/链路完成 | |
| `RULE_APPLYING` | 正在应用规则 | |
| `PREVIEW_READY` | 规则映射完成,可预览 | 用户可在此阶段查看预览结果、人工确认去重 |
| `IMPORTING` | 正在导入 | 用户确认后触发 |
| `IMPORT_SUCCESS` | 全部导入成功 | 终态 |
| `PARTIAL_FAILED` | 部分失败 | 可重试 |
| `IMPORT_FAILED` | 全部失败 | 可重试 |
| `RETRYING` | 重试中 | |
#### 状态转换规则
```
[*] → CREATED: 创建批次
CREATED → UPLOADED: 文件上传完成
UPLOADED → PARSING: 触发解析
PARSING → PARSED: 解析完成
PARSING → UPLOADED: 解析失败(回退)
PARSED → NORMALIZING: 触发标准化
NORMALIZING → NORMALIZED: 标准化完成
NORMALIZED → MATCHING: 触发去重/链路
MATCHING → MATCHED: 去重/链路完成
MATCHED → RULE_APPLYING: 触发规则映射
RULE_APPLYING → PREVIEW_READY: 规则映射完成
PREVIEW_READY → IMPORTING: 用户确认导入
IMPORTING → IMPORT_SUCCESS: 全部成功
IMPORTING → PARTIAL_FAILED: 部分失败
IMPORTING → IMPORT_FAILED: 全部失败
PARTIAL_FAILED → RETRYING: 用户重试
IMPORT_FAILED → RETRYING: 用户重试
RETRYING → IMPORT_SUCCESS: 重试成功
RETRYING → PARTIAL_FAILED: 仍有失败
```
#### 实现要点
- 解析失败时状态回退至 `UPLOADED`,记录错误信息
- `PREVIEW_READY` 是用户交互节点,用户可查看预览结果和人工确认去重
- 失败状态支持重试,不需要整批重做
- 状态变更必须记录到 `audit_logs`

View File

@@ -0,0 +1,113 @@
## 系统架构总览
- **DDS-Section**: 3. 总体架构设计
- **DDS-Lines**: L77-L164
### Extract
#### 分层架构
| 层级 | 技术选型 | 核心职责 |
|------|----------|----------|
| **展现层** | Vue3 + TS + Vuetify | 上传文件、批次管理、预览确认、规则配置、人工确认、导入结果展示 |
| **接入层** | GIN Framework | RESTful 接口,统一参数校验、错误处理、响应封装 |
| **应用服务层** | Go Service | 编排完整业务流程ETL Pipeline不承载具体解析规则 |
| **Adapter 层** | Go Plugin Interface | 按平台解析原始文件,输出平台原始记录 DTO |
| **Normalize 层** | Go Service | 统一字段、金额、方向、时间、分类原始值 |
| **Match 层** | Go Service | 严格去重、模糊去重(多因子评分) |
| **Link 层** | Go Service | 转账闭环、订单链路聚合 |
| **Rule 层** | Go Service | 分类、账户、对手方、标签映射 |
| **Export 层** | Go Service | 适配 Firefly III / Data Importer API 或 CSV/JSON |
| **Repository 层** | GORM | 隔离数据库访问,面向领域对象持久化 |
| **数据持久层** | SQLite | 本地数据库,存储全量数据与审计链路 |
#### 数据流转拓扑
```
多源账单导入 → 解析器(Parser) → 标准化入库 → 去重与链路合并 → 规则映射 → 导出/推送
(文件上传) (Adapter层) (SQLite) (Dedup+Link) (Rule Engine) (API/CSV)
```
#### 后端包结构
```
projectmoneyx-server/
├── cmd/server/main.go # 程序入口
├── internal/
│ ├── config/ # 配置管理
│ ├── handler/ # GIN Handler接入层
│ ├── middleware/ # 中间件
│ ├── service/ # 应用服务层
│ ├── domain/ # 领域层
│ │ ├── entity/ # 领域实体
│ │ ├── valueobject/ # 值对象
│ │ └── repository/ # 仓储接口
│ ├── parser/ # Adapter 解析层(插件化)
│ ├── normalize/ # 标准化引擎
│ ├── matcher/ # 去重引擎
│ ├── linker/ # 链路合并引擎
│ ├── rule/ # 规则引擎
│ ├── exporter/ # 导出引擎
│ ├── dao/ # 数据访问层GORM 实现)
│ └── dto/ # 数据传输对象
├── migrations/ # 数据库迁移
├── web/ # 前端打包产物
└── go.mod
```
#### 前端项目结构
```
projectmoneyx-web/
├── src/
│ ├── api/ # API 调用封装
│ ├── views/ # 页面视图 (8 个核心页面)
│ ├── components/ # 可复用组件
│ ├── stores/ # Pinia 状态管理
│ ├── types/ # TypeScript 类型定义
│ ├── router/ # Vue Router
│ ├── plugins/ # Vuetify 等插件
│ ├── App.vue
│ └── main.ts
├── package.json
├── tsconfig.json
└── vite.config.ts
```
#### 模块清单
| 模块 | 包名 | 职责 | 优先级 |
|------|------|------|--------|
| 导入中心 | `import-center` | 文件上传、批次管理、来源识别 | P0 |
| 解析引擎 | `parser-engine` | 平台解析器注册、装载与执行 | P0 |
| 标准化引擎 | `normalize-engine` | 统一模型转换 | P0 |
| 去重引擎 | `dedup-engine` | 严格去重与模糊去重 | P0/P1 |
| 链路引擎 | `link-engine` | 转账闭环与订单链路合并 | P0 |
| 规则引擎 | `rule-engine` | 分类/账户/标签/商户归一化 | P0/P1 |
| 导入编排 | `import-orchestrator` | 导入预览、执行、重试 | P0 |
| 审计中心 | `audit-center` | 审计日志、处理链追溯 | P0 |
| 系统设置 | `settings-center` | Firefly 配置、阈值参数 | P1 |
#### 核心 Service 清单
```go
type ImportBatchService struct { ... } // 批次管理
type PipelineService struct { ... } // ETL 流水线编排
type ParserRegistry struct { ... } // 解析器注册中心
type TransactionNormalizeService struct { ... } // 标准化服务
type DedupMatchService struct { ... } // 去重匹配服务
type TransferLinkService struct { ... } // 转账链路合并服务
type RuleApplyService struct { ... } // 规则应用服务
type FireflyExportService struct { ... } // Firefly 导出服务
type AuditTraceService struct { ... } // 审计追溯服务
```
#### 关键设计约束
| # | 约束 | 说明 |
|---|------|------|
| 1 | 本地优先 | 财务数据敏感,必须本地部署 |
| 2 | 插件化解析器 | 平台格式变化频繁,适配逻辑隔离在 Adapter 层 |
| 3 | 统一交易模型稳定 | 避免下游 Firefly 或上游平台格式污染核心域模型 |
| 4 | 支付宝分类为标准 | 支付宝 22 种分类最丰富,其他平台映射到此体系 |
| 5 | 微信分类需推断 | 微信"交易类型"粗粒度,结合"商品"字段推断分类 |

View File

@@ -0,0 +1,156 @@
## 平台字段映射规则
- **DDS-Section**: 6.3 支付宝解析规则 + 6.4 微信解析规则 + 6.5 统一交易模型
- **DDS-Lines**: L500-L668
### Extract
#### 支付宝原始字段
```
交易时间 | 交易分类 | 交易对方 | 对方账号 | 商品说明 | 收/支 | 金额 | 收/付款方式 | 交易状态 | 交易订单号 | 商家订单号 | 备注
```
#### 支付宝字段映射表
| 原字段 | 目标字段 | 映射说明 |
|--------|----------|----------|
| 交易时间 | `trade_time` | 解析为 `time.Time`,统一 UTC+8 |
| 交易分类 | `category_raw` | 直接作为原始分类22 种标准分类) |
| 交易对方 | `counterparty` | 交易对手名称 |
| 对方账号 | `counterparty_account` | 存入扩展字段 |
| 商品说明 | `merchant_name` / `note` | 优先作为商户名,辅助作为备注 |
| 收/支 | `direction` | "收入" → income, "支出" → expense, "其他" → other |
| 金额 | `amount` | 去除 ¥ 符号,解析为 Decimal 正数 |
| 收/付款方式 | `payment_method` | 存入扩展字段,可用于账户映射 |
| 交易状态 | `trade_status` | "交易成功" / "退款成功" 等 |
| 交易订单号 | `source_record_id` / `order_id` | 去除空格,作为唯一标识 |
| 商家订单号 | `merchant_order_id` / `parent_order_id` | 可用于链路关联 |
| 备注 | `note` | 补充备注 |
#### 支付宝标准分类枚举22 类)— 全局统一基准
| 编号 | 分类名称 | 编号 | 分类名称 |
|------|----------|------|----------|
| 1 | 餐饮美食 | 12 | 退款 |
| 2 | 投资理财 | 13 | 教育培训 |
| 3 | 日用百货 | 14 | 住房物业 |
| 4 | 数码电器 | 15 | 酒店旅游 |
| 5 | 交通出行 | 16 | 文化休闲 |
| 6 | 充值缴费 | 17 | 运动户外 |
| 7 | 信用借还 | 18 | 爱车养车 |
| 8 | 转账红包 | 19 | 商业服务 |
| 9 | 生活服务 | 20 | 母婴亲子 |
| 10 | 家居家装 | 21 | 收入 |
| 11 | 医疗健康 | 22 | 其他 |
> **关键设计决策**:支付宝拥有最丰富的 22 种交易分类,系统将其作为**全局统一分类基准字典**。所有其他平台的交易分类最终都应映射到此套分类枚举。
#### 微信原始字段
```
交易时间 | 交易类型 | 交易对方 | 商品 | 收/支 | 金额(元) | 支付方式 | 当前状态 | 交易单号 | 商户单号 | 备注
```
#### 微信字段映射表
| 原字段 | 目标字段 | 映射说明 |
|--------|----------|----------|
| 交易时间 | `trade_time` | 解析为 `time.Time`,统一 UTC+8 |
| 交易类型 | `category_raw` | 存储原始类型,需通过推断映射到标准分类 |
| 交易对方 | `counterparty` | 交易对手名称 |
| 商品 | `merchant_name` / `product_desc` | **关键字段**,用于推断实际消费分类 |
| 收/支 | `direction` | "收入" → income, "支出" → expense |
| 金额(元) | `amount` | 去除 ¥ 符号,解析为 Decimal 正数 |
| 支付方式 | `payment_method` | 存入扩展字段 |
| 当前状态 | `trade_status` | "支付成功" / "已退款" 等 |
| 交易单号 | `source_record_id` / `order_id` | 去除空格 |
| 商户单号 | `merchant_order_id` / `parent_order_id` | 链路关联 |
| 备注 | `note` | 补充备注 |
#### 微信分类推断规则
微信"交易类型"多为支付动作(商户消费、扫二维码付款、转账、红包等),无实际消费语义。推断流程:
1. **交易类型 = 转账** → 转账红包
2. **交易类型 = 微信红包** → 转账红包
3. **交易类型含 "退款"** → 退款
4. **交易类型 = 商户消费/扫二维码付款** → 基于"商品"字段关键词推断:
| 微信交易类型 | 商品关键词 | 推断标准分类 |
|-------------|-----------|-------------|
| 商户消费 | 美团 / 外卖 / 餐厅 / 咖啡 / 面包 | 餐饮美食 |
| 商户消费 | 滴滴 / 打车 / 地铁 / 高铁 / 加油 | 交通出行 |
| 商户消费 | 京东 / 超市 / 便利店 / 百货 | 日用百货 |
| 商户消费 | 电费 / 水费 / 话费 / 燃气 | 充值缴费 |
| 商户消费 | 医院 / 药店 / 体检 | 医疗健康 |
| 商户消费 | 电影 / 游戏 / 书籍 | 文化休闲 |
| 商户消费 | 酒店 / 景点 / 旅行 | 酒店旅游 |
| 商户消费 | 无法识别 | 其他(待人工补充) |
> **设计原则**:若商品内容无法识别关键词,先落入"其他"分类,并允许用户通过规则管理补充映射。系统应记录未命中规则的记录,便于后续规则完善。
#### 统一交易模型 (Transaction)
```go
type Transaction struct {
ID string
TransactionID string // 业务唯一 ID
BatchID string
SourcePlatform string
SourceRecordID string
TradeTime time.Time
Amount decimal.Decimal // 正数,方向独立表达
Currency string // 默认 CNY
Direction string // income/expense/transfer/refund/fee/other
Counterparty string
MerchantName string
CategoryRaw string // 原始分类(来自平台)
CategoryMapped string // 映射后分类(规则引擎填充)
AccountMapped string // 映射账户
Tags string // 逗号分隔
OrderID string
ParentOrderID string
PaymentMethod string
Note string
RawPayload string // 原始记录完整 JSON 快照
RowFingerprint string
Status string // PENDING_CLEAN → CLEANED → ... → IMPORTED
FireflyTxnID string
}
```
#### 标准化规则
| 规则项 | 说明 |
|--------|------|
| 时间 | 统一存储为 `Asia/Shanghai (UTC+8)` |
| 金额 | 统一使用正数(`decimal(18,6)`),方向独立用 `direction` 表达 |
| 币种 | 默认 `CNY` |
| 状态 | 初始为 `PENDING_CLEAN` |
| 原始快照 | 完整写入 `raw_payload`JSON确保审计可追溯 |
| 指纹 | 对关键字段做 SHA256用于严格去重 |
#### Direction 枚举
| 值 | 说明 |
|----|------|
| `INCOME` | 收入 |
| `EXPENSE` | 支出 |
| `TRANSFER` | 内部转账 |
| `REFUND` | 退款 |
| `FEE` | 手续费 |
| `OTHER` | 其他 |
#### TransactionStatus 枚举
| 值 | 说明 |
|----|------|
| `PENDING_CLEAN` | 待清洗(标准化完成) |
| `CLEANED` | 已清洗 |
| `PENDING_REVIEW` | 待人工确认(模糊去重疑似) |
| `READY_TO_IMPORT` | 可导入 |
| `IMPORTING` | 导入中 |
| `IMPORTED` | 已导入 |
| `FAILED` | 导入失败 |
| `DUPLICATE` | 重复记录 |

View File

@@ -0,0 +1,90 @@
## 解析器接口设计
- **DDS-Section**: 6.1 解析器接口设计 + 6.2 解析器注册中心
- **DDS-Lines**: L438-L498
### Extract
#### BillParser 接口
```go
// BillParser 是所有平台解析器必须实现的接口
type BillParser interface {
// Platform 返回平台标识符,如 "alipay", "wechat", "ccb"
Platform() string
// Detect 根据文件元信息和表头判断是否为本平台文件
Detect(fileMeta FileMeta, header []string) bool
// Parse 解析指定文件,返回原始记录列表
Parse(ctx context.Context, reader io.Reader) ([]RawBillRecord, error)
}
```
#### FileMeta 文件元信息
```go
type FileMeta struct {
FileName string
FileType string // csv, xlsx, txt
FileHash string
FileSize int64
}
```
#### RawBillRecord 原始账单记录
```go
type RawBillRecord struct {
SourcePlatform string
SourceRecordID string
RawFields map[string]string // 原始 K-V 字段
RowNo int
RowFingerprint string
}
```
#### Registry 解析器注册中心
```go
type Registry struct {
parsers []BillParser
}
func NewRegistry() *Registry {
r := &Registry{}
// 注册所有解析器
r.Register(&alipay.Parser{})
r.Register(&wechat.Parser{})
r.Register(&ccb.Parser{})
r.Register(&icbc.Parser{})
return r
}
// Detect 自动检测文件对应的解析器
func (r *Registry) Detect(meta FileMeta, header []string) (BillParser, error) {
for _, p := range r.parsers {
if p.Detect(meta, header) {
return p, nil
}
}
return nil, ErrUnknownPlatform
}
```
#### 新增解析器步骤
1.`parser/<platform>/` 目录下创建 `<platform>_parser.go`
2. 实现 `BillParser` 接口的 3 个方法
3.`parser/registry.go``NewRegistry()` 中调用 `r.Register()`
4. `Detect()` 方法基于文件名特征或 CSV 表头关键词判定
5. `Parse()` 逐行读取文件,填充 `RawBillRecord.RawFields`
#### V1.0 支持的平台
| 平台 | 标识符 | 文件格式 | 优先级 |
|------|--------|----------|--------|
| 支付宝 | `alipay` | CSV | P0优先 |
| 微信支付 | `wechat` | CSV | P0优先 |
| 建设银行 | `ccb` | CSV/Excel | P1次优先 |
| 工商银行 | `icbc` | CSV/Excel | P1次优先 |

View File

@@ -0,0 +1,41 @@
## 行指纹生成算法
- **DDS-Section**: 7.2 严格去重 — 行指纹算法
- **DDS-Lines**: L709-L726
### Extract
#### 指纹生成函数
```go
func GenerateRowFingerprint(t *Transaction) string {
raw := fmt.Sprintf("%s|%s|%s|%s|%s|%s",
t.TradeTime.Format("2006-01-02 15:04"), // 分钟粒度(非秒级)
t.Amount.String(),
t.Direction,
normalizeString(t.Counterparty),
normalizeString(t.MerchantName),
t.OrderID,
)
hash := sha256.Sum256([]byte(raw))
return hex.EncodeToString(hash[:])
}
```
#### 关键设计决策
- **分钟粒度**:使用 `2006-01-02 15:04` 格式(精确到分钟),不是 `15:04:05`(秒级)
- **原因**:不同平台对同一交易的记录时间可能有秒级差异(如支付宝记录 10:30:15微信记录 10:30:22分钟粒度可以容忍这种差异
- **normalizeString** 处理:去除前后空格、全角转半角、统一大小写
- **SHA256 输出**64 字符的十六进制字符串
#### 参与指纹的字段
| 字段 | 处理方式 |
|------|----------|
| `TradeTime` | `Format("2006-01-02 15:04")` — 分钟粒度 |
| `Amount` | `Decimal.String()` — 标准化数字字符串 |
| `Direction` | 原值income/expense/... |
| `Counterparty` | `normalizeString()` — 去空格、标准化 |
| `MerchantName` | `normalizeString()` — 去空格、标准化 |
| `OrderID` | 原值(已在 Parser 中去除空格) |

View File

@@ -0,0 +1,85 @@
## 模糊去重(多因子评分 — P1 阶段)
- **DDS-Section**: 7.3 模糊去重(多因子评分 — P1 阶段)
- **DDS-Lines**: L755-L818
### Extract
#### 多因子评分模型
| 因子 | 分值 | 评分说明 |
|------|------|----------|
| 时间在 ±5 分钟内 | 30 | 时间差越小得分越高,超出窗口直接 0 分 |
| 金额精确一致 | 30 | 金额一致得满分,差额在手续费容差内得部分分(20分) |
| 交易方向一致 | 10 | 方向相同得满分 |
| 订单号相同/相近 | 15 | 完全一致 15 分,包含关系 10 分 |
| 对手方相似 | 10 | Levenshtein 相似度 + contains 判定 |
| 来源关联规则命中 | 5 | 预配置的平台关联规则 |
**总分 = 100 分**
#### 判定阈值(可配置)
| 分值范围 | 判定结果 | 处理方式 |
|----------|----------|----------|
| ≥ 85 | 自动判定重复 | 自动标记 DUPLICATE |
| 60 ~ 84 | 疑似重复 | 标记 PENDING_REVIEW进入人工确认队列 |
| < 60 | 不判定重复 | 保留独立交易 |
#### 评分算法骨架
```go
type FuzzyScorer struct {
TimeWindow time.Duration // 默认 5 分钟
AmountEpsilon float64 // 金额容差(手续费)
}
func (s *FuzzyScorer) Score(a, b *Transaction) int {
score := 0
// 时间因子 (30分) — 线性衰减
timeDiff := math.Abs(a.TradeTime.Sub(b.TradeTime).Minutes())
if timeDiff <= s.TimeWindow.Minutes() {
score += int(30 * (1 - timeDiff/s.TimeWindow.Minutes()))
}
// 金额因子 (30分)
if a.Amount.Equal(b.Amount) {
score += 30
} else if a.Amount.Sub(b.Amount).Abs().LessThan(decimal.NewFromFloat(s.AmountEpsilon)) {
score += 20
}
// 方向因子 (10分)
if a.Direction == b.Direction { score += 10 }
// 订单号因子 (15分)
if a.OrderID != "" && a.OrderID == b.OrderID {
score += 15
} else if strings.Contains(a.OrderID, b.OrderID) || strings.Contains(b.OrderID, a.OrderID) {
score += 10
}
// 对手方因子 (10分) — Levenshtein 相似度
score += int(10 * counterpartySimilarity(a.Counterparty, b.Counterparty))
// 来源规则因子 (5分)
if s.platformLinked(a.SourcePlatform, b.SourcePlatform) { score += 5 }
return score
}
```
#### 配置项
| 配置键 | 默认值 | 说明 |
|--------|--------|------|
| `fuzzy_time_window` | 5 | 模糊匹配时间窗口分钟 |
| `fuzzy_threshold_high` | 85 | 自动判定重复阈值 |
| `fuzzy_threshold_low` | 60 | 疑似重复阈值 |
| `amount_epsilon` | 0.01 | 金额容差手续费 |
#### 性能优化
- `trade_time` 时间分桶避免全表 O(N²) 扫描
- 使用索引 `idx_txn_trade_time` 加速时间范围查询

View File

@@ -0,0 +1,48 @@
## 严格去重(精确匹配)
- **DDS-Section**: 7.2 严格去重(基础去重 — 精确匹配)
- **DDS-Lines**: L697-L753
### Extract
#### 三级唯一性判定键(按优先级)
| 优先级 | 判定键 | 适用场景 |
|--------|--------|----------|
| 1 | `source_platform` + `source_record_id` | 同一平台重复导入 |
| 2 | `source_file_hash` + `row_fingerprint` | 同一文件重复上传 |
| 3 | `order_id`(若可信) | 跨批次订单号匹配 |
#### 执行流程
```go
func (s *StrictDedup) Execute(ctx context.Context, txns []*Transaction) ([]*Transaction, error) {
var result []*Transaction
for _, txn := range txns {
// 判定键 1: platform + record_id
exists, existingID := s.repo.FindByPlatformAndRecordID(
ctx, txn.SourcePlatform, txn.SourceRecordID)
if exists {
s.createDedupRelation(ctx, txn.ID, existingID, "strict", 100)
txn.Status = "DUPLICATE"
continue
}
// 判定键 2: file_hash + fingerprint
exists, existingID = s.repo.FindByFingerprint(ctx, txn.RowFingerprint)
if exists {
s.createDedupRelation(ctx, txn.ID, existingID, "strict", 100)
txn.Status = "DUPLICATE"
continue
}
result = append(result, txn)
}
return result, nil
}
```
#### 实现要点
- 命中时创建 `dedup_relation`relation_type=`strict`, confidence=100
- 被判重的交易 status 设为 `DUPLICATE`
- 只返回未命中的交易进入下一阶段
- 使用数据库索引 `idx_txn_platform_record``idx_txn_fingerprint` 加速查询

View File

@@ -0,0 +1,78 @@
## 转账闭环识别
- **DDS-Section**: 7.4 链路合并(转账闭环 + 订单链路)
- **DDS-Lines**: L820-L871
### Extract
#### 典型场景
银行卡支出 1000 元(流向支付宝),支付宝收入 1000 元 → 合并为一笔内部转账。
#### 转账闭环识别规则5 项全部满足)
| # | 条件 | 说明 |
|---|------|------|
| 1 | 金额一致 | `a.Amount.Equal(b.Amount)` |
| 2 | 方向互补 | 一条 expense + 一条 income |
| 3 | 时间窗口内 | 默认 ±30 分钟(`transfer_time_window` 可配置) |
| 4 | 不同平台 | `a.SourcePlatform != b.SourcePlatform` |
| 5 | 非退款/手续费 | `direction` 不是 `refund``fee` |
#### 实现骨架
```go
func (l *TransferLinker) Detect(a, b *Transaction) *LinkResult {
// 条件 1: 金额一致
if !a.Amount.Equal(b.Amount) { return nil }
// 条件 2: 方向互补
if !((a.Direction == "expense" && b.Direction == "income") ||
(a.Direction == "income" && b.Direction == "expense")) { return nil }
// 条件 3: 时间窗口
if math.Abs(a.TradeTime.Sub(b.TradeTime).Minutes()) > l.TimeWindow { return nil }
// 条件 4: 不同平台
if a.SourcePlatform == b.SourcePlatform { return nil }
// 条件 5: 非退款/手续费
if a.Direction == "refund" || b.Direction == "refund" { return nil }
if a.Direction == "fee" || b.Direction == "fee" { return nil }
return &LinkResult{
ParentTransactionID: selectPrimary(a, b).ID,
ChildTransactionID: selectSecondary(a, b).ID,
LinkType: "transfer",
FromAccount: mapToAccount(getExpenseSide(a, b)),
ToAccount: mapToAccount(getIncomeSide(a, b)),
}
}
```
#### 订单链路合并
**典型场景**:京东订单 + 微信支付,一笔真实消费产生多条流水。
**合并策略**
- 保留更完整的业务记录为主交易(优先保留有商品详情的记录)
- 其他记录挂为关联来源
- 形成 `parent_order_id` 聚合链路
#### LinkResult 数据结构
```go
type LinkResult struct {
ParentTransactionID string
ChildTransactionID string
LinkType string // transfer / order / refund / fee
FromAccount string
ToAccount string
}
```
#### 配置项
| 配置键 | 默认值 | 说明 |
|--------|--------|------|
| `transfer_time_window` | 30 | 转账闭环时间窗口(分钟) |

View File

@@ -0,0 +1,63 @@
## 规则条件 JSON 格式
- **DDS-Section**: 8.2 规则匹配条件
- **DDS-Lines**: L938-L967
### Extract
#### 条件匹配维度
| 条件类型 | 字段 | 说明 | 示例 |
|----------|------|------|------|
| 平台过滤 | `platform` | 指定生效平台 | `"alipay"` |
| 原始分类 | `category_raw` | 原始分类匹配 | `"餐饮美食"` |
| 关键词 | `keywords` | 商品/商户名关键词 | `["美团", "外卖"]` |
| 正则 | `regex` | 正则表达式匹配 | `"^滴滴.*出行$"` |
| 金额范围 | `amount_range` | [min, max] | `[0, 50]` |
| 方向 | `direction` | 收支方向 | `"expense"` |
| 对手方 | `counterparty` | 对手方包含 | `"支付宝"` |
#### JSON 结构示例
```json
{
"platform": "wechat",
"conditions": {
"category_raw": "商户消费",
"keywords": ["美团", "外卖", "饿了么"],
"direction": "expense"
},
"actions": {
"category_mapped": "餐饮美食",
"merchant_normalized": "外卖平台"
}
}
```
#### Rule 数据模型
```go
type Rule struct {
ID string // UUID
RuleType string // 规则类型枚举
Priority int // 优先级(越小越高)
PlatformScope string // 平台范围all / alipay / wechat
ConditionsJSON string // 条件 JSON
ActionsJSON string // 动作 JSON
Enabled bool // 是否启用
Description string // 规则描述
CreatedAt time.Time
UpdatedAt time.Time
}
```
#### RuleType 枚举
| 类型 | 说明 |
|------|------|
| `COUNTERPARTY_NORMALIZE` | 对手方归一化 |
| `MERCHANT_NORMALIZE` | 商户名归一化 |
| `CATEGORY_MAPPING` | 分类映射 |
| `ACCOUNT_MAPPING` | 账户映射 |
| `TAG_MAPPING` | 标签映射 |
| `FIREFLY_FIELD_MAPPING` | Firefly 字段映射 |

View File

@@ -0,0 +1,89 @@
## 规则执行顺序与可解释性
- **DDS-Section**: 8.3 规则执行顺序 + 8.4 规则引擎核心实现 + 8.5 可解释性设计
- **DDS-Lines**: L969-L1045
### Extract
#### 固定执行顺序(不可变)
```
1. 对手方归一化 (COUNTERPARTY_NORMALIZE)
2. 商户归一化 (MERCHANT_NORMALIZE)
3. 分类映射 (CATEGORY_MAPPING)
4. 账户映射 (ACCOUNT_MAPPING)
5. 标签映射 (TAG_MAPPING)
6. Firefly 字段映射 (FIREFLY_FIELD_MAPPING)
```
**设计原因**:先做归一再做分类,可提升规则命中率与稳定性。例如先将"美团外卖-北京"归一为"美团外卖",再匹配分类规则"美团 → 餐饮美食"。
#### 执行原则
- 同一类型内按 `priority` 升序执行(数字越小优先级越高)
- **首条命中即停止**(同类型中第一个匹配的规则生效,后续不再匹配)
- 每条交易记录所有命中的规则 ID 和前后字段对比
#### 规则引擎核心实现
```go
type Engine struct {
ruleRepo repository.RuleRepo
hitRepo repository.RuleHitRepo
}
func (e *Engine) Apply(ctx context.Context, txns []*Transaction) error {
ruleGroups := e.loadRulesGroupByType(ctx)
executionOrder := []string{
"COUNTERPARTY_NORMALIZE",
"MERCHANT_NORMALIZE",
"CATEGORY_MAPPING",
"ACCOUNT_MAPPING",
"TAG_MAPPING",
"FIREFLY_FIELD_MAPPING",
}
for _, txn := range txns {
for _, ruleType := range executionOrder {
rules := ruleGroups[ruleType]
for _, rule := range rules {
if !rule.MatchPlatform(txn.SourcePlatform) { continue }
if rule.MatchConditions(txn) {
before := txn.Snapshot()
rule.ApplyActions(txn)
after := txn.Snapshot()
e.hitRepo.Save(ctx, &RuleHit{
TransactionID: txn.ID,
RuleID: rule.ID,
MatchedCondition: rule.ConditionsJSON,
BeforeValue: before,
AfterValue: after,
})
break // 同类型首条命中即停止
}
}
}
}
return nil
}
```
#### 可解释性设计 — RuleHit 审计
每条交易保留完整的规则命中记录:
| 信息项 | 说明 |
|--------|------|
| 命中规则 ID | 关联 rules 表 |
| 命中条件摘要 | 匹配的具体关键词/正则 |
| 变更前值 | 规则执行前的字段值 |
| 变更后值 | 规则执行后的字段值 |
| 命中时间 | 规则执行时间戳 |
用于前端"为何被分到餐饮/交通"的解释展示。

View File

@@ -0,0 +1,252 @@
## 数据库表结构设计
- **DDS-Section**: 10. 数据库详细设计SQLite + GORM
- **DDS-Lines**: L1135-L1347
### Extract
#### ER 关系总览
```
IMPORT_BATCHES ──1:N──> SOURCE_FILES ──1:N──> RAW_RECORDS
IMPORT_BATCHES ──1:N──> TRANSACTIONS
RAW_RECORDS ──1:1──> TRANSACTIONS (normalizes_to)
TRANSACTIONS ──1:N──> DEDUP_RELATIONS
TRANSACTIONS ──1:N──> LINK_RELATIONS
TRANSACTIONS ──1:N──> RULE_HITS
RULES ──1:N──> RULE_HITS
IMPORT_BATCHES ──1:N──> IMPORT_TASKS ──1:N──> IMPORT_RESULTS
TRANSACTIONS ──1:N──> AUDIT_LOGS
```
**11 张核心表**
#### 表结构定义
##### 1. IMPORT_BATCHES — 导入批次
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | varchar(36) | PK | UUID 主键 |
| status | varchar(32) | | 批次状态 |
| total_files | int | | 文件总数 |
| total_records | int | | 记录总数 |
| success_count | int | | 成功数 |
| failed_count | int | | 失败数 |
| duplicate_count | int | | 重复数 |
| created_at | datetime | | 创建时间 |
| updated_at | datetime | | 更新时间 |
##### 2. SOURCE_FILES — 源文件
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | varchar(36) | PK | UUID 主键 |
| batch_id | varchar(36) | FK | 批次 ID |
| file_name | varchar(255) | | 原始文件名 |
| file_hash | varchar(64) | INDEX | 文件 SHA256 哈希 |
| source_platform | varchar(32) | | 来源平台 |
| file_type | varchar(16) | | csv/xlsx/txt |
| file_size | int | | 文件大小(bytes) |
| uploaded_at | datetime | | 上传时间 |
##### 3. RAW_RECORDS — 原始记录
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | varchar(36) | PK | UUID 主键 |
| source_file_id | varchar(36) | FK | 来源文件 ID |
| row_no | int | | 行号 |
| source_platform | varchar(32) | | 平台 |
| source_record_id | varchar(128) | | 原始流水号 |
| row_fingerprint | varchar(64) | INDEX | 行指纹 SHA256 |
| raw_payload | text | | 原始 JSON 快照 |
| parse_status | varchar(32) | | 解析状态 |
| parse_error | text | | 错误信息 |
##### 4. TRANSACTIONS — 统一交易记录(核心表)
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | varchar(36) | PK | UUID 主键 |
| transaction_id | varchar(64) | UNIQUE | 业务唯一 ID |
| batch_id | varchar(36) | INDEX | 导入批次 |
| raw_record_id | varchar(36) | | 原始记录 ID |
| source_platform | varchar(32) | INDEX(组合) | 来源平台 |
| source_record_id | varchar(128) | INDEX(组合) | 原始记录号 |
| trade_time | datetime | INDEX, NOT NULL | 交易时间 |
| amount | decimal(18,6) | NOT NULL | 金额 |
| currency | varchar(16) | DEFAULT 'CNY' | 币种 |
| direction | varchar(16) | NOT NULL | 方向 |
| counterparty | varchar(255) | | 对手方 |
| merchant_name | varchar(255) | | 商户名 |
| category_raw | varchar(128) | | 原始分类 |
| category_mapped | varchar(128) | | 映射分类 |
| account_mapped | varchar(128) | | 映射账户 |
| tags | varchar(512) | | 标签(逗号分隔) |
| order_id | varchar(128) | INDEX | 订单号 |
| parent_order_id | varchar(128) | | 父链路号 |
| payment_method | varchar(128) | | 支付方式 |
| note | text | | 备注 |
| raw_payload | text | | 原始记录 JSON |
| row_fingerprint | varchar(64) | INDEX | 行指纹 |
| status | varchar(32) | INDEX, DEFAULT 'PENDING_CLEAN' | 状态 |
| firefly_txn_id | varchar(128) | | Firefly 交易 ID |
| imported_at | datetime | | 导入时间 |
| created_at | datetime | | 创建时间 |
| updated_at | datetime | | 更新时间 |
##### 5. DEDUP_RELATIONS — 去重关系
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | varchar(36) | PK | UUID 主键 |
| src_transaction_id | varchar(36) | FK, INDEX | 原交易 ID |
| target_transaction_id | varchar(36) | FK | 目标交易 ID |
| relation_type | varchar(16) | | strict/fuzzy |
| confidence | int | | 置信度 0-100 |
| status | varchar(16) | | auto/confirmed/rejected |
| reason_json | text | | 判定依据 JSON |
| created_at | datetime | | 创建时间 |
##### 6. LINK_RELATIONS — 链路关系
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | varchar(36) | PK | UUID 主键 |
| parent_transaction_id | varchar(36) | FK | 主交易 ID |
| child_transaction_id | varchar(36) | FK | 子交易 ID |
| link_type | varchar(16) | | transfer/order/refund/fee |
| reason_json | text | | 关联依据 JSON |
| created_at | datetime | | 创建时间 |
##### 7. RULES — 规则定义
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | varchar(36) | PK | UUID 主键 |
| rule_type | varchar(32) | INDEX(组合) | 规则类型 |
| priority | int | INDEX(组合) | 优先级 |
| platform_scope | varchar(32) | | 平台范围 |
| conditions_json | text | | 条件 JSON |
| actions_json | text | | 动作 JSON |
| enabled | boolean | | 是否启用 |
| description | varchar(255) | | 规则描述 |
| created_at | datetime | | 创建时间 |
| updated_at | datetime | | 更新时间 |
##### 8. RULE_HITS — 规则命中记录
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | varchar(36) | PK | UUID 主键 |
| transaction_id | varchar(36) | FK | 交易 ID |
| rule_id | varchar(36) | FK | 规则 ID |
| matched_condition | text | | 命中条件摘要 |
| before_value | text | | 变更前值 |
| after_value | text | | 变更后值 |
| created_at | datetime | | 执行时间 |
##### 9. IMPORT_TASKS — 导入任务
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | varchar(36) | PK | UUID 主键 |
| batch_id | varchar(36) | FK | 批次 ID |
| export_mode | varchar(16) | | api/csv |
| status | varchar(32) | | pending/running/success/partial_failed/failed |
| total_count | int | | 总记录数 |
| success_count | int | | 成功数 |
| failed_count | int | | 失败数 |
| started_at | datetime | | 开始时间 |
| finished_at | datetime | | 完成时间 |
##### 10. IMPORT_RESULTS — 导入结果
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | varchar(36) | PK | UUID 主键 |
| task_id | varchar(36) | FK | 任务 ID |
| transaction_id | varchar(36) | FK | 交易 ID |
| status | varchar(16) | | success/failed |
| error_code | varchar(32) | | 错误码 |
| error_message | text | | 错误描述 |
| firefly_txn_id | varchar(128) | | Firefly 返回 ID |
| retry_count | int | | 重试次数 |
| created_at | datetime | | 创建时间 |
##### 11. AUDIT_LOGS — 审计日志
| 字段 | 类型 | 约束 | 说明 |
|------|------|------|------|
| id | varchar(36) | PK | UUID 主键 |
| entity_type | varchar(64) | | 实体类型 |
| entity_id | varchar(36) | | 实体 ID |
| action | varchar(32) | | 操作类型 |
| before_snapshot | text | | 变更前快照 |
| after_snapshot | text | | 变更后快照 |
| operator | varchar(64) | | 操作者 |
| created_at | datetime | | 操作时间 |
#### GORM Model 示例Transaction
```go
type Transaction struct {
ID string `gorm:"primaryKey;type:varchar(36)"`
TransactionID string `gorm:"uniqueIndex;type:varchar(64)"`
BatchID string `gorm:"index;type:varchar(36)"`
RawRecordID string `gorm:"type:varchar(36)"`
SourcePlatform string `gorm:"type:varchar(32);index:idx_platform_record"`
SourceRecordID string `gorm:"type:varchar(128);index:idx_platform_record"`
TradeTime time.Time `gorm:"index;not null"`
Amount decimal.Decimal `gorm:"type:decimal(18,6);not null"`
Currency string `gorm:"type:varchar(16);default:'CNY'"`
Direction string `gorm:"type:varchar(16);not null"`
Counterparty string `gorm:"type:varchar(255)"`
MerchantName string `gorm:"type:varchar(255)"`
CategoryRaw string `gorm:"type:varchar(128)"`
CategoryMapped string `gorm:"type:varchar(128)"`
AccountMapped string `gorm:"type:varchar(128)"`
Tags string `gorm:"type:varchar(512)"`
OrderID string `gorm:"index;type:varchar(128)"`
ParentOrderID string `gorm:"type:varchar(128)"`
PaymentMethod string `gorm:"type:varchar(128)"`
Note string `gorm:"type:text"`
RawPayload string `gorm:"type:text"`
RowFingerprint string `gorm:"index;type:varchar(64)"`
Status string `gorm:"index;type:varchar(32);default:'PENDING_CLEAN'"`
FireflyTxnID string `gorm:"type:varchar(128)"`
ImportedAt *time.Time
CreatedAt time.Time
UpdatedAt time.Time
}
func (Transaction) TableName() string { return "transactions" }
```
#### SQLite 性能配置
```go
func initDB(dbPath string) *gorm.DB {
db, _ := gorm.Open(sqlite.Open(dbPath), &gorm.Config{})
sqlDB, _ := db.DB()
sqlDB.SetMaxOpenConns(1) // SQLite 单写
sqlDB.SetMaxIdleConns(10)
db.Exec("PRAGMA journal_mode=WAL")
db.Exec("PRAGMA synchronous=NORMAL")
db.Exec("PRAGMA cache_size=-64000") // 64MB cache
return db
}
```
#### 事务边界设计
```
事务 1: 文件入库 + 原始记录入库
事务 2: 标准化结果落库
事务 3: 去重/链路关系落库
事务 4: 规则命中落库
事务 5: 导入结果落库
```
每阶段独立事务,使用 `CreateInBatches` 每批 500 条。

View File

@@ -0,0 +1,25 @@
## 关键索引设计
- **DDS-Section**: 10.2 关键索引设计
- **DDS-Lines**: L1296-L1310
### Extract
| 表名 | 索引名 | 索引列 | 用途 |
|------|--------|--------|------|
| `transactions` | `idx_txn_platform_record` | `source_platform, source_record_id` | 严格去重判定键 1 |
| `transactions` | `idx_txn_fingerprint` | `row_fingerprint` | 严格去重判定键 2 |
| `transactions` | `idx_txn_batch` | `batch_id` | 批次查询 |
| `transactions` | `idx_txn_trade_time` | `trade_time` | 模糊去重时间分桶 |
| `transactions` | `idx_txn_order` | `order_id` | 订单号匹配 |
| `transactions` | `idx_txn_status` | `status` | 状态过滤 |
| `source_files` | `idx_sf_hash` | `file_hash` | 文件重复上传拦截 |
| `raw_records` | `idx_rr_fingerprint` | `row_fingerprint` | 行级去重 |
| `rules` | `idx_rule_type_priority` | `rule_type, priority` | 规则执行顺序 |
| `dedup_relations` | `idx_dedup_src` | `src_transaction_id` | 去重关系查询 |
#### 性能说明
- `idx_txn_platform_record` 是严格去重最频繁命中的索引,组合索引比两个单列索引效率更高
- `idx_txn_trade_time` 用于模糊去重的时间分桶,避免全表 O(N²) 扫描
- `idx_rule_type_priority` 确保规则按类型分组、按优先级有序加载

View File

@@ -0,0 +1,87 @@
## API 接口目录
- **DDS-Section**: 11. API 接口设计GIN RESTful
- **DDS-Lines**: L1350-L1451
### Extract
#### 导入中心 API
| 方法 | 路径 | 说明 |
|------|------|------|
| `POST` | `/api/v1/import/batches` | 上传账单文件创建批次multipart/form-data |
| `GET` | `/api/v1/import/batches` | 获取批次列表 |
| `GET` | `/api/v1/import/batches/:batchId` | 获取批次详情 |
| `POST` | `/api/v1/import/batches/:batchId/process` | 触发解析与清洗流水线 |
| `GET` | `/api/v1/import/batches/:batchId/preview` | 获取清洗预览结果 |
| `DELETE` | `/api/v1/import/batches/:batchId` | 删除批次 |
##### 上传文件接口详情
```
POST /api/v1/import/batches
Content-Type: multipart/form-data
参数:
- files[] (必填) 账单文件
- sourcePlatform (可选) 指定来源平台: "alipay", "wechat"
- autoDetect (可选) 是否自动识别, 默认 true
响应:
{
"code": 0,
"message": "ok",
"data": {
"batchId": "550e8400-e29b-41d4-a716-446655440000",
"status": "UPLOADED",
"filesCount": 2,
"detectedPlatforms": ["alipay", "wechat"]
}
}
```
#### 交易记录 API
| 方法 | 路径 | 说明 |
|------|------|------|
| `GET` | `/api/v1/transactions` | 分页查询交易记录 |
| `GET` | `/api/v1/transactions/:id` | 获取交易详情(含规则命中记录) |
| `GET` | `/api/v1/transactions/:id/trace` | 获取交易完整处理链路 |
#### 去重确认 API
| 方法 | 路径 | 说明 |
|------|------|------|
| `GET` | `/api/v1/dedup/reviews` | 获取疑似重复列表 |
| `GET` | `/api/v1/dedup/reviews/:reviewId` | 获取重复详情(含评分因子) |
| `POST` | `/api/v1/dedup/reviews/:reviewId/confirm` | 确认合并 |
| `POST` | `/api/v1/dedup/reviews/:reviewId/reject` | 拒绝合并 |
#### 规则管理 API
| 方法 | 路径 | 说明 |
|------|------|------|
| `GET` | `/api/v1/rules` | 获取规则列表(支持类型/平台过滤) |
| `POST` | `/api/v1/rules` | 创建规则 |
| `PUT` | `/api/v1/rules/:id` | 更新规则 |
| `DELETE` | `/api/v1/rules/:id` | 删除规则 |
| `POST` | `/api/v1/rules/evaluate` | 重新评估规则(修改规则后触发) |
| `POST` | `/api/v1/rules/:id/test` | 测试规则命中预览 |
#### 导入/导出 API
| 方法 | 路径 | 说明 |
|------|------|------|
| `POST` | `/api/v1/import/tasks` | 创建导入任务(确认导入到 Firefly |
| `GET` | `/api/v1/import/tasks/:taskId` | 获取导入任务详情 |
| `POST` | `/api/v1/import/tasks/:taskId/retry` | 重试失败项 |
| `GET` | `/api/v1/export/csv/:batchId` | 导出批次为 CSV 文件 |
#### 审计与系统 API
| 方法 | 路径 | 说明 |
|------|------|------|
| `GET` | `/api/v1/audit/logs` | 获取操作日志列表 |
| `GET` | `/api/v1/settings` | 获取系统配置 |
| `PUT` | `/api/v1/settings` | 更新系统配置Firefly 连接等) |
| `POST` | `/api/v1/settings/test-connection` | 测试 Firefly III 连接 |

View File

@@ -0,0 +1,57 @@
## 统一响应格式与错误码
- **DDS-Section**: 11.1 统一响应格式 + 14.4 错误处理策略
- **DDS-Lines**: L1352-L1369, L1787-L1816
### Extract
#### 统一响应结构
```go
// 普通响应
type Response struct {
Code int `json:"code"` // 0=成功, 非0=错误码
Message string `json:"message"` // 成功/错误说明
Data interface{} `json:"data"` // 业务数据
}
// 分页响应
type PageResponse struct {
Code int `json:"code"`
Message string `json:"message"`
Data interface{} `json:"data"`
Total int64 `json:"total"`
Page int `json:"page"`
Size int `json:"size"`
}
```
#### 错误码定义
| 错误码 | 常量名 | 说明 |
|--------|--------|------|
| 0 | `ErrCodeSuccess` | 成功 |
| 40000 | `ErrCodeBadRequest` | 通用参数错误 |
| 40001 | `ErrCodeFileParseError` | 文件解析失败 |
| 40002 | `ErrCodeUnknownPlatform` | 未识别的平台来源 |
| 40003 | `ErrCodeDuplicateFile` | 重复上传文件 |
| 40004 | `ErrCodeRuleInvalid` | 规则定义无效 |
| 40005 | `ErrCodeExportFailed` | 导出/推送失败 |
| 50000 | `ErrCodeInternal` | 服务器内部错误 |
#### 全局错误处理中间件
```go
func ErrorHandler() gin.HandlerFunc {
return func(c *gin.Context) {
c.Next()
if len(c.Errors) > 0 {
err := c.Errors.Last()
c.JSON(http.StatusOK, Response{
Code: mapErrorCode(err),
Message: err.Error(),
})
}
}
}
```

View File

@@ -0,0 +1,65 @@
## Firefly III 导出适配
- **DDS-Section**: 9. Firefly III / Data Importer 适配设计
- **DDS-Lines**: L1048-L1132
### Extract
#### 两段式规则映射策略
**第一阶段 — ProjectMoneyX 负责**
1. 字段级映射:异构字段 → 统一模型
2. 业务分类映射:对手方/描述 → 分类/标签
3. 商户名归一化:别名 → 统一名称
**第二阶段 — Firefly III 负责**
1. 最后一层字段适配
2. 临时补充规则
3. 导入格式兼容
#### 导出模式
##### 模式 AAPI 推送模式(优先)
```
Export Engine → Data Importer (POST /api/v1/import) → Firefly III
返回成功/失败明细 → 更新 import_results 表
```
##### 模式 B中间文件导出模式
- 生成完全符合 Data Importer 规范的标准 CSV / JSON
- 用户手动下载后在 Data Importer 中执行导入
- 适合 API 不可用或权限受限场景
#### Firefly 交易类型映射
| 内部 Direction | Firefly Type | 说明 |
|----------------|--------------|------|
| `expense` | `withdrawal` | 支出 |
| `income` | `deposit` | 收入 |
| `transfer` | `transfer` | 内部转账 |
| `refund` | `deposit` | 退款(作为收入处理) |
| `fee` | `withdrawal` | 手续费(作为支出处理) |
#### ImportResult 数据结构
```go
type ImportResult struct {
TaskID string
TransactionID string
Status string // success / failed
ErrorCode string
ErrorMessage string
FireflyTxnID string // Firefly III 返回的交易 ID
RetryCount int
CreatedAt time.Time
}
```
#### 导入后反馈
- 导入成功/失败数量统计
- 失败原因分类展示(字段缺失、格式错误、账户不存在等)
- 失败记录可单独重试(无需整批重做)

View File

@@ -0,0 +1,24 @@
## 导入前校验清单
- **DDS-Section**: 9.4 导入前校验清单
- **DDS-Lines**: L1103-L1112
### Extract
#### 6 项必须校验
| # | 校验项 | 说明 | 失败处理 |
|---|--------|------|----------|
| 1 | 必填字段完整性 | `amount`, `trade_time`, `direction` 不可为空 | 标记为 FAILED记录具体缺失字段 |
| 2 | 金额格式合法性 | 必须为正数,精度不超过 6 位小数 | 标记为 FAILED |
| 3 | 时间格式合法性 | 必须为有效日期时间 | 标记为 FAILED |
| 4 | 账户映射完整性 | 来源平台必须有对应的 Firefly 账户映射 | 标记为 FAILED提示配置账户映射 |
| 5 | 重复导入拦截 | 检查 `transaction_id` 是否已在 Firefly III 中存在 | 跳过并标记 |
| 6 | 未确认记录检查 | 是否存在 `PENDING_REVIEW` 状态的疑似重复记录 | 阻断导入,提示先处理待确认记录 |
#### 校验实现要点
- 校验在 Export Engine 的 `validate()` 方法中统一执行
- 校验失败的记录不参与推送,但不影响其他记录
- 校验结果写入 `import_results` 表,前端可展示失败原因
- 第 6 项(未确认记录)为全局阻断性校验,有未确认记录时整批不可导入

View File

@@ -0,0 +1,82 @@
## 前端核心组件与交互
- **DDS-Section**: 12.2 页面职责与交互说明
- **DDS-Lines**: L1489-L1572
### Extract
#### 核心可复用组件
| 组件 | 文件 | 用途 |
|------|------|------|
| 文件上传器 | `FileUploader.vue` | 拖拽 + 点击上传,进度条,平台自动检测 |
| 交易表格 | `TransactionTable.vue` | `v-data-table` + 行展开 + 批量操作 |
| 规则编辑器 | `RuleEditor.vue` | 条件构建器 + 动作配置 + 测试预览 |
| 去重对比 | `DedupCompare.vue` | 左右分栏对比 + 评分因子 + 差异高亮 |
| 审计时间线 | `AuditTimeline.vue` | `v-timeline` + 快照展开 |
#### 1. FileUploader.vue
- 使用 Vuetify `v-file-input` + 自定义拖拽区域
- 文件选择后立即调用 `POST /api/v1/import/batches`
- 上传进度条实时展示
- 自动检测文件来源平台,允许手动修改
- 批量文件列表显示文件名、大小、检测到的平台
#### 2. TransactionTable.vue
- 使用 Vuetify `v-data-table` 实现分页排序
- 行展开(`expanded`)显示详情面板:原始字段 vs 标准字段对比
- 规则命中说明显示该条交易命中的具体规则
- 状态标记:以颜色标签区分 待清洗/已清洗/重复/待确认
- 筛选器:支持按来源平台、分类、方向、状态过滤
- 必须 `fixed-header` + 列宽显式指定
#### 3. DedupCompare.vue
- 左右分栏对比布局
- 差异字段高亮显示
- 评分因子可展开查看6 项因子各自得分)
- 操作按钮:确认合并 / 拒绝合并 / 暂时跳过
- 链路视图展示已识别的转账闭环和订单链路
#### 4. RuleEditor.vue
- 条件构建器:支持多条件 AND/OR 组合
- 使用 Vuetify `v-select``v-text-field``v-chip` 构建
- 按类型分组展示规则列表,支持拖拽排序优先级
- 测试按钮触发 `POST /api/v1/rules/:id/test`
- 启用/禁用一键切换
#### 5. AuditTimeline.vue
- 使用 Vuetify `v-timeline` 组件
- 每个节点可展开查看详细快照数据
- 处理链路:原始文件 → 原始记录 → 标准化 → 规则命中 → 导入结果
#### Pinia Store 清单
| Store | 文件 | 管理的状态 |
|-------|------|-----------|
| Import Store | `importStore.ts` | 批次列表、上传状态、处理进度 |
| Transaction Store | `transactionStore.ts` | 交易记录分页、筛选条件、详情 |
| Rule Store | `ruleStore.ts` | 规则列表、编辑表单、测试结果 |
#### API 模块清单
| 模块 | 文件 | 封装的 API 组 |
|------|------|-------------|
| Import API | `api/import.ts` | 批次上传、列表、详情、处理 |
| Transaction API | `api/transaction.ts` | 交易查询、详情、链路 |
| Dedup API | `api/dedup.ts` | 去重列表、确认、拒绝 |
| Rule API | `api/rule.ts` | 规则 CRUD、测试、评估 |
| Export API | `api/export.ts` | 导入任务、重试、CSV 导出 |
| Audit API | `api/audit.ts` | 审计日志查询 |
#### TypeScript 类型清单
| 文件 | 定义的类型 |
|------|-----------|
| `types/transaction.ts` | Transaction, Direction, TransactionStatus, RawRecord |
| `types/rule.ts` | Rule, RuleType, RuleCondition, RuleHit |
| `types/common.ts` | Response, PageResponse, ImportBatch, SourceFile, ImportTask |

View File

@@ -0,0 +1,95 @@
## 前端路由与页面
- **DDS-Section**: 12.3 前端路由设计 + 12.1 信息架构与导航
- **DDS-Lines**: L1454-L1628
### Extract
#### 信息架构
```
侧边导航栏 → 导入中心
→ 数据清洗
→ 去重处理
→ 规则管理
→ 导入任务
→ 数据审计
→ 系统设置
导入中心 → 文件上传页 / 批次列表页 / 批次详情页
数据清洗 → 清洗结果预览页
去重处理 → 重复记录处理页
规则管理 → 规则列表页 / 规则编辑页
导入任务 → 导入任务列表页 / 导入结果详情页
数据审计 → 审计追溯页 / 交易处理链路页
系统设置 → Firefly 连接配置 / 去重参数配置
```
#### 路由定义
```typescript
const routes = [
{ path: '/', redirect: '/import' },
{
path: '/import',
name: 'ImportCenter',
component: () => import('@/views/ImportCenterView.vue'),
meta: { title: '导入中心', icon: 'mdi-upload' }
},
{
path: '/import/batch/:batchId',
name: 'BatchDetail',
component: () => import('@/views/BatchDetailView.vue'),
meta: { title: '批次详情' }
},
{
path: '/preview/:batchId',
name: 'Preview',
component: () => import('@/views/PreviewView.vue'),
meta: { title: '清洗预览' }
},
{
path: '/dedup',
name: 'DedupReview',
component: () => import('@/views/DedupReviewView.vue'),
meta: { title: '去重处理', icon: 'mdi-content-duplicate' }
},
{
path: '/rules',
name: 'RuleConfig',
component: () => import('@/views/RuleConfigView.vue'),
meta: { title: '规则管理', icon: 'mdi-cog-outline' }
},
{
path: '/tasks',
name: 'ImportTask',
component: () => import('@/views/ImportTaskView.vue'),
meta: { title: '导入任务', icon: 'mdi-export' }
},
{
path: '/audit',
name: 'AuditTrace',
component: () => import('@/views/AuditTraceView.vue'),
meta: { title: '数据审计', icon: 'mdi-history' }
},
{
path: '/settings',
name: 'Settings',
component: () => import('@/views/SettingsView.vue'),
meta: { title: '系统设置', icon: 'mdi-tune' }
}
]
```
#### 页面文件清单
| 页面 | 文件 | 用途 |
|------|------|------|
| 导入中心 | `ImportCenterView.vue` | 文件上传 + 批次列表 |
| 批次详情 | `BatchDetailView.vue` | 单批次详情 |
| 清洗预览 | `PreviewView.vue` | 标准化结果预览 |
| 去重处理 | `DedupReviewView.vue` | 疑似重复确认 |
| 规则配置 | `RuleConfigView.vue` | 规则 CRUD |
| 导入任务 | `ImportTaskView.vue` | 导入结果展示 |
| 审计追溯 | `AuditTraceView.vue` | 全链路追溯 |
| 系统设置 | `SettingsView.vue` | Firefly 配置 |

View File

@@ -0,0 +1,86 @@
## 部署与安全设计
- **DDS-Section**: 13.2 安全设计 + 13.5 部署架构
- **DDS-Lines**: L1663-L1732
### Extract
#### 安全措施
| 安全措施 | 说明 |
|----------|------|
| 本地部署 | 默认本地运行,敏感账单数据不上传云端 |
| API Token 加密 | Firefly III API Token 使用 AES 加密存储 |
| 审计日志脱敏 | 日志中账号、订单号局部遮罩(如 `138****1234` |
| 文件安全 | 上传文件限制类型和大小(默认最大 50MB |
| CORS 配置 | 仅允许本地来源访问 API |
#### 部署架构
```
Docker 容器 / 本地部署
├── 前端静态资源 (Vue3 Build → /web)
├── 后端服务 (Go Binary :8080)
└── SQLite 数据库 (/data/projectmoneyx.db)
外部依赖(可选)
├── Firefly III (API 推送)
└── Data Importer (API 推送)
```
#### 部署方式
1. **Docker 部署**(推荐):单容器包含前后端 + SQLite
2. **二进制部署**:交叉编译为单体可执行文件,前端资源使用 Go `embed` 嵌入
#### Dockerfile 示例
```dockerfile
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY . .
RUN go build -o projectmoneyx ./cmd/server
FROM alpine:latest
WORKDIR /app
COPY --from=builder /app/projectmoneyx .
COPY --from=builder /app/web ./web
VOLUME /data
EXPOSE 8080
CMD ["./projectmoneyx", "--db", "/data/projectmoneyx.db"]
```
#### 配置管理
```go
type Config struct {
Server ServerConfig `yaml:"server"`
Database DatabaseConfig `yaml:"database"`
Firefly FireflyConfig `yaml:"firefly"`
Dedup DedupConfig `yaml:"dedup"`
}
type ServerConfig struct {
Port int `yaml:"port" default:"8080"`
Mode string `yaml:"mode" default:"release"`
}
type DatabaseConfig struct {
Path string `yaml:"path" default:"./data/projectmoneyx.db"`
}
type FireflyConfig struct {
BaseURL string `yaml:"base_url"`
APIToken string `yaml:"api_token"` // AES 加密存储
ImporterURL string `yaml:"importer_url"`
Enabled bool `yaml:"enabled"`
}
type DedupConfig struct {
FuzzyTimeWindow int `yaml:"fuzzy_time_window" default:"5"`
FuzzyThresholdHigh int `yaml:"fuzzy_threshold_high" default:"85"`
FuzzyThresholdLow int `yaml:"fuzzy_threshold_low" default:"60"`
TransferTimeWindow int `yaml:"transfer_time_window" default:"30"`
AmountEpsilon float64 `yaml:"amount_epsilon" default:"0.01"`
}
```

View File

@@ -0,0 +1,55 @@
## 性能设计
- **DDS-Section**: 13.1 性能设计 + 13.3 可维护性设计
- **DDS-Lines**: L1632-L1681
### Extract
#### 性能目标
单次导入 1 万条记录在主流程内完成解析与清洗,去重计算应在 30 秒内完成。
#### 优化策略
| 策略 | 说明 |
|------|------|
| 批量插入 | `raw_records``transactions` 使用 GORM `CreateInBatches`,每批 500 条 |
| 关键索引 | `source_platform + source_record_id``batch_id``trade_time``order_id` |
| 模糊去重分桶 | 按 `trade_time` 时间分桶,避免全表扫描 |
| 规则预筛选 | 按平台和启用状态预加载规则,减少无效匹配 |
| 异步处理 | ETL Pipeline 使用 goroutine 异步执行,前端轮询状态 |
| 连接池 | SQLite 使用 WAL 模式提升并发读写性能 |
#### SQLite 性能配置
```go
func initDB(dbPath string) *gorm.DB {
db, _ := gorm.Open(sqlite.Open(dbPath), &gorm.Config{})
sqlDB, _ := db.DB()
sqlDB.SetMaxOpenConns(1) // SQLite 单写
sqlDB.SetMaxIdleConns(10)
db.Exec("PRAGMA journal_mode=WAL")
db.Exec("PRAGMA synchronous=NORMAL")
db.Exec("PRAGMA cache_size=-64000") // 64MB cache
return db
}
```
#### 可维护性设计
| 原则 | 说明 |
|------|------|
| 解析器插件化 | 新增平台只需实现 `BillParser` 接口并注册 |
| 规则条件 JSON 化 | 规则存储为 JSON灵活扩展匹配条件 |
| 导入器解耦 | Export 层独立,可替换下游目标 |
| 分层 DTO/VO/Entity | Handler → DTO → Service → Entity → DAO |
| 事务分阶段 | 每个 ETL 阶段独立事务,避免超长事务 |
#### 可追溯性设计
| 追溯能力 | 实现方式 |
|----------|----------|
| 任一导入结果 → 原始文件 | `transaction.raw_record_id → raw_record.source_file_id → source_file` |
| 任一规则命中 → 解释说明 | `rule_hits` 表记录命中条件和前后字段值对比 |
| 任一合并操作 → 判定依据 | `dedup_relations.reason_json``link_relations.reason_json` |
| 任一操作 → 操作日志 | `audit_logs` 表记录实体变更和操作者信息 |

View File

@@ -0,0 +1,101 @@
#!/bin/bash
# verify.sh - developing-projectmoneyx Skill 结构与内容验证
set -e
PASS=0; FAIL=0
check() {
if eval "$2"; then
echo "✅ PASS: $1"; ((PASS++))
else
echo "❌ FAIL: $1"; ((FAIL++))
fi
}
SKILL_DIR="$(cd "$(dirname "$0")/.." && pwd)"
# ========================
# 结构完整性检查
# ========================
check "SKILL.md 存在" "test -f '$SKILL_DIR/SKILL.md'"
check "reference/ 目录存在" "test -d '$SKILL_DIR/reference'"
check "scripts/ 目录存在" "test -d '$SKILL_DIR/scripts'"
check "SKILL.md < 500 行" "[ $(wc -l < '$SKILL_DIR/SKILL.md') -lt 500 ]"
# ========================
# Frontmatter 检查
# ========================
check "frontmatter 包含 name" "head -20 '$SKILL_DIR/SKILL.md' | grep -q '^name:'"
check "frontmatter 包含 description" "head -20 '$SKILL_DIR/SKILL.md' | grep -q '^description:'"
check "frontmatter 包含 argument-hint" "head -20 '$SKILL_DIR/SKILL.md' | grep -q '^argument-hint:'"
check "frontmatter 包含 allowed-tools" "head -20 '$SKILL_DIR/SKILL.md' | grep -q '^allowed-tools:'"
# ========================
# 章节检查
# ========================
check "包含 Quick Context 章节" "grep -q '## Quick Context' '$SKILL_DIR/SKILL.md'"
check "包含 Plan 章节" "grep -q '## Plan' '$SKILL_DIR/SKILL.md'"
check "包含 Verify 章节" "grep -q '## Verify' '$SKILL_DIR/SKILL.md'"
check "包含 Execute 章节" "grep -q '## Execute' '$SKILL_DIR/SKILL.md'"
check "包含 Pitfalls 章节" "grep -q '## Pitfalls' '$SKILL_DIR/SKILL.md'"
check "包含 Related References 章节" "grep -q '## Related References' '$SKILL_DIR/SKILL.md'"
# ========================
# 动态注入检查
# ========================
check "包含至少 2 处动态注入命令" "[ $(grep -c '!\`' '$SKILL_DIR/SKILL.md') -ge 2 ]"
# ========================
# Pitfalls 引用 reference 检查
# ========================
check "Pitfalls 引用 reference/ 至少 2 处" "[ $(grep -A 100 '## Pitfalls' '$SKILL_DIR/SKILL.md' | grep -c 'reference/') -ge 2 ]"
# ========================
# reference 目录结构检查
# ========================
check "reference 有 01-architecture 子目录" "test -d '$SKILL_DIR/reference/01-architecture'"
check "reference 有 02-parser-engine 子目录" "test -d '$SKILL_DIR/reference/02-parser-engine'"
check "reference 有 03-dedup-engine 子目录" "test -d '$SKILL_DIR/reference/03-dedup-engine'"
check "reference 有 04-rule-engine 子目录" "test -d '$SKILL_DIR/reference/04-rule-engine'"
check "reference 有 05-database 子目录" "test -d '$SKILL_DIR/reference/05-database'"
check "reference 有 06-api-design 子目录" "test -d '$SKILL_DIR/reference/06-api-design'"
check "reference 有 07-export-engine 子目录" "test -d '$SKILL_DIR/reference/07-export-engine'"
check "reference 有 08-frontend 子目录" "test -d '$SKILL_DIR/reference/08-frontend'"
check "reference 有 09-nonfunctional 子目录" "test -d '$SKILL_DIR/reference/09-nonfunctional'"
# ========================
# reference 内容检查
# ========================
check "reference 文件含 DDS-Section 溯源" "grep -rq 'DDS-Section:' '$SKILL_DIR/reference/' 2>/dev/null"
check "reference 文件含 DDS-Lines 溯源" "grep -rq 'DDS-Lines:' '$SKILL_DIR/reference/' 2>/dev/null"
# ========================
# 关键 reference 文件存在检查
# ========================
check "system-overview.md 存在" "test -f '$SKILL_DIR/reference/01-architecture/system-overview.md'"
check "batch-state-machine.md 存在" "test -f '$SKILL_DIR/reference/01-architecture/batch-state-machine.md'"
check "parser-interface.md 存在" "test -f '$SKILL_DIR/reference/02-parser-engine/parser-interface.md'"
check "field-mappings.md 存在" "test -f '$SKILL_DIR/reference/02-parser-engine/field-mappings.md'"
check "strict-dedup.md 存在" "test -f '$SKILL_DIR/reference/03-dedup-engine/strict-dedup.md'"
check "fuzzy-dedup.md 存在" "test -f '$SKILL_DIR/reference/03-dedup-engine/fuzzy-dedup.md'"
check "transfer-link.md 存在" "test -f '$SKILL_DIR/reference/03-dedup-engine/transfer-link.md'"
check "fingerprint.md 存在" "test -f '$SKILL_DIR/reference/03-dedup-engine/fingerprint.md'"
check "rule-conditions.md 存在" "test -f '$SKILL_DIR/reference/04-rule-engine/rule-conditions.md'"
check "rule-execution.md 存在" "test -f '$SKILL_DIR/reference/04-rule-engine/rule-execution.md'"
check "db-schema.md 存在" "test -f '$SKILL_DIR/reference/05-database/db-schema.md'"
check "indexes.md 存在" "test -f '$SKILL_DIR/reference/05-database/indexes.md'"
check "api-catalog.md 存在" "test -f '$SKILL_DIR/reference/06-api-design/api-catalog.md'"
check "response-format.md 存在" "test -f '$SKILL_DIR/reference/06-api-design/response-format.md'"
check "firefly-mapping.md 存在" "test -f '$SKILL_DIR/reference/07-export-engine/firefly-mapping.md'"
check "import-validation.md 存在" "test -f '$SKILL_DIR/reference/07-export-engine/import-validation.md'"
echo ""
echo "=== 结果: $PASS PASS / $FAIL FAIL ==="
[ $FAIL -eq 0 ] && exit 0 || exit 1

View File

@@ -0,0 +1,141 @@
---
name: frontend-vue3-vuetify
description: Build production-grade Vue 3 + TypeScript + Vuetify 3 interfaces with architectural rigor. 构建生产级 Vue 3 + TypeScript + Vuetify 3 界面。Use when creating Vue components, pages, layouts, Pinia stores, or API modules. 用于创建 Vue 组件、页面、布局、Pinia 状态管理或 API 模块。Enforces strict typing, Composition API patterns, Material Design 3 aesthetics, and bulletproof data handling.
---
本技能指导构建架构严谨、类型安全、视觉精致的 Vue 3 + Vuetify 3 代码。每个组件都应该达到生产级代码库的标准——让资深工程师也引以为傲。
用户输入:$ARGUMENTS组件规格、页面需求、功能请求或架构问题
## 架构思维
动手写代码之前,先建立清晰认知:
- **组件身份**:这是页面(Page)、布局(Layout)、可复用组件(Component)、组合式函数(Composable)、状态仓库(Store),还是 API 模块?每种都有独特模式。
- **数据重力**状态住在哪里Props 向下流动Events 向上冒泡。跨组件状态用 Pinia。深层级传递用 `provide/inject`
- **滚动策略**:哪个容器拥有滚动权?永远不是 body。必须显式声明。必须可控。
- **失败模式**:数据为 `null` 时怎么办?空数组?网络超时?先为不幸路径设计。
**关键原则**:生产代码预判混乱。为一切加类型。为一切加守卫。让一切优雅降级。
## 核心信条
### TypeScript 绝对主义
- `<script setup lang="ts">` — 唯一可接受的写法
- `any` 被禁止 — 使用 `unknown` + 类型守卫、泛型、工具类型
- 每个 prop、emit、ref、API 响应都必须穿戴类型
- 类型定义放在 `@/types/`,按领域组织:`user.d.ts``order.d.ts`
### Composition API 纯粹性
- `ref``reactive``computed``watchEffect` — 掌握这四大金刚
- `shallowRef``readonly``toRaw` — 知道何时使用优化手段
- 生命周期用 `onMounted``onUnmounted` — 绝不混用 Options API
- Pinia stores类型化的 state、类型化的 getters、类型化的 actions — 无例外
### Vuetify 3 + Material Design 3
- 所有 UI 通过 Vuetify 组件实现 — UI 元素不使用原生 HTML
- 始终主题感知 — `rgb(var(--v-theme-surface))`,绝不 `#ffffff`
- `useDisplay()` 处理响应式逻辑 — 断点是一等公民
- 密度很重要 — 数据密集界面使用 `density="compact"`
### 布局哲学
```
┌─────────────────────────────────┐
│ 工具栏 (flex-shrink-0) │
├─────────────────────────────────┤
│ │
│ 内容区域 │
│ (flex-grow-1, overflow-y-auto) │
│ (min-height: 0) ← 关键! │
│ │
├─────────────────────────────────┤
│ 底部栏 (flex-shrink-0) │
└─────────────────────────────────┘
```
- **禁止 body 滚动** — 视口锁定,内容在容器中滚动
- **Flexbox 陷阱**`flex-grow-1` 子元素必须有 `min-height: 0`
- **粘性元素**:筛选栏、表头 — 滚动时始终可见
## 数据健壮性模式
将所有外部数据视为不可信:
```typescript
// 防御性访问
const userName = user?.profile?.name ?? '未知'
// 数组安全检查
const items = Array.isArray(response.data) ? response.data : []
// 模板中的存在性守卫
<template v-if="user">{{ user.name }}</template>
<v-empty-state v-else />
```
## UI 状态三位一体
每个数据驱动视图必须处理三种状态:
| 状态 | 组件 | 禁止行为 |
|------|------|----------|
| **加载中** | `v-skeleton-loader` | 显示过期数据或空白屏幕 |
| **空数据** | `v-empty-state` + 操作按钮 | 留下白茫茫一片 |
| **错误** | Snackbar + 重试选项 | 静默失败 |
## 表格与列表戒律
- 每个 `v-data-table` 都要 `fixed-header` — 没有商量余地
- 截断文本必须配 `v-tooltip` — 用户有权 hover 看到完整内容
- 100+ 条数据?用 `v-virtual-scroll` — DOM 节点数保持恒定
- 列宽显式指定 — 不玩布局抽奖
## 反模式(绝不允许)
- TypeScript 项目中出现 `.js` 文件
- 没有正当理由使用 `any`
- 硬编码颜色:`color="#1976d2"` → 应该用 `color="primary"`
- SPA 布局中出现 body 级滚动
- 表格没有固定表头
- 截断文本没有 tooltip
- 空状态真的"空空如也"
- 加载状态冻结 UI
- API 调用没有错误处理
## 参考文件
需要实现细节时查阅:
| 需求 | 文件 |
|------|------|
| 高级 TypeScript 模式 | `reference/typescript-rules.md` |
| 复杂布局结构 | `reference/layout-patterns.md` |
| API 客户端架构 | `reference/api-patterns.md` |
| 表格、列表、表单、反馈 | `reference/ui-interaction.md` |
## 项目结构
```
src/
├── api/ # Axios 实例 + 模块
├── components/ # 共享组件
├── composables/ # 可复用 hooks
├── layouts/ # 页面外壳
├── pages/ # 路由视图
├── plugins/ # Vuetify, Pinia, Router
├── store/ # Pinia stores
├── styles/ # 全局 SCSS
├── types/ # 类型定义
└── utils/ # 纯函数
```
## 输出规范
1. 陈述架构方案2-3 句话)
2. 列出要创建的文件及其用途
3. 完整实现每个文件 — 无占位符,无 TODO
4. 对照反模式清单验证
5. 指出任何假设或权衡取舍
---
记住:你不是在写"能跑的代码"。你是在写能跑、能扩展、能维护、能令人愉悦的代码。每个 `ref` 都有类型。每个边界情况都有处理。每个加载状态都很美观。这就是"生产级"的含义。

View File

@@ -0,0 +1,113 @@
// @/api/modules/user.ts
import request from '@/api'
import type { PageParams, PageResult } from '@/types/api'
// ============================================
// 类型定义
// ============================================
export interface User {
id: string
name: string
email: string
avatar?: string
status: 'active' | 'disabled'
role: 'admin' | 'user' | 'guest'
createdAt: string
updatedAt: string
}
export interface CreateUserDto {
name: string
email: string
password: string
role?: User['role']
}
export interface UpdateUserDto {
name?: string
email?: string
status?: User['status']
role?: User['role']
}
export interface UserListParams extends PageParams {
search?: string
status?: User['status']
role?: User['role']
}
// ============================================
// API 封装
// ============================================
export const userApi = {
/**
* 获取用户分页列表
*/
getPage: (params: UserListParams) =>
request.get<PageResult<User>>('/users', { params }),
/**
* 获取用户列表(无分页,用于下拉选择等场景)
*/
getList: (params?: Partial<UserListParams>) =>
request.get<User[]>('/users/list', { params }),
/**
* 获取用户详情
*/
getById: (id: string) =>
request.get<User>(`/users/${id}`),
/**
* 检查邮箱是否已存在
*/
checkEmail: (email: string) =>
request.get<{ exists: boolean }>('/users/check-email', { params: { email } }),
/**
* 创建用户
*/
create: (data: CreateUserDto) =>
request.post<User>('/users', data),
/**
* 更新用户
*/
update: (id: string, data: UpdateUserDto) =>
request.put<User>(`/users/${id}`, data),
/**
* 删除用户
*/
remove: (id: string) =>
request.delete<void>(`/users/${id}`),
/**
* 批量删除用户
*/
batchRemove: (ids: string[]) =>
request.post<void>('/users/batch-delete', { ids }),
/**
* 启用/禁用用户
*/
toggleStatus: (id: string, status: User['status']) =>
request.patch<User>(`/users/${id}/status`, { status }),
/**
* 重置用户密码
*/
resetPassword: (id: string) =>
request.post<{ tempPassword: string }>(`/users/${id}/reset-password`),
/**
* 导出用户列表
*/
export: (params?: UserListParams) =>
request.get<Blob>('/users/export', {
params,
responseType: 'blob',
}),
}

View File

@@ -0,0 +1,409 @@
<template>
<div class="d-flex flex-column h-100">
<!-- 页面头部 -->
<v-toolbar density="compact" class="flex-shrink-0">
<v-toolbar-title>订单管理</v-toolbar-title>
<v-spacer />
<v-btn
icon="mdi-refresh"
:loading="loading"
@click="fetchData"
/>
<v-btn
color="primary"
prepend-icon="mdi-download"
:loading="exporting"
@click="exportData"
>
导出
</v-btn>
</v-toolbar>
<!-- 筛选栏 - 粘性定位 -->
<v-sheet class="flex-shrink-0 pa-4 sticky-filter">
<v-row dense>
<v-col cols="12" sm="6" md="3">
<v-text-field
v-model="filters.search"
label="搜索订单号/客户名"
prepend-inner-icon="mdi-magnify"
clearable
hide-details
@update:model-value="debouncedFetch"
/>
</v-col>
<v-col cols="12" sm="6" md="2">
<v-select
v-model="filters.status"
:items="statusOptions"
label="状态"
clearable
hide-details
@update:model-value="fetchData"
/>
</v-col>
<v-col cols="12" sm="6" md="3">
<v-text-field
v-model="filters.dateRange"
label="日期范围"
prepend-inner-icon="mdi-calendar"
readonly
hide-details
@click="showDatePicker = true"
/>
</v-col>
<v-col cols="12" sm="6" md="2">
<v-btn
variant="outlined"
block
@click="resetFilters"
>
重置筛选
</v-btn>
</v-col>
</v-row>
</v-sheet>
<v-divider />
<!-- 主内容区 - 可滚动 -->
<div class="flex-grow-1 overflow-y-auto" style="min-height: 0">
<!-- 加载状态 -->
<v-skeleton-loader
v-if="loading && !orders.length"
type="table-heading, table-row@8"
class="ma-4"
/>
<!-- 空状态 -->
<v-empty-state
v-else-if="!orders.length"
icon="mdi-package-variant"
title="暂无订单"
text="当前筛选条件下没有找到订单记录"
>
<template #actions>
<v-btn variant="outlined" @click="resetFilters">
清除筛选
</v-btn>
<v-btn color="primary" @click="fetchData">
刷新
</v-btn>
</template>
</v-empty-state>
<!-- 数据表格 -->
<v-data-table-server
v-else
v-model:items-per-page="pagination.pageSize"
v-model:page="pagination.page"
:headers="headers"
:items="orders"
:items-length="pagination.total"
:loading="loading"
fixed-header
hover
@update:options="onOptionsChange"
>
<!-- 订单号 - 可点击 -->
<template #item.orderNo="{ item }">
<a
href="#"
class="text-primary text-decoration-none"
@click.prevent="viewDetail(item)"
>
{{ item.orderNo }}
</a>
</template>
<!-- 客户名 - 截断 + Tooltip -->
<template #item.customerName="{ value }">
<v-tooltip :text="value" location="top">
<template #activator="{ props }">
<span
v-bind="props"
class="text-truncate d-inline-block"
style="max-width: 120px"
>
{{ value }}
</span>
</template>
</v-tooltip>
</template>
<!-- 金额 - 格式化 -->
<template #item.amount="{ value }">
<span class="font-weight-medium">
¥{{ formatNumber(value) }}
</span>
</template>
<!-- 状态 - Chip -->
<template #item.status="{ value }">
<v-chip
:color="getStatusColor(value)"
size="small"
variant="tonal"
>
{{ getStatusText(value) }}
</v-chip>
</template>
<!-- 备注 - 多行截断 -->
<template #item.remark="{ value }">
<div v-if="value" class="remark-cell">
<v-tooltip :text="value" location="top" max-width="300">
<template #activator="{ props }">
<span v-bind="props" class="line-clamp-2">
{{ value }}
</span>
</template>
</v-tooltip>
</div>
<span v-else class="text-grey">-</span>
</template>
<!-- 操作列 -->
<template #item.actions="{ item }">
<v-btn
icon="mdi-eye"
size="small"
variant="text"
@click="viewDetail(item)"
/>
<v-btn
icon="mdi-pencil"
size="small"
variant="text"
:disabled="item.status === 'completed'"
@click="editOrder(item)"
/>
<v-menu>
<template #activator="{ props }">
<v-btn
v-bind="props"
icon="mdi-dots-vertical"
size="small"
variant="text"
/>
</template>
<v-list density="compact">
<v-list-item @click="copyOrderNo(item)">
<template #prepend>
<v-icon size="small">mdi-content-copy</v-icon>
</template>
<v-list-item-title>复制订单号</v-list-item-title>
</v-list-item>
<v-list-item
:disabled="item.status !== 'pending'"
@click="cancelOrder(item)"
>
<template #prepend>
<v-icon size="small" color="error">mdi-cancel</v-icon>
</template>
<v-list-item-title class="text-error">取消订单</v-list-item-title>
</v-list-item>
</v-list>
</v-menu>
</template>
<!-- 空数据插槽 -->
<template #no-data>
<v-empty-state
icon="mdi-database-off"
title="暂无数据"
text="请尝试调整筛选条件"
/>
</template>
</v-data-table-server>
</div>
<!-- 底部统计栏 -->
<v-sheet class="flex-shrink-0 pa-2 border-t d-flex align-center justify-space-between">
<span class="text-body-2 text-grey">
{{ pagination.total }} 条记录
</span>
<span class="text-body-2">
已选 <strong>{{ selectedCount }}</strong>
<v-btn
v-if="selectedCount > 0"
variant="text"
size="small"
color="primary"
@click="batchAction"
>
批量操作
</v-btn>
</span>
</v-sheet>
</div>
</template>
<script setup lang="ts">
import { ref, reactive, computed, onMounted } from 'vue'
import { useDebounceFn } from '@vueuse/core'
import type { Order, OrderStatus } from '@/types/order'
import { orderApi } from '@/api/modules/order'
import { useSnackbar } from '@/composables/useSnackbar'
// Composables
const snackbar = useSnackbar()
// State
const loading = ref(false)
const exporting = ref(false)
const showDatePicker = ref(false)
const orders = ref<Order[]>([])
const selectedCount = ref(0)
const filters = reactive({
search: '',
status: null as OrderStatus | null,
dateRange: '',
})
const pagination = reactive({
page: 1,
pageSize: 20,
total: 0,
})
// Table Headers
const headers = [
{ title: '订单号', key: 'orderNo', width: 160 },
{ title: '客户名称', key: 'customerName', width: 150 },
{ title: '金额', key: 'amount', width: 120, align: 'end' as const },
{ title: '状态', key: 'status', width: 100 },
{ title: '备注', key: 'remark', width: 200 },
{ title: '创建时间', key: 'createdAt', width: 170 },
{ title: '操作', key: 'actions', width: 140, sortable: false },
]
const statusOptions = [
{ title: '全部', value: null },
{ title: '待处理', value: 'pending' },
{ title: '处理中', value: 'processing' },
{ title: '已完成', value: 'completed' },
{ title: '已取消', value: 'cancelled' },
]
// Methods
async function fetchData() {
loading.value = true
try {
const result = await orderApi.getPage({
page: pagination.page,
pageSize: pagination.pageSize,
search: filters.search || undefined,
status: filters.status || undefined,
})
orders.value = result.list
pagination.total = result.total
} catch (error) {
console.error('Failed to fetch orders:', error)
} finally {
loading.value = false
}
}
const debouncedFetch = useDebounceFn(fetchData, 300)
function onOptionsChange(options: { page: number; itemsPerPage: number }) {
pagination.page = options.page
pagination.pageSize = options.itemsPerPage
fetchData()
}
function resetFilters() {
filters.search = ''
filters.status = null
filters.dateRange = ''
pagination.page = 1
fetchData()
}
async function exportData() {
exporting.value = true
try {
const blob = await orderApi.export(filters)
const url = URL.createObjectURL(blob)
const a = document.createElement('a')
a.href = url
a.download = `orders_${Date.now()}.xlsx`
a.click()
URL.revokeObjectURL(url)
snackbar.success('导出成功')
} catch (error) {
snackbar.error('导出失败')
} finally {
exporting.value = false
}
}
function viewDetail(item: Order) {
// Navigate to detail page
}
function editOrder(item: Order) {
// Open edit dialog
}
function copyOrderNo(item: Order) {
navigator.clipboard.writeText(item.orderNo)
snackbar.success('订单号已复制')
}
function cancelOrder(item: Order) {
// Show confirm dialog
}
function batchAction() {
// Show batch action menu
}
// Helpers
function formatNumber(value: number): string {
return value.toLocaleString('zh-CN', { minimumFractionDigits: 2 })
}
function getStatusColor(status: OrderStatus): string {
const colors: Record<OrderStatus, string> = {
pending: 'warning',
processing: 'info',
completed: 'success',
cancelled: 'grey',
}
return colors[status] || 'grey'
}
function getStatusText(status: OrderStatus): string {
const texts: Record<OrderStatus, string> = {
pending: '待处理',
processing: '处理中',
completed: '已完成',
cancelled: '已取消',
}
return texts[status] || status
}
onMounted(fetchData)
</script>
<style scoped>
.sticky-filter {
position: sticky;
top: 0;
z-index: 1;
}
.remark-cell {
max-width: 180px;
}
.line-clamp-2 {
display: -webkit-box;
-webkit-line-clamp: 2;
-webkit-box-orient: vertical;
overflow: hidden;
}
</style>

View File

@@ -0,0 +1,155 @@
<template>
<v-container fluid class="d-flex flex-column h-100 pa-0">
<!-- 固定工具栏 -->
<v-toolbar density="compact" class="flex-shrink-0">
<v-toolbar-title>用户管理</v-toolbar-title>
<v-spacer />
<v-btn icon="mdi-refresh" :loading="loading" @click="fetchData" />
<v-btn color="primary" prepend-icon="mdi-plus" @click="openCreate">
新建
</v-btn>
</v-toolbar>
<!-- 筛选区域 -->
<v-sheet class="flex-shrink-0 pa-4">
<v-row dense>
<v-col cols="12" md="3">
<v-text-field
v-model="filters.search"
label="搜索"
prepend-inner-icon="mdi-magnify"
clearable
hide-details
/>
</v-col>
<v-col cols="12" md="3">
<v-select
v-model="filters.status"
:items="statusOptions"
label="状态"
clearable
hide-details
/>
</v-col>
</v-row>
</v-sheet>
<v-divider />
<!-- 可滚动内容区 -->
<div class="flex-grow-1 overflow-y-auto" style="min-height: 0">
<v-skeleton-loader v-if="loading" type="table-heading, table-row@10" />
<v-empty-state
v-else-if="!users.length"
icon="mdi-account-off"
title="暂无用户"
text="点击新建按钮添加第一个用户"
>
<template #actions>
<v-btn color="primary" @click="openCreate">新建用户</v-btn>
</template>
</v-empty-state>
<v-data-table
v-else
:headers="headers"
:items="users"
fixed-header
hover
>
<template #item.name="{ value }">
<v-tooltip :text="value" location="top">
<template #activator="{ props }">
<span
v-bind="props"
class="text-truncate d-inline-block"
style="max-width: 150px"
>
{{ value }}
</span>
</template>
</v-tooltip>
</template>
<template #item.status="{ value }">
<v-chip
:color="value === 'active' ? 'success' : 'grey'"
size="small"
>
{{ value === 'active' ? '活跃' : '禁用' }}
</v-chip>
</template>
<template #item.actions="{ item }">
<v-btn
icon="mdi-pencil"
size="small"
variant="text"
@click="edit(item)"
/>
<v-btn
icon="mdi-delete"
size="small"
variant="text"
color="error"
@click="remove(item)"
/>
</template>
</v-data-table>
</div>
</v-container>
</template>
<script setup lang="ts">
import { ref, reactive, onMounted } from 'vue'
import type { User } from '@/types/user'
import { userApi } from '@/api/modules/user'
// State
const loading = ref(false)
const users = ref<User[]>([])
const filters = reactive({
search: '',
status: null as string | null,
})
// Table config
const headers = [
{ title: '姓名', key: 'name', width: 200 },
{ title: '邮箱', key: 'email' },
{ title: '状态', key: 'status', width: 120 },
{ title: '创建时间', key: 'createdAt', width: 180 },
{ title: '操作', key: 'actions', width: 120, sortable: false },
]
const statusOptions = [
{ title: '全部', value: null },
{ title: '活跃', value: 'active' },
{ title: '禁用', value: 'disabled' },
]
// Methods
async function fetchData() {
loading.value = true
try {
users.value = await userApi.getList(filters)
} finally {
loading.value = false
}
}
function openCreate() {
// TODO: Open create dialog
}
function edit(item: User) {
// TODO: Open edit dialog
}
function remove(item: User) {
// TODO: Confirm and delete
}
onMounted(fetchData)
</script>

View File

@@ -0,0 +1,238 @@
# API 客户端模式
## 标准响应类型
```typescript
// @/types/api.d.ts
export interface ApiResponse<T = unknown> {
code: number
status: number
timestamp: string
data: T
message?: string
error?: string
}
export interface PageParams {
page: number
pageSize: number
sort?: string
order?: 'asc' | 'desc'
}
export interface PageResult<T> {
list: T[]
total: number
page: number
pageSize: number
}
export enum ApiErrorCode {
Success = 0,
ServerError = 10001,
ParamError = 10002,
Unauthorized = 10003,
Forbidden = 10004,
NotFound = 10005,
Timeout = 10006,
ValidationFail = 10007,
BusinessError = 20001,
}
```
## Axios 实例
```typescript
// @/api/index.ts
import axios from 'axios'
import { setupInterceptors } from './interceptors'
const request = axios.create({
baseURL: import.meta.env.VITE_API_BASE_URL,
timeout: 15000,
headers: {
'Content-Type': 'application/json',
},
})
setupInterceptors(request)
export default request
```
## 响应拦截器
```typescript
// @/api/interceptors.ts
import type { AxiosInstance, AxiosResponse } from 'axios'
import { useSnackbar } from '@/composables/useSnackbar'
import { useAuthStore } from '@/store/auth'
import router from '@/router'
import { ApiErrorCode, type ApiResponse } from '@/types/api'
export function setupInterceptors(instance: AxiosInstance) {
// 请求拦截器
instance.interceptors.request.use(
(config) => {
const authStore = useAuthStore()
if (authStore.token) {
config.headers.Authorization = `Bearer ${authStore.token}`
}
return config
},
(error) => Promise.reject(error)
)
// 响应拦截器
instance.interceptors.response.use(
(response: AxiosResponse<ApiResponse>) => {
const { code, data, message } = response.data
// 业务成功
if (code === ApiErrorCode.Success) {
return data
}
// 业务失败
const snackbar = useSnackbar()
snackbar.error(message || '操作失败')
return Promise.reject(new Error(message))
},
(error) => {
const snackbar = useSnackbar()
const status = error.response?.status
switch (status) {
case 401:
useAuthStore().logout()
router.push('/login')
snackbar.error('登录已过期,请重新登录')
break
case 403:
snackbar.error('无权访问')
break
case 404:
snackbar.error('请求的资源不存在')
break
case 500:
snackbar.error('服务器错误,请稍后重试')
break
default:
if (error.code === 'ECONNABORTED') {
snackbar.error('请求超时,请检查网络')
} else if (!error.response) {
snackbar.error('网络连接失败')
}
}
return Promise.reject(error)
}
)
}
```
## API 模块模板
```typescript
// @/api/modules/[domain].ts
import request from '@/api'
import type { PageParams, PageResult } from '@/types/api'
import type { Entity, CreateDto, UpdateDto } from '@/types/[domain]'
export const entityApi = {
// 分页列表
getPage: (params: PageParams) =>
request.get<PageResult<Entity>>('/entities', { params }),
// 详情
getById: (id: string) =>
request.get<Entity>(`/entities/${id}`),
// 新增
create: (data: CreateDto) =>
request.post<Entity>('/entities', data),
// 更新
update: (id: string, data: UpdateDto) =>
request.put<Entity>(`/entities/${id}`, data),
// 删除
remove: (id: string) =>
request.delete<void>(`/entities/${id}`),
}
```
## 请求取消处理
```typescript
// composables/useCancelableRequest.ts
import { onUnmounted } from 'vue'
import type { AxiosRequestConfig } from 'axios'
export function useCancelableRequest() {
const controller = new AbortController()
onUnmounted(() => {
controller.abort()
})
function withCancel<T>(
requestFn: (config?: AxiosRequestConfig) => Promise<T>
): Promise<T> {
return requestFn({ signal: controller.signal })
}
return { withCancel, abort: () => controller.abort() }
}
```
## 请求重试
```typescript
// utils/retryRequest.ts
export async function retryRequest<T>(
fn: () => Promise<T>,
options: { retries?: number; delay?: number } = {}
): Promise<T> {
const { retries = 3, delay = 1000 } = options
for (let attempt = 0; attempt < retries; attempt++) {
try {
return await fn()
} catch (error) {
if (attempt === retries - 1) throw error
await new Promise((resolve) => setTimeout(resolve, delay * (attempt + 1)))
}
}
throw new Error('Max retries exceeded')
}
```
## 并发请求处理
```typescript
// 使用 Promise.all 并发请求
async function fetchDashboardData() {
const [users, orders, stats] = await Promise.all([
userApi.getList(),
orderApi.getRecent(),
statsApi.getSummary(),
])
return { users, orders, stats }
}
// 使用 Promise.allSettled 处理部分失败
async function fetchWithFallback() {
const results = await Promise.allSettled([
userApi.getList(),
orderApi.getList(),
])
return results.map((result) =>
result.status === 'fulfilled' ? result.value : []
)
}
```

View File

@@ -0,0 +1,182 @@
# 布局模式参考
## 标准页面骨架
```vue
<template>
<v-container fluid class="d-flex flex-column h-100 pa-0">
<!-- 固定头部 -->
<v-toolbar density="compact" class="flex-shrink-0">
<v-toolbar-title>页面标题</v-toolbar-title>
<v-spacer />
<v-btn icon="mdi-refresh" @click="refresh" />
</v-toolbar>
<!-- 可滚动内容区 -->
<div class="flex-grow-1 overflow-y-auto pa-4" style="min-height: 0">
<slot />
</div>
<!-- 固定底部可选 -->
<v-footer app class="flex-shrink-0">
<v-btn block color="primary">操作</v-btn>
</v-footer>
</v-container>
</template>
```
## Flexbox 滚动陷阱解决方案
```css
/* 问题:子元素撑破父容器 */
.parent {
display: flex;
flex-direction: column;
height: 100%;
}
.content {
flex-grow: 1;
/* 必须添加以下任一属性 */
min-height: 0; /* 推荐 */
/* 或 */
overflow: hidden;
}
```
## 粘性筛选栏
```vue
<template>
<div class="flex-grow-1 overflow-y-auto" style="min-height: 0">
<!-- 粘性筛选区 -->
<div class="sticky-top bg-surface pa-4" style="z-index: 1">
<v-row>
<v-col cols="4">
<v-text-field v-model="search" label="搜索" />
</v-col>
<v-col cols="4">
<v-select v-model="status" :items="statusOptions" label="状态" />
</v-col>
</v-row>
</div>
<!-- 列表内容 -->
<v-list>...</v-list>
</div>
</template>
<style scoped>
.sticky-top {
position: sticky;
top: 0;
}
</style>
```
## 分栏布局(侧边栏 + 主内容)
```vue
<template>
<div class="d-flex h-100">
<!-- 固定宽度侧边栏 -->
<v-navigation-drawer permanent width="280">
<v-list nav>...</v-list>
</v-navigation-drawer>
<!-- 自适应主内容 -->
<div class="flex-grow-1 d-flex flex-column" style="min-width: 0">
<v-toolbar>主内容头部</v-toolbar>
<div class="flex-grow-1 overflow-y-auto pa-4" style="min-height: 0">
主内容区域
</div>
</div>
</div>
</template>
```
## 双栏详情布局
```vue
<template>
<div class="d-flex flex-column h-100">
<v-toolbar density="compact" class="flex-shrink-0">
<v-btn icon="mdi-arrow-left" @click="goBack" />
<v-toolbar-title>详情</v-toolbar-title>
</v-toolbar>
<div class="flex-grow-1 d-flex" style="min-height: 0">
<!-- 左侧主信息 -->
<div class="flex-grow-1 overflow-y-auto pa-4" style="min-width: 0">
<v-card>...</v-card>
</div>
<!-- 右侧边栏 -->
<v-sheet width="320" class="flex-shrink-0 overflow-y-auto border-s">
<v-list>相关信息</v-list>
</v-sheet>
</div>
</div>
</template>
```
## Tab 切换布局
```vue
<template>
<div class="d-flex flex-column h-100">
<v-tabs v-model="activeTab" class="flex-shrink-0">
<v-tab value="info">基本信息</v-tab>
<v-tab value="logs">操作日志</v-tab>
<v-tab value="settings">设置</v-tab>
</v-tabs>
<v-divider />
<v-tabs-window v-model="activeTab" class="flex-grow-1" style="min-height: 0">
<v-tabs-window-item value="info" class="h-100 overflow-y-auto">
<!-- 内容 -->
</v-tabs-window-item>
<v-tabs-window-item value="logs" class="h-100 overflow-y-auto">
<!-- 内容 -->
</v-tabs-window-item>
<v-tabs-window-item value="settings" class="h-100 overflow-y-auto">
<!-- 内容 -->
</v-tabs-window-item>
</v-tabs-window>
</div>
</template>
```
## 响应式断点处理
```vue
<script setup lang="ts">
import { useDisplay } from 'vuetify'
const { mobile, mdAndUp, lgAndUp } = useDisplay()
// 根据屏幕尺寸调整列数
const gridCols = computed(() => {
if (lgAndUp.value) return 4
if (mdAndUp.value) return 3
return 2
})
</script>
<template>
<v-row>
<v-col v-for="item in items" :key="item.id" :cols="12 / gridCols">
<v-card>...</v-card>
</v-col>
</v-row>
<!-- 移动端特殊处理 -->
<v-bottom-navigation v-if="mobile" grow>
<v-btn value="home">
<v-icon>mdi-home</v-icon>
<span>首页</span>
</v-btn>
</v-bottom-navigation>
</template>
```

View File

@@ -0,0 +1,142 @@
# TypeScript 严格规范
## 类型定义位置
```
src/types/
├── api.d.ts # ApiResponse, PageParams, etc.
├── user.d.ts # User domain types
├── order.d.ts # Order domain types
└── common.d.ts # Shared utilities
```
## 禁止 `any` 的替代方案
| 场景 | 错误 | 正确 |
|------|------|------|
| 未知对象 | `any` | `Record<string, unknown>` |
| 动态数组 | `any[]` | `unknown[]` + type guard |
| 回调函数 | `(x: any) => any` | 泛型 `<T>(x: T) => T` |
| 第三方库缺失类型 | `any` | 创建 `.d.ts` 声明文件 |
## Props 与 Emits 类型化
```typescript
// Props
interface Props {
user: User
mode?: 'view' | 'edit'
onSave?: (data: User) => void
}
const props = withDefaults(defineProps<Props>(), {
mode: 'view',
})
// Emits
interface Emits {
(e: 'update', value: User): void
(e: 'cancel'): void
}
const emit = defineEmits<Emits>()
```
## 泛型组合式函数
```typescript
// composables/useFetch.ts
export function useFetch<T>(
fetcher: () => Promise<T>,
options?: { immediate?: boolean }
) {
const data = ref<T | null>(null) as Ref<T | null>
const loading = ref(false)
const error = ref<Error | null>(null)
async function execute() {
loading.value = true
error.value = null
try {
data.value = await fetcher()
} catch (e) {
error.value = e instanceof Error ? e : new Error(String(e))
} finally {
loading.value = false
}
}
if (options?.immediate !== false) {
onMounted(execute)
}
return { data, loading, error, execute }
}
```
## 类型守卫
```typescript
function isUser(obj: unknown): obj is User {
return (
typeof obj === 'object' &&
obj !== null &&
'id' in obj &&
'name' in obj
)
}
// 使用
const data: unknown = await fetchData()
if (isUser(data)) {
console.log(data.name) // TypeScript 知道这是 User
}
```
## 联合类型与字面量类型
```typescript
// 状态枚举替代方案
type UserStatus = 'active' | 'disabled' | 'pending'
// 带类型的事件处理
type TableAction =
| { type: 'edit'; payload: User }
| { type: 'delete'; payload: string }
| { type: 'view'; payload: User }
function handleAction(action: TableAction) {
switch (action.type) {
case 'edit':
openEditDialog(action.payload) // payload 是 User
break
case 'delete':
confirmDelete(action.payload) // payload 是 string (id)
break
}
}
```
## 工具类型使用
```typescript
// Partial - 所有属性可选
type UpdateUserDto = Partial<User>
// Pick - 选择部分属性
type UserPreview = Pick<User, 'id' | 'name' | 'avatar'>
// Omit - 排除部分属性
type CreateUserDto = Omit<User, 'id' | 'createdAt' | 'updatedAt'>
// Required - 所有属性必填
type CompleteUser = Required<User>
// Record - 键值映射
type UserMap = Record<string, User>
// 自定义工具类型
type Nullable<T> = T | null
type AsyncReturnType<T extends (...args: any) => Promise<any>> =
T extends (...args: any) => Promise<infer R> ? R : never
```

View File

@@ -0,0 +1,345 @@
# UI 交互规范
## 数据表格
```vue
<template>
<v-data-table
:headers="headers"
:items="items"
:loading="loading"
fixed-header
height="calc(100vh - 200px)"
>
<!-- 截断文本 + Tooltip -->
<template #item.description="{ value }">
<v-tooltip :text="value" location="top">
<template #activator="{ props }">
<span
v-bind="props"
class="text-truncate d-inline-block"
style="max-width: 200px"
>
{{ value }}
</span>
</template>
</v-tooltip>
</template>
<!-- 空状态 -->
<template #no-data>
<v-empty-state
icon="mdi-database-off"
title="暂无数据"
text="请尝试调整筛选条件或新建记录"
>
<template #actions>
<v-btn color="primary" @click="refresh">刷新</v-btn>
</template>
</v-empty-state>
</template>
</v-data-table>
</template>
```
## 虚拟滚动列表
```vue
<template>
<v-virtual-scroll :items="largeList" height="400" item-height="64">
<template #default="{ item }">
<v-list-item :title="item.name" :subtitle="item.description">
<template #append>
<v-btn icon="mdi-chevron-right" variant="text" />
</template>
</v-list-item>
</template>
</v-virtual-scroll>
</template>
```
## 骨架屏加载
```vue
<template>
<div>
<!-- 表格骨架 -->
<v-skeleton-loader v-if="loading" type="table-heading, table-row@5" />
<!-- 卡片骨架 -->
<v-skeleton-loader v-if="loading" type="card" />
<!-- 列表骨架 -->
<v-skeleton-loader v-if="loading" type="list-item-avatar-two-line@3" />
<!-- 文章骨架 -->
<v-skeleton-loader v-if="loading" type="article" />
<!-- 自定义组合骨架 -->
<v-skeleton-loader
v-if="loading"
type="heading, list-item-two-line@3, actions"
/>
<!-- 实际内容 -->
<template v-else>...</template>
</div>
</template>
```
## 空状态组件
```vue
<template>
<v-empty-state
:icon="icon"
:title="title"
:text="description"
>
<template #actions>
<v-btn v-if="showRefresh" variant="outlined" @click="$emit('refresh')">
刷新
</v-btn>
<v-btn v-if="showCreate" color="primary" @click="$emit('create')">
新建
</v-btn>
</template>
</v-empty-state>
</template>
<script setup lang="ts">
interface Props {
icon?: string
title?: string
description?: string
showRefresh?: boolean
showCreate?: boolean
}
withDefaults(defineProps<Props>(), {
icon: 'mdi-folder-open-outline',
title: '暂无数据',
description: '当前没有可显示的内容',
showRefresh: true,
showCreate: false,
})
defineEmits<{
(e: 'refresh'): void
(e: 'create'): void
}>()
</script>
```
## 多行文本截断
```vue
<template>
<div class="line-clamp-container">
<p :class="{ 'line-clamp-2': !expanded }">{{ longText }}</p>
<v-btn
v-if="needsExpand"
variant="text"
size="small"
@click="expanded = !expanded"
>
{{ expanded ? '收起' : '展开' }}
</v-btn>
</div>
</template>
<style scoped>
.line-clamp-2 {
display: -webkit-box;
-webkit-line-clamp: 2;
-webkit-box-orient: vertical;
overflow: hidden;
}
</style>
```
## 确认对话框
```vue
<template>
<v-dialog v-model="dialog" max-width="400" persistent>
<v-card>
<v-card-title class="d-flex align-center">
<v-icon :color="iconColor" class="mr-2">{{ icon }}</v-icon>
{{ title }}
</v-card-title>
<v-card-text>{{ message }}</v-card-text>
<v-card-actions>
<v-spacer />
<v-btn variant="text" @click="cancel">取消</v-btn>
<v-btn :color="confirmColor" :loading="loading" @click="confirm">
{{ confirmText }}
</v-btn>
</v-card-actions>
</v-card>
</v-dialog>
</template>
<script setup lang="ts">
interface Props {
title?: string
message: string
icon?: string
iconColor?: string
confirmText?: string
confirmColor?: string
}
withDefaults(defineProps<Props>(), {
title: '确认操作',
icon: 'mdi-alert-circle-outline',
iconColor: 'warning',
confirmText: '确认',
confirmColor: 'primary',
})
const dialog = defineModel<boolean>({ default: false })
const loading = ref(false)
const emit = defineEmits<{
(e: 'confirm'): void
(e: 'cancel'): void
}>()
function confirm() {
emit('confirm')
}
function cancel() {
dialog.value = false
emit('cancel')
}
</script>
```
## 表单验证
```vue
<template>
<v-form ref="formRef" v-model="valid" @submit.prevent="submit">
<v-text-field
v-model="form.name"
:rules="rules.name"
label="姓名"
required
/>
<v-text-field
v-model="form.email"
:rules="rules.email"
label="邮箱"
type="email"
/>
<v-btn type="submit" :disabled="!valid" :loading="loading">
提交
</v-btn>
</v-form>
</template>
<script setup lang="ts">
import type { VForm } from 'vuetify/components'
const formRef = ref<VForm | null>(null)
const valid = ref(false)
const loading = ref(false)
const form = reactive({
name: '',
email: '',
})
const rules = {
name: [
(v: string) => !!v || '姓名不能为空',
(v: string) => v.length <= 20 || '姓名不能超过20个字符',
],
email: [
(v: string) => !!v || '邮箱不能为空',
(v: string) => /.+@.+\..+/.test(v) || '请输入有效的邮箱地址',
],
}
async function submit() {
const { valid } = await formRef.value!.validate()
if (!valid) return
loading.value = true
try {
await api.submit(form)
} finally {
loading.value = false
}
}
</script>
```
## Snackbar 全局通知
```typescript
// composables/useSnackbar.ts
import { ref } from 'vue'
interface SnackbarOptions {
text: string
color?: string
timeout?: number
}
const snackbar = ref<SnackbarOptions & { show: boolean }>({
show: false,
text: '',
color: 'success',
timeout: 3000,
})
export function useSnackbar() {
function show(options: SnackbarOptions) {
snackbar.value = { ...snackbar.value, ...options, show: true }
}
function success(text: string) {
show({ text, color: 'success' })
}
function error(text: string) {
show({ text, color: 'error', timeout: 5000 })
}
function warning(text: string) {
show({ text, color: 'warning' })
}
function info(text: string) {
show({ text, color: 'info' })
}
return { snackbar, show, success, error, warning, info }
}
```
```vue
<!-- App.vue 中使用 -->
<template>
<v-app>
<router-view />
<v-snackbar
v-model="snackbar.show"
:color="snackbar.color"
:timeout="snackbar.timeout"
>
{{ snackbar.text }}
</v-snackbar>
</v-app>
</template>
<script setup lang="ts">
import { useSnackbar } from '@/composables/useSnackbar'
const { snackbar } = useSnackbar()
</script>
```

View File

@@ -0,0 +1,202 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -0,0 +1,479 @@
---
name: skill-creator
description: Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, update or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.
---
# Skill Creator
A skill for creating new skills and iteratively improving them.
At a high level, the process of creating a skill goes like this:
- Decide what you want the skill to do and roughly how it should do it
- Write a draft of the skill
- Create a few test prompts and run claude-with-access-to-the-skill on them
- Help the user evaluate the results both qualitatively and quantitatively
- While the runs happen in the background, draft some quantitative evals if there aren't any (if there are some, you can either use as is or modify if you feel something needs to change about them). Then explain them to the user (or if they already existed, explain the ones that already exist)
- Use the `eval-viewer/generate_review.py` script to show the user the results for them to look at, and also let them look at the quantitative metrics
- Rewrite the skill based on feedback from the user's evaluation of the results (and also if there are any glaring flaws that become apparent from the quantitative benchmarks)
- Repeat until you're satisfied
- Expand the test set and try again at larger scale
Your job when using this skill is to figure out where the user is in this process and then jump in and help them progress through these stages. So for instance, maybe they're like "I want to make a skill for X". You can help narrow down what they mean, write a draft, write the test cases, figure out how they want to evaluate, run all the prompts, and repeat.
On the other hand, maybe they already have a draft of the skill. In this case you can go straight to the eval/iterate part of the loop.
Of course, you should always be flexible and if the user is like "I don't need to run a bunch of evaluations, just vibe with me", you can do that instead.
Then after the skill is done (but again, the order is flexible), you can also run the skill description improver, which we have a whole separate script for, to optimize the triggering of the skill.
Cool? Cool.
## Communicating with the user
The skill creator is liable to be used by people across a wide range of familiarity with coding jargon. If you haven't heard (and how could you, it's only very recently that it started), there's a trend now where the power of Claude is inspiring plumbers to open up their terminals, parents and grandparents to google "how to install npm". On the other hand, the bulk of users are probably fairly computer-literate.
So please pay attention to context cues to understand how to phrase your communication! In the default case, just to give you some idea:
- "evaluation" and "benchmark" are borderline, but OK
- for "JSON" and "assertion" you want to see serious cues from the user that they know what those things are before using them without explaining them
It's OK to briefly explain terms if you're in doubt, and feel free to clarify terms with a short definition if you're unsure if the user will get it.
---
## Creating a skill
### Capture Intent
Start by understanding the user's intent. The current conversation might already contain a workflow the user wants to capture (e.g., they say "turn this into a skill"). If so, extract answers from the conversation history first — the tools used, the sequence of steps, corrections the user made, input/output formats observed. The user may need to fill the gaps, and should confirm before proceeding to the next step.
1. What should this skill enable Claude to do?
2. When should this skill trigger? (what user phrases/contexts)
3. What's the expected output format?
4. Should we set up test cases to verify the skill works? Skills with objectively verifiable outputs (file transforms, data extraction, code generation, fixed workflow steps) benefit from test cases. Skills with subjective outputs (writing style, art) often don't need them. Suggest the appropriate default based on the skill type, but let the user decide.
### Interview and Research
Proactively ask questions about edge cases, input/output formats, example files, success criteria, and dependencies. Wait to write test prompts until you've got this part ironed out.
Check available MCPs - if useful for research (searching docs, finding similar skills, looking up best practices), research in parallel via subagents if available, otherwise inline. Come prepared with context to reduce burden on the user.
### Write the SKILL.md
Based on the user interview, fill in these components:
- **name**: Skill identifier
- **description**: When to trigger, what it does. This is the primary triggering mechanism - include both what the skill does AND specific contexts for when to use it. All "when to use" info goes here, not in the body. Note: currently Claude has a tendency to "undertrigger" skills -- to not use them when they'd be useful. To combat this, please make the skill descriptions a little bit "pushy". So for instance, instead of "How to build a simple fast dashboard to display internal Anthropic data.", you might write "How to build a simple fast dashboard to display internal Anthropic data. Make sure to use this skill whenever the user mentions dashboards, data visualization, internal metrics, or wants to display any kind of company data, even if they don't explicitly ask for a 'dashboard.'"
- **compatibility**: Required tools, dependencies (optional, rarely needed)
- **the rest of the skill :)**
### Skill Writing Guide
#### Anatomy of a Skill
```
skill-name/
├── SKILL.md (required)
│ ├── YAML frontmatter (name, description required)
│ └── Markdown instructions
└── Bundled Resources (optional)
├── scripts/ - Executable code for deterministic/repetitive tasks
├── references/ - Docs loaded into context as needed
└── assets/ - Files used in output (templates, icons, fonts)
```
#### Progressive Disclosure
Skills use a three-level loading system:
1. **Metadata** (name + description) - Always in context (~100 words)
2. **SKILL.md body** - In context whenever skill triggers (<500 lines ideal)
3. **Bundled resources** - As needed (unlimited, scripts can execute without loading)
These word counts are approximate and you can feel free to go longer if needed.
**Key patterns:**
- Keep SKILL.md under 500 lines; if you're approaching this limit, add an additional layer of hierarchy along with clear pointers about where the model using the skill should go next to follow up.
- Reference files clearly from SKILL.md with guidance on when to read them
- For large reference files (>300 lines), include a table of contents
**Domain organization**: When a skill supports multiple domains/frameworks, organize by variant:
```
cloud-deploy/
├── SKILL.md (workflow + selection)
└── references/
├── aws.md
├── gcp.md
└── azure.md
```
Claude reads only the relevant reference file.
#### Principle of Lack of Surprise
This goes without saying, but skills must not contain malware, exploit code, or any content that could compromise system security. A skill's contents should not surprise the user in their intent if described. Don't go along with requests to create misleading skills or skills designed to facilitate unauthorized access, data exfiltration, or other malicious activities. Things like a "roleplay as an XYZ" are OK though.
#### Writing Patterns
Prefer using the imperative form in instructions.
**Defining output formats** - You can do it like this:
```markdown
## Report structure
ALWAYS use this exact template:
# [Title]
## Executive summary
## Key findings
## Recommendations
```
**Examples pattern** - It's useful to include examples. You can format them like this (but if "Input" and "Output" are in the examples you might want to deviate a little):
```markdown
## Commit message format
**Example 1:**
Input: Added user authentication with JWT tokens
Output: feat(auth): implement JWT-based authentication
```
### Writing Style
Try to explain to the model why things are important in lieu of heavy-handed musty MUSTs. Use theory of mind and try to make the skill general and not super-narrow to specific examples. Start by writing a draft and then look at it with fresh eyes and improve it.
### Test Cases
After writing the skill draft, come up with 2-3 realistic test prompts — the kind of thing a real user would actually say. Share them with the user: [you don't have to use this exact language] "Here are a few test cases I'd like to try. Do these look right, or do you want to add more?" Then run them.
Save test cases to `evals/evals.json`. Don't write assertions yet — just the prompts. You'll draft assertions in the next step while the runs are in progress.
```json
{
"skill_name": "example-skill",
"evals": [
{
"id": 1,
"prompt": "User's task prompt",
"expected_output": "Description of expected result",
"files": []
}
]
}
```
See `references/schemas.md` for the full schema (including the `assertions` field, which you'll add later).
## Running and evaluating test cases
This section is one continuous sequence — don't stop partway through. Do NOT use `/skill-test` or any other testing skill.
Put results in `<skill-name>-workspace/` as a sibling to the skill directory. Within the workspace, organize results by iteration (`iteration-1/`, `iteration-2/`, etc.) and within that, each test case gets a directory (`eval-0/`, `eval-1/`, etc.). Don't create all of this upfront — just create directories as you go.
### Step 1: Spawn all runs (with-skill AND baseline) in the same turn
For each test case, spawn two subagents in the same turn — one with the skill, one without. This is important: don't spawn the with-skill runs first and then come back for baselines later. Launch everything at once so it all finishes around the same time.
**With-skill run:**
```
Execute this task:
- Skill path: <path-to-skill>
- Task: <eval prompt>
- Input files: <eval files if any, or "none">
- Save outputs to: <workspace>/iteration-<N>/eval-<ID>/with_skill/outputs/
- Outputs to save: <what the user cares about — e.g., "the .docx file", "the final CSV">
```
**Baseline run** (same prompt, but the baseline depends on context):
- **Creating a new skill**: no skill at all. Same prompt, no skill path, save to `without_skill/outputs/`.
- **Improving an existing skill**: the old version. Before editing, snapshot the skill (`cp -r <skill-path> <workspace>/skill-snapshot/`), then point the baseline subagent at the snapshot. Save to `old_skill/outputs/`.
Write an `eval_metadata.json` for each test case (assertions can be empty for now). Give each eval a descriptive name based on what it's testing — not just "eval-0". Use this name for the directory too. If this iteration uses new or modified eval prompts, create these files for each new eval directory — don't assume they carry over from previous iterations.
```json
{
"eval_id": 0,
"eval_name": "descriptive-name-here",
"prompt": "The user's task prompt",
"assertions": []
}
```
### Step 2: While runs are in progress, draft assertions
Don't just wait for the runs to finish — you can use this time productively. Draft quantitative assertions for each test case and explain them to the user. If assertions already exist in `evals/evals.json`, review them and explain what they check.
Good assertions are objectively verifiable and have descriptive names — they should read clearly in the benchmark viewer so someone glancing at the results immediately understands what each one checks. Subjective skills (writing style, design quality) are better evaluated qualitatively — don't force assertions onto things that need human judgment.
Update the `eval_metadata.json` files and `evals/evals.json` with the assertions once drafted. Also explain to the user what they'll see in the viewer — both the qualitative outputs and the quantitative benchmark.
### Step 3: As runs complete, capture timing data
When each subagent task completes, you receive a notification containing `total_tokens` and `duration_ms`. Save this data immediately to `timing.json` in the run directory:
```json
{
"total_tokens": 84852,
"duration_ms": 23332,
"total_duration_seconds": 23.3
}
```
This is the only opportunity to capture this data — it comes through the task notification and isn't persisted elsewhere. Process each notification as it arrives rather than trying to batch them.
### Step 4: Grade, aggregate, and launch the viewer
Once all runs are done:
1. **Grade each run** — spawn a grader subagent (or grade inline) that reads `agents/grader.md` and evaluates each assertion against the outputs. Save results to `grading.json` in each run directory. The grading.json expectations array must use the fields `text`, `passed`, and `evidence` (not `name`/`met`/`details` or other variants) — the viewer depends on these exact field names. For assertions that can be checked programmatically, write and run a script rather than eyeballing it — scripts are faster, more reliable, and can be reused across iterations.
2. **Aggregate into benchmark** — run the aggregation script from the skill-creator directory:
```bash
python -m scripts.aggregate_benchmark <workspace>/iteration-N --skill-name <name>
```
This produces `benchmark.json` and `benchmark.md` with pass_rate, time, and tokens for each configuration, with mean ± stddev and the delta. If generating benchmark.json manually, see `references/schemas.md` for the exact schema the viewer expects.
Put each with_skill version before its baseline counterpart.
3. **Do an analyst pass** — read the benchmark data and surface patterns the aggregate stats might hide. See `agents/analyzer.md` (the "Analyzing Benchmark Results" section) for what to look for — things like assertions that always pass regardless of skill (non-discriminating), high-variance evals (possibly flaky), and time/token tradeoffs.
4. **Launch the viewer** with both qualitative outputs and quantitative data:
```bash
nohup python <skill-creator-path>/eval-viewer/generate_review.py \
<workspace>/iteration-N \
--skill-name "my-skill" \
--benchmark <workspace>/iteration-N/benchmark.json \
> /dev/null 2>&1 &
VIEWER_PID=$!
```
For iteration 2+, also pass `--previous-workspace <workspace>/iteration-<N-1>`.
**Cowork / headless environments:** If `webbrowser.open()` is not available or the environment has no display, use `--static <output_path>` to write a standalone HTML file instead of starting a server. Feedback will be downloaded as a `feedback.json` file when the user clicks "Submit All Reviews". After download, copy `feedback.json` into the workspace directory for the next iteration to pick up.
Note: please use generate_review.py to create the viewer; there's no need to write custom HTML.
5. **Tell the user** something like: "I've opened the results in your browser. There are two tabs — 'Outputs' lets you click through each test case and leave feedback, 'Benchmark' shows the quantitative comparison. When you're done, come back here and let me know."
### What the user sees in the viewer
The "Outputs" tab shows one test case at a time:
- **Prompt**: the task that was given
- **Output**: the files the skill produced, rendered inline where possible
- **Previous Output** (iteration 2+): collapsed section showing last iteration's output
- **Formal Grades** (if grading was run): collapsed section showing assertion pass/fail
- **Feedback**: a textbox that auto-saves as they type
- **Previous Feedback** (iteration 2+): their comments from last time, shown below the textbox
The "Benchmark" tab shows the stats summary: pass rates, timing, and token usage for each configuration, with per-eval breakdowns and analyst observations.
Navigation is via prev/next buttons or arrow keys. When done, they click "Submit All Reviews" which saves all feedback to `feedback.json`.
### Step 5: Read the feedback
When the user tells you they're done, read `feedback.json`:
```json
{
"reviews": [
{"run_id": "eval-0-with_skill", "feedback": "the chart is missing axis labels", "timestamp": "..."},
{"run_id": "eval-1-with_skill", "feedback": "", "timestamp": "..."},
{"run_id": "eval-2-with_skill", "feedback": "perfect, love this", "timestamp": "..."}
],
"status": "complete"
}
```
Empty feedback means the user thought it was fine. Focus your improvements on the test cases where the user had specific complaints.
Kill the viewer server when you're done with it:
```bash
kill $VIEWER_PID 2>/dev/null
```
---
## Improving the skill
This is the heart of the loop. You've run the test cases, the user has reviewed the results, and now you need to make the skill better based on their feedback.
### How to think about improvements
1. **Generalize from the feedback.** The big picture thing that's happening here is that we're trying to create skills that can be used a million times (maybe literally, maybe even more who knows) across many different prompts. Here you and the user are iterating on only a few examples over and over again because it helps move faster. The user knows these examples in and out and it's quick for them to assess new outputs. But if the skill you and the user are codeveloping works only for those examples, it's useless. Rather than put in fiddly overfitty changes, or oppressively constrictive MUSTs, if there's some stubborn issue, you might try branching out and using different metaphors, or recommending different patterns of working. It's relatively cheap to try and maybe you'll land on something great.
2. **Keep the prompt lean.** Remove things that aren't pulling their weight. Make sure to read the transcripts, not just the final outputs — if it looks like the skill is making the model waste a bunch of time doing things that are unproductive, you can try getting rid of the parts of the skill that are making it do that and seeing what happens.
3. **Explain the why.** Try hard to explain the **why** behind everything you're asking the model to do. Today's LLMs are *smart*. They have good theory of mind and when given a good harness can go beyond rote instructions and really make things happen. Even if the feedback from the user is terse or frustrated, try to actually understand the task and why the user is writing what they wrote, and what they actually wrote, and then transmit this understanding into the instructions. If you find yourself writing ALWAYS or NEVER in all caps, or using super rigid structures, that's a yellow flag — if possible, reframe and explain the reasoning so that the model understands why the thing you're asking for is important. That's a more humane, powerful, and effective approach.
4. **Look for repeated work across test cases.** Read the transcripts from the test runs and notice if the subagents all independently wrote similar helper scripts or took the same multi-step approach to something. If all 3 test cases resulted in the subagent writing a `create_docx.py` or a `build_chart.py`, that's a strong signal the skill should bundle that script. Write it once, put it in `scripts/`, and tell the skill to use it. This saves every future invocation from reinventing the wheel.
This task is pretty important (we are trying to create billions a year in economic value here!) and your thinking time is not the blocker; take your time and really mull things over. I'd suggest writing a draft revision and then looking at it anew and making improvements. Really do your best to get into the head of the user and understand what they want and need.
### The iteration loop
After improving the skill:
1. Apply your improvements to the skill
2. Rerun all test cases into a new `iteration-<N+1>/` directory, including baseline runs. If you're creating a new skill, the baseline is always `without_skill` (no skill) — that stays the same across iterations. If you're improving an existing skill, use your judgment on what makes sense as the baseline: the original version the user came in with, or the previous iteration.
3. Launch the reviewer with `--previous-workspace` pointing at the previous iteration
4. Wait for the user to review and tell you they're done
5. Read the new feedback, improve again, repeat
Keep going until:
- The user says they're happy
- The feedback is all empty (everything looks good)
- You're not making meaningful progress
---
## Advanced: Blind comparison
For situations where you want a more rigorous comparison between two versions of a skill (e.g., the user asks "is the new version actually better?"), there's a blind comparison system. Read `agents/comparator.md` and `agents/analyzer.md` for the details. The basic idea is: give two outputs to an independent agent without telling it which is which, and let it judge quality. Then analyze why the winner won.
This is optional, requires subagents, and most users won't need it. The human review loop is usually sufficient.
---
## Description Optimization
The description field in SKILL.md frontmatter is the primary mechanism that determines whether Claude invokes a skill. After creating or improving a skill, offer to optimize the description for better triggering accuracy.
### Step 1: Generate trigger eval queries
Create 20 eval queries — a mix of should-trigger and should-not-trigger. Save as JSON:
```json
[
{"query": "the user prompt", "should_trigger": true},
{"query": "another prompt", "should_trigger": false}
]
```
The queries must be realistic and something a Claude Code or Claude.ai user would actually type. Not abstract requests, but requests that are concrete and specific and have a good amount of detail. For instance, file paths, personal context about the user's job or situation, column names and values, company names, URLs. A little bit of backstory. Some might be in lowercase or contain abbreviations or typos or casual speech. Use a mix of different lengths, and focus on edge cases rather than making them clear-cut (the user will get a chance to sign off on them).
Bad: `"Format this data"`, `"Extract text from PDF"`, `"Create a chart"`
Good: `"ok so my boss just sent me this xlsx file (its in my downloads, called something like 'Q4 sales final FINAL v2.xlsx') and she wants me to add a column that shows the profit margin as a percentage. The revenue is in column C and costs are in column D i think"`
For the **should-trigger** queries (8-10), think about coverage. You want different phrasings of the same intent — some formal, some casual. Include cases where the user doesn't explicitly name the skill or file type but clearly needs it. Throw in some uncommon use cases and cases where this skill competes with another but should win.
For the **should-not-trigger** queries (8-10), the most valuable ones are the near-misses — queries that share keywords or concepts with the skill but actually need something different. Think adjacent domains, ambiguous phrasing where a naive keyword match would trigger but shouldn't, and cases where the query touches on something the skill does but in a context where another tool is more appropriate.
The key thing to avoid: don't make should-not-trigger queries obviously irrelevant. "Write a fibonacci function" as a negative test for a PDF skill is too easy — it doesn't test anything. The negative cases should be genuinely tricky.
### Step 2: Review with user
Present the eval set to the user for review using the HTML template:
1. Read the template from `assets/eval_review.html`
2. Replace the placeholders:
- `__EVAL_DATA_PLACEHOLDER__` → the JSON array of eval items (no quotes around it — it's a JS variable assignment)
- `__SKILL_NAME_PLACEHOLDER__` → the skill's name
- `__SKILL_DESCRIPTION_PLACEHOLDER__` → the skill's current description
3. Write to a temp file (e.g., `/tmp/eval_review_<skill-name>.html`) and open it: `open /tmp/eval_review_<skill-name>.html`
4. The user can edit queries, toggle should-trigger, add/remove entries, then click "Export Eval Set"
5. The file downloads to `~/Downloads/eval_set.json` — check the Downloads folder for the most recent version in case there are multiple (e.g., `eval_set (1).json`)
This step matters — bad eval queries lead to bad descriptions.
### Step 3: Run the optimization loop
Tell the user: "This will take some time — I'll run the optimization loop in the background and check on it periodically."
Save the eval set to the workspace, then run in the background:
```bash
python -m scripts.run_loop \
--eval-set <path-to-trigger-eval.json> \
--skill-path <path-to-skill> \
--model <model-id-powering-this-session> \
--max-iterations 5 \
--verbose
```
Use the model ID from your system prompt (the one powering the current session) so the triggering test matches what the user actually experiences.
While it runs, periodically tail the output to give the user updates on which iteration it's on and what the scores look like.
This handles the full optimization loop automatically. It splits the eval set into 60% train and 40% held-out test, evaluates the current description (running each query 3 times to get a reliable trigger rate), then calls Claude with extended thinking to propose improvements based on what failed. It re-evaluates each new description on both train and test, iterating up to 5 times. When it's done, it opens an HTML report in the browser showing the results per iteration and returns JSON with `best_description` — selected by test score rather than train score to avoid overfitting.
### How skill triggering works
Understanding the triggering mechanism helps design better eval queries. Skills appear in Claude's `available_skills` list with their name + description, and Claude decides whether to consult a skill based on that description. The important thing to know is that Claude only consults skills for tasks it can't easily handle on its own — simple, one-step queries like "read this PDF" may not trigger a skill even if the description matches perfectly, because Claude can handle them directly with basic tools. Complex, multi-step, or specialized queries reliably trigger skills when the description matches.
This means your eval queries should be substantive enough that Claude would actually benefit from consulting a skill. Simple queries like "read file X" are poor test cases — they won't trigger skills regardless of description quality.
### Step 4: Apply the result
Take `best_description` from the JSON output and update the skill's SKILL.md frontmatter. Show the user before/after and report the scores.
---
### Package and Present (only if `present_files` tool is available)
Check whether you have access to the `present_files` tool. If you don't, skip this step. If you do, package the skill and present the .skill file to the user:
```bash
python -m scripts.package_skill <path/to/skill-folder>
```
After packaging, direct the user to the resulting `.skill` file path so they can install it.
---
## Claude.ai-specific instructions
In Claude.ai, the core workflow is the same (draft → test → review → improve → repeat), but because Claude.ai doesn't have subagents, some mechanics change. Here's what to adapt:
**Running test cases**: No subagents means no parallel execution. For each test case, read the skill's SKILL.md, then follow its instructions to accomplish the test prompt yourself. Do them one at a time. This is less rigorous than independent subagents (you wrote the skill and you're also running it, so you have full context), but it's a useful sanity check — and the human review step compensates. Skip the baseline runs — just use the skill to complete the task as requested.
**Reviewing results**: If you can't open a browser (e.g., Claude.ai's VM has no display, or you're on a remote server), skip the browser reviewer entirely. Instead, present results directly in the conversation. For each test case, show the prompt and the output. If the output is a file the user needs to see (like a .docx or .xlsx), save it to the filesystem and tell them where it is so they can download and inspect it. Ask for feedback inline: "How does this look? Anything you'd change?"
**Benchmarking**: Skip the quantitative benchmarking — it relies on baseline comparisons which aren't meaningful without subagents. Focus on qualitative feedback from the user.
**The iteration loop**: Same as before — improve the skill, rerun the test cases, ask for feedback — just without the browser reviewer in the middle. You can still organize results into iteration directories on the filesystem if you have one.
**Description optimization**: This section requires the `claude` CLI tool (specifically `claude -p`) which is only available in Claude Code. Skip it if you're on Claude.ai.
**Blind comparison**: Requires subagents. Skip it.
**Packaging**: The `package_skill.py` script works anywhere with Python and a filesystem. On Claude.ai, you can run it and the user can download the resulting `.skill` file.
---
## Cowork-Specific Instructions
If you're in Cowork, the main things to know are:
- You have subagents, so the main workflow (spawn test cases in parallel, run baselines, grade, etc.) all works. (However, if you run into severe problems with timeouts, it's OK to run the test prompts in series rather than parallel.)
- You don't have a browser or display, so when generating the eval viewer, use `--static <output_path>` to write a standalone HTML file instead of starting a server. Then proffer a link that the user can click to open the HTML in their browser.
- For whatever reason, the Cowork setup seems to disincline Claude from generating the eval viewer after running the tests, so just to reiterate: whether you're in Cowork or in Claude Code, after running tests, you should always generate the eval viewer for the human to look at examples before revising the skill yourself and trying to make corrections, using `generate_review.py` (not writing your own boutique html code). Sorry in advance but I'm gonna go all caps here: GENERATE THE EVAL VIEWER *BEFORE* evaluating inputs yourself. You want to get them in front of the human ASAP!
- Feedback works differently: since there's no running server, the viewer's "Submit All Reviews" button will download `feedback.json` as a file. You can then read it from there (you may have to request access first).
- Packaging works — `package_skill.py` just needs Python and a filesystem.
- Description optimization (`run_loop.py` / `run_eval.py`) should work in Cowork just fine since it uses `claude -p` via subprocess, not a browser, but please save it until you've fully finished making the skill and the user agrees it's in good shape.
---
## Reference files
The agents/ directory contains instructions for specialized subagents. Read them when you need to spawn the relevant subagent.
- `agents/grader.md` — How to evaluate assertions against outputs
- `agents/comparator.md` — How to do blind A/B comparison between two outputs
- `agents/analyzer.md` — How to analyze why one version beat another
The references/ directory has additional documentation:
- `references/schemas.md` — JSON structures for evals.json, grading.json, etc.
---
Repeating one more time the core loop here for emphasis:
- Figure out what the skill is about
- Draft or edit the skill
- Run claude-with-access-to-the-skill on test prompts
- With the user, evaluate the outputs:
- Create benchmark.json and run `eval-viewer/generate_review.py` to help the user review them
- Run quantitative evals
- Repeat until you and the user are satisfied
- Package the final skill and return it to the user.
Please add steps to your TodoList, if you have such a thing, to make sure you don't forget. If you're in Cowork, please specifically put "Create evals JSON and run `eval-viewer/generate_review.py` so human can review test cases" in your TodoList to make sure it happens.
Good luck!

View File

@@ -0,0 +1,274 @@
# Post-hoc Analyzer Agent
Analyze blind comparison results to understand WHY the winner won and generate improvement suggestions.
## Role
After the blind comparator determines a winner, the Post-hoc Analyzer "unblids" the results by examining the skills and transcripts. The goal is to extract actionable insights: what made the winner better, and how can the loser be improved?
## Inputs
You receive these parameters in your prompt:
- **winner**: "A" or "B" (from blind comparison)
- **winner_skill_path**: Path to the skill that produced the winning output
- **winner_transcript_path**: Path to the execution transcript for the winner
- **loser_skill_path**: Path to the skill that produced the losing output
- **loser_transcript_path**: Path to the execution transcript for the loser
- **comparison_result_path**: Path to the blind comparator's output JSON
- **output_path**: Where to save the analysis results
## Process
### Step 1: Read Comparison Result
1. Read the blind comparator's output at comparison_result_path
2. Note the winning side (A or B), the reasoning, and any scores
3. Understand what the comparator valued in the winning output
### Step 2: Read Both Skills
1. Read the winner skill's SKILL.md and key referenced files
2. Read the loser skill's SKILL.md and key referenced files
3. Identify structural differences:
- Instructions clarity and specificity
- Script/tool usage patterns
- Example coverage
- Edge case handling
### Step 3: Read Both Transcripts
1. Read the winner's transcript
2. Read the loser's transcript
3. Compare execution patterns:
- How closely did each follow their skill's instructions?
- What tools were used differently?
- Where did the loser diverge from optimal behavior?
- Did either encounter errors or make recovery attempts?
### Step 4: Analyze Instruction Following
For each transcript, evaluate:
- Did the agent follow the skill's explicit instructions?
- Did the agent use the skill's provided tools/scripts?
- Were there missed opportunities to leverage skill content?
- Did the agent add unnecessary steps not in the skill?
Score instruction following 1-10 and note specific issues.
### Step 5: Identify Winner Strengths
Determine what made the winner better:
- Clearer instructions that led to better behavior?
- Better scripts/tools that produced better output?
- More comprehensive examples that guided edge cases?
- Better error handling guidance?
Be specific. Quote from skills/transcripts where relevant.
### Step 6: Identify Loser Weaknesses
Determine what held the loser back:
- Ambiguous instructions that led to suboptimal choices?
- Missing tools/scripts that forced workarounds?
- Gaps in edge case coverage?
- Poor error handling that caused failures?
### Step 7: Generate Improvement Suggestions
Based on the analysis, produce actionable suggestions for improving the loser skill:
- Specific instruction changes to make
- Tools/scripts to add or modify
- Examples to include
- Edge cases to address
Prioritize by impact. Focus on changes that would have changed the outcome.
### Step 8: Write Analysis Results
Save structured analysis to `{output_path}`.
## Output Format
Write a JSON file with this structure:
```json
{
"comparison_summary": {
"winner": "A",
"winner_skill": "path/to/winner/skill",
"loser_skill": "path/to/loser/skill",
"comparator_reasoning": "Brief summary of why comparator chose winner"
},
"winner_strengths": [
"Clear step-by-step instructions for handling multi-page documents",
"Included validation script that caught formatting errors",
"Explicit guidance on fallback behavior when OCR fails"
],
"loser_weaknesses": [
"Vague instruction 'process the document appropriately' led to inconsistent behavior",
"No script for validation, agent had to improvise and made errors",
"No guidance on OCR failure, agent gave up instead of trying alternatives"
],
"instruction_following": {
"winner": {
"score": 9,
"issues": [
"Minor: skipped optional logging step"
]
},
"loser": {
"score": 6,
"issues": [
"Did not use the skill's formatting template",
"Invented own approach instead of following step 3",
"Missed the 'always validate output' instruction"
]
}
},
"improvement_suggestions": [
{
"priority": "high",
"category": "instructions",
"suggestion": "Replace 'process the document appropriately' with explicit steps: 1) Extract text, 2) Identify sections, 3) Format per template",
"expected_impact": "Would eliminate ambiguity that caused inconsistent behavior"
},
{
"priority": "high",
"category": "tools",
"suggestion": "Add validate_output.py script similar to winner skill's validation approach",
"expected_impact": "Would catch formatting errors before final output"
},
{
"priority": "medium",
"category": "error_handling",
"suggestion": "Add fallback instructions: 'If OCR fails, try: 1) different resolution, 2) image preprocessing, 3) manual extraction'",
"expected_impact": "Would prevent early failure on difficult documents"
}
],
"transcript_insights": {
"winner_execution_pattern": "Read skill -> Followed 5-step process -> Used validation script -> Fixed 2 issues -> Produced output",
"loser_execution_pattern": "Read skill -> Unclear on approach -> Tried 3 different methods -> No validation -> Output had errors"
}
}
```
## Guidelines
- **Be specific**: Quote from skills and transcripts, don't just say "instructions were unclear"
- **Be actionable**: Suggestions should be concrete changes, not vague advice
- **Focus on skill improvements**: The goal is to improve the losing skill, not critique the agent
- **Prioritize by impact**: Which changes would most likely have changed the outcome?
- **Consider causation**: Did the skill weakness actually cause the worse output, or is it incidental?
- **Stay objective**: Analyze what happened, don't editorialize
- **Think about generalization**: Would this improvement help on other evals too?
## Categories for Suggestions
Use these categories to organize improvement suggestions:
| Category | Description |
|----------|-------------|
| `instructions` | Changes to the skill's prose instructions |
| `tools` | Scripts, templates, or utilities to add/modify |
| `examples` | Example inputs/outputs to include |
| `error_handling` | Guidance for handling failures |
| `structure` | Reorganization of skill content |
| `references` | External docs or resources to add |
## Priority Levels
- **high**: Would likely change the outcome of this comparison
- **medium**: Would improve quality but may not change win/loss
- **low**: Nice to have, marginal improvement
---
# Analyzing Benchmark Results
When analyzing benchmark results, the analyzer's purpose is to **surface patterns and anomalies** across multiple runs, not suggest skill improvements.
## Role
Review all benchmark run results and generate freeform notes that help the user understand skill performance. Focus on patterns that wouldn't be visible from aggregate metrics alone.
## Inputs
You receive these parameters in your prompt:
- **benchmark_data_path**: Path to the in-progress benchmark.json with all run results
- **skill_path**: Path to the skill being benchmarked
- **output_path**: Where to save the notes (as JSON array of strings)
## Process
### Step 1: Read Benchmark Data
1. Read the benchmark.json containing all run results
2. Note the configurations tested (with_skill, without_skill)
3. Understand the run_summary aggregates already calculated
### Step 2: Analyze Per-Assertion Patterns
For each expectation across all runs:
- Does it **always pass** in both configurations? (may not differentiate skill value)
- Does it **always fail** in both configurations? (may be broken or beyond capability)
- Does it **always pass with skill but fail without**? (skill clearly adds value here)
- Does it **always fail with skill but pass without**? (skill may be hurting)
- Is it **highly variable**? (flaky expectation or non-deterministic behavior)
### Step 3: Analyze Cross-Eval Patterns
Look for patterns across evals:
- Are certain eval types consistently harder/easier?
- Do some evals show high variance while others are stable?
- Are there surprising results that contradict expectations?
### Step 4: Analyze Metrics Patterns
Look at time_seconds, tokens, tool_calls:
- Does the skill significantly increase execution time?
- Is there high variance in resource usage?
- Are there outlier runs that skew the aggregates?
### Step 5: Generate Notes
Write freeform observations as a list of strings. Each note should:
- State a specific observation
- Be grounded in the data (not speculation)
- Help the user understand something the aggregate metrics don't show
Examples:
- "Assertion 'Output is a PDF file' passes 100% in both configurations - may not differentiate skill value"
- "Eval 3 shows high variance (50% ± 40%) - run 2 had an unusual failure that may be flaky"
- "Without-skill runs consistently fail on table extraction expectations (0% pass rate)"
- "Skill adds 13s average execution time but improves pass rate by 50%"
- "Token usage is 80% higher with skill, primarily due to script output parsing"
- "All 3 without-skill runs for eval 1 produced empty output"
### Step 6: Write Notes
Save notes to `{output_path}` as a JSON array of strings:
```json
[
"Assertion 'Output is a PDF file' passes 100% in both configurations - may not differentiate skill value",
"Eval 3 shows high variance (50% ± 40%) - run 2 had an unusual failure",
"Without-skill runs consistently fail on table extraction expectations",
"Skill adds 13s average execution time but improves pass rate by 50%"
]
```
## Guidelines
**DO:**
- Report what you observe in the data
- Be specific about which evals, expectations, or runs you're referring to
- Note patterns that aggregate metrics would hide
- Provide context that helps interpret the numbers
**DO NOT:**
- Suggest improvements to the skill (that's for the improvement step, not benchmarking)
- Make subjective quality judgments ("the output was good/bad")
- Speculate about causes without evidence
- Repeat information already in the run_summary aggregates

View File

@@ -0,0 +1,202 @@
# Blind Comparator Agent
Compare two outputs WITHOUT knowing which skill produced them.
## Role
The Blind Comparator judges which output better accomplishes the eval task. You receive two outputs labeled A and B, but you do NOT know which skill produced which. This prevents bias toward a particular skill or approach.
Your judgment is based purely on output quality and task completion.
## Inputs
You receive these parameters in your prompt:
- **output_a_path**: Path to the first output file or directory
- **output_b_path**: Path to the second output file or directory
- **eval_prompt**: The original task/prompt that was executed
- **expectations**: List of expectations to check (optional - may be empty)
## Process
### Step 1: Read Both Outputs
1. Examine output A (file or directory)
2. Examine output B (file or directory)
3. Note the type, structure, and content of each
4. If outputs are directories, examine all relevant files inside
### Step 2: Understand the Task
1. Read the eval_prompt carefully
2. Identify what the task requires:
- What should be produced?
- What qualities matter (accuracy, completeness, format)?
- What would distinguish a good output from a poor one?
### Step 3: Generate Evaluation Rubric
Based on the task, generate a rubric with two dimensions:
**Content Rubric** (what the output contains):
| Criterion | 1 (Poor) | 3 (Acceptable) | 5 (Excellent) |
|-----------|----------|----------------|---------------|
| Correctness | Major errors | Minor errors | Fully correct |
| Completeness | Missing key elements | Mostly complete | All elements present |
| Accuracy | Significant inaccuracies | Minor inaccuracies | Accurate throughout |
**Structure Rubric** (how the output is organized):
| Criterion | 1 (Poor) | 3 (Acceptable) | 5 (Excellent) |
|-----------|----------|----------------|---------------|
| Organization | Disorganized | Reasonably organized | Clear, logical structure |
| Formatting | Inconsistent/broken | Mostly consistent | Professional, polished |
| Usability | Difficult to use | Usable with effort | Easy to use |
Adapt criteria to the specific task. For example:
- PDF form → "Field alignment", "Text readability", "Data placement"
- Document → "Section structure", "Heading hierarchy", "Paragraph flow"
- Data output → "Schema correctness", "Data types", "Completeness"
### Step 4: Evaluate Each Output Against the Rubric
For each output (A and B):
1. **Score each criterion** on the rubric (1-5 scale)
2. **Calculate dimension totals**: Content score, Structure score
3. **Calculate overall score**: Average of dimension scores, scaled to 1-10
### Step 5: Check Assertions (if provided)
If expectations are provided:
1. Check each expectation against output A
2. Check each expectation against output B
3. Count pass rates for each output
4. Use expectation scores as secondary evidence (not the primary decision factor)
### Step 6: Determine the Winner
Compare A and B based on (in priority order):
1. **Primary**: Overall rubric score (content + structure)
2. **Secondary**: Assertion pass rates (if applicable)
3. **Tiebreaker**: If truly equal, declare a TIE
Be decisive - ties should be rare. One output is usually better, even if marginally.
### Step 7: Write Comparison Results
Save results to a JSON file at the path specified (or `comparison.json` if not specified).
## Output Format
Write a JSON file with this structure:
```json
{
"winner": "A",
"reasoning": "Output A provides a complete solution with proper formatting and all required fields. Output B is missing the date field and has formatting inconsistencies.",
"rubric": {
"A": {
"content": {
"correctness": 5,
"completeness": 5,
"accuracy": 4
},
"structure": {
"organization": 4,
"formatting": 5,
"usability": 4
},
"content_score": 4.7,
"structure_score": 4.3,
"overall_score": 9.0
},
"B": {
"content": {
"correctness": 3,
"completeness": 2,
"accuracy": 3
},
"structure": {
"organization": 3,
"formatting": 2,
"usability": 3
},
"content_score": 2.7,
"structure_score": 2.7,
"overall_score": 5.4
}
},
"output_quality": {
"A": {
"score": 9,
"strengths": ["Complete solution", "Well-formatted", "All fields present"],
"weaknesses": ["Minor style inconsistency in header"]
},
"B": {
"score": 5,
"strengths": ["Readable output", "Correct basic structure"],
"weaknesses": ["Missing date field", "Formatting inconsistencies", "Partial data extraction"]
}
},
"expectation_results": {
"A": {
"passed": 4,
"total": 5,
"pass_rate": 0.80,
"details": [
{"text": "Output includes name", "passed": true},
{"text": "Output includes date", "passed": true},
{"text": "Format is PDF", "passed": true},
{"text": "Contains signature", "passed": false},
{"text": "Readable text", "passed": true}
]
},
"B": {
"passed": 3,
"total": 5,
"pass_rate": 0.60,
"details": [
{"text": "Output includes name", "passed": true},
{"text": "Output includes date", "passed": false},
{"text": "Format is PDF", "passed": true},
{"text": "Contains signature", "passed": false},
{"text": "Readable text", "passed": true}
]
}
}
}
```
If no expectations were provided, omit the `expectation_results` field entirely.
## Field Descriptions
- **winner**: "A", "B", or "TIE"
- **reasoning**: Clear explanation of why the winner was chosen (or why it's a tie)
- **rubric**: Structured rubric evaluation for each output
- **content**: Scores for content criteria (correctness, completeness, accuracy)
- **structure**: Scores for structure criteria (organization, formatting, usability)
- **content_score**: Average of content criteria (1-5)
- **structure_score**: Average of structure criteria (1-5)
- **overall_score**: Combined score scaled to 1-10
- **output_quality**: Summary quality assessment
- **score**: 1-10 rating (should match rubric overall_score)
- **strengths**: List of positive aspects
- **weaknesses**: List of issues or shortcomings
- **expectation_results**: (Only if expectations provided)
- **passed**: Number of expectations that passed
- **total**: Total number of expectations
- **pass_rate**: Fraction passed (0.0 to 1.0)
- **details**: Individual expectation results
## Guidelines
- **Stay blind**: DO NOT try to infer which skill produced which output. Judge purely on output quality.
- **Be specific**: Cite specific examples when explaining strengths and weaknesses.
- **Be decisive**: Choose a winner unless outputs are genuinely equivalent.
- **Output quality first**: Assertion scores are secondary to overall task completion.
- **Be objective**: Don't favor outputs based on style preferences; focus on correctness and completeness.
- **Explain your reasoning**: The reasoning field should make it clear why you chose the winner.
- **Handle edge cases**: If both outputs fail, pick the one that fails less badly. If both are excellent, pick the one that's marginally better.

View File

@@ -0,0 +1,223 @@
# Grader Agent
Evaluate expectations against an execution transcript and outputs.
## Role
The Grader reviews a transcript and output files, then determines whether each expectation passes or fails. Provide clear evidence for each judgment.
You have two jobs: grade the outputs, and critique the evals themselves. A passing grade on a weak assertion is worse than useless — it creates false confidence. When you notice an assertion that's trivially satisfied, or an important outcome that no assertion checks, say so.
## Inputs
You receive these parameters in your prompt:
- **expectations**: List of expectations to evaluate (strings)
- **transcript_path**: Path to the execution transcript (markdown file)
- **outputs_dir**: Directory containing output files from execution
## Process
### Step 1: Read the Transcript
1. Read the transcript file completely
2. Note the eval prompt, execution steps, and final result
3. Identify any issues or errors documented
### Step 2: Examine Output Files
1. List files in outputs_dir
2. Read/examine each file relevant to the expectations. If outputs aren't plain text, use the inspection tools provided in your prompt — don't rely solely on what the transcript says the executor produced.
3. Note contents, structure, and quality
### Step 3: Evaluate Each Assertion
For each expectation:
1. **Search for evidence** in the transcript and outputs
2. **Determine verdict**:
- **PASS**: Clear evidence the expectation is true AND the evidence reflects genuine task completion, not just surface-level compliance
- **FAIL**: No evidence, or evidence contradicts the expectation, or the evidence is superficial (e.g., correct filename but empty/wrong content)
3. **Cite the evidence**: Quote the specific text or describe what you found
### Step 4: Extract and Verify Claims
Beyond the predefined expectations, extract implicit claims from the outputs and verify them:
1. **Extract claims** from the transcript and outputs:
- Factual statements ("The form has 12 fields")
- Process claims ("Used pypdf to fill the form")
- Quality claims ("All fields were filled correctly")
2. **Verify each claim**:
- **Factual claims**: Can be checked against the outputs or external sources
- **Process claims**: Can be verified from the transcript
- **Quality claims**: Evaluate whether the claim is justified
3. **Flag unverifiable claims**: Note claims that cannot be verified with available information
This catches issues that predefined expectations might miss.
### Step 5: Read User Notes
If `{outputs_dir}/user_notes.md` exists:
1. Read it and note any uncertainties or issues flagged by the executor
2. Include relevant concerns in the grading output
3. These may reveal problems even when expectations pass
### Step 6: Critique the Evals
After grading, consider whether the evals themselves could be improved. Only surface suggestions when there's a clear gap.
Good suggestions test meaningful outcomes — assertions that are hard to satisfy without actually doing the work correctly. Think about what makes an assertion *discriminating*: it passes when the skill genuinely succeeds and fails when it doesn't.
Suggestions worth raising:
- An assertion that passed but would also pass for a clearly wrong output (e.g., checking filename existence but not file content)
- An important outcome you observed — good or bad — that no assertion covers at all
- An assertion that can't actually be verified from the available outputs
Keep the bar high. The goal is to flag things the eval author would say "good catch" about, not to nitpick every assertion.
### Step 7: Write Grading Results
Save results to `{outputs_dir}/../grading.json` (sibling to outputs_dir).
## Grading Criteria
**PASS when**:
- The transcript or outputs clearly demonstrate the expectation is true
- Specific evidence can be cited
- The evidence reflects genuine substance, not just surface compliance (e.g., a file exists AND contains correct content, not just the right filename)
**FAIL when**:
- No evidence found for the expectation
- Evidence contradicts the expectation
- The expectation cannot be verified from available information
- The evidence is superficial — the assertion is technically satisfied but the underlying task outcome is wrong or incomplete
- The output appears to meet the assertion by coincidence rather than by actually doing the work
**When uncertain**: The burden of proof to pass is on the expectation.
### Step 8: Read Executor Metrics and Timing
1. If `{outputs_dir}/metrics.json` exists, read it and include in grading output
2. If `{outputs_dir}/../timing.json` exists, read it and include timing data
## Output Format
Write a JSON file with this structure:
```json
{
"expectations": [
{
"text": "The output includes the name 'John Smith'",
"passed": true,
"evidence": "Found in transcript Step 3: 'Extracted names: John Smith, Sarah Johnson'"
},
{
"text": "The spreadsheet has a SUM formula in cell B10",
"passed": false,
"evidence": "No spreadsheet was created. The output was a text file."
},
{
"text": "The assistant used the skill's OCR script",
"passed": true,
"evidence": "Transcript Step 2 shows: 'Tool: Bash - python ocr_script.py image.png'"
}
],
"summary": {
"passed": 2,
"failed": 1,
"total": 3,
"pass_rate": 0.67
},
"execution_metrics": {
"tool_calls": {
"Read": 5,
"Write": 2,
"Bash": 8
},
"total_tool_calls": 15,
"total_steps": 6,
"errors_encountered": 0,
"output_chars": 12450,
"transcript_chars": 3200
},
"timing": {
"executor_duration_seconds": 165.0,
"grader_duration_seconds": 26.0,
"total_duration_seconds": 191.0
},
"claims": [
{
"claim": "The form has 12 fillable fields",
"type": "factual",
"verified": true,
"evidence": "Counted 12 fields in field_info.json"
},
{
"claim": "All required fields were populated",
"type": "quality",
"verified": false,
"evidence": "Reference section was left blank despite data being available"
}
],
"user_notes_summary": {
"uncertainties": ["Used 2023 data, may be stale"],
"needs_review": [],
"workarounds": ["Fell back to text overlay for non-fillable fields"]
},
"eval_feedback": {
"suggestions": [
{
"assertion": "The output includes the name 'John Smith'",
"reason": "A hallucinated document that mentions the name would also pass — consider checking it appears as the primary contact with matching phone and email from the input"
},
{
"reason": "No assertion checks whether the extracted phone numbers match the input — I observed incorrect numbers in the output that went uncaught"
}
],
"overall": "Assertions check presence but not correctness. Consider adding content verification."
}
}
```
## Field Descriptions
- **expectations**: Array of graded expectations
- **text**: The original expectation text
- **passed**: Boolean - true if expectation passes
- **evidence**: Specific quote or description supporting the verdict
- **summary**: Aggregate statistics
- **passed**: Count of passed expectations
- **failed**: Count of failed expectations
- **total**: Total expectations evaluated
- **pass_rate**: Fraction passed (0.0 to 1.0)
- **execution_metrics**: Copied from executor's metrics.json (if available)
- **output_chars**: Total character count of output files (proxy for tokens)
- **transcript_chars**: Character count of transcript
- **timing**: Wall clock timing from timing.json (if available)
- **executor_duration_seconds**: Time spent in executor subagent
- **total_duration_seconds**: Total elapsed time for the run
- **claims**: Extracted and verified claims from the output
- **claim**: The statement being verified
- **type**: "factual", "process", or "quality"
- **verified**: Boolean - whether the claim holds
- **evidence**: Supporting or contradicting evidence
- **user_notes_summary**: Issues flagged by the executor
- **uncertainties**: Things the executor wasn't sure about
- **needs_review**: Items requiring human attention
- **workarounds**: Places where the skill didn't work as expected
- **eval_feedback**: Improvement suggestions for the evals (only when warranted)
- **suggestions**: List of concrete suggestions, each with a `reason` and optionally an `assertion` it relates to
- **overall**: Brief assessment — can be "No suggestions, evals look solid" if nothing to flag
## Guidelines
- **Be objective**: Base verdicts on evidence, not assumptions
- **Be specific**: Quote the exact text that supports your verdict
- **Be thorough**: Check both transcript and output files
- **Be consistent**: Apply the same standard to each expectation
- **Explain failures**: Make it clear why evidence was insufficient
- **No partial credit**: Each expectation is pass or fail, not partial

View File

@@ -0,0 +1,146 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Eval Set Review - __SKILL_NAME_PLACEHOLDER__</title>
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Poppins:wght@500;600&family=Lora:wght@400;500&display=swap" rel="stylesheet">
<style>
* { box-sizing: border-box; margin: 0; padding: 0; }
body { font-family: 'Lora', Georgia, serif; background: #faf9f5; padding: 2rem; color: #141413; }
h1 { font-family: 'Poppins', sans-serif; margin-bottom: 0.5rem; font-size: 1.5rem; }
.description { color: #b0aea5; margin-bottom: 1.5rem; font-style: italic; max-width: 900px; }
.controls { margin-bottom: 1rem; display: flex; gap: 0.5rem; }
.btn { font-family: 'Poppins', sans-serif; padding: 0.5rem 1rem; border: none; border-radius: 6px; cursor: pointer; font-size: 0.875rem; font-weight: 500; }
.btn-add { background: #6a9bcc; color: white; }
.btn-add:hover { background: #5889b8; }
.btn-export { background: #d97757; color: white; }
.btn-export:hover { background: #c4613f; }
table { width: 100%; max-width: 1100px; border-collapse: collapse; background: white; border-radius: 6px; overflow: hidden; box-shadow: 0 1px 3px rgba(0,0,0,0.08); }
th { font-family: 'Poppins', sans-serif; background: #141413; color: #faf9f5; padding: 0.75rem 1rem; text-align: left; font-size: 0.875rem; }
td { padding: 0.75rem 1rem; border-bottom: 1px solid #e8e6dc; vertical-align: top; }
tr:nth-child(even) td { background: #faf9f5; }
tr:hover td { background: #f3f1ea; }
.section-header td { background: #e8e6dc; font-family: 'Poppins', sans-serif; font-weight: 500; font-size: 0.8rem; color: #141413; text-transform: uppercase; letter-spacing: 0.05em; }
.query-input { width: 100%; padding: 0.4rem; border: 1px solid #e8e6dc; border-radius: 4px; font-size: 0.875rem; font-family: 'Lora', Georgia, serif; resize: vertical; min-height: 60px; }
.query-input:focus { outline: none; border-color: #d97757; box-shadow: 0 0 0 2px rgba(217,119,87,0.15); }
.toggle { position: relative; display: inline-block; width: 44px; height: 24px; }
.toggle input { opacity: 0; width: 0; height: 0; }
.toggle .slider { position: absolute; inset: 0; background: #b0aea5; border-radius: 24px; cursor: pointer; transition: 0.2s; }
.toggle .slider::before { content: ""; position: absolute; width: 18px; height: 18px; left: 3px; bottom: 3px; background: white; border-radius: 50%; transition: 0.2s; }
.toggle input:checked + .slider { background: #d97757; }
.toggle input:checked + .slider::before { transform: translateX(20px); }
.btn-delete { background: #c44; color: white; padding: 0.3rem 0.6rem; border: none; border-radius: 4px; cursor: pointer; font-size: 0.75rem; font-family: 'Poppins', sans-serif; }
.btn-delete:hover { background: #a33; }
.summary { margin-top: 1rem; color: #b0aea5; font-size: 0.875rem; }
</style>
</head>
<body>
<h1>Eval Set Review: <span id="skill-name">__SKILL_NAME_PLACEHOLDER__</span></h1>
<p class="description">Current description: <span id="skill-desc">__SKILL_DESCRIPTION_PLACEHOLDER__</span></p>
<div class="controls">
<button class="btn btn-add" onclick="addRow()">+ Add Query</button>
<button class="btn btn-export" onclick="exportEvalSet()">Export Eval Set</button>
</div>
<table>
<thead>
<tr>
<th style="width:65%">Query</th>
<th style="width:18%">Should Trigger</th>
<th style="width:10%">Actions</th>
</tr>
</thead>
<tbody id="eval-body"></tbody>
</table>
<p class="summary" id="summary"></p>
<script>
const EVAL_DATA = __EVAL_DATA_PLACEHOLDER__;
let evalItems = [...EVAL_DATA];
function render() {
const tbody = document.getElementById('eval-body');
tbody.innerHTML = '';
// Sort: should-trigger first, then should-not-trigger
const sorted = evalItems
.map((item, origIdx) => ({ ...item, origIdx }))
.sort((a, b) => (b.should_trigger ? 1 : 0) - (a.should_trigger ? 1 : 0));
let lastGroup = null;
sorted.forEach(item => {
const group = item.should_trigger ? 'trigger' : 'no-trigger';
if (group !== lastGroup) {
const headerRow = document.createElement('tr');
headerRow.className = 'section-header';
headerRow.innerHTML = `<td colspan="3">${item.should_trigger ? 'Should Trigger' : 'Should NOT Trigger'}</td>`;
tbody.appendChild(headerRow);
lastGroup = group;
}
const idx = item.origIdx;
const tr = document.createElement('tr');
tr.innerHTML = `
<td><textarea class="query-input" onchange="updateQuery(${idx}, this.value)">${escapeHtml(item.query)}</textarea></td>
<td>
<label class="toggle">
<input type="checkbox" ${item.should_trigger ? 'checked' : ''} onchange="updateTrigger(${idx}, this.checked)">
<span class="slider"></span>
</label>
<span style="margin-left:8px;font-size:0.8rem;color:#b0aea5">${item.should_trigger ? 'Yes' : 'No'}</span>
</td>
<td><button class="btn-delete" onclick="deleteRow(${idx})">Delete</button></td>
`;
tbody.appendChild(tr);
});
updateSummary();
}
function escapeHtml(text) {
const div = document.createElement('div');
div.textContent = text;
return div.innerHTML;
}
function updateQuery(idx, value) { evalItems[idx].query = value; updateSummary(); }
function updateTrigger(idx, value) { evalItems[idx].should_trigger = value; render(); }
function deleteRow(idx) { evalItems.splice(idx, 1); render(); }
function addRow() {
evalItems.push({ query: '', should_trigger: true });
render();
const inputs = document.querySelectorAll('.query-input');
inputs[inputs.length - 1].focus();
}
function updateSummary() {
const trigger = evalItems.filter(i => i.should_trigger).length;
const noTrigger = evalItems.filter(i => !i.should_trigger).length;
document.getElementById('summary').textContent =
`${evalItems.length} queries total: ${trigger} should trigger, ${noTrigger} should not trigger`;
}
function exportEvalSet() {
const valid = evalItems.filter(i => i.query.trim() !== '');
const data = valid.map(i => ({ query: i.query.trim(), should_trigger: i.should_trigger }));
const blob = new Blob([JSON.stringify(data, null, 2)], { type: 'application/json' });
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = 'eval_set.json';
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
URL.revokeObjectURL(url);
}
render();
</script>
</body>
</html>

View File

@@ -0,0 +1,471 @@
#!/usr/bin/env python3
"""Generate and serve a review page for eval results.
Reads the workspace directory, discovers runs (directories with outputs/),
embeds all output data into a self-contained HTML page, and serves it via
a tiny HTTP server. Feedback auto-saves to feedback.json in the workspace.
Usage:
python generate_review.py <workspace-path> [--port PORT] [--skill-name NAME]
python generate_review.py <workspace-path> --previous-feedback /path/to/old/feedback.json
No dependencies beyond the Python stdlib are required.
"""
import argparse
import base64
import json
import mimetypes
import os
import re
import signal
import subprocess
import sys
import time
import webbrowser
from functools import partial
from http.server import HTTPServer, BaseHTTPRequestHandler
from pathlib import Path
# Files to exclude from output listings
METADATA_FILES = {"transcript.md", "user_notes.md", "metrics.json"}
# Extensions we render as inline text
TEXT_EXTENSIONS = {
".txt", ".md", ".json", ".csv", ".py", ".js", ".ts", ".tsx", ".jsx",
".yaml", ".yml", ".xml", ".html", ".css", ".sh", ".rb", ".go", ".rs",
".java", ".c", ".cpp", ".h", ".hpp", ".sql", ".r", ".toml",
}
# Extensions we render as inline images
IMAGE_EXTENSIONS = {".png", ".jpg", ".jpeg", ".gif", ".svg", ".webp"}
# MIME type overrides for common types
MIME_OVERRIDES = {
".svg": "image/svg+xml",
".xlsx": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
".docx": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
".pptx": "application/vnd.openxmlformats-officedocument.presentationml.presentation",
}
def get_mime_type(path: Path) -> str:
ext = path.suffix.lower()
if ext in MIME_OVERRIDES:
return MIME_OVERRIDES[ext]
mime, _ = mimetypes.guess_type(str(path))
return mime or "application/octet-stream"
def find_runs(workspace: Path) -> list[dict]:
"""Recursively find directories that contain an outputs/ subdirectory."""
runs: list[dict] = []
_find_runs_recursive(workspace, workspace, runs)
runs.sort(key=lambda r: (r.get("eval_id", float("inf")), r["id"]))
return runs
def _find_runs_recursive(root: Path, current: Path, runs: list[dict]) -> None:
if not current.is_dir():
return
outputs_dir = current / "outputs"
if outputs_dir.is_dir():
run = build_run(root, current)
if run:
runs.append(run)
return
skip = {"node_modules", ".git", "__pycache__", "skill", "inputs"}
for child in sorted(current.iterdir()):
if child.is_dir() and child.name not in skip:
_find_runs_recursive(root, child, runs)
def build_run(root: Path, run_dir: Path) -> dict | None:
"""Build a run dict with prompt, outputs, and grading data."""
prompt = ""
eval_id = None
# Try eval_metadata.json
for candidate in [run_dir / "eval_metadata.json", run_dir.parent / "eval_metadata.json"]:
if candidate.exists():
try:
metadata = json.loads(candidate.read_text())
prompt = metadata.get("prompt", "")
eval_id = metadata.get("eval_id")
except (json.JSONDecodeError, OSError):
pass
if prompt:
break
# Fall back to transcript.md
if not prompt:
for candidate in [run_dir / "transcript.md", run_dir / "outputs" / "transcript.md"]:
if candidate.exists():
try:
text = candidate.read_text()
match = re.search(r"## Eval Prompt\n\n([\s\S]*?)(?=\n##|$)", text)
if match:
prompt = match.group(1).strip()
except OSError:
pass
if prompt:
break
if not prompt:
prompt = "(No prompt found)"
run_id = str(run_dir.relative_to(root)).replace("/", "-").replace("\\", "-")
# Collect output files
outputs_dir = run_dir / "outputs"
output_files: list[dict] = []
if outputs_dir.is_dir():
for f in sorted(outputs_dir.iterdir()):
if f.is_file() and f.name not in METADATA_FILES:
output_files.append(embed_file(f))
# Load grading if present
grading = None
for candidate in [run_dir / "grading.json", run_dir.parent / "grading.json"]:
if candidate.exists():
try:
grading = json.loads(candidate.read_text())
except (json.JSONDecodeError, OSError):
pass
if grading:
break
return {
"id": run_id,
"prompt": prompt,
"eval_id": eval_id,
"outputs": output_files,
"grading": grading,
}
def embed_file(path: Path) -> dict:
"""Read a file and return an embedded representation."""
ext = path.suffix.lower()
mime = get_mime_type(path)
if ext in TEXT_EXTENSIONS:
try:
content = path.read_text(errors="replace")
except OSError:
content = "(Error reading file)"
return {
"name": path.name,
"type": "text",
"content": content,
}
elif ext in IMAGE_EXTENSIONS:
try:
raw = path.read_bytes()
b64 = base64.b64encode(raw).decode("ascii")
except OSError:
return {"name": path.name, "type": "error", "content": "(Error reading file)"}
return {
"name": path.name,
"type": "image",
"mime": mime,
"data_uri": f"data:{mime};base64,{b64}",
}
elif ext == ".pdf":
try:
raw = path.read_bytes()
b64 = base64.b64encode(raw).decode("ascii")
except OSError:
return {"name": path.name, "type": "error", "content": "(Error reading file)"}
return {
"name": path.name,
"type": "pdf",
"data_uri": f"data:{mime};base64,{b64}",
}
elif ext == ".xlsx":
try:
raw = path.read_bytes()
b64 = base64.b64encode(raw).decode("ascii")
except OSError:
return {"name": path.name, "type": "error", "content": "(Error reading file)"}
return {
"name": path.name,
"type": "xlsx",
"data_b64": b64,
}
else:
# Binary / unknown — base64 download link
try:
raw = path.read_bytes()
b64 = base64.b64encode(raw).decode("ascii")
except OSError:
return {"name": path.name, "type": "error", "content": "(Error reading file)"}
return {
"name": path.name,
"type": "binary",
"mime": mime,
"data_uri": f"data:{mime};base64,{b64}",
}
def load_previous_iteration(workspace: Path) -> dict[str, dict]:
"""Load previous iteration's feedback and outputs.
Returns a map of run_id -> {"feedback": str, "outputs": list[dict]}.
"""
result: dict[str, dict] = {}
# Load feedback
feedback_map: dict[str, str] = {}
feedback_path = workspace / "feedback.json"
if feedback_path.exists():
try:
data = json.loads(feedback_path.read_text())
feedback_map = {
r["run_id"]: r["feedback"]
for r in data.get("reviews", [])
if r.get("feedback", "").strip()
}
except (json.JSONDecodeError, OSError, KeyError):
pass
# Load runs (to get outputs)
prev_runs = find_runs(workspace)
for run in prev_runs:
result[run["id"]] = {
"feedback": feedback_map.get(run["id"], ""),
"outputs": run.get("outputs", []),
}
# Also add feedback for run_ids that had feedback but no matching run
for run_id, fb in feedback_map.items():
if run_id not in result:
result[run_id] = {"feedback": fb, "outputs": []}
return result
def generate_html(
runs: list[dict],
skill_name: str,
previous: dict[str, dict] | None = None,
benchmark: dict | None = None,
) -> str:
"""Generate the complete standalone HTML page with embedded data."""
template_path = Path(__file__).parent / "viewer.html"
template = template_path.read_text()
# Build previous_feedback and previous_outputs maps for the template
previous_feedback: dict[str, str] = {}
previous_outputs: dict[str, list[dict]] = {}
if previous:
for run_id, data in previous.items():
if data.get("feedback"):
previous_feedback[run_id] = data["feedback"]
if data.get("outputs"):
previous_outputs[run_id] = data["outputs"]
embedded = {
"skill_name": skill_name,
"runs": runs,
"previous_feedback": previous_feedback,
"previous_outputs": previous_outputs,
}
if benchmark:
embedded["benchmark"] = benchmark
data_json = json.dumps(embedded)
return template.replace("/*__EMBEDDED_DATA__*/", f"const EMBEDDED_DATA = {data_json};")
# ---------------------------------------------------------------------------
# HTTP server (stdlib only, zero dependencies)
# ---------------------------------------------------------------------------
def _kill_port(port: int) -> None:
"""Kill any process listening on the given port."""
try:
result = subprocess.run(
["lsof", "-ti", f":{port}"],
capture_output=True, text=True, timeout=5,
)
for pid_str in result.stdout.strip().split("\n"):
if pid_str.strip():
try:
os.kill(int(pid_str.strip()), signal.SIGTERM)
except (ProcessLookupError, ValueError):
pass
if result.stdout.strip():
time.sleep(0.5)
except subprocess.TimeoutExpired:
pass
except FileNotFoundError:
print("Note: lsof not found, cannot check if port is in use", file=sys.stderr)
class ReviewHandler(BaseHTTPRequestHandler):
"""Serves the review HTML and handles feedback saves.
Regenerates the HTML on each page load so that refreshing the browser
picks up new eval outputs without restarting the server.
"""
def __init__(
self,
workspace: Path,
skill_name: str,
feedback_path: Path,
previous: dict[str, dict],
benchmark_path: Path | None,
*args,
**kwargs,
):
self.workspace = workspace
self.skill_name = skill_name
self.feedback_path = feedback_path
self.previous = previous
self.benchmark_path = benchmark_path
super().__init__(*args, **kwargs)
def do_GET(self) -> None:
if self.path == "/" or self.path == "/index.html":
# Regenerate HTML on each request (re-scans workspace for new outputs)
runs = find_runs(self.workspace)
benchmark = None
if self.benchmark_path and self.benchmark_path.exists():
try:
benchmark = json.loads(self.benchmark_path.read_text())
except (json.JSONDecodeError, OSError):
pass
html = generate_html(runs, self.skill_name, self.previous, benchmark)
content = html.encode("utf-8")
self.send_response(200)
self.send_header("Content-Type", "text/html; charset=utf-8")
self.send_header("Content-Length", str(len(content)))
self.end_headers()
self.wfile.write(content)
elif self.path == "/api/feedback":
data = b"{}"
if self.feedback_path.exists():
data = self.feedback_path.read_bytes()
self.send_response(200)
self.send_header("Content-Type", "application/json")
self.send_header("Content-Length", str(len(data)))
self.end_headers()
self.wfile.write(data)
else:
self.send_error(404)
def do_POST(self) -> None:
if self.path == "/api/feedback":
length = int(self.headers.get("Content-Length", 0))
body = self.rfile.read(length)
try:
data = json.loads(body)
if not isinstance(data, dict) or "reviews" not in data:
raise ValueError("Expected JSON object with 'reviews' key")
self.feedback_path.write_text(json.dumps(data, indent=2) + "\n")
resp = b'{"ok":true}'
self.send_response(200)
except (json.JSONDecodeError, OSError, ValueError) as e:
resp = json.dumps({"error": str(e)}).encode()
self.send_response(500)
self.send_header("Content-Type", "application/json")
self.send_header("Content-Length", str(len(resp)))
self.end_headers()
self.wfile.write(resp)
else:
self.send_error(404)
def log_message(self, format: str, *args: object) -> None:
# Suppress request logging to keep terminal clean
pass
def main() -> None:
parser = argparse.ArgumentParser(description="Generate and serve eval review")
parser.add_argument("workspace", type=Path, help="Path to workspace directory")
parser.add_argument("--port", "-p", type=int, default=3117, help="Server port (default: 3117)")
parser.add_argument("--skill-name", "-n", type=str, default=None, help="Skill name for header")
parser.add_argument(
"--previous-workspace", type=Path, default=None,
help="Path to previous iteration's workspace (shows old outputs and feedback as context)",
)
parser.add_argument(
"--benchmark", type=Path, default=None,
help="Path to benchmark.json to show in the Benchmark tab",
)
parser.add_argument(
"--static", "-s", type=Path, default=None,
help="Write standalone HTML to this path instead of starting a server",
)
args = parser.parse_args()
workspace = args.workspace.resolve()
if not workspace.is_dir():
print(f"Error: {workspace} is not a directory", file=sys.stderr)
sys.exit(1)
runs = find_runs(workspace)
if not runs:
print(f"No runs found in {workspace}", file=sys.stderr)
sys.exit(1)
skill_name = args.skill_name or workspace.name.replace("-workspace", "")
feedback_path = workspace / "feedback.json"
previous: dict[str, dict] = {}
if args.previous_workspace:
previous = load_previous_iteration(args.previous_workspace.resolve())
benchmark_path = args.benchmark.resolve() if args.benchmark else None
benchmark = None
if benchmark_path and benchmark_path.exists():
try:
benchmark = json.loads(benchmark_path.read_text())
except (json.JSONDecodeError, OSError):
pass
if args.static:
html = generate_html(runs, skill_name, previous, benchmark)
args.static.parent.mkdir(parents=True, exist_ok=True)
args.static.write_text(html)
print(f"\n Static viewer written to: {args.static}\n")
sys.exit(0)
# Kill any existing process on the target port
port = args.port
_kill_port(port)
handler = partial(ReviewHandler, workspace, skill_name, feedback_path, previous, benchmark_path)
try:
server = HTTPServer(("127.0.0.1", port), handler)
except OSError:
# Port still in use after kill attempt — find a free one
server = HTTPServer(("127.0.0.1", 0), handler)
port = server.server_address[1]
url = f"http://localhost:{port}"
print(f"\n Eval Viewer")
print(f" ─────────────────────────────────")
print(f" URL: {url}")
print(f" Workspace: {workspace}")
print(f" Feedback: {feedback_path}")
if previous:
print(f" Previous: {args.previous_workspace} ({len(previous)} runs)")
if benchmark_path:
print(f" Benchmark: {benchmark_path}")
print(f"\n Press Ctrl+C to stop.\n")
webbrowser.open(url)
try:
server.serve_forever()
except KeyboardInterrupt:
print("\nStopped.")
server.server_close()
if __name__ == "__main__":
main()

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,430 @@
# JSON Schemas
This document defines the JSON schemas used by skill-creator.
---
## evals.json
Defines the evals for a skill. Located at `evals/evals.json` within the skill directory.
```json
{
"skill_name": "example-skill",
"evals": [
{
"id": 1,
"prompt": "User's example prompt",
"expected_output": "Description of expected result",
"files": ["evals/files/sample1.pdf"],
"expectations": [
"The output includes X",
"The skill used script Y"
]
}
]
}
```
**Fields:**
- `skill_name`: Name matching the skill's frontmatter
- `evals[].id`: Unique integer identifier
- `evals[].prompt`: The task to execute
- `evals[].expected_output`: Human-readable description of success
- `evals[].files`: Optional list of input file paths (relative to skill root)
- `evals[].expectations`: List of verifiable statements
---
## history.json
Tracks version progression in Improve mode. Located at workspace root.
```json
{
"started_at": "2026-01-15T10:30:00Z",
"skill_name": "pdf",
"current_best": "v2",
"iterations": [
{
"version": "v0",
"parent": null,
"expectation_pass_rate": 0.65,
"grading_result": "baseline",
"is_current_best": false
},
{
"version": "v1",
"parent": "v0",
"expectation_pass_rate": 0.75,
"grading_result": "won",
"is_current_best": false
},
{
"version": "v2",
"parent": "v1",
"expectation_pass_rate": 0.85,
"grading_result": "won",
"is_current_best": true
}
]
}
```
**Fields:**
- `started_at`: ISO timestamp of when improvement started
- `skill_name`: Name of the skill being improved
- `current_best`: Version identifier of the best performer
- `iterations[].version`: Version identifier (v0, v1, ...)
- `iterations[].parent`: Parent version this was derived from
- `iterations[].expectation_pass_rate`: Pass rate from grading
- `iterations[].grading_result`: "baseline", "won", "lost", or "tie"
- `iterations[].is_current_best`: Whether this is the current best version
---
## grading.json
Output from the grader agent. Located at `<run-dir>/grading.json`.
```json
{
"expectations": [
{
"text": "The output includes the name 'John Smith'",
"passed": true,
"evidence": "Found in transcript Step 3: 'Extracted names: John Smith, Sarah Johnson'"
},
{
"text": "The spreadsheet has a SUM formula in cell B10",
"passed": false,
"evidence": "No spreadsheet was created. The output was a text file."
}
],
"summary": {
"passed": 2,
"failed": 1,
"total": 3,
"pass_rate": 0.67
},
"execution_metrics": {
"tool_calls": {
"Read": 5,
"Write": 2,
"Bash": 8
},
"total_tool_calls": 15,
"total_steps": 6,
"errors_encountered": 0,
"output_chars": 12450,
"transcript_chars": 3200
},
"timing": {
"executor_duration_seconds": 165.0,
"grader_duration_seconds": 26.0,
"total_duration_seconds": 191.0
},
"claims": [
{
"claim": "The form has 12 fillable fields",
"type": "factual",
"verified": true,
"evidence": "Counted 12 fields in field_info.json"
}
],
"user_notes_summary": {
"uncertainties": ["Used 2023 data, may be stale"],
"needs_review": [],
"workarounds": ["Fell back to text overlay for non-fillable fields"]
},
"eval_feedback": {
"suggestions": [
{
"assertion": "The output includes the name 'John Smith'",
"reason": "A hallucinated document that mentions the name would also pass"
}
],
"overall": "Assertions check presence but not correctness."
}
}
```
**Fields:**
- `expectations[]`: Graded expectations with evidence
- `summary`: Aggregate pass/fail counts
- `execution_metrics`: Tool usage and output size (from executor's metrics.json)
- `timing`: Wall clock timing (from timing.json)
- `claims`: Extracted and verified claims from the output
- `user_notes_summary`: Issues flagged by the executor
- `eval_feedback`: (optional) Improvement suggestions for the evals, only present when the grader identifies issues worth raising
---
## metrics.json
Output from the executor agent. Located at `<run-dir>/outputs/metrics.json`.
```json
{
"tool_calls": {
"Read": 5,
"Write": 2,
"Bash": 8,
"Edit": 1,
"Glob": 2,
"Grep": 0
},
"total_tool_calls": 18,
"total_steps": 6,
"files_created": ["filled_form.pdf", "field_values.json"],
"errors_encountered": 0,
"output_chars": 12450,
"transcript_chars": 3200
}
```
**Fields:**
- `tool_calls`: Count per tool type
- `total_tool_calls`: Sum of all tool calls
- `total_steps`: Number of major execution steps
- `files_created`: List of output files created
- `errors_encountered`: Number of errors during execution
- `output_chars`: Total character count of output files
- `transcript_chars`: Character count of transcript
---
## timing.json
Wall clock timing for a run. Located at `<run-dir>/timing.json`.
**How to capture:** When a subagent task completes, the task notification includes `total_tokens` and `duration_ms`. Save these immediately — they are not persisted anywhere else and cannot be recovered after the fact.
```json
{
"total_tokens": 84852,
"duration_ms": 23332,
"total_duration_seconds": 23.3,
"executor_start": "2026-01-15T10:30:00Z",
"executor_end": "2026-01-15T10:32:45Z",
"executor_duration_seconds": 165.0,
"grader_start": "2026-01-15T10:32:46Z",
"grader_end": "2026-01-15T10:33:12Z",
"grader_duration_seconds": 26.0
}
```
---
## benchmark.json
Output from Benchmark mode. Located at `benchmarks/<timestamp>/benchmark.json`.
```json
{
"metadata": {
"skill_name": "pdf",
"skill_path": "/path/to/pdf",
"executor_model": "claude-sonnet-4-20250514",
"analyzer_model": "most-capable-model",
"timestamp": "2026-01-15T10:30:00Z",
"evals_run": [1, 2, 3],
"runs_per_configuration": 3
},
"runs": [
{
"eval_id": 1,
"eval_name": "Ocean",
"configuration": "with_skill",
"run_number": 1,
"result": {
"pass_rate": 0.85,
"passed": 6,
"failed": 1,
"total": 7,
"time_seconds": 42.5,
"tokens": 3800,
"tool_calls": 18,
"errors": 0
},
"expectations": [
{"text": "...", "passed": true, "evidence": "..."}
],
"notes": [
"Used 2023 data, may be stale",
"Fell back to text overlay for non-fillable fields"
]
}
],
"run_summary": {
"with_skill": {
"pass_rate": {"mean": 0.85, "stddev": 0.05, "min": 0.80, "max": 0.90},
"time_seconds": {"mean": 45.0, "stddev": 12.0, "min": 32.0, "max": 58.0},
"tokens": {"mean": 3800, "stddev": 400, "min": 3200, "max": 4100}
},
"without_skill": {
"pass_rate": {"mean": 0.35, "stddev": 0.08, "min": 0.28, "max": 0.45},
"time_seconds": {"mean": 32.0, "stddev": 8.0, "min": 24.0, "max": 42.0},
"tokens": {"mean": 2100, "stddev": 300, "min": 1800, "max": 2500}
},
"delta": {
"pass_rate": "+0.50",
"time_seconds": "+13.0",
"tokens": "+1700"
}
},
"notes": [
"Assertion 'Output is a PDF file' passes 100% in both configurations - may not differentiate skill value",
"Eval 3 shows high variance (50% ± 40%) - may be flaky or model-dependent",
"Without-skill runs consistently fail on table extraction expectations",
"Skill adds 13s average execution time but improves pass rate by 50%"
]
}
```
**Fields:**
- `metadata`: Information about the benchmark run
- `skill_name`: Name of the skill
- `timestamp`: When the benchmark was run
- `evals_run`: List of eval names or IDs
- `runs_per_configuration`: Number of runs per config (e.g. 3)
- `runs[]`: Individual run results
- `eval_id`: Numeric eval identifier
- `eval_name`: Human-readable eval name (used as section header in the viewer)
- `configuration`: Must be `"with_skill"` or `"without_skill"` (the viewer uses this exact string for grouping and color coding)
- `run_number`: Integer run number (1, 2, 3...)
- `result`: Nested object with `pass_rate`, `passed`, `total`, `time_seconds`, `tokens`, `errors`
- `run_summary`: Statistical aggregates per configuration
- `with_skill` / `without_skill`: Each contains `pass_rate`, `time_seconds`, `tokens` objects with `mean` and `stddev` fields
- `delta`: Difference strings like `"+0.50"`, `"+13.0"`, `"+1700"`
- `notes`: Freeform observations from the analyzer
**Important:** The viewer reads these field names exactly. Using `config` instead of `configuration`, or putting `pass_rate` at the top level of a run instead of nested under `result`, will cause the viewer to show empty/zero values. Always reference this schema when generating benchmark.json manually.
---
## comparison.json
Output from blind comparator. Located at `<grading-dir>/comparison-N.json`.
```json
{
"winner": "A",
"reasoning": "Output A provides a complete solution with proper formatting and all required fields. Output B is missing the date field and has formatting inconsistencies.",
"rubric": {
"A": {
"content": {
"correctness": 5,
"completeness": 5,
"accuracy": 4
},
"structure": {
"organization": 4,
"formatting": 5,
"usability": 4
},
"content_score": 4.7,
"structure_score": 4.3,
"overall_score": 9.0
},
"B": {
"content": {
"correctness": 3,
"completeness": 2,
"accuracy": 3
},
"structure": {
"organization": 3,
"formatting": 2,
"usability": 3
},
"content_score": 2.7,
"structure_score": 2.7,
"overall_score": 5.4
}
},
"output_quality": {
"A": {
"score": 9,
"strengths": ["Complete solution", "Well-formatted", "All fields present"],
"weaknesses": ["Minor style inconsistency in header"]
},
"B": {
"score": 5,
"strengths": ["Readable output", "Correct basic structure"],
"weaknesses": ["Missing date field", "Formatting inconsistencies", "Partial data extraction"]
}
},
"expectation_results": {
"A": {
"passed": 4,
"total": 5,
"pass_rate": 0.80,
"details": [
{"text": "Output includes name", "passed": true}
]
},
"B": {
"passed": 3,
"total": 5,
"pass_rate": 0.60,
"details": [
{"text": "Output includes name", "passed": true}
]
}
}
}
```
---
## analysis.json
Output from post-hoc analyzer. Located at `<grading-dir>/analysis.json`.
```json
{
"comparison_summary": {
"winner": "A",
"winner_skill": "path/to/winner/skill",
"loser_skill": "path/to/loser/skill",
"comparator_reasoning": "Brief summary of why comparator chose winner"
},
"winner_strengths": [
"Clear step-by-step instructions for handling multi-page documents",
"Included validation script that caught formatting errors"
],
"loser_weaknesses": [
"Vague instruction 'process the document appropriately' led to inconsistent behavior",
"No script for validation, agent had to improvise"
],
"instruction_following": {
"winner": {
"score": 9,
"issues": ["Minor: skipped optional logging step"]
},
"loser": {
"score": 6,
"issues": [
"Did not use the skill's formatting template",
"Invented own approach instead of following step 3"
]
}
},
"improvement_suggestions": [
{
"priority": "high",
"category": "instructions",
"suggestion": "Replace 'process the document appropriately' with explicit steps",
"expected_impact": "Would eliminate ambiguity that caused inconsistent behavior"
}
],
"transcript_insights": {
"winner_execution_pattern": "Read skill -> Followed 5-step process -> Used validation script",
"loser_execution_pattern": "Read skill -> Unclear on approach -> Tried 3 different methods"
}
}
```

View File

@@ -0,0 +1,401 @@
#!/usr/bin/env python3
"""
Aggregate individual run results into benchmark summary statistics.
Reads grading.json files from run directories and produces:
- run_summary with mean, stddev, min, max for each metric
- delta between with_skill and without_skill configurations
Usage:
python aggregate_benchmark.py <benchmark_dir>
Example:
python aggregate_benchmark.py benchmarks/2026-01-15T10-30-00/
The script supports two directory layouts:
Workspace layout (from skill-creator iterations):
<benchmark_dir>/
└── eval-N/
├── with_skill/
│ ├── run-1/grading.json
│ └── run-2/grading.json
└── without_skill/
├── run-1/grading.json
└── run-2/grading.json
Legacy layout (with runs/ subdirectory):
<benchmark_dir>/
└── runs/
└── eval-N/
├── with_skill/
│ └── run-1/grading.json
└── without_skill/
└── run-1/grading.json
"""
import argparse
import json
import math
import sys
from datetime import datetime, timezone
from pathlib import Path
def calculate_stats(values: list[float]) -> dict:
"""Calculate mean, stddev, min, max for a list of values."""
if not values:
return {"mean": 0.0, "stddev": 0.0, "min": 0.0, "max": 0.0}
n = len(values)
mean = sum(values) / n
if n > 1:
variance = sum((x - mean) ** 2 for x in values) / (n - 1)
stddev = math.sqrt(variance)
else:
stddev = 0.0
return {
"mean": round(mean, 4),
"stddev": round(stddev, 4),
"min": round(min(values), 4),
"max": round(max(values), 4)
}
def load_run_results(benchmark_dir: Path) -> dict:
"""
Load all run results from a benchmark directory.
Returns dict keyed by config name (e.g. "with_skill"/"without_skill",
or "new_skill"/"old_skill"), each containing a list of run results.
"""
# Support both layouts: eval dirs directly under benchmark_dir, or under runs/
runs_dir = benchmark_dir / "runs"
if runs_dir.exists():
search_dir = runs_dir
elif list(benchmark_dir.glob("eval-*")):
search_dir = benchmark_dir
else:
print(f"No eval directories found in {benchmark_dir} or {benchmark_dir / 'runs'}")
return {}
results: dict[str, list] = {}
for eval_idx, eval_dir in enumerate(sorted(search_dir.glob("eval-*"))):
metadata_path = eval_dir / "eval_metadata.json"
if metadata_path.exists():
try:
with open(metadata_path) as mf:
eval_id = json.load(mf).get("eval_id", eval_idx)
except (json.JSONDecodeError, OSError):
eval_id = eval_idx
else:
try:
eval_id = int(eval_dir.name.split("-")[1])
except ValueError:
eval_id = eval_idx
# Discover config directories dynamically rather than hardcoding names
for config_dir in sorted(eval_dir.iterdir()):
if not config_dir.is_dir():
continue
# Skip non-config directories (inputs, outputs, etc.)
if not list(config_dir.glob("run-*")):
continue
config = config_dir.name
if config not in results:
results[config] = []
for run_dir in sorted(config_dir.glob("run-*")):
run_number = int(run_dir.name.split("-")[1])
grading_file = run_dir / "grading.json"
if not grading_file.exists():
print(f"Warning: grading.json not found in {run_dir}")
continue
try:
with open(grading_file) as f:
grading = json.load(f)
except json.JSONDecodeError as e:
print(f"Warning: Invalid JSON in {grading_file}: {e}")
continue
# Extract metrics
result = {
"eval_id": eval_id,
"run_number": run_number,
"pass_rate": grading.get("summary", {}).get("pass_rate", 0.0),
"passed": grading.get("summary", {}).get("passed", 0),
"failed": grading.get("summary", {}).get("failed", 0),
"total": grading.get("summary", {}).get("total", 0),
}
# Extract timing — check grading.json first, then sibling timing.json
timing = grading.get("timing", {})
result["time_seconds"] = timing.get("total_duration_seconds", 0.0)
timing_file = run_dir / "timing.json"
if result["time_seconds"] == 0.0 and timing_file.exists():
try:
with open(timing_file) as tf:
timing_data = json.load(tf)
result["time_seconds"] = timing_data.get("total_duration_seconds", 0.0)
result["tokens"] = timing_data.get("total_tokens", 0)
except json.JSONDecodeError:
pass
# Extract metrics if available
metrics = grading.get("execution_metrics", {})
result["tool_calls"] = metrics.get("total_tool_calls", 0)
if not result.get("tokens"):
result["tokens"] = metrics.get("output_chars", 0)
result["errors"] = metrics.get("errors_encountered", 0)
# Extract expectations — viewer requires fields: text, passed, evidence
raw_expectations = grading.get("expectations", [])
for exp in raw_expectations:
if "text" not in exp or "passed" not in exp:
print(f"Warning: expectation in {grading_file} missing required fields (text, passed, evidence): {exp}")
result["expectations"] = raw_expectations
# Extract notes from user_notes_summary
notes_summary = grading.get("user_notes_summary", {})
notes = []
notes.extend(notes_summary.get("uncertainties", []))
notes.extend(notes_summary.get("needs_review", []))
notes.extend(notes_summary.get("workarounds", []))
result["notes"] = notes
results[config].append(result)
return results
def aggregate_results(results: dict) -> dict:
"""
Aggregate run results into summary statistics.
Returns run_summary with stats for each configuration and delta.
"""
run_summary = {}
configs = list(results.keys())
for config in configs:
runs = results.get(config, [])
if not runs:
run_summary[config] = {
"pass_rate": {"mean": 0.0, "stddev": 0.0, "min": 0.0, "max": 0.0},
"time_seconds": {"mean": 0.0, "stddev": 0.0, "min": 0.0, "max": 0.0},
"tokens": {"mean": 0, "stddev": 0, "min": 0, "max": 0}
}
continue
pass_rates = [r["pass_rate"] for r in runs]
times = [r["time_seconds"] for r in runs]
tokens = [r.get("tokens", 0) for r in runs]
run_summary[config] = {
"pass_rate": calculate_stats(pass_rates),
"time_seconds": calculate_stats(times),
"tokens": calculate_stats(tokens)
}
# Calculate delta between the first two configs (if two exist)
if len(configs) >= 2:
primary = run_summary.get(configs[0], {})
baseline = run_summary.get(configs[1], {})
else:
primary = run_summary.get(configs[0], {}) if configs else {}
baseline = {}
delta_pass_rate = primary.get("pass_rate", {}).get("mean", 0) - baseline.get("pass_rate", {}).get("mean", 0)
delta_time = primary.get("time_seconds", {}).get("mean", 0) - baseline.get("time_seconds", {}).get("mean", 0)
delta_tokens = primary.get("tokens", {}).get("mean", 0) - baseline.get("tokens", {}).get("mean", 0)
run_summary["delta"] = {
"pass_rate": f"{delta_pass_rate:+.2f}",
"time_seconds": f"{delta_time:+.1f}",
"tokens": f"{delta_tokens:+.0f}"
}
return run_summary
def generate_benchmark(benchmark_dir: Path, skill_name: str = "", skill_path: str = "") -> dict:
"""
Generate complete benchmark.json from run results.
"""
results = load_run_results(benchmark_dir)
run_summary = aggregate_results(results)
# Build runs array for benchmark.json
runs = []
for config in results:
for result in results[config]:
runs.append({
"eval_id": result["eval_id"],
"configuration": config,
"run_number": result["run_number"],
"result": {
"pass_rate": result["pass_rate"],
"passed": result["passed"],
"failed": result["failed"],
"total": result["total"],
"time_seconds": result["time_seconds"],
"tokens": result.get("tokens", 0),
"tool_calls": result.get("tool_calls", 0),
"errors": result.get("errors", 0)
},
"expectations": result["expectations"],
"notes": result["notes"]
})
# Determine eval IDs from results
eval_ids = sorted(set(
r["eval_id"]
for config in results.values()
for r in config
))
benchmark = {
"metadata": {
"skill_name": skill_name or "<skill-name>",
"skill_path": skill_path or "<path/to/skill>",
"executor_model": "<model-name>",
"analyzer_model": "<model-name>",
"timestamp": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"),
"evals_run": eval_ids,
"runs_per_configuration": 3
},
"runs": runs,
"run_summary": run_summary,
"notes": [] # To be filled by analyzer
}
return benchmark
def generate_markdown(benchmark: dict) -> str:
"""Generate human-readable benchmark.md from benchmark data."""
metadata = benchmark["metadata"]
run_summary = benchmark["run_summary"]
# Determine config names (excluding "delta")
configs = [k for k in run_summary if k != "delta"]
config_a = configs[0] if len(configs) >= 1 else "config_a"
config_b = configs[1] if len(configs) >= 2 else "config_b"
label_a = config_a.replace("_", " ").title()
label_b = config_b.replace("_", " ").title()
lines = [
f"# Skill Benchmark: {metadata['skill_name']}",
"",
f"**Model**: {metadata['executor_model']}",
f"**Date**: {metadata['timestamp']}",
f"**Evals**: {', '.join(map(str, metadata['evals_run']))} ({metadata['runs_per_configuration']} runs each per configuration)",
"",
"## Summary",
"",
f"| Metric | {label_a} | {label_b} | Delta |",
"|--------|------------|---------------|-------|",
]
a_summary = run_summary.get(config_a, {})
b_summary = run_summary.get(config_b, {})
delta = run_summary.get("delta", {})
# Format pass rate
a_pr = a_summary.get("pass_rate", {})
b_pr = b_summary.get("pass_rate", {})
lines.append(f"| Pass Rate | {a_pr.get('mean', 0)*100:.0f}% ± {a_pr.get('stddev', 0)*100:.0f}% | {b_pr.get('mean', 0)*100:.0f}% ± {b_pr.get('stddev', 0)*100:.0f}% | {delta.get('pass_rate', '')} |")
# Format time
a_time = a_summary.get("time_seconds", {})
b_time = b_summary.get("time_seconds", {})
lines.append(f"| Time | {a_time.get('mean', 0):.1f}s ± {a_time.get('stddev', 0):.1f}s | {b_time.get('mean', 0):.1f}s ± {b_time.get('stddev', 0):.1f}s | {delta.get('time_seconds', '')}s |")
# Format tokens
a_tokens = a_summary.get("tokens", {})
b_tokens = b_summary.get("tokens", {})
lines.append(f"| Tokens | {a_tokens.get('mean', 0):.0f} ± {a_tokens.get('stddev', 0):.0f} | {b_tokens.get('mean', 0):.0f} ± {b_tokens.get('stddev', 0):.0f} | {delta.get('tokens', '')} |")
# Notes section
if benchmark.get("notes"):
lines.extend([
"",
"## Notes",
""
])
for note in benchmark["notes"]:
lines.append(f"- {note}")
return "\n".join(lines)
def main():
parser = argparse.ArgumentParser(
description="Aggregate benchmark run results into summary statistics"
)
parser.add_argument(
"benchmark_dir",
type=Path,
help="Path to the benchmark directory"
)
parser.add_argument(
"--skill-name",
default="",
help="Name of the skill being benchmarked"
)
parser.add_argument(
"--skill-path",
default="",
help="Path to the skill being benchmarked"
)
parser.add_argument(
"--output", "-o",
type=Path,
help="Output path for benchmark.json (default: <benchmark_dir>/benchmark.json)"
)
args = parser.parse_args()
if not args.benchmark_dir.exists():
print(f"Directory not found: {args.benchmark_dir}")
sys.exit(1)
# Generate benchmark
benchmark = generate_benchmark(args.benchmark_dir, args.skill_name, args.skill_path)
# Determine output paths
output_json = args.output or (args.benchmark_dir / "benchmark.json")
output_md = output_json.with_suffix(".md")
# Write benchmark.json
with open(output_json, "w") as f:
json.dump(benchmark, f, indent=2)
print(f"Generated: {output_json}")
# Write benchmark.md
markdown = generate_markdown(benchmark)
with open(output_md, "w") as f:
f.write(markdown)
print(f"Generated: {output_md}")
# Print summary
run_summary = benchmark["run_summary"]
configs = [k for k in run_summary if k != "delta"]
delta = run_summary.get("delta", {})
print(f"\nSummary:")
for config in configs:
pr = run_summary[config]["pass_rate"]["mean"]
label = config.replace("_", " ").title()
print(f" {label}: {pr*100:.1f}% pass rate")
print(f" Delta: {delta.get('pass_rate', '')}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,326 @@
#!/usr/bin/env python3
"""Generate an HTML report from run_loop.py output.
Takes the JSON output from run_loop.py and generates a visual HTML report
showing each description attempt with check/x for each test case.
Distinguishes between train and test queries.
"""
import argparse
import html
import json
import sys
from pathlib import Path
def generate_html(data: dict, auto_refresh: bool = False, skill_name: str = "") -> str:
"""Generate HTML report from loop output data. If auto_refresh is True, adds a meta refresh tag."""
history = data.get("history", [])
holdout = data.get("holdout", 0)
title_prefix = html.escape(skill_name + " \u2014 ") if skill_name else ""
# Get all unique queries from train and test sets, with should_trigger info
train_queries: list[dict] = []
test_queries: list[dict] = []
if history:
for r in history[0].get("train_results", history[0].get("results", [])):
train_queries.append({"query": r["query"], "should_trigger": r.get("should_trigger", True)})
if history[0].get("test_results"):
for r in history[0].get("test_results", []):
test_queries.append({"query": r["query"], "should_trigger": r.get("should_trigger", True)})
refresh_tag = ' <meta http-equiv="refresh" content="5">\n' if auto_refresh else ""
html_parts = ["""<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
""" + refresh_tag + """ <title>""" + title_prefix + """Skill Description Optimization</title>
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Poppins:wght@500;600&family=Lora:wght@400;500&display=swap" rel="stylesheet">
<style>
body {
font-family: 'Lora', Georgia, serif;
max-width: 100%;
margin: 0 auto;
padding: 20px;
background: #faf9f5;
color: #141413;
}
h1 { font-family: 'Poppins', sans-serif; color: #141413; }
.explainer {
background: white;
padding: 15px;
border-radius: 6px;
margin-bottom: 20px;
border: 1px solid #e8e6dc;
color: #b0aea5;
font-size: 0.875rem;
line-height: 1.6;
}
.summary {
background: white;
padding: 15px;
border-radius: 6px;
margin-bottom: 20px;
border: 1px solid #e8e6dc;
}
.summary p { margin: 5px 0; }
.best { color: #788c5d; font-weight: bold; }
.table-container {
overflow-x: auto;
width: 100%;
}
table {
border-collapse: collapse;
background: white;
border: 1px solid #e8e6dc;
border-radius: 6px;
font-size: 12px;
min-width: 100%;
}
th, td {
padding: 8px;
text-align: left;
border: 1px solid #e8e6dc;
white-space: normal;
word-wrap: break-word;
}
th {
font-family: 'Poppins', sans-serif;
background: #141413;
color: #faf9f5;
font-weight: 500;
}
th.test-col {
background: #6a9bcc;
}
th.query-col { min-width: 200px; }
td.description {
font-family: monospace;
font-size: 11px;
word-wrap: break-word;
max-width: 400px;
}
td.result {
text-align: center;
font-size: 16px;
min-width: 40px;
}
td.test-result {
background: #f0f6fc;
}
.pass { color: #788c5d; }
.fail { color: #c44; }
.rate {
font-size: 9px;
color: #b0aea5;
display: block;
}
tr:hover { background: #faf9f5; }
.score {
display: inline-block;
padding: 2px 6px;
border-radius: 4px;
font-weight: bold;
font-size: 11px;
}
.score-good { background: #eef2e8; color: #788c5d; }
.score-ok { background: #fef3c7; color: #d97706; }
.score-bad { background: #fceaea; color: #c44; }
.train-label { color: #b0aea5; font-size: 10px; }
.test-label { color: #6a9bcc; font-size: 10px; font-weight: bold; }
.best-row { background: #f5f8f2; }
th.positive-col { border-bottom: 3px solid #788c5d; }
th.negative-col { border-bottom: 3px solid #c44; }
th.test-col.positive-col { border-bottom: 3px solid #788c5d; }
th.test-col.negative-col { border-bottom: 3px solid #c44; }
.legend { font-family: 'Poppins', sans-serif; display: flex; gap: 20px; margin-bottom: 10px; font-size: 13px; align-items: center; }
.legend-item { display: flex; align-items: center; gap: 6px; }
.legend-swatch { width: 16px; height: 16px; border-radius: 3px; display: inline-block; }
.swatch-positive { background: #141413; border-bottom: 3px solid #788c5d; }
.swatch-negative { background: #141413; border-bottom: 3px solid #c44; }
.swatch-test { background: #6a9bcc; }
.swatch-train { background: #141413; }
</style>
</head>
<body>
<h1>""" + title_prefix + """Skill Description Optimization</h1>
<div class="explainer">
<strong>Optimizing your skill's description.</strong> This page updates automatically as Claude tests different versions of your skill's description. Each row is an iteration — a new description attempt. The columns show test queries: green checkmarks mean the skill triggered correctly (or correctly didn't trigger), red crosses mean it got it wrong. The "Train" score shows performance on queries used to improve the description; the "Test" score shows performance on held-out queries the optimizer hasn't seen. When it's done, Claude will apply the best-performing description to your skill.
</div>
"""]
# Summary section
best_test_score = data.get('best_test_score')
best_train_score = data.get('best_train_score')
html_parts.append(f"""
<div class="summary">
<p><strong>Original:</strong> {html.escape(data.get('original_description', 'N/A'))}</p>
<p class="best"><strong>Best:</strong> {html.escape(data.get('best_description', 'N/A'))}</p>
<p><strong>Best Score:</strong> {data.get('best_score', 'N/A')} {'(test)' if best_test_score else '(train)'}</p>
<p><strong>Iterations:</strong> {data.get('iterations_run', 0)} | <strong>Train:</strong> {data.get('train_size', '?')} | <strong>Test:</strong> {data.get('test_size', '?')}</p>
</div>
""")
# Legend
html_parts.append("""
<div class="legend">
<span style="font-weight:600">Query columns:</span>
<span class="legend-item"><span class="legend-swatch swatch-positive"></span> Should trigger</span>
<span class="legend-item"><span class="legend-swatch swatch-negative"></span> Should NOT trigger</span>
<span class="legend-item"><span class="legend-swatch swatch-train"></span> Train</span>
<span class="legend-item"><span class="legend-swatch swatch-test"></span> Test</span>
</div>
""")
# Table header
html_parts.append("""
<div class="table-container">
<table>
<thead>
<tr>
<th>Iter</th>
<th>Train</th>
<th>Test</th>
<th class="query-col">Description</th>
""")
# Add column headers for train queries
for qinfo in train_queries:
polarity = "positive-col" if qinfo["should_trigger"] else "negative-col"
html_parts.append(f' <th class="{polarity}">{html.escape(qinfo["query"])}</th>\n')
# Add column headers for test queries (different color)
for qinfo in test_queries:
polarity = "positive-col" if qinfo["should_trigger"] else "negative-col"
html_parts.append(f' <th class="test-col {polarity}">{html.escape(qinfo["query"])}</th>\n')
html_parts.append(""" </tr>
</thead>
<tbody>
""")
# Find best iteration for highlighting
if test_queries:
best_iter = max(history, key=lambda h: h.get("test_passed") or 0).get("iteration")
else:
best_iter = max(history, key=lambda h: h.get("train_passed", h.get("passed", 0))).get("iteration")
# Add rows for each iteration
for h in history:
iteration = h.get("iteration", "?")
train_passed = h.get("train_passed", h.get("passed", 0))
train_total = h.get("train_total", h.get("total", 0))
test_passed = h.get("test_passed")
test_total = h.get("test_total")
description = h.get("description", "")
train_results = h.get("train_results", h.get("results", []))
test_results = h.get("test_results", [])
# Create lookups for results by query
train_by_query = {r["query"]: r for r in train_results}
test_by_query = {r["query"]: r for r in test_results} if test_results else {}
# Compute aggregate correct/total runs across all retries
def aggregate_runs(results: list[dict]) -> tuple[int, int]:
correct = 0
total = 0
for r in results:
runs = r.get("runs", 0)
triggers = r.get("triggers", 0)
total += runs
if r.get("should_trigger", True):
correct += triggers
else:
correct += runs - triggers
return correct, total
train_correct, train_runs = aggregate_runs(train_results)
test_correct, test_runs = aggregate_runs(test_results)
# Determine score classes
def score_class(correct: int, total: int) -> str:
if total > 0:
ratio = correct / total
if ratio >= 0.8:
return "score-good"
elif ratio >= 0.5:
return "score-ok"
return "score-bad"
train_class = score_class(train_correct, train_runs)
test_class = score_class(test_correct, test_runs)
row_class = "best-row" if iteration == best_iter else ""
html_parts.append(f""" <tr class="{row_class}">
<td>{iteration}</td>
<td><span class="score {train_class}">{train_correct}/{train_runs}</span></td>
<td><span class="score {test_class}">{test_correct}/{test_runs}</span></td>
<td class="description">{html.escape(description)}</td>
""")
# Add result for each train query
for qinfo in train_queries:
r = train_by_query.get(qinfo["query"], {})
did_pass = r.get("pass", False)
triggers = r.get("triggers", 0)
runs = r.get("runs", 0)
icon = "" if did_pass else ""
css_class = "pass" if did_pass else "fail"
html_parts.append(f' <td class="result {css_class}">{icon}<span class="rate">{triggers}/{runs}</span></td>\n')
# Add result for each test query (with different background)
for qinfo in test_queries:
r = test_by_query.get(qinfo["query"], {})
did_pass = r.get("pass", False)
triggers = r.get("triggers", 0)
runs = r.get("runs", 0)
icon = "" if did_pass else ""
css_class = "pass" if did_pass else "fail"
html_parts.append(f' <td class="result test-result {css_class}">{icon}<span class="rate">{triggers}/{runs}</span></td>\n')
html_parts.append(" </tr>\n")
html_parts.append(""" </tbody>
</table>
</div>
""")
html_parts.append("""
</body>
</html>
""")
return "".join(html_parts)
def main():
parser = argparse.ArgumentParser(description="Generate HTML report from run_loop output")
parser.add_argument("input", help="Path to JSON output from run_loop.py (or - for stdin)")
parser.add_argument("-o", "--output", default=None, help="Output HTML file (default: stdout)")
parser.add_argument("--skill-name", default="", help="Skill name to include in the report title")
args = parser.parse_args()
if args.input == "-":
data = json.load(sys.stdin)
else:
data = json.loads(Path(args.input).read_text())
html_output = generate_html(data, skill_name=args.skill_name)
if args.output:
Path(args.output).write_text(html_output)
print(f"Report written to {args.output}", file=sys.stderr)
else:
print(html_output)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,248 @@
#!/usr/bin/env python3
"""Improve a skill description based on eval results.
Takes eval results (from run_eval.py) and generates an improved description
using Claude with extended thinking.
"""
import argparse
import json
import re
import sys
from pathlib import Path
import anthropic
from scripts.utils import parse_skill_md
def improve_description(
client: anthropic.Anthropic,
skill_name: str,
skill_content: str,
current_description: str,
eval_results: dict,
history: list[dict],
model: str,
test_results: dict | None = None,
log_dir: Path | None = None,
iteration: int | None = None,
) -> str:
"""Call Claude to improve the description based on eval results."""
failed_triggers = [
r for r in eval_results["results"]
if r["should_trigger"] and not r["pass"]
]
false_triggers = [
r for r in eval_results["results"]
if not r["should_trigger"] and not r["pass"]
]
# Build scores summary
train_score = f"{eval_results['summary']['passed']}/{eval_results['summary']['total']}"
if test_results:
test_score = f"{test_results['summary']['passed']}/{test_results['summary']['total']}"
scores_summary = f"Train: {train_score}, Test: {test_score}"
else:
scores_summary = f"Train: {train_score}"
prompt = f"""You are optimizing a skill description for a Claude Code skill called "{skill_name}". A "skill" is sort of like a prompt, but with progressive disclosure -- there's a title and description that Claude sees when deciding whether to use the skill, and then if it does use the skill, it reads the .md file which has lots more details and potentially links to other resources in the skill folder like helper files and scripts and additional documentation or examples.
The description appears in Claude's "available_skills" list. When a user sends a query, Claude decides whether to invoke the skill based solely on the title and on this description. Your goal is to write a description that triggers for relevant queries, and doesn't trigger for irrelevant ones.
Here's the current description:
<current_description>
"{current_description}"
</current_description>
Current scores ({scores_summary}):
<scores_summary>
"""
if failed_triggers:
prompt += "FAILED TO TRIGGER (should have triggered but didn't):\n"
for r in failed_triggers:
prompt += f' - "{r["query"]}" (triggered {r["triggers"]}/{r["runs"]} times)\n'
prompt += "\n"
if false_triggers:
prompt += "FALSE TRIGGERS (triggered but shouldn't have):\n"
for r in false_triggers:
prompt += f' - "{r["query"]}" (triggered {r["triggers"]}/{r["runs"]} times)\n'
prompt += "\n"
if history:
prompt += "PREVIOUS ATTEMPTS (do NOT repeat these — try something structurally different):\n\n"
for h in history:
train_s = f"{h.get('train_passed', h.get('passed', 0))}/{h.get('train_total', h.get('total', 0))}"
test_s = f"{h.get('test_passed', '?')}/{h.get('test_total', '?')}" if h.get('test_passed') is not None else None
score_str = f"train={train_s}" + (f", test={test_s}" if test_s else "")
prompt += f'<attempt {score_str}>\n'
prompt += f'Description: "{h["description"]}"\n'
if "results" in h:
prompt += "Train results:\n"
for r in h["results"]:
status = "PASS" if r["pass"] else "FAIL"
prompt += f' [{status}] "{r["query"][:80]}" (triggered {r["triggers"]}/{r["runs"]})\n'
if h.get("note"):
prompt += f'Note: {h["note"]}\n'
prompt += "</attempt>\n\n"
prompt += f"""</scores_summary>
Skill content (for context on what the skill does):
<skill_content>
{skill_content}
</skill_content>
Based on the failures, write a new and improved description that is more likely to trigger correctly. When I say "based on the failures", it's a bit of a tricky line to walk because we don't want to overfit to the specific cases you're seeing. So what I DON'T want you to do is produce an ever-expanding list of specific queries that this skill should or shouldn't trigger for. Instead, try to generalize from the failures to broader categories of user intent and situations where this skill would be useful or not useful. The reason for this is twofold:
1. Avoid overfitting
2. The list might get loooong and it's injected into ALL queries and there might be a lot of skills, so we don't want to blow too much space on any given description.
Concretely, your description should not be more than about 100-200 words, even if that comes at the cost of accuracy.
Here are some tips that we've found to work well in writing these descriptions:
- The skill should be phrased in the imperative -- "Use this skill for" rather than "this skill does"
- The skill description should focus on the user's intent, what they are trying to achieve, vs. the implementation details of how the skill works.
- The description competes with other skills for Claude's attention — make it distinctive and immediately recognizable.
- If you're getting lots of failures after repeated attempts, change things up. Try different sentence structures or wordings.
I'd encourage you to be creative and mix up the style in different iterations since you'll have multiple opportunities to try different approaches and we'll just grab the highest-scoring one at the end.
Please respond with only the new description text in <new_description> tags, nothing else."""
response = client.messages.create(
model=model,
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000,
},
messages=[{"role": "user", "content": prompt}],
)
# Extract thinking and text from response
thinking_text = ""
text = ""
for block in response.content:
if block.type == "thinking":
thinking_text = block.thinking
elif block.type == "text":
text = block.text
# Parse out the <new_description> tags
match = re.search(r"<new_description>(.*?)</new_description>", text, re.DOTALL)
description = match.group(1).strip().strip('"') if match else text.strip().strip('"')
# Log the transcript
transcript: dict = {
"iteration": iteration,
"prompt": prompt,
"thinking": thinking_text,
"response": text,
"parsed_description": description,
"char_count": len(description),
"over_limit": len(description) > 1024,
}
# If over 1024 chars, ask the model to shorten it
if len(description) > 1024:
shorten_prompt = f"Your description is {len(description)} characters, which exceeds the hard 1024 character limit. Please rewrite it to be under 1024 characters while preserving the most important trigger words and intent coverage. Respond with only the new description in <new_description> tags."
shorten_response = client.messages.create(
model=model,
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000,
},
messages=[
{"role": "user", "content": prompt},
{"role": "assistant", "content": text},
{"role": "user", "content": shorten_prompt},
],
)
shorten_thinking = ""
shorten_text = ""
for block in shorten_response.content:
if block.type == "thinking":
shorten_thinking = block.thinking
elif block.type == "text":
shorten_text = block.text
match = re.search(r"<new_description>(.*?)</new_description>", shorten_text, re.DOTALL)
shortened = match.group(1).strip().strip('"') if match else shorten_text.strip().strip('"')
transcript["rewrite_prompt"] = shorten_prompt
transcript["rewrite_thinking"] = shorten_thinking
transcript["rewrite_response"] = shorten_text
transcript["rewrite_description"] = shortened
transcript["rewrite_char_count"] = len(shortened)
description = shortened
transcript["final_description"] = description
if log_dir:
log_dir.mkdir(parents=True, exist_ok=True)
log_file = log_dir / f"improve_iter_{iteration or 'unknown'}.json"
log_file.write_text(json.dumps(transcript, indent=2))
return description
def main():
parser = argparse.ArgumentParser(description="Improve a skill description based on eval results")
parser.add_argument("--eval-results", required=True, help="Path to eval results JSON (from run_eval.py)")
parser.add_argument("--skill-path", required=True, help="Path to skill directory")
parser.add_argument("--history", default=None, help="Path to history JSON (previous attempts)")
parser.add_argument("--model", required=True, help="Model for improvement")
parser.add_argument("--verbose", action="store_true", help="Print thinking to stderr")
args = parser.parse_args()
skill_path = Path(args.skill_path)
if not (skill_path / "SKILL.md").exists():
print(f"Error: No SKILL.md found at {skill_path}", file=sys.stderr)
sys.exit(1)
eval_results = json.loads(Path(args.eval_results).read_text())
history = []
if args.history:
history = json.loads(Path(args.history).read_text())
name, _, content = parse_skill_md(skill_path)
current_description = eval_results["description"]
if args.verbose:
print(f"Current: {current_description}", file=sys.stderr)
print(f"Score: {eval_results['summary']['passed']}/{eval_results['summary']['total']}", file=sys.stderr)
client = anthropic.Anthropic()
new_description = improve_description(
client=client,
skill_name=name,
skill_content=content,
current_description=current_description,
eval_results=eval_results,
history=history,
model=args.model,
)
if args.verbose:
print(f"Improved: {new_description}", file=sys.stderr)
# Output as JSON with both the new description and updated history
output = {
"description": new_description,
"history": history + [{
"description": current_description,
"passed": eval_results["summary"]["passed"],
"failed": eval_results["summary"]["failed"],
"total": eval_results["summary"]["total"],
"results": eval_results["results"],
}],
}
print(json.dumps(output, indent=2))
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,136 @@
#!/usr/bin/env python3
"""
Skill Packager - Creates a distributable .skill file of a skill folder
Usage:
python utils/package_skill.py <path/to/skill-folder> [output-directory]
Example:
python utils/package_skill.py skills/public/my-skill
python utils/package_skill.py skills/public/my-skill ./dist
"""
import fnmatch
import sys
import zipfile
from pathlib import Path
from scripts.quick_validate import validate_skill
# Patterns to exclude when packaging skills.
EXCLUDE_DIRS = {"__pycache__", "node_modules"}
EXCLUDE_GLOBS = {"*.pyc"}
EXCLUDE_FILES = {".DS_Store"}
# Directories excluded only at the skill root (not when nested deeper).
ROOT_EXCLUDE_DIRS = {"evals"}
def should_exclude(rel_path: Path) -> bool:
"""Check if a path should be excluded from packaging."""
parts = rel_path.parts
if any(part in EXCLUDE_DIRS for part in parts):
return True
# rel_path is relative to skill_path.parent, so parts[0] is the skill
# folder name and parts[1] (if present) is the first subdir.
if len(parts) > 1 and parts[1] in ROOT_EXCLUDE_DIRS:
return True
name = rel_path.name
if name in EXCLUDE_FILES:
return True
return any(fnmatch.fnmatch(name, pat) for pat in EXCLUDE_GLOBS)
def package_skill(skill_path, output_dir=None):
"""
Package a skill folder into a .skill file.
Args:
skill_path: Path to the skill folder
output_dir: Optional output directory for the .skill file (defaults to current directory)
Returns:
Path to the created .skill file, or None if error
"""
skill_path = Path(skill_path).resolve()
# Validate skill folder exists
if not skill_path.exists():
print(f"❌ Error: Skill folder not found: {skill_path}")
return None
if not skill_path.is_dir():
print(f"❌ Error: Path is not a directory: {skill_path}")
return None
# Validate SKILL.md exists
skill_md = skill_path / "SKILL.md"
if not skill_md.exists():
print(f"❌ Error: SKILL.md not found in {skill_path}")
return None
# Run validation before packaging
print("🔍 Validating skill...")
valid, message = validate_skill(skill_path)
if not valid:
print(f"❌ Validation failed: {message}")
print(" Please fix the validation errors before packaging.")
return None
print(f"{message}\n")
# Determine output location
skill_name = skill_path.name
if output_dir:
output_path = Path(output_dir).resolve()
output_path.mkdir(parents=True, exist_ok=True)
else:
output_path = Path.cwd()
skill_filename = output_path / f"{skill_name}.skill"
# Create the .skill file (zip format)
try:
with zipfile.ZipFile(skill_filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
# Walk through the skill directory, excluding build artifacts
for file_path in skill_path.rglob('*'):
if not file_path.is_file():
continue
arcname = file_path.relative_to(skill_path.parent)
if should_exclude(arcname):
print(f" Skipped: {arcname}")
continue
zipf.write(file_path, arcname)
print(f" Added: {arcname}")
print(f"\n✅ Successfully packaged skill to: {skill_filename}")
return skill_filename
except Exception as e:
print(f"❌ Error creating .skill file: {e}")
return None
def main():
if len(sys.argv) < 2:
print("Usage: python utils/package_skill.py <path/to/skill-folder> [output-directory]")
print("\nExample:")
print(" python utils/package_skill.py skills/public/my-skill")
print(" python utils/package_skill.py skills/public/my-skill ./dist")
sys.exit(1)
skill_path = sys.argv[1]
output_dir = sys.argv[2] if len(sys.argv) > 2 else None
print(f"📦 Packaging skill: {skill_path}")
if output_dir:
print(f" Output directory: {output_dir}")
print()
result = package_skill(skill_path, output_dir)
if result:
sys.exit(0)
else:
sys.exit(1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,103 @@
#!/usr/bin/env python3
"""
Quick validation script for skills - minimal version
"""
import sys
import os
import re
import yaml
from pathlib import Path
def validate_skill(skill_path):
"""Basic validation of a skill"""
skill_path = Path(skill_path)
# Check SKILL.md exists
skill_md = skill_path / 'SKILL.md'
if not skill_md.exists():
return False, "SKILL.md not found"
# Read and validate frontmatter
content = skill_md.read_text()
if not content.startswith('---'):
return False, "No YAML frontmatter found"
# Extract frontmatter
match = re.match(r'^---\n(.*?)\n---', content, re.DOTALL)
if not match:
return False, "Invalid frontmatter format"
frontmatter_text = match.group(1)
# Parse YAML frontmatter
try:
frontmatter = yaml.safe_load(frontmatter_text)
if not isinstance(frontmatter, dict):
return False, "Frontmatter must be a YAML dictionary"
except yaml.YAMLError as e:
return False, f"Invalid YAML in frontmatter: {e}"
# Define allowed properties
ALLOWED_PROPERTIES = {'name', 'description', 'license', 'allowed-tools', 'metadata', 'compatibility'}
# Check for unexpected properties (excluding nested keys under metadata)
unexpected_keys = set(frontmatter.keys()) - ALLOWED_PROPERTIES
if unexpected_keys:
return False, (
f"Unexpected key(s) in SKILL.md frontmatter: {', '.join(sorted(unexpected_keys))}. "
f"Allowed properties are: {', '.join(sorted(ALLOWED_PROPERTIES))}"
)
# Check required fields
if 'name' not in frontmatter:
return False, "Missing 'name' in frontmatter"
if 'description' not in frontmatter:
return False, "Missing 'description' in frontmatter"
# Extract name for validation
name = frontmatter.get('name', '')
if not isinstance(name, str):
return False, f"Name must be a string, got {type(name).__name__}"
name = name.strip()
if name:
# Check naming convention (kebab-case: lowercase with hyphens)
if not re.match(r'^[a-z0-9-]+$', name):
return False, f"Name '{name}' should be kebab-case (lowercase letters, digits, and hyphens only)"
if name.startswith('-') or name.endswith('-') or '--' in name:
return False, f"Name '{name}' cannot start/end with hyphen or contain consecutive hyphens"
# Check name length (max 64 characters per spec)
if len(name) > 64:
return False, f"Name is too long ({len(name)} characters). Maximum is 64 characters."
# Extract and validate description
description = frontmatter.get('description', '')
if not isinstance(description, str):
return False, f"Description must be a string, got {type(description).__name__}"
description = description.strip()
if description:
# Check for angle brackets
if '<' in description or '>' in description:
return False, "Description cannot contain angle brackets (< or >)"
# Check description length (max 1024 characters per spec)
if len(description) > 1024:
return False, f"Description is too long ({len(description)} characters). Maximum is 1024 characters."
# Validate compatibility field if present (optional)
compatibility = frontmatter.get('compatibility', '')
if compatibility:
if not isinstance(compatibility, str):
return False, f"Compatibility must be a string, got {type(compatibility).__name__}"
if len(compatibility) > 500:
return False, f"Compatibility is too long ({len(compatibility)} characters). Maximum is 500 characters."
return True, "Skill is valid!"
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python quick_validate.py <skill_directory>")
sys.exit(1)
valid, message = validate_skill(sys.argv[1])
print(message)
sys.exit(0 if valid else 1)

View File

@@ -0,0 +1,310 @@
#!/usr/bin/env python3
"""Run trigger evaluation for a skill description.
Tests whether a skill's description causes Claude to trigger (read the skill)
for a set of queries. Outputs results as JSON.
"""
import argparse
import json
import os
import select
import subprocess
import sys
import time
import uuid
from concurrent.futures import ProcessPoolExecutor, as_completed
from pathlib import Path
from scripts.utils import parse_skill_md
def find_project_root() -> Path:
"""Find the project root by walking up from cwd looking for .claude/.
Mimics how Claude Code discovers its project root, so the command file
we create ends up where claude -p will look for it.
"""
current = Path.cwd()
for parent in [current, *current.parents]:
if (parent / ".claude").is_dir():
return parent
return current
def run_single_query(
query: str,
skill_name: str,
skill_description: str,
timeout: int,
project_root: str,
model: str | None = None,
) -> bool:
"""Run a single query and return whether the skill was triggered.
Creates a command file in .claude/commands/ so it appears in Claude's
available_skills list, then runs `claude -p` with the raw query.
Uses --include-partial-messages to detect triggering early from
stream events (content_block_start) rather than waiting for the
full assistant message, which only arrives after tool execution.
"""
unique_id = uuid.uuid4().hex[:8]
clean_name = f"{skill_name}-skill-{unique_id}"
project_commands_dir = Path(project_root) / ".claude" / "commands"
command_file = project_commands_dir / f"{clean_name}.md"
try:
project_commands_dir.mkdir(parents=True, exist_ok=True)
# Use YAML block scalar to avoid breaking on quotes in description
indented_desc = "\n ".join(skill_description.split("\n"))
command_content = (
f"---\n"
f"description: |\n"
f" {indented_desc}\n"
f"---\n\n"
f"# {skill_name}\n\n"
f"This skill handles: {skill_description}\n"
)
command_file.write_text(command_content)
cmd = [
"claude",
"-p", query,
"--output-format", "stream-json",
"--verbose",
"--include-partial-messages",
]
if model:
cmd.extend(["--model", model])
# Remove CLAUDECODE env var to allow nesting claude -p inside a
# Claude Code session. The guard is for interactive terminal conflicts;
# programmatic subprocess usage is safe.
env = {k: v for k, v in os.environ.items() if k != "CLAUDECODE"}
process = subprocess.Popen(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.DEVNULL,
cwd=project_root,
env=env,
)
triggered = False
start_time = time.time()
buffer = ""
# Track state for stream event detection
pending_tool_name = None
accumulated_json = ""
try:
while time.time() - start_time < timeout:
if process.poll() is not None:
remaining = process.stdout.read()
if remaining:
buffer += remaining.decode("utf-8", errors="replace")
break
ready, _, _ = select.select([process.stdout], [], [], 1.0)
if not ready:
continue
chunk = os.read(process.stdout.fileno(), 8192)
if not chunk:
break
buffer += chunk.decode("utf-8", errors="replace")
while "\n" in buffer:
line, buffer = buffer.split("\n", 1)
line = line.strip()
if not line:
continue
try:
event = json.loads(line)
except json.JSONDecodeError:
continue
# Early detection via stream events
if event.get("type") == "stream_event":
se = event.get("event", {})
se_type = se.get("type", "")
if se_type == "content_block_start":
cb = se.get("content_block", {})
if cb.get("type") == "tool_use":
tool_name = cb.get("name", "")
if tool_name in ("Skill", "Read"):
pending_tool_name = tool_name
accumulated_json = ""
else:
return False
elif se_type == "content_block_delta" and pending_tool_name:
delta = se.get("delta", {})
if delta.get("type") == "input_json_delta":
accumulated_json += delta.get("partial_json", "")
if clean_name in accumulated_json:
return True
elif se_type in ("content_block_stop", "message_stop"):
if pending_tool_name:
return clean_name in accumulated_json
if se_type == "message_stop":
return False
# Fallback: full assistant message
elif event.get("type") == "assistant":
message = event.get("message", {})
for content_item in message.get("content", []):
if content_item.get("type") != "tool_use":
continue
tool_name = content_item.get("name", "")
tool_input = content_item.get("input", {})
if tool_name == "Skill" and clean_name in tool_input.get("skill", ""):
triggered = True
elif tool_name == "Read" and clean_name in tool_input.get("file_path", ""):
triggered = True
return triggered
elif event.get("type") == "result":
return triggered
finally:
# Clean up process on any exit path (return, exception, timeout)
if process.poll() is None:
process.kill()
process.wait()
return triggered
finally:
if command_file.exists():
command_file.unlink()
def run_eval(
eval_set: list[dict],
skill_name: str,
description: str,
num_workers: int,
timeout: int,
project_root: Path,
runs_per_query: int = 1,
trigger_threshold: float = 0.5,
model: str | None = None,
) -> dict:
"""Run the full eval set and return results."""
results = []
with ProcessPoolExecutor(max_workers=num_workers) as executor:
future_to_info = {}
for item in eval_set:
for run_idx in range(runs_per_query):
future = executor.submit(
run_single_query,
item["query"],
skill_name,
description,
timeout,
str(project_root),
model,
)
future_to_info[future] = (item, run_idx)
query_triggers: dict[str, list[bool]] = {}
query_items: dict[str, dict] = {}
for future in as_completed(future_to_info):
item, _ = future_to_info[future]
query = item["query"]
query_items[query] = item
if query not in query_triggers:
query_triggers[query] = []
try:
query_triggers[query].append(future.result())
except Exception as e:
print(f"Warning: query failed: {e}", file=sys.stderr)
query_triggers[query].append(False)
for query, triggers in query_triggers.items():
item = query_items[query]
trigger_rate = sum(triggers) / len(triggers)
should_trigger = item["should_trigger"]
if should_trigger:
did_pass = trigger_rate >= trigger_threshold
else:
did_pass = trigger_rate < trigger_threshold
results.append({
"query": query,
"should_trigger": should_trigger,
"trigger_rate": trigger_rate,
"triggers": sum(triggers),
"runs": len(triggers),
"pass": did_pass,
})
passed = sum(1 for r in results if r["pass"])
total = len(results)
return {
"skill_name": skill_name,
"description": description,
"results": results,
"summary": {
"total": total,
"passed": passed,
"failed": total - passed,
},
}
def main():
parser = argparse.ArgumentParser(description="Run trigger evaluation for a skill description")
parser.add_argument("--eval-set", required=True, help="Path to eval set JSON file")
parser.add_argument("--skill-path", required=True, help="Path to skill directory")
parser.add_argument("--description", default=None, help="Override description to test")
parser.add_argument("--num-workers", type=int, default=10, help="Number of parallel workers")
parser.add_argument("--timeout", type=int, default=30, help="Timeout per query in seconds")
parser.add_argument("--runs-per-query", type=int, default=3, help="Number of runs per query")
parser.add_argument("--trigger-threshold", type=float, default=0.5, help="Trigger rate threshold")
parser.add_argument("--model", default=None, help="Model to use for claude -p (default: user's configured model)")
parser.add_argument("--verbose", action="store_true", help="Print progress to stderr")
args = parser.parse_args()
eval_set = json.loads(Path(args.eval_set).read_text())
skill_path = Path(args.skill_path)
if not (skill_path / "SKILL.md").exists():
print(f"Error: No SKILL.md found at {skill_path}", file=sys.stderr)
sys.exit(1)
name, original_description, content = parse_skill_md(skill_path)
description = args.description or original_description
project_root = find_project_root()
if args.verbose:
print(f"Evaluating: {description}", file=sys.stderr)
output = run_eval(
eval_set=eval_set,
skill_name=name,
description=description,
num_workers=args.num_workers,
timeout=args.timeout,
project_root=project_root,
runs_per_query=args.runs_per_query,
trigger_threshold=args.trigger_threshold,
model=args.model,
)
if args.verbose:
summary = output["summary"]
print(f"Results: {summary['passed']}/{summary['total']} passed", file=sys.stderr)
for r in output["results"]:
status = "PASS" if r["pass"] else "FAIL"
rate_str = f"{r['triggers']}/{r['runs']}"
print(f" [{status}] rate={rate_str} expected={r['should_trigger']}: {r['query'][:70]}", file=sys.stderr)
print(json.dumps(output, indent=2))
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,332 @@
#!/usr/bin/env python3
"""Run the eval + improve loop until all pass or max iterations reached.
Combines run_eval.py and improve_description.py in a loop, tracking history
and returning the best description found. Supports train/test split to prevent
overfitting.
"""
import argparse
import json
import random
import sys
import tempfile
import time
import webbrowser
from pathlib import Path
import anthropic
from scripts.generate_report import generate_html
from scripts.improve_description import improve_description
from scripts.run_eval import find_project_root, run_eval
from scripts.utils import parse_skill_md
def split_eval_set(eval_set: list[dict], holdout: float, seed: int = 42) -> tuple[list[dict], list[dict]]:
"""Split eval set into train and test sets, stratified by should_trigger."""
random.seed(seed)
# Separate by should_trigger
trigger = [e for e in eval_set if e["should_trigger"]]
no_trigger = [e for e in eval_set if not e["should_trigger"]]
# Shuffle each group
random.shuffle(trigger)
random.shuffle(no_trigger)
# Calculate split points
n_trigger_test = max(1, int(len(trigger) * holdout))
n_no_trigger_test = max(1, int(len(no_trigger) * holdout))
# Split
test_set = trigger[:n_trigger_test] + no_trigger[:n_no_trigger_test]
train_set = trigger[n_trigger_test:] + no_trigger[n_no_trigger_test:]
return train_set, test_set
def run_loop(
eval_set: list[dict],
skill_path: Path,
description_override: str | None,
num_workers: int,
timeout: int,
max_iterations: int,
runs_per_query: int,
trigger_threshold: float,
holdout: float,
model: str,
verbose: bool,
live_report_path: Path | None = None,
log_dir: Path | None = None,
) -> dict:
"""Run the eval + improvement loop."""
project_root = find_project_root()
name, original_description, content = parse_skill_md(skill_path)
current_description = description_override or original_description
# Split into train/test if holdout > 0
if holdout > 0:
train_set, test_set = split_eval_set(eval_set, holdout)
if verbose:
print(f"Split: {len(train_set)} train, {len(test_set)} test (holdout={holdout})", file=sys.stderr)
else:
train_set = eval_set
test_set = []
client = anthropic.Anthropic()
history = []
exit_reason = "unknown"
for iteration in range(1, max_iterations + 1):
if verbose:
print(f"\n{'='*60}", file=sys.stderr)
print(f"Iteration {iteration}/{max_iterations}", file=sys.stderr)
print(f"Description: {current_description}", file=sys.stderr)
print(f"{'='*60}", file=sys.stderr)
# Evaluate train + test together in one batch for parallelism
all_queries = train_set + test_set
t0 = time.time()
all_results = run_eval(
eval_set=all_queries,
skill_name=name,
description=current_description,
num_workers=num_workers,
timeout=timeout,
project_root=project_root,
runs_per_query=runs_per_query,
trigger_threshold=trigger_threshold,
model=model,
)
eval_elapsed = time.time() - t0
# Split results back into train/test by matching queries
train_queries_set = {q["query"] for q in train_set}
train_result_list = [r for r in all_results["results"] if r["query"] in train_queries_set]
test_result_list = [r for r in all_results["results"] if r["query"] not in train_queries_set]
train_passed = sum(1 for r in train_result_list if r["pass"])
train_total = len(train_result_list)
train_summary = {"passed": train_passed, "failed": train_total - train_passed, "total": train_total}
train_results = {"results": train_result_list, "summary": train_summary}
if test_set:
test_passed = sum(1 for r in test_result_list if r["pass"])
test_total = len(test_result_list)
test_summary = {"passed": test_passed, "failed": test_total - test_passed, "total": test_total}
test_results = {"results": test_result_list, "summary": test_summary}
else:
test_results = None
test_summary = None
history.append({
"iteration": iteration,
"description": current_description,
"train_passed": train_summary["passed"],
"train_failed": train_summary["failed"],
"train_total": train_summary["total"],
"train_results": train_results["results"],
"test_passed": test_summary["passed"] if test_summary else None,
"test_failed": test_summary["failed"] if test_summary else None,
"test_total": test_summary["total"] if test_summary else None,
"test_results": test_results["results"] if test_results else None,
# For backward compat with report generator
"passed": train_summary["passed"],
"failed": train_summary["failed"],
"total": train_summary["total"],
"results": train_results["results"],
})
# Write live report if path provided
if live_report_path:
partial_output = {
"original_description": original_description,
"best_description": current_description,
"best_score": "in progress",
"iterations_run": len(history),
"holdout": holdout,
"train_size": len(train_set),
"test_size": len(test_set),
"history": history,
}
live_report_path.write_text(generate_html(partial_output, auto_refresh=True, skill_name=name))
if verbose:
def print_eval_stats(label, results, elapsed):
pos = [r for r in results if r["should_trigger"]]
neg = [r for r in results if not r["should_trigger"]]
tp = sum(r["triggers"] for r in pos)
pos_runs = sum(r["runs"] for r in pos)
fn = pos_runs - tp
fp = sum(r["triggers"] for r in neg)
neg_runs = sum(r["runs"] for r in neg)
tn = neg_runs - fp
total = tp + tn + fp + fn
precision = tp / (tp + fp) if (tp + fp) > 0 else 1.0
recall = tp / (tp + fn) if (tp + fn) > 0 else 1.0
accuracy = (tp + tn) / total if total > 0 else 0.0
print(f"{label}: {tp+tn}/{total} correct, precision={precision:.0%} recall={recall:.0%} accuracy={accuracy:.0%} ({elapsed:.1f}s)", file=sys.stderr)
for r in results:
status = "PASS" if r["pass"] else "FAIL"
rate_str = f"{r['triggers']}/{r['runs']}"
print(f" [{status}] rate={rate_str} expected={r['should_trigger']}: {r['query'][:60]}", file=sys.stderr)
print_eval_stats("Train", train_results["results"], eval_elapsed)
if test_summary:
print_eval_stats("Test ", test_results["results"], 0)
if train_summary["failed"] == 0:
exit_reason = f"all_passed (iteration {iteration})"
if verbose:
print(f"\nAll train queries passed on iteration {iteration}!", file=sys.stderr)
break
if iteration == max_iterations:
exit_reason = f"max_iterations ({max_iterations})"
if verbose:
print(f"\nMax iterations reached ({max_iterations}).", file=sys.stderr)
break
# Improve the description based on train results
if verbose:
print(f"\nImproving description...", file=sys.stderr)
t0 = time.time()
# Strip test scores from history so improvement model can't see them
blinded_history = [
{k: v for k, v in h.items() if not k.startswith("test_")}
for h in history
]
new_description = improve_description(
client=client,
skill_name=name,
skill_content=content,
current_description=current_description,
eval_results=train_results,
history=blinded_history,
model=model,
log_dir=log_dir,
iteration=iteration,
)
improve_elapsed = time.time() - t0
if verbose:
print(f"Proposed ({improve_elapsed:.1f}s): {new_description}", file=sys.stderr)
current_description = new_description
# Find the best iteration by TEST score (or train if no test set)
if test_set:
best = max(history, key=lambda h: h["test_passed"] or 0)
best_score = f"{best['test_passed']}/{best['test_total']}"
else:
best = max(history, key=lambda h: h["train_passed"])
best_score = f"{best['train_passed']}/{best['train_total']}"
if verbose:
print(f"\nExit reason: {exit_reason}", file=sys.stderr)
print(f"Best score: {best_score} (iteration {best['iteration']})", file=sys.stderr)
return {
"exit_reason": exit_reason,
"original_description": original_description,
"best_description": best["description"],
"best_score": best_score,
"best_train_score": f"{best['train_passed']}/{best['train_total']}",
"best_test_score": f"{best['test_passed']}/{best['test_total']}" if test_set else None,
"final_description": current_description,
"iterations_run": len(history),
"holdout": holdout,
"train_size": len(train_set),
"test_size": len(test_set),
"history": history,
}
def main():
parser = argparse.ArgumentParser(description="Run eval + improve loop")
parser.add_argument("--eval-set", required=True, help="Path to eval set JSON file")
parser.add_argument("--skill-path", required=True, help="Path to skill directory")
parser.add_argument("--description", default=None, help="Override starting description")
parser.add_argument("--num-workers", type=int, default=10, help="Number of parallel workers")
parser.add_argument("--timeout", type=int, default=30, help="Timeout per query in seconds")
parser.add_argument("--max-iterations", type=int, default=5, help="Max improvement iterations")
parser.add_argument("--runs-per-query", type=int, default=3, help="Number of runs per query")
parser.add_argument("--trigger-threshold", type=float, default=0.5, help="Trigger rate threshold")
parser.add_argument("--holdout", type=float, default=0.4, help="Fraction of eval set to hold out for testing (0 to disable)")
parser.add_argument("--model", required=True, help="Model for improvement")
parser.add_argument("--verbose", action="store_true", help="Print progress to stderr")
parser.add_argument("--report", default="auto", help="Generate HTML report at this path (default: 'auto' for temp file, 'none' to disable)")
parser.add_argument("--results-dir", default=None, help="Save all outputs (results.json, report.html, log.txt) to a timestamped subdirectory here")
args = parser.parse_args()
eval_set = json.loads(Path(args.eval_set).read_text())
skill_path = Path(args.skill_path)
if not (skill_path / "SKILL.md").exists():
print(f"Error: No SKILL.md found at {skill_path}", file=sys.stderr)
sys.exit(1)
name, _, _ = parse_skill_md(skill_path)
# Set up live report path
if args.report != "none":
if args.report == "auto":
timestamp = time.strftime("%Y%m%d_%H%M%S")
live_report_path = Path(tempfile.gettempdir()) / f"skill_description_report_{skill_path.name}_{timestamp}.html"
else:
live_report_path = Path(args.report)
# Open the report immediately so the user can watch
live_report_path.write_text("<html><body><h1>Starting optimization loop...</h1><meta http-equiv='refresh' content='5'></body></html>")
webbrowser.open(str(live_report_path))
else:
live_report_path = None
# Determine output directory (create before run_loop so logs can be written)
if args.results_dir:
timestamp = time.strftime("%Y-%m-%d_%H%M%S")
results_dir = Path(args.results_dir) / timestamp
results_dir.mkdir(parents=True, exist_ok=True)
else:
results_dir = None
log_dir = results_dir / "logs" if results_dir else None
output = run_loop(
eval_set=eval_set,
skill_path=skill_path,
description_override=args.description,
num_workers=args.num_workers,
timeout=args.timeout,
max_iterations=args.max_iterations,
runs_per_query=args.runs_per_query,
trigger_threshold=args.trigger_threshold,
holdout=args.holdout,
model=args.model,
verbose=args.verbose,
live_report_path=live_report_path,
log_dir=log_dir,
)
# Save JSON output
json_output = json.dumps(output, indent=2)
print(json_output)
if results_dir:
(results_dir / "results.json").write_text(json_output)
# Write final HTML report (without auto-refresh)
if live_report_path:
live_report_path.write_text(generate_html(output, auto_refresh=False, skill_name=name))
print(f"\nReport: {live_report_path}", file=sys.stderr)
if results_dir and live_report_path:
(results_dir / "report.html").write_text(generate_html(output, auto_refresh=False, skill_name=name))
if results_dir:
print(f"Results saved to: {results_dir}", file=sys.stderr)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,47 @@
"""Shared utilities for skill-creator scripts."""
from pathlib import Path
def parse_skill_md(skill_path: Path) -> tuple[str, str, str]:
"""Parse a SKILL.md file, returning (name, description, full_content)."""
content = (skill_path / "SKILL.md").read_text()
lines = content.split("\n")
if lines[0].strip() != "---":
raise ValueError("SKILL.md missing frontmatter (no opening ---)")
end_idx = None
for i, line in enumerate(lines[1:], start=1):
if line.strip() == "---":
end_idx = i
break
if end_idx is None:
raise ValueError("SKILL.md missing frontmatter (no closing ---)")
name = ""
description = ""
frontmatter_lines = lines[1:end_idx]
i = 0
while i < len(frontmatter_lines):
line = frontmatter_lines[i]
if line.startswith("name:"):
name = line[len("name:"):].strip().strip('"').strip("'")
elif line.startswith("description:"):
value = line[len("description:"):].strip()
# Handle YAML multiline indicators (>, |, >-, |-)
if value in (">", "|", ">-", "|-"):
continuation_lines: list[str] = []
i += 1
while i < len(frontmatter_lines) and (frontmatter_lines[i].startswith(" ") or frontmatter_lines[i].startswith("\t")):
continuation_lines.append(frontmatter_lines[i].strip())
i += 1
description = " ".join(continuation_lines)
continue
else:
description = value.strip('"').strip("'")
i += 1
return name, description, content