Files
CmiiDeploy/85-20260617-江西环境整理/2-节点驱逐-完整解决方案.md
2026-07-01 16:30:30 +08:00

40 KiB
Raw Blame History

节点驱逐 - 完整解决方案

一、环境概览

集群节点信息

节点 IP 角色 操作系统 状态 处理方式
10.20.1.130 controlplane, etcd, worker openEuler 20.03 (LTS-SP3) 保留 master 节点不动
10.20.1.133 worker openEuler 20.03 (LTS-SP3) 保留 中间件调度目标
10.20.1.134 worker openEuler 20.03 (LTS-SP3) 保留 中间件调度目标 (mysql)
10.20.1.141 worker BigCloud Enterprise Linux For Euler 保留 中间件调度目标
10.20.1.142 worker BigCloud Enterprise Linux For Euler 清退 一个月后清退
10.20.1.144 worker BigCloud Enterprise Linux For Euler 清退 一个月后清退
10.20.1.145 worker BigCloud Enterprise Linux For Euler 清退 一个月后清退

需求拆解

  1. 清退节点10.20.1.142、10.20.1.144、10.20.1.145(一个月后执行)
  2. jxyd 命名空间 Deployment:生命周期与被清退节点一致 → 即直接删除
  3. jxyd 命名空间中间件:需保留,迁移调度到 10.20.1.133、10.20.1.134、10.20.1.141
  4. 资源调整:超标 Deployment 统一降配 + 同步调整 JVM 参数

二、操作流程总览

┌─────────────────────────────────────────────────────────────────┐
│  Phase 1: 信息采集与备份(立即执行)                            │
│  - 导出所有 deployment/statefulset 信息                        │
│  - 记录当前 Pod 分布                                           │
│  - 备份所有 YAML                                               │
├─────────────────────────────────────────────────────────────────┤
│  Phase 2: 资源调整(立即执行)                                  │
│  - 扫描 jxyd 业务 deployment 的资源配置(不扫描中间件)         │
│  - 将超标资源统一降配                                          │
│  - 同步修改 CUST_JAVA_OPTS 环境变量                            │
├─────────────────────────────────────────────────────────────────┤
│  Phase 3: 中间件及业务迁移(立即执行)                          │
│  - 给保留节点及待清退节点打 label                              │
│  - 调度中间件至保留节点,调度业务 Deployment 至待清退节点       │
│  - 验证迁移及调度在对应节点上正常运行                           │
├─────────────────────────────────────────────────────────────────┤
│  Phase 4: 节点清退(一个月后执行)                              │
│  - 删除 jxyd 命名空间中的业务 deployment                       │
│  - cordon + drain 节点                                         │
│  - 从集群中移除节点                                            │
└─────────────────────────────────────────────────────────────────┘

三、Phase 1信息采集与备份

1.1 备份脚本 phase1_backup.sh

#!/bin/bash
# ============================================================
# Phase 1: 信息采集与备份
# 在 master 节点 (10.20.1.130) 上执行
# ============================================================

BACKUP_DIR="/root/wdd/backup_260617/$(date +%Y%m%d_%H%M%S)"
NAMESPACE="jxyd"
mkdir -p "${BACKUP_DIR}"

echo "=========================================="
echo "  Phase 1: 信息采集与备份"
echo "  备份目录: ${BACKUP_DIR}"
echo "=========================================="

# 1. 导出节点信息
echo "[1/6] 导出节点信息..."
kubectl get nodes -o wide --show-labels > "${BACKUP_DIR}/nodes_info.txt"

# 2. 导出 jxyd 命名空间所有资源
echo "[2/6] 导出 jxyd 命名空间所有资源 YAML..."
kubectl get all -n ${NAMESPACE} -o yaml > "${BACKUP_DIR}/all_resources.yaml"

# 3. 单独导出每个 deployment
echo "[3/6] 导出每个 Deployment YAML..."
mkdir -p "${BACKUP_DIR}/deployments"
for dep in $(kubectl get deployments -n ${NAMESPACE} -o jsonpath='{.items[*].metadata.name}'); do
    kubectl get deployment "${dep}" -n ${NAMESPACE} -o yaml > "${BACKUP_DIR}/deployments/${dep}.yaml"
    echo "  已备份 deployment: ${dep}"
done

# 4. 单独导出每个 statefulset中间件通常用 statefulset
echo "[4/6] 导出每个 StatefulSet YAML..."
mkdir -p "${BACKUP_DIR}/statefulsets"
for sts in $(kubectl get statefulsets -n ${NAMESPACE} -o jsonpath='{.items[*].metadata.name}' 2>/dev/null); do
    kubectl get statefulset "${sts}" -n ${NAMESPACE} -o yaml > "${BACKUP_DIR}/statefulsets/${sts}.yaml"
    echo "  已备份 statefulset: ${sts}"
done

# 5. 导出 ConfigMap 和 Secret
echo "[5/6] 导出 ConfigMap 和 Secret..."
mkdir -p "${BACKUP_DIR}/configmaps" "${BACKUP_DIR}/secrets"
for cm in $(kubectl get configmaps -n ${NAMESPACE} -o jsonpath='{.items[*].metadata.name}'); do
    kubectl get configmap "${cm}" -n ${NAMESPACE} -o yaml > "${BACKUP_DIR}/configmaps/${cm}.yaml"
done
for sec in $(kubectl get secrets -n ${NAMESPACE} -o jsonpath='{.items[*].metadata.name}'); do
    kubectl get secret "${sec}" -n ${NAMESPACE} -o yaml > "${BACKUP_DIR}/secrets/${sec}.yaml"
done

# 6. 导出 Pod 分布信息(关键:记录哪些 Pod 在待清退节点上)
echo "[6/6] 导出 Pod 分布信息..."
echo "=== 所有 Pod 的节点分布 ===" > "${BACKUP_DIR}/pod_distribution.txt"
kubectl get pods -n ${NAMESPACE} -o wide >> "${BACKUP_DIR}/pod_distribution.txt"

echo ""
echo "--- 待清退节点上的 Pod ---" >> "${BACKUP_DIR}/pod_distribution.txt"
for NODE in 10.20.1.142 10.20.1.144 10.20.1.145; do
    echo "" >> "${BACKUP_DIR}/pod_distribution.txt"
    echo "=== 节点 ${NODE} 上的 Pod ===" >> "${BACKUP_DIR}/pod_distribution.txt"
    kubectl get pods -n ${NAMESPACE} -o wide --field-selector spec.nodeName=${NODE} >> "${BACKUP_DIR}/pod_distribution.txt"
done

# 7. 导出 Service 和 Ingress
mkdir -p "${BACKUP_DIR}/services" "${BACKUP_DIR}/ingresses"
for svc in $(kubectl get svc -n ${NAMESPACE} -o jsonpath='{.items[*].metadata.name}'); do
    kubectl get svc "${svc}" -n ${NAMESPACE} -o yaml > "${BACKUP_DIR}/services/${svc}.yaml"
done
for ing in $(kubectl get ingress -n ${NAMESPACE} -o jsonpath='{.items[*].metadata.name}' 2>/dev/null); do
    kubectl get ingress "${ing}" -n ${NAMESPACE} -o yaml > "${BACKUP_DIR}/ingresses/${ing}.yaml"
done

# 8. 生成资源使用汇总
echo ""
echo "=== 资源使用汇总 ===" | tee "${BACKUP_DIR}/resource_summary.txt"
kubectl top nodes 2>/dev/null | tee -a "${BACKUP_DIR}/resource_summary.txt"
echo "" | tee -a "${BACKUP_DIR}/resource_summary.txt"
kubectl top pods -n ${NAMESPACE} 2>/dev/null | tee -a "${BACKUP_DIR}/resource_summary.txt"

echo ""
echo "=========================================="
echo "  备份完成!目录: ${BACKUP_DIR}"
echo "=========================================="

四、Phase 2资源检查与调整

2.1 Python 脚本 phase2_resource_adjust.py

此脚本会:

  • 扫描 jxyd 命名空间下所有 Deployment中间件不存在资源限制且只在 StatefulSet因此阶段二不考虑中间件
  • 检测哪些容器的资源超标(超过 limits cpu:2 / memory:2Gi
  • 对超标的 Deployment 自动执行 kubectl patch 降配
  • 同步修改 CUST_JAVA_OPTS 环境变量
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Phase 2: 资源检查与调整
功能:
  1. 扫描 jxyd 命名空间下所有 Deployment 的资源配置
  2. 识别超标资源(超过 limits cpu:2 / memory:2Gi
  3. 自动 patch 降配 + 修改 CUST_JAVA_OPTS
  4. 生成变更报告

使用方式:
  python3 phase2_resource_adjust.py --dry-run     # 仅预览,不执行变更
  python3 phase2_resource_adjust.py --apply        # 执行变更
"""

import json
import subprocess
import sys
import re
import argparse
from datetime import datetime


# ============================================================
# 配置区
# ============================================================
NAMESPACE = "jxyd"

# 目标资源规格
TARGET_RESOURCES = {
    "limits": {"cpu": "2", "memory": "2Gi"},
    "requests": {"cpu": "1", "memory": "500Mi"}
}

# 目标 JVM 参数
TARGET_JAVA_OPTS = "-Xms500m -Xmx2000m -Dlog4j2.formatMsgNoLookups=true"

# ============================================================
# 工具函数
# ============================================================

def run_kubectl(args, capture=True):
    """执行 kubectl 命令"""
    cmd = ["kubectl"] + args
    result = subprocess.run(cmd, capture_output=capture, text=True)
    if result.returncode != 0:
        print(f"  [错误] 命令失败: {' '.join(cmd)}")
        print(f"  stderr: {result.stderr}")
        return None
    return result.stdout


def parse_memory(mem_str):
    """将内存字符串转换为字节数"""
    if not mem_str:
        return 0
    mem_str = str(mem_str)
    units = {
        'Ki': 1024, 'Mi': 1024**2, 'Gi': 1024**3, 'Ti': 1024**4,
        'K': 1000, 'M': 1000**2, 'G': 1000**3, 'T': 1000**4,
        'k': 1000, 'm': 0.001  # millibytes (unusual but valid)
    }
    for suffix, multiplier in sorted(units.items(), key=lambda x: -len(x[0])):
        if mem_str.endswith(suffix):
            try:
                return float(mem_str[:-len(suffix)]) * multiplier
            except ValueError:
                return 0
    try:
        return float(mem_str)  # 纯数字,单位为字节
    except ValueError:
        return 0


def parse_cpu(cpu_str):
    """将 CPU 字符串转换为核心数float"""
    if not cpu_str:
        return 0.0
    cpu_str = str(cpu_str)
    if cpu_str.endswith('m'):
        return float(cpu_str[:-1]) / 1000.0
    return float(cpu_str)


def is_resource_over_limit(resources):
    """检查资源是否超标"""
    limits = resources.get("limits", {})
    cpu_limit = limits.get("cpu", "0")
    mem_limit = limits.get("memory", "0")
    
    cpu_over = parse_cpu(cpu_limit) > parse_cpu(TARGET_RESOURCES["limits"]["cpu"])
    mem_over = parse_memory(mem_limit) > parse_memory(TARGET_RESOURCES["limits"]["memory"])
    
    return cpu_over or mem_over


# ============================================================
# 主逻辑
# ============================================================

def get_deployments():
    """获取所有 Deployment 信息"""
    output = run_kubectl([
        "get", "deployments", "-n", NAMESPACE,
        "-o", "json"
    ])
    if not output:
        return []
    data = json.loads(output)
    return data.get("items", [])


def analyze_and_patch(workloads, kind, dry_run=True):
    """分析并 patch 超标资源,若副本数为 0 则直接删除"""
    report = {
        "business": [],
        "patched": [],
        "skipped": [],
        "errors": [],
        "deleted": []
    }
    
    for wl in workloads:
        name = wl["metadata"]["name"]
        replicas = wl["spec"].get("replicas", 1)
        
        if replicas == 0:
            print(f"\n  [删除] {kind}/{name} (副本数为 0)")
            if dry_run:
                print("    [DRY-RUN] 跳过删除")
                report["skipped"].append({"name": name, "reason": "replicas=0"})
            else:
                result = run_kubectl([
                    "delete", kind.lower(), name,
                    "-n", NAMESPACE
                ])
                if result is not None:
                    print("    [成功] 已删除")
                    report["deleted"].append(name)
                else:
                    print("    [失败] 删除失败")
                    report["errors"].append({"name": name, "reason": "delete failed"})
            continue
            
        spec = wl["spec"]["template"]["spec"]
        containers = spec.get("containers", [])
        
        category = "business"
        
        for idx, container in enumerate(containers):
            container_name = container.get("name", f"container-{idx}")
            resources = container.get("resources", {})
            limits = resources.get("limits", {})
            requests = resources.get("requests", {})
            
            # 提取当前 JAVA_OPTS
            current_java_opts = ""
            env_list = container.get("env", [])
            for env in env_list:
                if env.get("name") == "CUST_JAVA_OPTS":
                    current_java_opts = env.get("value", "")
                    break
            
            item = {
                "name": name,
                "kind": kind,
                "container": container_name,
                "container_index": idx,
                "category": category,
                "current_limits_cpu": limits.get("cpu", "未设置"),
                "current_limits_memory": limits.get("memory", "未设置"),
                "current_requests_cpu": requests.get("cpu", "未设置"),
                "current_requests_memory": requests.get("memory", "未设置"),
                "current_java_opts": current_java_opts,
                "over_limit": is_resource_over_limit(resources),
            }
            
            report[category].append(item)
            
            # 只调整超标的
            if item["over_limit"]:
                print(f"\n  [超标] {kind}/{name} (容器: {container_name})")
                print(f"    当前: limits={{cpu:{limits.get('cpu', 'N/A')}, memory:{limits.get('memory', 'N/A')}}}")
                print(f"    目标: limits={{cpu:{TARGET_RESOURCES['limits']['cpu']}, memory:{TARGET_RESOURCES['limits']['memory']}}}")
                
                if dry_run:
                    print(f"    [DRY-RUN] 跳过 patch")
                    report["skipped"].append(item)
                else:
                    # 构造 patch JSON
                    patch = {
                        "spec": {
                            "template": {
                                "spec": {
                                    "containers": []
                                }
                            }
                        }
                    }
                    
                    # 构建容器 patch需要按索引
                    container_patch = {
                        "name": container_name,
                        "resources": TARGET_RESOURCES
                    }
                    
                    # 同步修改 CUST_JAVA_OPTS如果存在该 env
                    if current_java_opts:
                        # 使用 kubectl set env 更简洁,这里用 strategic merge patch
                        new_env = []
                        for env in env_list:
                            if env.get("name") == "CUST_JAVA_OPTS":
                                new_env.append({
                                    "name": "CUST_JAVA_OPTS",
                                    "value": TARGET_JAVA_OPTS
                                })
                            else:
                                new_env.append(env)
                        container_patch["env"] = new_env
                    
                    patch["spec"]["template"]["spec"]["containers"] = [container_patch]
                    
                    patch_json = json.dumps(patch)
                    
                    result = run_kubectl([
                        "patch", kind.lower(), name,
                        "-n", NAMESPACE,
                        "--type", "strategic",
                        "-p", patch_json
                    ])
                    
                    if result is not None:
                        print(f"    [成功] 已 patch {kind}/{name}")
                        report["patched"].append(item)
                    else:
                        print(f"    [失败] patch {kind}/{name} 失败")
                        report["errors"].append(item)
            else:
                # 未超标但可能也需要同步 JAVA_OPTS
                if current_java_opts and current_java_opts != TARGET_JAVA_OPTS:
                    print(f"\n  [JAVA_OPTS 不一致] {kind}/{name} (容器: {container_name})")
                    print(f"    当前: {current_java_opts}")
                    print(f"    目标: {TARGET_JAVA_OPTS}")
                    
                    if not dry_run:
                        result = run_kubectl([
                            "set", "env",
                            f"{kind.lower()}/{name}",
                            "-n", NAMESPACE,
                            f"CUST_JAVA_OPTS={TARGET_JAVA_OPTS}",
                            "-c", container_name
                        ])
                        if result is not None:
                            print(f"    [成功] 已更新 JAVA_OPTS")
                        else:
                            print(f"    [失败] 更新 JAVA_OPTS 失败")
    
    return report


def print_summary(report, kind):
    """打印汇总报告"""
    print(f"\n{'='*60}")
    print(f"  {kind} 资源扫描报告")
    print(f"{'='*60}")
    
    print(f"\n  🚀 业务 Deployment ({len(report['business'])} 个容器):")
    for item in report["business"]:
        over = "⚠️ 超标" if item["over_limit"] else "✅ 正常"
        print(f"    - {item['name']}/{item['container']}: "
              f"limits(cpu={item['current_limits_cpu']}, mem={item['current_limits_memory']}) "
              f"requests(cpu={item['current_requests_cpu']}, mem={item['current_requests_memory']}) "
              f"{over}")
    
    if report.get("deleted"):
        print(f"\n  🗑️ 已删除 (副本数为0): {len(report['deleted'])} 个")
    if report["patched"]:
        print(f"\n  🔧 已 Patch: {len(report['patched'])} 个")
    if report["skipped"]:
        print(f"  ⏭️  跳过 (DRY-RUN): {len(report['skipped'])} 个")
    if report["errors"]:
        print(f"  ❌ 失败: {len(report['errors'])} 个")


def main():
    parser = argparse.ArgumentParser(description="Phase 2: jxyd 命名空间资源检查与调整")
    group = parser.add_mutually_exclusive_group(required=True)
    group.add_argument("--dry-run", action="store_true", help="仅预览,不执行变更")
    group.add_argument("--apply", action="store_true", help="执行变更")
    args = parser.parse_args()
    
    dry_run = args.dry_run
    
    print(f"\n{'#'*60}")
    print(f"  Phase 2: 资源检查与调整")
    print(f"  命名空间: {NAMESPACE}")
    print(f"  模式: {'DRY-RUN仅预览' if dry_run else '⚡ APPLY执行变更'}")
    print(f"  时间: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    print(f"{'#'*60}")
    
    if not dry_run:
        confirm = input("\n  ⚠️  确认要执行资源调整吗?(yes/no): ")
        if confirm.lower() != "yes":
            print("  已取消。")
            return
    
    # 扫描 Deployment (业务)
    print(f"\n{'='*60}")
    print("  扫描 Deployment...")
    print(f"{'='*60}")
    deployments = get_deployments()
    dep_report = analyze_and_patch(deployments, "Deployment", dry_run)
    print_summary(dep_report, "Deployment")
    
    # 生成变更日志
    log_file = f"/root/wdd/backup_260617/phase2_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
    log_data = {
        "timestamp": datetime.now().isoformat(),
        "mode": "dry-run" if dry_run else "apply",
        "deployment_report": {
            "business_count": len(dep_report["business"]),
            "patched_count": len(dep_report["patched"]),
            "error_count": len(dep_report["errors"]),
        }
    }
    
    try:
        with open(log_file, 'w') as f:
            json.dump(log_data, f, indent=2, ensure_ascii=False)
        print(f"\n  📄 报告已保存到: {log_file}")
    except Exception as e:
        print(f"\n  [警告] 无法保存报告文件: {e}")
    
    print(f"\n{'#'*60}")
    print(f"  Phase 2 完成!")
    print(f"{'#'*60}")


if __name__ == "__main__":
    main()

五、Phase 3中间件迁移

3.1 给保留节点打 Label

#!/bin/bash
# ============================================================
# Phase 3.1: 给保留节点打 Label
# ============================================================

echo "=== 给保留节点打 label: jxyd-middleware=true ==="

# 保留节点
kubectl label node 10.20.1.133 jxyd-middleware=true --overwrite
kubectl label node 10.20.1.134 jxyd-middleware=true --overwrite
kubectl label node 10.20.1.141 jxyd-middleware=true --overwrite

echo ""
echo ""
echo "=== 给待清退节点打 label: jxyd-business=true ==="
# 待清退节点(将业务部署到这些节点)
kubectl label node 10.20.1.142 jxyd-business=true --overwrite
kubectl label node 10.20.1.144 jxyd-business=true --overwrite
kubectl label node 10.20.1.145 jxyd-business=true --overwrite

echo ""
echo "=== 验证 label ==="
kubectl get nodes -l jxyd-middleware=true -o wide
kubectl get nodes -l jxyd-business=true -o wide

3.2 调度与迁移脚本 phase3_migrate_workloads.py

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Phase 3: 工作负载迁移与调度
功能:
  1. 识别 jxyd 命名空间下的中间件 StatefulSet 与 业务 Deployment
  2. 为中间件添加 nodeSelector调度到保留节点过滤包含 hostPath 的中间件)
  3. 为业务 Deployment 添加 nodeSelector调度到待清退节点
  4. 验证调度和迁移结果

使用方式:
  python3 phase3_migrate_workloads.py --list             # 列出迁移计划
  python3 phase3_migrate_workloads.py --dry-run           # 预览变更
  python3 phase3_migrate_workloads.py --apply             # 执行迁移
  python3 phase3_migrate_workloads.py --verify            # 验证迁移结果
"""

import json
import subprocess
import sys
import argparse
from datetime import datetime


NAMESPACE = "jxyd"

# 目标节点 label
TARGET_NODE_SELECTOR = {"jxyd-middleware": "true"}
BUSINESS_NODE_SELECTOR = {"jxyd-business": "true"}

# 中间件关键词
MIDDLEWARE_KEYWORDS = [
    "mysql", "redis", "rabbitmq", "kafka", "zookeeper", "elasticsearch",
    "nacos", "minio", "mongo", "postgres", "nginx",
    "sentinel", "rocketmq", "emqx", "mqtt", "influxdb", "grafana",
    "prometheus", "xxl-job", "seata"
]

# 待清退节点
EVICT_NODES = ["10.20.1.142", "10.20.1.144", "10.20.1.145"]


def run_kubectl(args):
    cmd = ["kubectl"] + args
    result = subprocess.run(cmd, capture_output=True, text=True)
    if result.returncode != 0:
        print(f"  [错误] {' '.join(cmd)}: {result.stderr.strip()}")
        return None
    return result.stdout


def is_middleware(name):
    name_lower = name.lower()
    return any(kw in name_lower for kw in MIDDLEWARE_KEYWORDS)


def get_workloads(kind):
    output = run_kubectl(["get", kind, "-n", NAMESPACE, "-o", "json"])
    if not output:
        return []
    return json.loads(output).get("items", [])


def list_workloads():
    """列出工作负载迁移与调度计划"""
    print(f"\n{'='*60}")
    print(f"  jxyd 命名空间 - 迁移与调度计划")
    print(f"{'='*60}")
    
    # StatefulSet (中间件)
    sts_workloads = get_workloads("statefulsets")
    mw_list = [w for w in sts_workloads if is_middleware(w["metadata"]["name"])]
    print(f"\n  📦 中间件 StatefulSet (调度到保留节点,{len(mw_list)} 个):")
    for w in mw_list:
        name = w["metadata"]["name"]
        replicas = w["spec"].get("replicas", 1)
        node_selector = w["spec"]["template"]["spec"].get("nodeSelector", {})
        volumes = w["spec"]["template"]["spec"].get("volumes", [])
        has_host_path = any("hostPath" in v for v in volumes)
        host_path_flag = "[含 hostPath不可迁移]" if has_host_path else ""
        print(f"    ✅ {name}  (replicas={replicas}, nodeSelector={node_selector}) {host_path_flag}")

    # Deployment (业务)
    dep_workloads = get_workloads("deployments")
    print(f"\n  🚀 业务 Deployment (调度到待清退节点,{len(dep_workloads)} 个):")
    for w in dep_workloads:
        name = w["metadata"]["name"]
        replicas = w["spec"].get("replicas", 1)
        node_selector = w["spec"]["template"]["spec"].get("nodeSelector", {})
        if replicas == 0:
            print(f"    - {name}  (replicas=0, 忽略调度)")
        else:
            print(f"    ✅ {name}  (replicas={replicas}, nodeSelector={node_selector})")
    
    print(f"\n  💡 提示: 若状态不准确可人工审核配置。")


def migrate_workloads(dry_run=True):
    """执行工作负载的迁移与调度"""
    print(f"\n{'='*60}")
    print(f"  工作负载调度 - {'DRY-RUN' if dry_run else 'APPLY'}")
    print(f"{'='*60}")
    
    patched = 0
    errors = 0
    
    def apply_patch(kind, w, target_ns, is_middleware_check=False):
        nonlocal patched, errors
        name = w["metadata"]["name"]
        replicas = w["spec"].get("replicas", 1)
        current_ns = w["spec"]["template"]["spec"].get("nodeSelector", {})
        
        if replicas == 0:
            return  # 忽略副本数为 0 的
            
        if is_middleware_check:
            volumes = w["spec"]["template"]["spec"].get("volumes", [])
            has_host_path = any("hostPath" in v for v in volumes)
            if has_host_path:
                print(f"  [跳过] {kind}/{name} 包含 hostPath 挂载,不能进行迁移")
                return
                
        # 检查是否已包含所有的目标 selector
        if all(current_ns.get(k) == v for k, v in target_ns.items()):
            print(f"  [跳过] {kind}/{name} 已满足 nodeSelector 目标")
            return
            
        print(f"  [调度] {kind}/{name}")
        print(f"    当前 nodeSelector: {current_ns}")
        print(f"    目标 nodeSelector: {target_ns}")
        
        if dry_run:
            print(f"    [DRY-RUN] 跳过")
            return
            
        merged_selector = {**current_ns, **target_ns}
        patch = {"spec": {"template": {"spec": {"nodeSelector": merged_selector}}}}
        result = run_kubectl([
            "patch", kind, name, "-n", NAMESPACE, "--type", "strategic", "-p", json.dumps(patch)
        ])
        
        if result is not None:
            print(f"    [成功] 已 patch")
            patched += 1
        else:
            print(f"    [失败]")
            errors += 1

    # 处理中间件
    sts_workloads = get_workloads("statefulsets")
    for w in [w for w in sts_workloads if is_middleware(w["metadata"]["name"])]:
        apply_patch("statefulset", w, TARGET_NODE_SELECTOR, True)
        
    # 处理业务
    dep_workloads = get_workloads("deployments")
    for w in dep_workloads:
        apply_patch("deployment", w, BUSINESS_NODE_SELECTOR, False)
        
    print(f"\n  汇总: 已调度 {patched} 个, 失败 {errors} 个")


def verify_migration():
    """验证迁移及调度结果"""
    print(f"\n{'='*60}")
    print(f"  调度结果验证")
    print(f"{'='*60}")
    
    issues = []
    
    def check_workloads(kind, workloads, target_label_key, target_nodes_list, is_middleware_check=False):
        for w in workloads:
            name = w["metadata"]["name"]
            replicas = w["spec"].get("replicas", 1)
            if replicas == 0:
                continue
                
            ns = w["spec"]["template"]["spec"].get("nodeSelector", {})
            if is_middleware_check:
                volumes = w["spec"]["template"]["spec"].get("volumes", [])
                if any("hostPath" in v for v in volumes):
                    continue
                    
            if ns.get(target_label_key) != "true":
                issues.append(f"  ⚠️  {kind}/{name} 未设置 {target_label_key} nodeSelector")
                
            pods_output = run_kubectl(["get", "pods", "-n", NAMESPACE, "-l", f"app={name}", "-o", "json"])
            if not pods_output or json.loads(pods_output).get("items", []) == []:
                pods_output = run_kubectl(["get", "pods", "-n", NAMESPACE, "-o", "json"])
                if pods_output:
                    all_pods = [p for p in json.loads(pods_output).get("items", []) if p["metadata"]["name"].startswith(name)]
                else:
                    all_pods = []
            else:
                all_pods = json.loads(pods_output).get("items", [])
                
            for pod in all_pods:
                pod_name = pod["metadata"]["name"]
                node = pod["spec"].get("nodeName", "unknown")
                phase = pod["status"].get("phase", "Unknown")
                
                # Check target nodes
                if target_nodes_list is not None and node not in target_nodes_list:
                    issues.append(f"  ⚠️  Pod {pod_name} 调度异常,当前节点 {node},应在目标节点集合 {target_nodes_list} 中")
                # Exclude nodes
                elif target_nodes_list is None and node in EVICT_NODES:
                    issues.append(f"  ⚠️  Pod {pod_name} 仍在待清退节点 {node} 上")
                else:
                    print(f"  ✅ {pod_name} -> 节点 {node} (状态: {phase})")

    # 1. 中间件需在保留节点上(非待清退节点)
    sts_workloads = get_workloads("statefulsets")
    mw_list = [w for w in sts_workloads if is_middleware(w["metadata"]["name"])]
    check_workloads("statefulset", mw_list, "jxyd-middleware", None, True)
    
    # 2. 业务需在待清退节点上
    dep_workloads = get_workloads("deployments")
    check_workloads("deployment", dep_workloads, "jxyd-business", EVICT_NODES, False)

    if issues:
        print(f"\n  ⚠️  发现 {len(issues)} 个问题:")
        for issue in issues:
            print(issue)
    else:
        print(f"\n  ✅ 所有的工作负载已成功调度到对应的目标节点!")


def main():
    parser = argparse.ArgumentParser(description="Phase 3: 工作负载迁移与调度")
    group = parser.add_mutually_exclusive_group(required=True)
    group.add_argument("--list", action="store_true", help="列出迁移计划")
    group.add_argument("--dry-run", action="store_true", help="预览变更")
    group.add_argument("--apply", action="store_true", help="执行迁移")
    group.add_argument("--verify", action="store_true", help="验证迁移结果")
    args = parser.parse_args()
    
    if args.list:
        list_workloads()
    elif args.dry_run:
        migrate_workloads(dry_run=True)
    elif args.apply:
        confirm = input("  ⚠️  确认要执行迁移和调度吗?(yes/no): ")
        if confirm.lower() == "yes":
            migrate_workloads(dry_run=False)
        else:
            print("  已取消。")
    elif args.verify:
        verify_migration()


if __name__ == "__main__":
    main()

六、Phase 4节点清退一个月后执行

4.1 清退脚本 phase4_evict_nodes.sh

#!/bin/bash
# ============================================================
# Phase 4: 节点清退(一个月后执行)
# 
# ⚠️  此脚本将执行不可逆操作,请确认:
#   1. Phase 2 资源调整已完成
#   2. Phase 3 中间件迁移已完成并验证
#   3. 已获得相关审批
# ============================================================

set -e

NAMESPACE="jxyd"
EVICT_NODES=("10.20.1.142" "10.20.1.144" "10.20.1.145")
LOG_DIR="/root/wdd/backup_260617/phase4_$(date +%Y%m%d_%H%M%S)"
mkdir -p "${LOG_DIR}"

echo "=========================================="
echo "  Phase 4: 节点清退"
echo "  时间: $(date '+%Y-%m-%d %H:%M:%S')"
echo "  日志: ${LOG_DIR}"
echo "=========================================="

# ============================================================
# Step 1: 最终确认
# ============================================================
echo ""
echo "⚠️  即将执行以下操作:"
echo "  1. 删除 ${NAMESPACE} 命名空间中的业务 Deployment"
echo "  2. Cordon 节点: ${EVICT_NODES[*]}"
echo "  3. Drain 节点: ${EVICT_NODES[*]}"
echo "  4. 从集群中删除节点: ${EVICT_NODES[*]}"
echo ""
read -p "确认执行?(输入 YES 继续): " CONFIRM
if [ "${CONFIRM}" != "YES" ]; then
    echo "已取消。"
    exit 0
fi

# ============================================================
# Step 2: 最终备份
# ============================================================
echo ""
echo "=== Step 2: 最终备份 ==="
kubectl get all -n ${NAMESPACE} -o yaml > "${LOG_DIR}/final_backup_all.yaml"
kubectl get pods -n ${NAMESPACE} -o wide > "${LOG_DIR}/final_pod_distribution.txt"
echo "  备份完成"

# ============================================================
# Step 3: 删除 jxyd 业务 Deployment
# (说明:中间件仅存在于 StatefulSet因此所有的 Deployment 均为业务)
# ============================================================
echo ""
echo "=== Step 3: 删除 jxyd 业务 Deployment ==="

# 获取所有 deployment
ALL_DEPS=$(kubectl get deployments -n ${NAMESPACE} -o jsonpath='{.items[*].metadata.name}')

for dep in ${ALL_DEPS}; do
    echo "  [删除-业务] ${dep}"
    kubectl delete deployment "${dep}" -n ${NAMESPACE} --grace-period=30 2>&1 | tee -a "${LOG_DIR}/delete_deployments.log"
done

# ============================================================
# Step 4: Cordon 节点(标记不可调度)
# ============================================================
echo ""
echo "=== Step 4: Cordon 节点 ==="
for NODE in "${EVICT_NODES[@]}"; do
    echo "  Cordon ${NODE}..."
    kubectl cordon "${NODE}" 2>&1 | tee -a "${LOG_DIR}/cordon.log"
done

echo ""
kubectl get nodes -o wide | tee "${LOG_DIR}/nodes_after_cordon.txt"

# ============================================================
# Step 5: Drain 节点(驱逐所有 Pod
# ============================================================
echo ""
echo "=== Step 5: Drain 节点 ==="
for NODE in "${EVICT_NODES[@]}"; do
    echo "  Drain ${NODE}..."
    kubectl drain "${NODE}" \
        --ignore-daemonsets \
        --delete-emptydir-data \
        --force \
        --grace-period=60 \
        --timeout=300s \
        2>&1 | tee -a "${LOG_DIR}/drain_${NODE}.log"
    
    echo "  等待 30 秒让 Pod 完成迁移..."
    sleep 30
done

# ============================================================
# Step 6: 验证 Pod 已全部迁走
# ============================================================
echo ""
echo "=== Step 6: 验证 Pod 迁移情况 ==="
for NODE in "${EVICT_NODES[@]}"; do
    REMAINING=$(kubectl get pods --all-namespaces --field-selector spec.nodeName=${NODE} --no-headers 2>/dev/null | grep -v "kube-system" | wc -l)
    if [ "${REMAINING}" -gt 0 ]; then
        echo "  ⚠️  节点 ${NODE} 上仍有 ${REMAINING} 个非系统 Pod:"
        kubectl get pods --all-namespaces --field-selector spec.nodeName=${NODE} -o wide | grep -v "kube-system"
    else
        echo "  ✅ 节点 ${NODE} 上的用户 Pod 已全部迁走"
    fi
done

# ============================================================
# Step 7: 从集群中删除节点
# ============================================================
echo ""
echo "=== Step 7: 从集群中删除节点 ==="
read -p "确认从集群中删除节点?(输入 YES 继续): " CONFIRM2
if [ "${CONFIRM2}" == "YES" ]; then
    for NODE in "${EVICT_NODES[@]}"; do
        echo "  删除节点 ${NODE}..."
        kubectl delete node "${NODE}" 2>&1 | tee -a "${LOG_DIR}/delete_nodes.log"
    done
else
    echo "  跳过节点删除。可稍后手动执行: kubectl delete node <node>"
fi

# ============================================================
# Step 8: 最终验证
# ============================================================
echo ""
echo "=== Step 8: 最终验证 ==="
echo "--- 集群节点状态 ---"
kubectl get nodes -o wide | tee "${LOG_DIR}/nodes_final.txt"
echo ""
echo "--- jxyd 命名空间 Pod 状态 ---"
kubectl get pods -n ${NAMESPACE} -o wide | tee "${LOG_DIR}/pods_final.txt"

echo ""
echo "=========================================="
echo "  Phase 4 完成!"
echo "  日志目录: ${LOG_DIR}"
echo "=========================================="

七、快速命令参考

如果需要手动单独操作

# ============================================================
# 常用单条命令(按需执行)
# ============================================================

# 1. 查看 jxyd 下所有 deployment 的资源配置
kubectl get deploy -n jxyd -o custom-columns=\
NAME:.metadata.name,\
CPU_LIMIT:.spec.template.spec.containers[0].resources.limits.cpu,\
MEM_LIMIT:.spec.template.spec.containers[0].resources.limits.memory,\
CPU_REQ:.spec.template.spec.containers[0].resources.requests.cpu,\
MEM_REQ:.spec.template.spec.containers[0].resources.requests.memory

# 2. 查看某个 deployment 的完整资源配置
kubectl get deploy <NAME> -n jxyd -o jsonpath='{.spec.template.spec.containers[0].resources}' | python3 -m json.tool

# 3. 手动 patch 单个 deployment 的资源
kubectl patch deploy <NAME> -n jxyd --type='strategic' -p '{
  "spec": {
    "template": {
      "spec": {
        "containers": [{
          "name": "<CONTAINER_NAME>",
          "resources": {
            "limits": {"cpu": "2", "memory": "2Gi"},
            "requests": {"cpu": "1", "memory": "500Mi"}
          },
          "env": [{
            "name": "CUST_JAVA_OPTS",
            "value": "-Xms500m -Xmx2000m -Dlog4j2.formatMsgNoLookups=true"
          }]
        }]
      }
    }
  }
}'

# 4. 查看哪些 Pod 在待清退节点上
for node in 10.20.1.142 10.20.1.144 10.20.1.145; do
    echo "=== $node ==="
    kubectl get pods -n jxyd --field-selector spec.nodeName=$node -o wide
done

# 5. 单独给 StatefulSet 设置 nodeSelector
kubectl patch statefulset <NAME> -n jxyd --type='strategic' -p '{"spec":{"template":{"spec":{"nodeSelector":{"jxyd-middleware":"true"}}}}}'

# 5.1 单独给业务 Deployment 设置 nodeSelector
kubectl patch deploy <NAME> -n jxyd --type='strategic' -p '{"spec":{"template":{"spec":{"nodeSelector":{"jxyd-business":"true"}}}}}'

# 6. 快速 cordon标记不可调度
kubectl cordon 10.20.1.142
kubectl cordon 10.20.1.144
kubectl cordon 10.20.1.145

# 7. 快速 drain驱逐 + 清空)
kubectl drain 10.20.1.142 --ignore-daemonsets --delete-emptydir-data --force --grace-period=60
kubectl drain 10.20.1.144 --ignore-daemonsets --delete-emptydir-data --force --grace-period=60
kubectl drain 10.20.1.145 --ignore-daemonsets --delete-emptydir-data --force --grace-period=60

八、注意事项与风险提示

⚠️ 关键风险点

风险 说明 应对措施
中间件识别遗漏 自动识别依赖关键词,可能漏判 先用 --list 模式人工审核
PVC/PV 数据丢失 StatefulSet 的持久化数据可能绑定在特定节点的 local PV 提前检查 PV 类型,如为 local-storage 需手动迁移数据
资源降配影响服务 CPU/内存缩减可能导致 OOM 或性能下降 降配后监控1-2天关注 Pod 重启情况
Strategic Merge Patch 覆盖 env patch containers 时注意 env 的合并策略 脚本已处理,但建议 patch 后验证
Drain 超时 存在强制终止 Pod 的风险 已设置 grace-period=60timeout=300s

执行顺序清单

□ Phase 1: 执行备份脚本,确认备份完整
□ Phase 2: 先 --dry-run 预览资源调整
□ Phase 2: --apply 执行资源调整(仅针对 Deployment 业务)
□ Phase 2: 监控1-2天确认服务稳定
□ Phase 3: 给保留节点打 label
□ Phase 3: 先 --list 查看迁移和调度计划(并确认 hostPath
□ Phase 3: 人工确认中间件识别结果,如有遗漏则修改 MIDDLEWARE_KEYWORDS
□ Phase 3: --dry-run 预览迁移与调度
□ Phase 3: --apply 执行迁移与调度
□ Phase 3: --verify 验证迁移与调度结果
□ Phase 3: 监控中间件服务是否正常
□ ---- 等待一个月 ----
□ Phase 4: 执行最终清退脚本
□ Phase 4: 验证集群状态正常