# SQL完整使用指南:从数据定义到复杂查询 ## 前言 作为一名SQL语言大师,我将带您深入理解SQL的精髓。SQL不仅仅是语法规则的集合,更是一门与数据对话的艺术。让我们从数据库设计开始,循序渐进地掌握SQL的各个方面。 ## 示例数据库设计 在开始学习之前,我们先构建一个完整的业务场景:**在线教育平台数据库**。这个案例将贯穿我们整个学习过程。 ```mermaid erDiagram USERS ||--o{ ENROLLMENTS : "enrolls" COURSES ||--o{ ENROLLMENTS : "enrolled_in" COURSES ||--o{ LESSONS : "contains" USERS ||--o{ USER_PROGRESS : "tracks" LESSONS ||--o{ USER_PROGRESS : "progressed_in" CATEGORIES ||--o{ COURSES : "categorizes" USERS { int user_id PK varchar name varchar email date birth_date enum gender datetime created_at datetime updated_at boolean is_active } CATEGORIES { int category_id PK varchar name text description datetime created_at } COURSES { int course_id PK varchar title text description int category_id FK decimal price int duration_hours enum difficulty_level datetime created_at boolean is_published } LESSONS { int lesson_id PK int course_id FK varchar title text content int duration_minutes int sequence_number datetime created_at } ENROLLMENTS { int enrollment_id PK int user_id FK int course_id FK datetime enrollment_date decimal amount_paid enum status } USER_PROGRESS { int progress_id PK int user_id FK int lesson_id FK datetime completed_at int time_spent_minutes } ``` *** # 一、数据定义语言 (DDL) - 构建数据世界的基石 > **核心理念**:DDL就像建筑师的图纸,它定义了数据的"骨架"和"规则"。每一个CREATE、ALTER、DROP操作都在塑造数据的结构和约束。 ## 1.1 CREATE - 创造数据结构 ### 创建数据库 ```sql -- 创建数据库(某些数据库系统需要) CREATE DATABASE online_education_platform CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ``` ### 创建表结构 ```sql -- 1. 用户表 - 展示完整的列定义和约束 CREATE TABLE users ( user_id INTEGER PRIMARY KEY AUTOINCREMENT, -- 主键,自动增长 name VARCHAR(100) NOT NULL, -- 非空约束 email VARCHAR(255) NOT NULL UNIQUE, -- 唯一约束 birth_date DATE, -- 可为空 gender ENUM('male', 'female', 'other') DEFAULT 'other', -- 枚举类型 created_at DATETIME DEFAULT CURRENT_TIMESTAMP, -- 默认当前时间 updated_at DATETIME DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, is_active BOOLEAN DEFAULT true, -- 表级约束 CONSTRAINT chk_age CHECK (birth_date <= date('now', '-13 years')), -- 检查约束:至少13岁 INDEX idx_email (email), -- 普通索引 INDEX idx_created_at (created_at) ); -- 2. 课程分类表 CREATE TABLE categories ( category_id INTEGER PRIMARY KEY AUTOINCREMENT, name VARCHAR(50) NOT NULL UNIQUE, description TEXT, created_at DATETIME DEFAULT CURRENT_TIMESTAMP ); -- 3. 课程表 - 展示外键约束 CREATE TABLE courses ( course_id INTEGER PRIMARY KEY AUTOINCREMENT, title VARCHAR(200) NOT NULL, description TEXT, category_id INTEGER NOT NULL, price DECIMAL(10,2) DEFAULT 0.00, duration_hours INTEGER DEFAULT 0, difficulty_level ENUM('beginner', 'intermediate', 'advanced') DEFAULT 'beginner', created_at DATETIME DEFAULT CURRENT_TIMESTAMP, is_published BOOLEAN DEFAULT false, -- 外键约束 FOREIGN KEY (category_id) REFERENCES categories(category_id) ON DELETE RESTRICT ON UPDATE CASCADE, -- 检查约束 CONSTRAINT chk_price CHECK (price >= 0), CONSTRAINT chk_duration CHECK (duration_hours >= 0), -- 索引 INDEX idx_category (category_id), INDEX idx_price (price), INDEX idx_difficulty (difficulty_level) ); ``` ### 创建索引 - 性能优化的秘密武器 ```sql -- 创建单列索引 CREATE INDEX idx_users_name ON users(name); -- 创建复合索引(顺序很重要!) CREATE INDEX idx_course_category_price ON courses(category_id, price); -- 创建唯一索引 CREATE UNIQUE INDEX idx_users_email_unique ON users(email); -- 创建条件索引(SQLite特性) CREATE INDEX idx_active_users ON users(name) WHERE is_active = true; ``` ### 创建视图 - 数据的逻辑窗口 ```sql -- 创建视图:活跃用户的课程统计 CREATE VIEW active_user_course_stats AS SELECT u.user_id, u.name, u.email, COUNT(e.course_id) as enrolled_courses, SUM(e.amount_paid) as total_spent, AVG(c.price) as avg_course_price FROM users u LEFT JOIN enrollments e ON u.user_id = e.user_id LEFT JOIN courses c ON e.course_id = c.course_id WHERE u.is_active = true GROUP BY u.user_id, u.name, u.email; ``` ## 1.2 ALTER - 修改现有结构 ```sql -- 添加新列 ALTER TABLE users ADD COLUMN phone VARCHAR(20); ALTER TABLE users ADD COLUMN profile_picture_url TEXT; -- 修改列定义 ALTER TABLE courses ALTER COLUMN title SET NOT NULL; -- 设置非空 ALTER TABLE courses ALTER COLUMN description TYPE TEXT; -- 修改数据类型 -- 添加约束 ALTER TABLE users ADD CONSTRAINT chk_phone_format CHECK (phone IS NULL OR length(phone) >= 10); -- 删除约束 ALTER TABLE users DROP CONSTRAINT chk_phone_format; -- 重命名列 ALTER TABLE users RENAME COLUMN phone TO phone_number; -- 重命名表 ALTER TABLE users RENAME TO platform_users; ALTER TABLE platform_users RENAME TO users; -- 改回来 ``` ## 1.3 DROP - 删除结构 ```sql -- 删除索引 DROP INDEX idx_users_name; -- 删除视图 DROP VIEW active_user_course_stats; -- 删除表(注意外键约束的影响) -- DROP TABLE courses; -- 这会失败,因为有外键约束 -- 删除数据库(谨慎使用!) -- DROP DATABASE online_education_platform; ``` ## 1.4 TRUNCATE - 清空表数据 ```sql -- 快速清空表数据,保留结构 TRUNCATE TABLE user_progress; -- 比DELETE FROM table 更快 -- 注意:TRUNCATE不会触发DELETE触发器,也不能回滚单条记录 ``` **💡 DDL最佳实践** 1. **命名规范**:表名用复数,列名用snake_case 2. **约束设计**:优先使用数据库约束而不是应用层检查 3. **索引策略**:根据查询模式创建索引,避免过度索引 4. **版本控制**:所有DDL变更都应该有迁移脚本 *** # 二、数据操作语言 (DML) - 数据的生命周期管理 > **核心理念**:DML是数据的生命线。INSERT赋予数据生命,UPDATE让数据成长,DELETE让数据安息。每个操作都要考虑性能、一致性和业务逻辑。 ## 2.1 INSERT - 数据的诞生 ### 基础插入操作 ```sql -- 1. 插入完整记录 INSERT INTO categories (name, description) VALUES ('编程开发', '包含各种编程语言和开发技术的课程'); INSERT INTO categories (name, description) VALUES ('数据科学', '数据分析、机器学习、统计学相关课程'), ('设计创意', '平面设计、UI/UX、插画等创意类课程'), ('商业管理', '市场营销、项目管理、创业等商业课程'); -- 2. 插入用户数据 INSERT INTO users (name, email, birth_date, gender) VALUES ('张三', 'zhang.san@email.com', '1995-05-15', 'male'), ('李四', 'li.si@email.com', '1992-08-22', 'female'), ('王五', 'wang.wu@email.com', '1988-12-03', 'male'), ('赵六', 'zhao.liu@email.com', '1990-03-18', 'female'); -- 3. 插入课程数据 INSERT INTO courses (title, description, category_id, price, duration_hours, difficulty_level, is_published) VALUES ('Python从入门到精通', '零基础学Python,包含基础语法、数据结构、面向对象编程', 1, 199.00, 40, 'beginner', true), ('数据分析实战', '使用Python进行数据分析,包含pandas、numpy、matplotlib', 2, 299.00, 30, 'intermediate', true), ('UI设计基础教程', 'Figma和Sketch工具使用,界面设计原则', 3, 159.00, 25, 'beginner', true); ``` ### 高级插入技巧 ```sql -- 1. 基于查询的插入(INSERT ... SELECT) INSERT INTO user_progress (user_id, lesson_id, completed_at, time_spent_minutes) SELECT u.user_id, l.lesson_id, datetime('now', '-' || abs(random()) % 30 || ' days'), -- 随机过去30天内 30 + abs(random()) % 60 -- 30-90分钟随机时长 FROM users u CROSS JOIN lessons l WHERE u.user_id <= 2 AND l.lesson_id <= 3; -- 限制数据量 -- 2. 插入时处理冲突(UPSERT) INSERT INTO users (email, name, updated_at) VALUES ('zhang.san@email.com', '张三(更新)', datetime('now')) ON CONFLICT(email) DO UPDATE SET name = excluded.name, updated_at = excluded.updated_at; -- 3. 批量插入优化 INSERT INTO enrollments (user_id, course_id, enrollment_date, amount_paid, status) VALUES (1, 1, datetime('now'), 199.00, 'active'), (1, 2, datetime('now'), 299.00, 'active'), (2, 1, datetime('now'), 199.00, 'active'), (3, 3, datetime('now'), 159.00, 'completed'), (4, 1, datetime('now'), 199.00, 'active'); ``` ## 2.2 UPDATE - 数据的成长 ### 基础更新操作 ```sql -- 1. 单表更新 UPDATE users SET name = '张三丰', updated_at = datetime('now') WHERE email = 'zhang.san@email.com'; -- 2. 条件更新 UPDATE courses SET price = price * 0.8 -- 全场8折 WHERE category_id = 1 AND is_published = true; -- 3. 使用CASE进行条件更新 UPDATE courses SET difficulty_level = CASE WHEN duration_hours < 20 THEN 'beginner' WHEN duration_hours BETWEEN 20 AND 40 THEN 'intermediate' ELSE 'advanced' END WHERE difficulty_level IS NULL; ``` ### 高级更新技巧 ```sql -- 1. 基于子查询的更新 UPDATE users SET is_active = false WHERE user_id IN ( SELECT DISTINCT u.user_id FROM users u LEFT JOIN enrollments e ON u.user_id = e.user_id WHERE e.enrollment_date < date('now', '-365 days') OR e.enrollment_date IS NULL ); -- 2. 使用JOIN的更新(如果数据库支持) UPDATE courses SET price = c.price * 1.1 FROM courses c JOIN categories cat ON c.category_id = cat.category_id WHERE cat.name = '编程开发' AND c.created_at < date('now', '-180 days'); ``` ## 2.3 DELETE - 数据的清理 ### 基础删除操作 ```sql -- 1. 条件删除 DELETE FROM user_progress WHERE completed_at < date('now', '-365 days'); -- 2. 使用子查询删除 DELETE FROM enrollments WHERE course_id IN ( SELECT course_id FROM courses WHERE is_published = false AND created_at < date('now', '-90 days') ); ``` ### 安全删除策略 ```sql -- 1. 软删除(推荐生产环境使用) -- 添加deleted_at列而不是真正删除数据 ALTER TABLE users ADD COLUMN deleted_at DATETIME NULL; -- 软删除操作 UPDATE users SET deleted_at = datetime('now'), is_active = false WHERE user_id = 1; -- 查询时排除已删除的记录 SELECT * FROM users WHERE deleted_at IS NULL; ``` **⚠️ DML常见陷阱和最佳实践** ```mermaid flowchart TD A[DML操作] --> B{是否需要事务?} B -->|是| C[BEGIN TRANSACTION] B -->|否| D[直接执行] C --> E[执行DML语句] E --> F{检查结果} F -->|成功| G[COMMIT] F -->|失败| H[ROLLBACK] D --> I[记录日志] G --> I H --> I ``` 1. **批量操作优化**:大量数据操作时使用批量语句 2. **索引影响**:UPDATE和DELETE会影响索引性能 3. **外键约束**:删除时注意外键约束的限制 4. **备份策略**:重要数据操作前先备份 *** # 三、事务控制语言 (TCL) - 数据一致性的守护者 > **核心理念**:TCL是数据完整性的守护神。它确保要么全部成功,要么全部失败,绝不允许数据处于不一致的中间状态。 ## 3.1 ACID特性深度解析 ```mermaid mindmap root((ACID)) Atomicity 原子性 全部成功或全部失败 不可分割的操作单元 Consistency 一致性 数据完整性约束 业务规则满足 Isolation 隔离性 并发事务相互独立 隔离级别控制 Durability 持久性 已提交数据永久保存 系统故障不影响 ``` ## 3.2 基础事务控制 ### 事务的生命周期 ```sql -- 1. 显式事务控制 BEGIN TRANSACTION; -- 开始事务 -- 用户注册业务逻辑 INSERT INTO users (name, email, birth_date, gender) VALUES ('新用户', 'newuser@email.com', '1995-01-01', 'male'); -- 获取刚插入的用户ID(SQLite方式) -- 在实际应用中,通常通过程序获取last_insert_rowid() -- 初始化用户数据 INSERT INTO user_progress (user_id, lesson_id, completed_at) SELECT last_insert_rowid(), lesson_id, NULL FROM lessons WHERE course_id = 1; -- 检查操作是否成功 -- 如果一切正常 COMMIT; -- 提交事务 -- 如果出现错误 -- ROLLBACK; -- 回滚事务 ``` ### 保存点(Savepoint)- 事务内的检查点 ```sql BEGIN TRANSACTION; -- 创建保存点 SAVEPOINT user_creation; INSERT INTO users (name, email, birth_date, gender) VALUES ('测试用户1', 'test1@email.com', '1990-01-01', 'male'); INSERT INTO users (name, email, birth_date, gender) VALUES ('测试用户2', 'test2@email.com', '1991-01-01', 'female'); -- 创建另一个保存点 SAVEPOINT course_enrollment; -- 假设这里发生了错误 INSERT INTO enrollments (user_id, course_id, enrollment_date, amount_paid) VALUES (999, 1, datetime('now'), 199.00); -- 用户ID不存在,会失败 -- 回滚到指定保存点 ROLLBACK TO SAVEPOINT course_enrollment; -- 释放保存点 RELEASE SAVEPOINT course_enrollment; COMMIT; ``` ## 3.3 隔离级别详解 ```sql -- 设置事务隔离级别(根据数据库而定) -- SQLite默认是可串行化级别 -- 1. 读未提交 (Read Uncommitted) PRAGMA read_uncommitted = true; -- 2. 读已提交 (Read Committed) - 大多数数据库默认 -- 避免脏读,但可能出现不可重复读 -- 3. 可重复读 (Repeatable Read) -- 避免脏读和不可重复读,但可能出现幻读 -- 4. 可串行化 (Serializable) - SQLite默认 -- 避免所有并发问题,但性能最低 ``` ### 并发问题演示 ```sql -- 演示并发问题的场景 -- 会话1: BEGIN TRANSACTION; SELECT price FROM courses WHERE course_id = 1; -- 假设返回199.00 -- 在这里暂停,让会话2执行 -- 会话2: BEGIN TRANSACTION; UPDATE courses SET price = 299.00 WHERE course_id = 1; COMMIT; -- 回到会话1: SELECT price FROM courses WHERE course_id = 1; -- 可能返回299.00(不可重复读) COMMIT; ``` ## 3.4 实际业务场景中的事务应用 ### 场景1:用户注册完整流程 ```sql BEGIN TRANSACTION; -- 1. 创建用户账户 INSERT INTO users (name, email, birth_date, gender) VALUES ('李明', 'liming@email.com', '1992-06-15', 'male'); -- 2. 记录注册来源(假设有这样一个表) INSERT INTO user_activity_log (user_id, activity_type, created_at) VALUES (last_insert_rowid(), 'registration', datetime('now')); -- 3. 赠送新用户福利课程 INSERT INTO enrollments (user_id, course_id, enrollment_date, amount_paid, status) VALUES (last_insert_rowid(), 1, datetime('now'), 0.00, 'active'); COMMIT; ``` ### 场景2:课程购买事务 ```sql -- 模拟课程购买的复杂事务 BEGIN TRANSACTION; SAVEPOINT before_purchase; -- 1. 检查课程是否可购买 -- 2. 检查用户是否已购买 -- 3. 创建订单记录 INSERT INTO enrollments (user_id, course_id, enrollment_date, amount_paid, status) VALUES (1, 2, datetime('now'), 299.00, 'active'); -- 4. 更新用户统计信息(假设有这样的字段) UPDATE users SET updated_at = datetime('now') WHERE user_id = 1; -- 5. 记录支付日志 -- INSERT INTO payment_log (user_id, amount, payment_method, transaction_id) -- VALUES (1, 299.00, 'credit_card', 'TXN123456789'); -- 如果一切顺利 COMMIT; -- 如果出现错误 -- ROLLBACK TO SAVEPOINT before_purchase; ``` **🔒 TCL最佳实践** ```mermaid flowchart LR A[开始业务操作] --> B[BEGIN TRANSACTION] B --> C[执行关键操作] C --> D{操作成功?} D -->|是| E[COMMIT] D -->|否| F[ROLLBACK] E --> G[记录成功日志] F --> H[记录错误日志] G --> I[结束] H --> I ``` 1. **事务范围最小化**:只在必要的操作周围使用事务 2. **避免长事务**:长时间的事务会增加死锁风险 3. **合理使用保存点**:复杂业务逻辑中使用保存点进行精细控制 4. **错误处理**:总是准备回滚计划 *** # 四、数据查询语言 (DQL) - SELECT语句完全指南 > **核心理念**:SELECT不仅仅是查询,它是数据思维的体现。理解数据的逻辑处理顺序,就能写出高效、准确的查询语句。 ## 4.1 SQL逻辑处理顺序 - 引擎的视角 ```mermaid flowchart TD A[FROM 子句] --> B[JOIN 连接] B --> C[WHERE 行过滤] C --> D[GROUP BY 分组] D --> E[HAVING 组过滤] E --> F[SELECT 选择列] F --> G[DISTINCT 去重] G --> H[ORDER BY 排序] H --> I[LIMIT/OFFSET 限制] style A fill:#f9f,stroke:#333,stroke-width:2px style F fill:#bbf,stroke:#333,stroke-width:2px ``` 让我们先创建示例数据,然后按照这个顺序深入学习每个子句: ```sql -- 创建完整的示例数据 INSERT INTO lessons (course_id, title, content, duration_minutes, sequence_number) VALUES (1, 'Python基础语法', 'Python的基本语法介绍', 45, 1), (1, '数据类型和变量', 'Python的数据类型详解', 38, 2), (1, '控制流程', '条件语句和循环', 52, 3), (1, '函数和模块', '函数定义和模块使用', 41, 4), (2, '数据导入导出', 'pandas读取各种格式数据', 35, 1), (2, '数据清洗', '处理缺失值和异常值', 48, 2), (2, '数据可视化', 'matplotlib和seaborn使用', 55, 3), (3, 'Figma基础', 'Figma界面和基本操作', 30, 1), (3, '组件设计', '创建可重用的设计组件', 42, 2); ``` ## 4.2 FROM子句 - 数据源的确立 ### 基础表查询 ```sql -- 1. 单表查询 SELECT * FROM users; -- 2. 表别名使用 SELECT u.name, u.email, u.created_at FROM users u WHERE u.is_active = true; -- 3. 子查询作为数据源 SELECT course_stats.* FROM ( SELECT course_id, COUNT(*) as enrollment_count, AVG(amount_paid) as avg_paid FROM enrollments WHERE status = 'active' GROUP BY course_id ) course_stats WHERE course_stats.enrollment_count > 1; ``` ## 4.3 JOIN连接 - 数据关系的桥梁 ### JOIN类型详解 ```mermaid graph LR A[LEFT TABLE] B[RIGHT TABLE] subgraph "INNER JOIN" C[只返回匹配的记录] end subgraph "LEFT JOIN" D[返回左表所有记录
右表匹配记录] end subgraph "RIGHT JOIN" E[返回右表所有记录
左表匹配记录] end subgraph "FULL OUTER JOIN" F[返回两表所有记录] end ``` ```sql -- 1. INNER JOIN - 内连接(最常用) SELECT u.name as student_name, c.title as course_title, e.enrollment_date, e.amount_paid FROM users u INNER JOIN enrollments e ON u.user_id = e.user_id INNER JOIN courses c ON e.course_id = c.course_id WHERE u.is_active = true; -- 2. LEFT JOIN - 左连接(查看所有用户,包括未注册课程的) SELECT u.name, u.email, COUNT(e.course_id) as course_count, COALESCE(SUM(e.amount_paid), 0) as total_spent FROM users u LEFT JOIN enrollments e ON u.user_id = e.user_id GROUP BY u.user_id, u.name, u.email; -- 3. 多表连接的复杂查询 SELECT u.name as student_name, c.title as course_title, cat.name as category_name, COUNT(up.progress_id) as lessons_completed, COUNT(l.lesson_id) as total_lessons, ROUND( COUNT(up.progress_id) * 100.0 / COUNT(l.lesson_id), 2 ) as completion_percentage FROM users u JOIN enrollments e ON u.user_id = e.user_id JOIN courses c ON e.course_id = c.course_id JOIN categories cat ON c.category_id = cat.category_id LEFT JOIN lessons l ON c.course_id = l.course_id LEFT JOIN user_progress up ON u.user_id = up.user_id AND l.lesson_id = up.lesson_id WHERE e.status = 'active' GROUP BY u.user_id, c.course_id, u.name, c.title, cat.name ORDER BY completion_percentage DESC; ``` ### 自连接 - 同表不同角色 ```sql -- 查找同一课程的其他学员 SELECT DISTINCT u1.name as current_user, u2.name as classmate, c.title as course_title FROM users u1 JOIN enrollments e1 ON u1.user_id = e1.user_id JOIN enrollments e2 ON e1.course_id = e2.course_id JOIN users u2 ON e2.user_id = u2.user_id JOIN courses c ON e1.course_id = c.course_id WHERE u1.user_id != u2.user_id -- 排除自己 AND u1.name = '张三' ORDER BY c.title, u2.name; ``` ## 4.4 WHERE子句 - 数据的精确过滤 ### 基础过滤条件 ```sql -- 1. 基本比较运算符 SELECT * FROM courses WHERE price > 200 AND difficulty_level = 'intermediate' AND is_published = true; -- 2. 范围查询 SELECT * FROM users WHERE birth_date BETWEEN '1990-01-01' AND '1999-12-31' AND created_at >= datetime('now', '-30 days'); -- 3. 模式匹配 SELECT * FROM courses WHERE title LIKE '%Python%' -- 包含Python OR title LIKE '数据%'; -- 以"数据"开头 -- 4. 列表匹配 SELECT * FROM courses WHERE category_id IN (1, 2) -- 编程和数据科学 AND difficulty_level NOT IN ('advanced'); ``` ### 高级过滤技巧 ```sql -- 1. NULL值处理 SELECT * FROM users WHERE birth_date IS NOT NULL AND (phone IS NULL OR length(phone) < 11); -- 2. 正则表达式(如果支持) SELECT * FROM users WHERE email REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'; -- 3. 日期和时间函数 SELECT * FROM enrollments WHERE date(enrollment_date) = date('now') -- 今天注册的 OR strftime('%w', enrollment_date) = '0'; -- 周日注册的 -- 4. 子查询过滤 SELECT * FROM users WHERE user_id IN ( SELECT DISTINCT user_id FROM enrollments WHERE amount_paid > 200 ); ``` ## 4.5 GROUP BY和聚合函数 - 数据的汇总艺术 ### 基础分组和聚合 ```sql -- 1. 按课程统计注册情况 SELECT c.title, COUNT(*) as enrollment_count, AVG(e.amount_paid) as avg_payment, MIN(e.amount_paid) as min_payment, MAX(e.amount_paid) as max_payment, SUM(e.amount_paid) as total_revenue FROM courses c JOIN enrollments e ON c.course_id = e.course_id GROUP BY c.course_id, c.title ORDER BY enrollment_count DESC; -- 2. 按年月统计注册趋势 SELECT strftime('%Y-%m', enrollment_date) as month, COUNT(*) as enrollments, SUM(amount_paid) as monthly_revenue, COUNT(DISTINCT user_id) as unique_students FROM enrollments GROUP BY strftime('%Y-%m', enrollment_date) ORDER BY month; ``` ### HAVING子句 - 分组后的过滤 ```sql -- 找出受欢迎的课程(注册人数 > 2) SELECT c.title, COUNT(*) as enrollment_count, AVG(e.amount_paid) as avg_payment FROM courses c JOIN enrollments e ON c.course_id = e.course_id GROUP BY c.course_id, c.title HAVING COUNT(*) > 1 -- 注意:这里不能用别名enrollment_count AND AVG(e.amount_paid) > 150 ORDER BY enrollment_count DESC; -- 复杂的HAVING条件 SELECT cat.name as category, COUNT(DISTINCT c.course_id) as course_count, COUNT(DISTINCT e.user_id) as student_count, SUM(e.amount_paid) as category_revenue FROM categories cat JOIN courses c ON cat.category_id = c.category_id JOIN enrollments e ON c.course_id = e.course_id GROUP BY cat.category_id, cat.name HAVING course_count >= 1 AND category_revenue > 200 AND student_count > 1; ``` ## 4.6 高级查询技术 ### 窗口函数 - 数据分析的利器 ```sql -- 1. 排名函数 SELECT u.name, c.title, e.amount_paid, ROW_NUMBER() OVER (ORDER BY e.amount_paid DESC) as payment_rank, RANK() OVER (ORDER BY e.amount_paid DESC) as payment_rank_with_ties, DENSE_RANK() OVER (ORDER BY e.amount_paid DESC) as dense_payment_rank FROM users u JOIN enrollments e ON u.user_id = e.user_id JOIN courses c ON e.course_id = c.course_id; -- 2. 分区窗口函数 SELECT u.name, c.title, cat.name as category, e.amount_paid, AVG(e.amount_paid) OVER (PARTITION BY cat.category_id) as category_avg_payment, e.amount_paid - AVG(e.amount_paid) OVER (PARTITION BY cat.category_id) as payment_diff_from_avg FROM users u JOIN enrollments e ON u.user_id = e.user_id JOIN courses c ON e.course_id = c.course_id JOIN categories cat ON c.category_id = cat.category_id; -- 3. 累计计算 SELECT enrollment_date, amount_paid, SUM(amount_paid) OVER (ORDER BY enrollment_date) as running_total, LAG(amount_paid, 1) OVER (ORDER BY enrollment_date) as previous_payment, LEAD(amount_paid, 1) OVER (ORDER BY enrollment_date) as next_payment FROM enrollments ORDER BY enrollment_date; ``` ### 通用表表达式 (CTE) ```sql -- 1. 基础CTE WITH course_stats AS ( SELECT c.course_id, c.title, COUNT(e.enrollment_id) as enrollment_count, AVG(e.amount_paid) as avg_payment FROM courses c LEFT JOIN enrollments e ON c.course_id = e.course_id GROUP BY c.course_id, c.title ), popular_courses AS ( SELECT * FROM course_stats WHERE enrollment_count >= 2 ) SELECT title, enrollment_count, ROUND(avg_payment, 2) as avg_payment FROM popular_courses ORDER BY enrollment_count DESC; -- 2. 递归CTE(课程推荐系统) WITH RECURSIVE course_recommendations AS ( -- 基础情况:用户直接购买的课程 SELECT u.user_id, u.name, c.course_id, c.title, 1 as level FROM users u JOIN enrollments e ON u.user_id = e.user_id JOIN courses c ON e.course_id = c.course_id WHERE u.name = '张三' UNION ALL -- 递归情况:相同分类的其他课程 SELECT cr.user_id, cr.name, c2.course_id, c2.title, cr.level + 1 FROM course_recommendations cr JOIN courses c1 ON cr.course_id = c1.course_id JOIN courses c2 ON c1.category_id = c2.category_id WHERE cr.level < 2 AND c2.course_id NOT IN ( SELECT course_id FROM course_recommendations ) ) SELECT DISTINCT name, title, level FROM course_recommendations ORDER BY level, title; ``` ### 子查询的艺术 ```sql -- 1. 标量子查询 SELECT u.name, u.email, (SELECT COUNT(*) FROM enrollments e WHERE e.user_id = u.user_id) as course_count, (SELECT MAX(amount_paid) FROM enrollments e WHERE e.user_id = u.user_id) as max_payment FROM users u; -- 2. 存在性检查 SELECT u.name, u.email FROM users u WHERE EXISTS ( SELECT 1 FROM enrollments e WHERE e.user_id = u.user_id AND e.amount_paid > 200 ); -- 3. 复杂的相关子查询 SELECT c.title, c.price, ( SELECT COUNT(*) FROM enrollments e WHERE e.course_id = c.course_id ) as enrollment_count, CASE WHEN (SELECT COUNT(*) FROM enrollments e WHERE e.course_id = c.course_id) > 2 THEN '热门课程' WHEN (SELECT COUNT(*) FROM enrollments e WHERE e.course_id = c.course_id) > 0 THEN '普通课程' ELSE '冷门课程' END as popularity FROM courses c WHERE c.is_published = true; ``` ## 4.7 集合操作 ```sql -- 1. UNION - 合并结果集 SELECT name as person_name, 'Student' as role FROM users WHERE user_id IN (SELECT DISTINCT user_id FROM enrollments) UNION SELECT title as person_name, 'Course' as role FROM courses WHERE is_published = true; -- 2. INTERSECT - 交集 SELECT user_id FROM enrollments WHERE course_id = 1 INTERSECT SELECT user_id FROM enrollments WHERE course_id = 2; -- 3. EXCEPT - 差集 SELECT user_id FROM users WHERE is_active = true EXCEPT SELECT DISTINCT user_id FROM enrollments; ``` ## 4.8 ORDER BY和LIMIT - 结果的排序和限制 ```sql -- 1. 多列排序 SELECT u.name, c.title, e.amount_paid, e.enrollment_date FROM users u JOIN enrollments e ON u.user_id = e.user_id JOIN courses c ON e.course_id = c.course_id ORDER BY e.amount_paid DESC, -- 按支付金额降序 e.enrollment_date ASC, -- 相同金额按注册时间升序 u.name; -- 最后按姓名排序 -- 2. 使用表达式排序 SELECT u.name, u.birth_date, CASE WHEN date('now') - birth_date > 365.25 * 30 THEN '30+' WHEN date('now') - birth_date > 365.25 * 25 THEN '25-30' ELSE '25以下' END as age_group FROM users u ORDER BY CASE WHEN date('now') - birth_date > 365.25 * 30 THEN 1 WHEN date('now') - birth_date > 365.25 * 25 THEN 2 ELSE 3 END, u.name; -- 3. 分页查询 SELECT u.name, c.title, e.amount_paid FROM users u JOIN enrollments e ON u.user_id = e.user_id JOIN courses c ON e.course_id = c.course_id ORDER BY e.enrollment_date DESC LIMIT 10 OFFSET 0; -- 第一页,每页10条 -- 更复杂的分页示例 WITH numbered_results AS ( SELECT u.name, c.title, e.amount_paid, ROW_NUMBER() OVER (ORDER BY e.enrollment_date DESC) as row_num FROM users u JOIN enrollments e ON u.user_id = e.user_id JOIN courses c ON e.course_id = c.course_id ) SELECT name, title, amount_paid FROM numbered_results WHERE row_num BETWEEN 11 AND 20; -- 第二页 ``` ## 📊 查询性能优化 ### 执行计划分析 ```sql -- 查看查询执行计划 EXPLAIN QUERY PLAN SELECT u.name, COUNT(e.course_id) as course_count FROM users u LEFT JOIN enrollments e ON u.user_id = e.user_id WHERE u.is_active = true GROUP BY u.user_id, u.name; ``` ### 优化建议 ```mermaid flowchart TD A[查询优化] --> B[索引优化] A --> C[查询重写] A --> D[数据量控制] B --> E[单列索引] B --> F[复合索引] B --> G[覆盖索引] C --> H[避免SELECT *] C --> I[合理使用子查询] C --> J[优化JOIN顺序] D --> K[使用LIMIT] D --> L[分页处理] D --> M[条件前置] ``` **🎯 SELECT最佳实践总结** 1. **理解执行顺序**:按照FROM→WHERE→GROUP BY→HAVING→SELECT→ORDER BY的逻辑思考 2. **合理使用索引**:根据WHERE和JOIN条件创建合适的索引 3. **避免N+1查询**:使用JOIN替代循环查询 4. **控制结果集大小**:合理使用LIMIT和分页 5. **选择合适的连接类型**:根据业务需求选择INNER JOIN或LEFT JOIN 6. **善用窗口函数**:复杂的分析需求使用窗口函数而不是自连接 *** # 总结 通过这份完整的SQL指南,我们从数据定义的基石开始,经历了数据操作的生命周期,掌握了事务控制的精髓,最终深入探索了查询的艺术。每一个SQL语句都不仅仅是代码,更是我们与数据对话的语言。 记住,优秀的SQL不仅要正确,还要高效、可读、可维护。在实际工作中,始终考虑性能影响、数据完整性和业务逻辑的完整性。 SQL的学习永无止境,但掌握了这些核心概念和技术,你已经具备了处理复杂数据问题的能力。继续实践,不断优化,让SQL成为你数据世界中最得力的工具。