通用表表達式實現(xiàn) UPDATE/DELETE LIMIT

作者: 不剪發(fā)的Tony老師
畢業(yè)于北京航空航天大學,十多年數(shù)據(jù)庫管理與開發(fā)經(jīng)驗,目前在一家全球性的金融公司從事數(shù)據(jù)庫架構(gòu)設(shè)計。CSDN學院簽約講師以及GitChat專欄作者。csdn上的博客收藏于以下地址:https://tonydong.blog.csdn.net



文章目錄

        使用場景
        通用表表達式
        隨機更新或者刪除
        總結(jié)

    ??少而好學,如日出之陽;壯而好學,如日中之光;志而好學,如炳燭之光。——劉向

大家好!我是只談技術(shù)不剪發(fā)的 Tony 老師。

在 SQL 查詢語句中,我們可以使用 FETCH(標準語法)、LIMIT(MySQL、PostgreSQL、SQLite)、TOP(Microsoft SQL Server)、ROWNUM(Oracle)限定返回結(jié)果的數(shù)量,從而實現(xiàn) Top-N 以及分頁查詢等功能。同時,MySQL 和 SQLite 還支持在 UPDATE 和 DELETE 語句中使用 LIMIT 子句限定更新和刪除的行數(shù)。但是這種方法不適用于其他數(shù)據(jù)庫,因此本文介紹一種更加通用的實現(xiàn)方法。

如果覺得文章有用,歡迎評論??、點贊??、推薦??
使用場景

在介紹具體方法之前,我們先來了解一下這種語法的使用場景。簡單來說,它的主要作用是將大型事務分解為多個小的事務,包括:

    批量更新和刪除,減少鎖資源的爭用,限制對系統(tǒng)的影響。
    減少數(shù)據(jù)文件和重做日志磁盤的使用。
    分區(qū)之間的批量數(shù)據(jù)遷移。

通用表表達式

我們首先創(chuàng)建一個測試表和示例數(shù)據(jù):

CREATE TABLE big_table (id INT PRIMARY KEY, v VARCHAR(10));

WITH RECURSIVE d(n) AS (
  SELECT 1
  UNION ALL
  SELECT n + 1 FROM d WHERE n < 100000
)
INSERT INTO big_table
SELECT n, concat('id:', n) FROM d;

SELECT * FROM big_table ORDER BY id LIMIT 10;
id|v    |
--+-----+
 1|id:1 |
 2|id:2 |
 3|id:3 |
 4|id:4 |
 5|id:5 |
 6|id:6 |
 7|id:7 |
 8|id:8 |
 9|id:9 |
10|id:10|   

   
我們使用遞歸通用表表達式(WITH 子句)為示例表 big_table 生成了 100000 條記錄。

    ??關(guān)于通用表表達式的語法和使用案例,可以參考這篇文章和這篇文章。

由于 SELECT 語句支持查詢結(jié)果的限制,我們可以將一個這樣的查詢語句的結(jié)果用于 DELETE 或者 UPDATE 語句,作為刪除或者更新的過濾條件。使用通用表表達式實現(xiàn)的語法如下:

WITH deleted_rows AS (
  SELECT id
  FROM big_table
  LIMIT 10
)
DELETE FROM big_table
WHERE id IN (SELECT id FROM deleted_rows);

    

我們首先使用 CTE 構(gòu)造了一個臨時表 deleted_rows,其中包含了想要刪除的數(shù)據(jù)。然后在 DELETE 語句中使用子查詢引用了這個臨時表作為判斷條件。我們可以查看一下以上語句的執(zhí)行計劃(以 PostgreSQL 為例):

EXPLAIN
WITH deleted_rows AS (
  SELECT id
  FROM big_table
  LIMIT 10
)
DELETE FROM big_table
WHERE id IN (SELECT id FROM deleted_rows);

QUERY PLAN                                                                                               |
---------------------------------------------------------------------------------------------------------+
Delete on big_table  (cost=0.57..83.48 rows=0 width=0)                                                   |
  ->  Nested Loop  (cost=0.57..83.48 rows=10 width=34)                                                   |
        ->  HashAggregate  (cost=0.28..0.38 rows=10 width=32)                                            |
              Group Key: deleted_rows.id                                                                 |
              ->  Subquery Scan on deleted_rows  (cost=0.00..0.25 rows=10 width=32)                      |
                    ->  Limit  (cost=0.00..0.15 rows=10 width=4)                                         |
                          ->  Seq Scan on big_table big_table_1  (cost=0.00..1541.00 rows=100000 width=4)|
        ->  Index Scan using big_table_pkey on big_table  (cost=0.29..8.31 rows=1 width=10)              |
              Index Cond: (id = deleted_rows.id)                                                         |

   

CTE 的實現(xiàn)采用了全表順序掃描(Seq Scan on big_table),因為我們只需要返回任意的 10 條記錄。我們也可以使用關(guān)聯(lián)子查詢替代 IN 子句:

EXPLAIN
WITH deleted_rows AS (
  SELECT id
  FROM big_table
  LIMIT 10
)
DELETE FROM big_table
WHERE EXISTS (SELECT * FROM deleted_rows d WHERE d.id = big_table.id);

QUERY PLAN                                                                                               |
---------------------------------------------------------------------------------------------------------+
Delete on big_table  (cost=0.57..83.48 rows=0 width=0)                                                   |
  ->  Nested Loop  (cost=0.57..83.48 rows=10 width=34)                                                   |
        ->  HashAggregate  (cost=0.28..0.38 rows=10 width=32)                                            |
              Group Key: d.id                                                                            |
              ->  Subquery Scan on d  (cost=0.00..0.25 rows=10 width=32)                                 |
                    ->  Limit  (cost=0.00..0.15 rows=10 width=4)                                         |
                          ->  Seq Scan on big_table big_table_1  (cost=0.00..1541.00 rows=100000 width=4)|
        ->  Index Scan using big_table_pkey on big_table  (cost=0.29..8.31 rows=1 width=10)              |
              Index Cond: (id = d.id)                                                                    |

   

兩者的查詢計劃一樣。

然后我們再為 CTE 增加一個基于 id 的排序操作:

EXPLAIN
WITH deleted_rows AS (
  SELECT id
  FROM big_table
  ORDER BY id
  LIMIT 10
)
DELETE FROM big_table
WHERE id IN (SELECT id FROM deleted_rows);

QUERY PLAN                                                                                                                           |
-------------------------------------------------------------------------------------------------------------------------------------+
Delete on big_table  (cost=0.97..83.88 rows=0 width=0)                                                                               |
  ->  Nested Loop  (cost=0.97..83.88 rows=10 width=34)                                                                               |
        ->  HashAggregate  (cost=0.68..0.78 rows=10 width=32)                                                                        |
              Group Key: deleted_rows.id                                                                                             |
              ->  Subquery Scan on deleted_rows  (cost=0.29..0.65 rows=10 width=32)                                                  |
                    ->  Limit  (cost=0.29..0.55 rows=10 width=4)                                                                     |
                          ->  Index Only Scan using big_table_pkey on big_table big_table_1  (cost=0.29..2604.29 rows=100000 width=4)|
        ->  Index Scan using big_table_pkey on big_table  (cost=0.29..8.31 rows=1 width=10)                                          |
              Index Cond: (id = deleted_rows.id)                                                                                     |

 

此時 CTE 的實現(xiàn)采用了索引掃描(Index Only Scan using big_table_pkey on big_table)。

對于 UPDATE 語句,我們同樣可以使用這種方式模擬 LIMIT 子句。例如:

EXPLAIN ANALYZE
WITH updated_rows AS (
  SELECT id
  FROM big_table
  ORDER BY id
  LIMIT 10
)
UPDATE big_table
SET v = 'new value'
WHERE EXISTS (SELECT * FROM updated_rows d WHERE d.id = big_table.id);

QUERY PLAN                                                                                                                                                                      |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Update on big_table  (cost=0.97..83.88 rows=0 width=0) (actual time=0.586..0.587 rows=0 loops=1)                                                                                |
  ->  Nested Loop  (cost=0.97..83.88 rows=10 width=72) (actual time=0.056..0.078 rows=10 loops=1)                                                                               |
        ->  HashAggregate  (cost=0.68..0.78 rows=10 width=32) (actual time=0.044..0.048 rows=10 loops=1)                                                                        |
              Group Key: d.id                                                                                                                                                   |
              Batches: 1  Memory Usage: 24kB                                                                                                                                    |
              ->  Subquery Scan on d  (cost=0.29..0.65 rows=10 width=32) (actual time=0.034..0.038 rows=10 loops=1)                                                             |
                    ->  Limit  (cost=0.29..0.55 rows=10 width=4) (actual time=0.019..0.021 rows=10 loops=1)                                                                     |
                          ->  Index Only Scan using big_table_pkey on big_table big_table_1  (cost=0.29..2604.29 rows=100000 width=4) (actual time=0.018..0.019 rows=10 loops=1)|
                                Heap Fetches: 0                                                                                                                                 |
        ->  Index Scan using big_table_pkey on big_table  (cost=0.29..8.31 rows=1 width=10) (actual time=0.002..0.002 rows=1 loops=10)                                          |
              Index Cond: (id = d.id)                                                                                                                                           |
Planning Time: 0.232 ms                                                                                                                                                         |
Execution Time: 0.641 ms                                                                                                                                                        |

SELECT COUNT(*)
FROM big_table
WHERE v = 'new value';
count|
-----+
   10|

 
    

隨機更新或者刪除

如果我們想要更新或者刪除隨機的 N 條記錄,可以在 CTE 中使用 ORDER BY RANDOM(),不過可能會影響操作的性能,因為查詢需要對表中的全表數(shù)據(jù)進行排序。例如:

EXPLAIN ANALYZE
WITH updated_rows AS (
  SELECT id
  FROM big_table
  ORDER BY random()
  LIMIT 10
)
UPDATE big_table
SET v = 'new value'
WHERE EXISTS (SELECT * FROM updated_rows d WHERE d.id = big_table.id);

QUERY PLAN                                                                                                                                      |
------------------------------------------------------------------------------------------------------------------------------------------------+
Update on big_table  (cost=3952.51..4035.42 rows=0 width=0) (actual time=49.711..49.713 rows=0 loops=1)                                         |
  CTE updated_rows                                                                                                                              |
    ->  Limit  (cost=3951.96..3951.99 rows=10 width=12) (actual time=49.438..49.441 rows=10 loops=1)                                            |
          ->  Sort  (cost=3951.96..4201.96 rows=100000 width=12) (actual time=49.433..49.434 rows=10 loops=1)                                   |
                Sort Key: (random())                                                                                                            |
                Sort Method: top-N heapsort  Memory: 25kB                                                                                       |
                ->  Seq Scan on big_table big_table_1  (cost=0.00..1791.00 rows=100000 width=12) (actual time=0.034..29.596 rows=100000 loops=1)|
  ->  Nested Loop  (cost=0.52..83.43 rows=10 width=72) (actual time=49.493..49.534 rows=10 loops=1)                                             |
        ->  HashAggregate  (cost=0.23..0.33 rows=10 width=32) (actual time=49.462..49.465 rows=10 loops=1)                                      |
              Group Key: d.id                                                                                                                   |
              Batches: 1  Memory Usage: 24kB                                                                                                    |
              ->  CTE Scan on updated_rows d  (cost=0.00..0.20 rows=10 width=32) (actual time=49.446..49.453 rows=10 loops=1)                   |
        ->  Index Scan using big_table_pkey on big_table  (cost=0.29..8.31 rows=1 width=10) (actual time=0.006..0.006 rows=1 loops=10)          |
              Index Cond: (id = d.id)                                                                                                           |
Planning Time: 0.333 ms                                                                                                                         |
Execution Time: 49.820 ms                                                                                                                       |

    

此時,數(shù)據(jù)庫需要遍歷所有的數(shù)據(jù)并創(chuàng)建隨機數(shù),我們的執(zhí)行時間從 0.641 ms 上升到了 49.820 ms。

當我們基于非索引字段排序時的情況也是如此,例如:

EXPLAIN ANALYZE
WITH updated_rows AS (
  SELECT id
  FROM big_table
  ORDER BY v
  LIMIT 10
)
UPDATE big_table
SET v = 'new value'
WHERE EXISTS (SELECT * FROM updated_rows d WHERE d.id = big_table.id);

QUERY PLAN                                                                                                                                                     |
---------------------------------------------------------------------------------------------------------------------------------------------------------------+
Update on big_table  (cost=3702.21..5505.83 rows=0 width=0) (actual time=88.642..88.647 rows=0 loops=1)                                                        |
  ->  Hash Semi Join  (cost=3702.21..5505.83 rows=10 width=72) (actual time=19.764..41.442 rows=10 loops=1)                                                    |
        Hash Cond: (big_table.id = d.id)                                                                                                                       |
        ->  Seq Scan on big_table  (cost=0.00..1541.00 rows=100000 width=10) (actual time=0.044..13.495 rows=100000 loops=1)                                   |
        ->  Hash  (cost=3702.09..3702.09 rows=10 width=32) (actual time=19.652..19.656 rows=10 loops=1)                                                        |
              Buckets: 1024  Batches: 1  Memory Usage: 9kB                                                                                                     |
              ->  Subquery Scan on d  (cost=3701.96..3702.09 rows=10 width=32) (actual time=19.629..19.641 rows=10 loops=1)                                    |
                    ->  Limit  (cost=3701.96..3701.99 rows=10 width=12) (actual time=19.602..19.605 rows=10 loops=1)                                           |
                          ->  Sort  (cost=3701.96..3951.96 rows=100000 width=12) (actual time=19.601..19.602 rows=10 loops=1)                                  |
                                Sort Key: big_table_1.v                                                                                                        |
                                Sort Method: top-N heapsort  Memory: 25kB                                                                                      |
                                ->  Seq Scan on big_table big_table_1  (cost=0.00..1541.00 rows=100000 width=12) (actual time=0.020..8.034 rows=100000 loops=1)|
Planning Time: 0.232 ms                                                                                                                                        |
Execution Time: 88.712 ms                                                                                                                                      |

    為了性能考慮,我們應該基于索引字段執(zhí)行這種操作。
總結(jié)

本文介紹了一種使用 CTE 實現(xiàn) UPDATE LIMIT 以及 DELETE LIMIT 語句的通用方法,它可以用于各種數(shù)據(jù)庫管理系統(tǒng)。