国产视频网,亚洲午夜福利院在线观看

MySQL 8.0 新特性之哈希連接（Hash Join）

作者: 不剪發(fā)的Tony老師
畢業(yè)于北京航空航天大學，十多年數(shù)據(jù)庫管理與開發(fā)經(jīng)驗，目前在一家全球性的金融公司從事數(shù)據(jù)庫架構設計。CSDN學院簽約講師以及GitChat專欄作者。csdn上的博客收藏于以下地址：https://tonydong.blog.csdn.net

MySQL 開發(fā)組于 2019 年 10 月 14 日正式發(fā)布了 MySQL 8.0.18 GA 版本，帶來了一些新特性和增強功能。其中最引人注目的莫過于多表連接查詢支持 hash join 方式了。我們先來看看官方的描述：

MySQL 實現(xiàn)了用于內連接查詢的 hash join 方式。例如，從 MySQL 8.0.18 開始以下查詢可以使用 hash join 進行連接查詢：

SELECT *
    FROM t1
    JOIN t2
        ON t1.c1=t2.c1;

Hash join 不需要索引的支持。大多數(shù)情況下，hash join 比之前的 Block Nested-Loop 算法在沒有索引時的等值連接更加高效。使用以下語句創(chuàng)建三張測試表：

CREATE TABLE t1 (c1 INT, c2 INT);
CREATE TABLE t2 (c1 INT, c2 INT);
CREATE TABLE t3 (c1 INT, c2 INT);

使用EXPLAIN FORMAT=TREE命令可以看到執(zhí)行計劃中的 hash join，例如：

mysql> EXPLAIN FORMAT=TREE
    -> SELECT *
    ->     FROM t1
    ->     JOIN t2
    ->         ON t1.c1=t2.c1\G
*************************** 1. row ***************************
EXPLAIN: -> Inner hash join (t2.c1 = t1.c1) (cost=0.70 rows=1)
    -> Table scan on t2 (cost=0.35 rows=1)
    -> Hash
        -> Table scan on t1 (cost=0.35 rows=1)

必須使用 EXPLAIN 命令的 FORMAT=TREE 選項才能看到節(jié)點中的 hash join。另外，EXPLAIN ANALYZE命令也可以顯示 hash join 的使用信息。這也是該版本新增的一個功能。

多個表之間使用等值連接的的查詢也會進行這種優(yōu)化。例如以下查詢：

SELECT *
    FROM t1
    JOIN t2
        ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2)
    JOIN t3
        ON (t2.c1 = t3.c1);

在以上示例中，任何其他非等值連接的條件將會在連接操作之后作為過濾器使用?？梢酝ㄟ^EXPLAIN FORMAT=TREE命令的輸出進行查看：

mysql> EXPLAIN FORMAT=TREE
    -> SELECT *
    ->     FROM t1
    ->     JOIN t2
    ->         ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2)
    ->     JOIN t3
    ->         ON (t2.c1 = t3.c1)\G
*************************** 1. row ***************************
EXPLAIN: -> Inner hash join (t3.c1 = t1.c1) (cost=1.05 rows=1)
    -> Table scan on t3 (cost=0.35 rows=1)
    -> Hash
        -> Filter: (t1.c2 < t2.c2) (cost=0.70 rows=1)
            -> Inner hash join (t2.c1 = t1.c1) (cost=0.70 rows=1)
                -> Table scan on t2 (cost=0.35 rows=1)
                -> Hash
                    -> Table scan on t1 (cost=0.35 rows=1)

從以上輸出同樣可以看出，包含多個等值連接條件的查詢也可以（會）使用多個 hash join 連接。

但是，如果任何連接語句（ON）中沒有使用等值連接條件，將不會采用 hash join 連接方式。例如：

mysql> EXPLAIN FORMAT=TREE
    ->     SELECT *
    ->         FROM t1
    ->         JOIN t2
    ->             ON (t1.c1 = t2.c1)
    ->         JOIN t3
    ->             ON (t2.c1 < t3.c1)\G
*************************** 1. row ***************************
EXPLAIN: <not executable by iterator executor>

此時，將會采用性能更慢的 block nested loop 連接算法。這與 MySQL 8.0.18 之前版本中沒有索引時的情況一樣：

mysql> EXPLAIN
    ->     SELECT *
    ->         FROM t1
    ->         JOIN t2
    ->             ON (t1.c1 = t2.c1)
    ->         JOIN t3
    ->             ON (t2.c1 < t3.c1)\G
*************************** 1. row ***************************
           id: 1
select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
select_type: SIMPLE
        table: t2
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where; Using join buffer (Block Nested Loop)
*************************** 3. row ***************************
           id: 1
select_type: SIMPLE
        table: t3
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where; Using join buffer (Block Nested Loop)

Hash join 連接同樣適用于不指定查詢條件時的笛卡爾積（Cartesian product），例如：

mysql> EXPLAIN FORMAT=TREE
    -> SELECT *
    ->     FROM t1
    ->     JOIN t2
    ->     WHERE t1.c2 > 50\G
*************************** 1. row ***************************
EXPLAIN: -> Inner hash join (cost=0.70 rows=1)
    -> Table scan on t2 (cost=0.35 rows=1)
    -> Hash
        -> Filter: (t1.c2 > 50) (cost=0.35 rows=1)
            -> Table scan on t1 (cost=0.35 rows=1)

默認配置時，MySQL 所有可能的情況下都會使用 hash join。同時提供了兩種控制是否使用 hash join 的方法：

在全局或者會話級別設置服務器系統(tǒng)變量 optimizer_switch 中的 hash_join=on 或者 hash_join=off 選項。默認為 hash_join=on。
在語句級別為特定的連接指定優(yōu)化器提示 HASH_JOIN 或者 NO_HASH_JOIN。

可以通過系統(tǒng)變量 join_buffer_size 控制 hash join 允許使用的內存數(shù)量；hash join 不會使用超過該變量設置的內存數(shù)量。如果 hash join 所需的內存超過該閾值，MySQL 將會在磁盤中執(zhí)行操作。需要注意的是，如果 hash join 無法在內存中完成，并且打開的文件數(shù)量超過系統(tǒng)變量 open_files_limit 的值，連接操作可能會失敗。為了解決這個問題，可以使用以下方法之一：

增加 join_buffer_size 的值，確保 hash join 可以在內存中完成。
增加 open_files_limit 的值。

接下來我們比較一下 hash join 和 block nested loop 的性能，首先分別為 t1、t2 和 t3 生成 1000000 條記錄：

set join_buffer_size=2097152000;

SET @@cte_max_recursion_depth = 99999999;

INSERT INTO t1
-- INSERT INTO t2
-- INSERT INTO t3
WITH RECURSIVE t AS (
SELECT 1 AS c1, 1 AS c2
UNION ALL
SELECT t.c1 + 1, t.c1 * 2
FROM t
WHERE t.c1 < 1000000
)
SELECT *
FROM t;

沒有索引情況下的 hash join：

mysql> EXPLAIN ANALYZE
    -> SELECT COUNT(*)
    ->   FROM t1
    ->   JOIN t2
    ->     ON (t1.c1 = t2.c1)
    ->   JOIN t3
    ->     ON (t2.c1 = t3.c1)\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0) (actual time=22993.098..22993.099 rows=1 loops=1)
    -> Inner hash join (t3.c1 = t1.c1) (cost=9952535443663536.00 rows=9952435908880402) (actual time=14489.176..21737.032 rows=1000000 loops=1)
        -> Table scan on t3 (cost=0.00 rows=998412) (actual time=0.103..3973.892 rows=1000000 loops=1)
        -> Hash
            -> Inner hash join (t2.c1 = t1.c1) (cost=99682753413.67 rows=99682653660) (actual time=5663.592..12236.984 rows=1000000 loops=1)
                -> Table scan on t2 (cost=0.01 rows=998412) (actual time=0.067..3364.105 rows=1000000 loops=1)
                -> Hash
                    -> Table scan on t1 (cost=100539.40 rows=998412) (actual time=0.133..3395.799 rows=1000000 loops=1)

1 row in set (23.22 sec)

mysql> SELECT COUNT(*)
    ->   FROM t1
    ->   JOIN t2
    ->     ON (t1.c1 = t2.c1)
    ->   JOIN t3
    ->     ON (t2.c1 = t3.c1);
+----------+
| COUNT(*) |
+----------+
| 1000000 |
+----------+
1 row in set (12.98 sec)

實際運行花費了 12.98 秒。這個時候如果使用 block nested loop：

mysql> EXPLAIN FORMAT=TREE
    -> SELECT /*+ NO_HASH_JOIN(t1, t2, t3) */ COUNT(*)
    ->   FROM t1
    ->   JOIN t2
    ->     ON (t1.c1 = t2.c1)
    ->   JOIN t3
    ->     ON (t2.c1 = t3.c1)\G
*************************** 1. row ***************************
EXPLAIN: <not executable by iterator executor>

1 row in set (0.00 sec)

SELECT /*+ NO_HASH_JOIN(t1, t2, t3) */ COUNT(*)
FROM t1
JOIN t2
ON (t1.c1 = t2.c1)
JOIN t3
ON (t2.c1 = t3.c1);

EXPLAIN 顯示無法使用 hash join。查詢跑了幾十分鐘也沒有出結果，其中一個 CPU 使用率到了 100%；因為一直在執(zhí)行嵌套循環(huán)（1000000 的 3 次方）。

再看有索引時的 block nested loop 方法，增加索引：

mysql> CREATE index idx1 ON t1(c1);
Query OK, 0 rows affected (7.39 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> CREATE index idx2 ON t2(c1);
Query OK, 0 rows affected (6.77 sec)
Records: 0 Duplicates: 0 Warnings: 0

mysql> CREATE index idx3 ON t3(c1);
Query OK, 0 rows affected (7.23 sec)
Records: 0 Duplicates: 0 Warnings: 0

查看執(zhí)行計劃并運行相同的查詢語句：

mysql> EXPLAIN ANALYZE
    -> SELECT COUNT(*)
    ->   FROM t1
    ->   JOIN t2
    ->     ON (t1.c1 = t2.c1)
    ->   JOIN t3
    ->     ON (t2.c1 = t3.c1)\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0) (actual time=47684.034..47684.035 rows=1 loops=1)
    -> Nested loop inner join (cost=2295573.22 rows=998412) (actual time=0.116..46363.599 rows=1000000 loops=1)
        -> Nested loop inner join (cost=1198056.31 rows=998412) (actual time=0.087..25788.696 rows=1000000 loops=1)
            -> Filter: (t1.c1 is not null) (cost=100539.40 rows=998412) (actual time=0.050..5557.847 rows=1000000 loops=1)
                -> Index scan on t1 using idx1 (cost=100539.40 rows=998412) (actual time=0.043..3253.769 rows=1000000 loops=1)
            -> Index lookup on t2 using idx2 (c1=t1.c1) (cost=1.00 rows=1) (actual time=0.012..0.015 rows=1 loops=1000000)
        -> Index lookup on t3 using idx3 (c1=t1.c1) (cost=1.00 rows=1) (actual time=0.012..0.015 rows=1 loops=1000000)

1 row in set (47.68 sec)

mysql> SELECT COUNT(*)
    ->   FROM t1
    ->   JOIN t2
    ->     ON (t1.c1 = t2.c1)
    ->   JOIN t3
    ->     ON (t2.c1 = t3.c1);
+----------+
| COUNT(*) |
+----------+
| 1000000 |
+----------+
1 row in set (19.56 sec)

實際運行花費了 19.56 秒。所以在我們這個場景中的測試結果如下：

再增加一個 Oracle 12c 中無索引時 hash join 結果：1.282 s。

再增加一個 PostgreSQL 11.5 中無索引時 hash join 結果：6.234 s。

再增加一個 SQL 2017 中無索引時 hash join 結果：5.207 s。

在线午夜精品自拍小视频_无码av无码专区线_亚洲无码精品人妻_人人澡欧美一区

數(shù)據(jù)庫