Oracle 模擬部分索引提高查詢性能

作者: 不剪發(fā)的Tony老師
畢業(yè)于北京航空航天大學(xué),十多年數(shù)據(jù)庫管理與開發(fā)經(jīng)驗(yàn),目前在一家全球性的金融公司從事數(shù)據(jù)庫架構(gòu)設(shè)計。CSDN學(xué)院簽約講師以及GitChat專欄作者。csdn上的博客收藏于以下地址:https://tonydong.blog.csdn.net

 

文章目錄

        索引與空值
        模擬部分索引
        實(shí)現(xiàn)業(yè)務(wù)約束
        總結(jié)

大家好,我是只談技術(shù)不剪發(fā)的 Tony 老師。

在 Oracle 數(shù)據(jù)庫中,如果索引字段的取值全部為 NULL,索引中不會包含該數(shù)據(jù)行。利用這一特性,我們可以創(chuàng)建一個只索引滿足特定條件的數(shù)據(jù)行的部分索引(partial index)。由于部分索引不需要對表中的全部數(shù)據(jù)進(jìn)行索引,因此索引會更小,在特定場景下通過部分索引查找數(shù)據(jù)時性能會更好。本文就來介紹一下如何在 Oracle 中模擬部分索引。

如果覺得文章有用,歡迎評論??、點(diǎn)贊??、推薦??

    ??Oracle 12c 中允許為分區(qū)表的部分分區(qū)創(chuàng)建全局索引和局部索引,也屬于一種部分索引技術(shù),具體可以參考官方文檔。

索引與空值

EMPLOYEE 員工表的 MANAGER 字段上存在一個索引 IDX_EMP_MANAGER(示例表),當(dāng)我們使用該字段作為條件進(jìn)行查詢時可以利用索引:

EXPLAIN PLAN FOR
SELECT *
FROM employee
WHERE manager = 5;

SELECT * FROM TABLE(DBMS_XPLAN.display);
PLAN_TABLE_OUTPUT                                                                                      |
-------------------------------------------------------------------------------------------------------|
Plan hash value: 3813740982                                                                            |
                                                                                                       |
-------------------------------------------------------------------------------------------------------|
| Id  | Operation                           | Name            | Rows  | Bytes | Cost (%CPU)| Time     ||
-------------------------------------------------------------------------------------------------------|
|   0 | SELECT STATEMENT                    |                 |     4 |   224 |     2   (0)| 00:00:01 ||
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| EMPLOYEE        |     4 |   224 |     2   (0)| 00:00:01 ||
|*  2 |   INDEX RANGE SCAN                  | IDX_EMP_MANAGER |     4 |       |     1   (0)| 00:00:01 ||
-------------------------------------------------------------------------------------------------------|
                                                                                                       |
Predicate Information (identified by operation id):                                                    |
---------------------------------------------------                                                    |
                                                                                                       |
   2 - access("MANAGER"=5)                                                                             |

 

執(zhí)行計劃顯示該語句使用了索引范圍掃描(INDEX RANGE SCAN)。但是如果想要查詢沒有 MANAGER 的員工時,無法利用索引進(jìn)行優(yōu)化:

EXPLAIN PLAN FOR
SELECT *
FROM employee
WHERE manager IS NULL;

SELECT * FROM TABLE(DBMS_XPLAN.display);
PLAN_TABLE_OUTPUT                                                             |
------------------------------------------------------------------------------|
Plan hash value: 2119105728                                                   |
                                                                              |
------------------------------------------------------------------------------|
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     ||
------------------------------------------------------------------------------|
|   0 | SELECT STATEMENT  |          |     1 |    56 |     3   (0)| 00:00:01 ||
|*  1 |  TABLE ACCESS FULL| EMPLOYEE |     1 |    56 |     3   (0)| 00:00:01 ||
------------------------------------------------------------------------------|
                                                                              |
Predicate Information (identified by operation id):                           |
---------------------------------------------------                           |
                                                                              |
   1 - filter("MANAGER" IS NULL)                                              |

 

執(zhí)行計劃顯示該語句使用了全表掃描(TABLE ACCESS FULL),因?yàn)樗饕?IDX_EMP_MANAGER 中沒有相關(guān)的數(shù)據(jù)。

對于復(fù)合索引(多列索引),如果至少有一個索引字段的值不為空,就會創(chuàng)建相應(yīng)的索引項(xiàng)。我們創(chuàng)建一個基于 DEPT_ID 和 MANAGER 字段的復(fù)合索引:

CREATE INDEX IDX_EMP_DEPT_MANAGER ON EMPLOYEE(DEPT_ID, MANAGER);

 

由于 DEPT_ID 字段不會為空,即使 MANAGER 為空的數(shù)據(jù)也會生成相應(yīng)的索引項(xiàng)。再次執(zhí)行上面的查詢語句:

EXPLAIN PLAN FOR
SELECT *
FROM employee
WHERE manager IS NULL;

SELECT * FROM TABLE(DBMS_XPLAN.display);
PLAN_TABLE_OUTPUT                                                                                           |
------------------------------------------------------------------------------------------------------------|
Plan hash value: 4053329608                                                                                 |
                                                                                                            |
------------------------------------------------------------------------------------------------------------|
| Id  | Operation                           | Name                 | Rows  | Bytes | Cost (%CPU)| Time     ||
------------------------------------------------------------------------------------------------------------|
|   0 | SELECT STATEMENT                    |                      |     1 |    56 |     2   (0)| 00:00:01 ||
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| EMPLOYEE             |     1 |    56 |     2   (0)| 00:00:01 ||
|*  2 |   INDEX SKIP SCAN                   | IDX_EMP_DEPT_MANAGER |     1 |       |     1   (0)| 00:00:01 ||
------------------------------------------------------------------------------------------------------------|
                                                                                                            |
Predicate Information (identified by operation id):                                                         |
---------------------------------------------------                                                         |
                                                                                                            |
   2 - access("MANAGER" IS NULL)                                                                            |
       filter("MANAGER" IS NULL)                                                                            |

 

執(zhí)行計劃顯示該語句使用了索引跳躍掃描(INDEX SKIP SCAN)。為了進(jìn)一步優(yōu)化該查詢,可以將復(fù)合索引中的字段交換順序,也就是將 MANAGER 放在最左側(cè):

DROP INDEX IDX_EMP_DEPT_MANAGER;
CREATE INDEX IDX_EMP_DEPT_MANAGER ON EMPLOYEE(MANAGER, DEPT_ID);

 

然后再次查看執(zhí)行計劃:

PLAN_TABLE_OUTPUT                                                                                           |
------------------------------------------------------------------------------------------------------------|
Plan hash value: 3287566901                                                                                 |
                                                                                                            |
------------------------------------------------------------------------------------------------------------|
| Id  | Operation                           | Name                 | Rows  | Bytes | Cost (%CPU)| Time     ||
------------------------------------------------------------------------------------------------------------|
|   0 | SELECT STATEMENT                    |                      |     1 |    56 |     2   (0)| 00:00:01 ||
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| EMPLOYEE             |     1 |    56 |     2   (0)| 00:00:01 ||
|*  2 |   INDEX RANGE SCAN                  | IDX_EMP_DEPT_MANAGER |     1 |       |     1   (0)| 00:00:01 ||
------------------------------------------------------------------------------------------------------------|
                                                                                                            |
Predicate Information (identified by operation id):                                                         |
---------------------------------------------------                                                         |
                                                                                                            |
   2 - access("MANAGER" IS NULL)                                                                            |

 

執(zhí)行計劃顯示該語句使用了索引范圍掃描(INDEX RANGE SCAN)。索引中的 DEPT_ID 可以是其他任何非空字段,確保 MANAGER 字段為空的數(shù)據(jù)存在索引。我們甚至可以使用一個常量創(chuàng)建該索引:

DROP INDEX IDX_EMP_DEPT_MANAGER;
CREATE INDEX IDX_EMP_DEPT_MANAGER ON EMPLOYEE(MANAGER, 0);

 

以上索引實(shí)際上是一個函數(shù)索引,函數(shù)的值是常量 0。這種方式也可以實(shí)現(xiàn)對 NULL 值的索引。
模擬部分索引

利用 Oracle 對索引字段中 NULL 值的處理方式,我們可以模擬一個部分索引。例如,對于以下訂單表 orders:

CREATE TABLE orders (
  id INTEGER PRIMARY KEY,
  customer_id INTEGER,
  status VARCHAR(10)
);

INSERT INTO orders (id, customer_id, status)
WITH t(id, customer_id, status) AS (
  SELECT 1,
         floor(dbms_random.value * 10000),
         CASE floor(dbms_random.value * 100)
           WHEN 0 THEN 'pending'
           WHEN 1 THEN 'shipped'
           ELSE 'completed'
         END
  FROM dual
  UNION ALL
  SELECT id + 1,
         floor(dbms_random.value * 10000),
         CASE floor(dbms_random.value * 100)
           WHEN 0 THEN 'pending'
           WHEN 1 THEN 'shipped'
           ELSE 'completed'
         END
  FROM t WHERE id < 1000000
)
SELECT id, customer_id, status
FROM t;

SELECT count(*) FROM orders;
COUNT(*)|
--------|
 1000000|

 

orders 表中總共有 1000000 個訂單,其中絕大部的訂單都處于完成狀態(tài)。通常我們只需要針對某個用戶未完成的訂單進(jìn)行查詢跟蹤,因此可以創(chuàng)建一個基于用戶編號和狀態(tài)的部分索引:

CREATE INDEX full_idx ON orders (customer_id, status);

BEGIN
 DBMS_STATS.GATHER_TABLE_STATS (
 OWNNAME => 'TONY',
 TABNAME => 'ORDERS'
 );
END;

 

然后查看以下語句的執(zhí)行計劃:

EXPLAIN PLAN FOR
SELECT *
FROM orders
WHERE customer_id = 6666
AND status != 'completed';

SELECT * FROM TABLE(DBMS_XPLAN.display);
PLAN_TABLE_OUTPUT                                                                               |
------------------------------------------------------------------------------------------------|
Plan hash value: 2146694866                                                                     |
                                                                                                |
------------------------------------------------------------------------------------------------|
| Id  | Operation                           | Name     | Rows  | Bytes | Cost (%CPU)| Time     ||
------------------------------------------------------------------------------------------------|
|   0 | SELECT STATEMENT                    |          |     2 |    38 |     5   (0)| 00:00:01 ||
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| ORDERS   |     2 |    38 |     5   (0)| 00:00:01 ||
|*  2 |   INDEX RANGE SCAN                  | FULL_IDX |     2 |       |     3   (0)| 00:00:01 ||
------------------------------------------------------------------------------------------------|
                                                                                                |
Predicate Information (identified by operation id):                                             |
---------------------------------------------------                                             |
                                                                                                |
   2 - access("CUSTOMER_ID"=6666)                                                               |
       filter("STATUS"<>'completed')                                                            |

 

執(zhí)行計劃輸出結(jié)果顯示,該語句使用了索引范圍掃描(INDEX RANGE SCAN)。我們可以查看一下索引 FULL_IDX 占用的空間大小:

SELECT sum(bytes)/1024/1024 AS MB
FROM user_segments
WHERE segment_type ='INDEX'
AND segment_name = 'FULL_IDX';

MB|
--|
28|

 

下面我們再創(chuàng)建一個部分索引,只包含未完成的訂單數(shù)據(jù),從而減少索引的數(shù)據(jù)量。為此需要先創(chuàng)建一個函數(shù):

CREATE OR REPLACE
FUNCTION f_order_status(status VARCHAR2, customer_id INTEGER)
RETURN INTEGER
DETERMINISTIC
AS
BEGIN
   IF status != 'completed' THEN
      RETURN customer_id;
   ELSE
      RETURN NULL;
   END IF;
END;

 

如果輸入?yún)?shù) status 不等于 completed,直接返回輸入?yún)?shù) customer_id;否則,返回 NULL。另外,作為索引使用的函數(shù)必須指定 DETERMINISTIC 屬性。

然后我們創(chuàng)建一個函數(shù)索引:

CREATE INDEX partial_idx ON orders (f_order_status(status, customer_id));

BEGIN
 DBMS_STATS.GATHER_TABLE_STATS (
 OWNNAME => 'TONY',
 TABNAME => 'ORDERS'
 );
END;

 

索引 partial_idx 中只包含狀態(tài)未完成的訂單,而且只包含 customer_id 字段。然后再次查看執(zhí)行計劃,需要修改一下查詢語句的寫法:

EXPLAIN PLAN FOR
SELECT *
FROM orders
WHERE f_order_status(status, customer_id) = 6666;

SELECT * FROM TABLE(DBMS_XPLAN.display);
PLAN_TABLE_OUTPUT                                                                                  |
---------------------------------------------------------------------------------------------------|
Plan hash value: 2791331943                                                                        |
                                                                                                   |
---------------------------------------------------------------------------------------------------|
| Id  | Operation                           | Name        | Rows  | Bytes | Cost (%CPU)| Time     ||
---------------------------------------------------------------------------------------------------|
|   0 | SELECT STATEMENT                    |             |     2 |    40 |     4   (0)| 00:00:01 ||
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| ORDERS      |     2 |    40 |     4   (0)| 00:00:01 ||
|*  2 |   INDEX RANGE SCAN                  | PARTIAL_IDX |     2 |       |     1   (0)| 00:00:01 ||
---------------------------------------------------------------------------------------------------|
                                                                                                   |
Predicate Information (identified by operation id):                                                |
---------------------------------------------------                                                |
                                                                                                   |
   2 - access("TONY"."F_ORDER_STATUS"("STATUS","CUSTOMER_ID")=6666)                                |

 

查詢條件中使用了 f_order_status 函數(shù)返回指定用戶未完成的訂單。輸出結(jié)果顯示執(zhí)行計劃選擇了索引 PARTIAL_IDX,而不是 FULL_IDX;因?yàn)檫@樣性能更好,只需要掃描更少的索引記錄就可以得到結(jié)果。

我們同樣可以查看一下索引 PARTIAL_IDX 占用的空間大?。?/p>

SELECT sum(bytes)/1024/1024 AS MB
FROM user_segments
WHERE segment_type ='INDEX'
AND segment_name = 'PARTIAL_IDX';
MB    |
------|
0.4375|

 

索引只有 0.4375 MB,而不是 28 MB;索引縮小到了原來的 1/64,因?yàn)榻^大多數(shù)訂單都處于完成狀態(tài)。
實(shí)現(xiàn)業(yè)務(wù)約束

利用部分索引還可以用于實(shí)現(xiàn)其他的功能。例如,我們可以將索引 PARTIAL_IDX 定義為唯一索引,從而實(shí)現(xiàn)每個客戶只能存在一個未完成訂單的約束。

DROP INDEX partial_idx;
TRUNCATE TABLE orders;

CREATE UNIQUE INDEX partial_idx ON orders (f_order_status(status, customer_id));

INSERT INTO orders(id, customer_id, status) VALUES (1, 1, 'pending');

INSERT INTO orders(id, customer_id, status) VALUES (2, 1, 'pending');
SQL Error [1] [23000]: ORA-00001: unique constraint (TONY.PARTIAL_IDX) violated

 

增加唯一約束之后,客戶必須完成一個訂單之后才能繼續(xù)創(chuàng)建新的訂單。
總結(jié)

數(shù)據(jù)庫中的部分索引只針對滿足特定條件的數(shù)據(jù)進(jìn)行索引,通常比普通索引更加節(jié)省空間,可以用于優(yōu)化特定條件的查詢。利用 Oracle 不對空值進(jìn)行索引的特性可以模擬實(shí)現(xiàn)一個部分索引。