PostgreSQL 11 新特性之窗口函數(shù)(window function)增強(qiáng)

作者: 不剪發(fā)的Tony老師
畢業(yè)于北京航空航天大學(xué),十多年數(shù)據(jù)庫(kù)管理與開(kāi)發(fā)經(jīng)驗(yàn),目前在一家全球性的金融公司從事數(shù)據(jù)庫(kù)架構(gòu)設(shè)計(jì)。CSDN學(xué)院簽約講師以及GitChat專(zhuān)欄作者。csdn上的博客收藏于以下地址:https://tonydong.blog.csdn.net

文章目錄

PostgreSQL 11 針對(duì)窗口函數(shù)(window function)進(jìn)行了增強(qiáng),添加了 SQL:2011 標(biāo)準(zhǔn)中的所有窗口范圍(window frame)子句選項(xiàng)。具體來(lái)說(shuō),新增了以下功能:

    允許在RANGE窗口模式中使用off_set PRECEDING/FOLLOWING限定窗口范圍
    支持GROUPS窗口模式,使用基于組的方式限定窗口范圍
    支持窗口范圍子句的排除選項(xiàng),用于排除窗口內(nèi)的某些數(shù)據(jù)行

首先,對(duì)于RANGE窗口模式,增加了off_set PRECEDING/FOLLOWING子句,現(xiàn)在可以支持所有的選項(xiàng):

RANGE frame_start
RANGE BETWEEN frame_start AND frame_end

   

其中,frame_start 和 frame_end 可以是以下選項(xiàng)之一:

UNBOUNDED PRECEDING
off_set PRECEDING
CURRENT ROW
off_set FOLLOWING
UNBOUNDED FOLLOWING

    
創(chuàng)建一個(gè)測(cè)試表:

CREATE TABLE test
(
  id      SERIAL PRIMARY KEY,
  val     INT4,
  logtime TIMESTAMP
);

INSERT INTO test(val, logtime)
VALUES
(1, '2019-01-15 08:08:17'),
(1, '2019-01-15 08:14:30'),
(3, '2019-01-15 08:36:00'),
(6, '2019-01-15 09:20:56'),
(6, '2019-01-15 10:15:41');

以下示例比較了ROWS模式和RANGE模式的區(qū)別:

SELECT id, val, logtime,
       sum(val) OVER (ORDER BY val ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as sum_rows,
       sum(val) OVER (ORDER BY val RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) as sum_range,
       sum(val) OVER (ORDER BY logtime RANGE BETWEEN INTERVAL '10 minute' PRECEDING AND '10 minute' FOLLOWING) as sum_range_time
  FROM test;
 id | val |       logtime       | sum_rows | sum_range | sum_range_time
----+-----+---------------------+----------+-----------+----------------
  1 |   1 | 2019-01-15 08:08:17 |        2 |         2 |              2
  2 |   1 | 2019-01-15 08:14:30 |        5 |         2 |              2
  3 |   3 | 2019-01-15 08:36:00 |       10 |         3 |              3
  4 |   6 | 2019-01-15 09:20:56 |       15 |        12 |              6
  5 |   6 | 2019-01-15 10:15:41 |       12 |        12 |              6
(5 rows)

   
其中,

    sum_rows 列計(jì)算按照字段 val 排序后,每一行以及前后各一行的合計(jì)值;
    sum_range 列計(jì)算按照字段 val 排序后,每一行以及前后和它的值相差小于等于 1 的那些行的合計(jì)值;
    sum_range_time 列計(jì)算按照字段 logtime 排序后,每一行以及前后和它的值相差小于等于 10 分鐘的那些行的合計(jì)值。

其次,PostgreSQL 11 增加了GROUPS窗口模式。

GROUPS frame_start
GROUPS BETWEEN frame_start AND frame_end

 
其中,frame_start 和 frame_end 可以是以下選項(xiàng)之一:

UNBOUNDED PRECEDING
off_set PRECEDING
CURRENT ROW
off_set FOLLOWING
UNBOUNDED FOLLOWING

   
對(duì)于GROUPS窗口模式,計(jì)算窗口范圍時(shí)使用組(由ORDER BY決定)為單位,而不是以行ROWS模式)或者值RANGE模式)為單位。

基于上面的示例,增加GROUPS模式:

SELECT id, val, logtime,
       sum(val) OVER (ORDER BY val ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as sum_rows,
       sum(val) OVER (ORDER BY val RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) as sum_range,
       sum(val) OVER (ORDER BY val GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as sum_groups
  FROM test;
 id | val |       logtime       | sum_rows | sum_range | sum_groups
----+-----+---------------------+----------+-----------+------------
  1 |   1 | 2019-01-15 08:08:17 |        2 |         2 |          5
  2 |   1 | 2019-01-15 08:14:30 |        5 |         2 |          5
  3 |   3 | 2019-01-15 08:36:00 |       10 |         3 |         17
  4 |   6 | 2019-01-15 09:20:56 |       15 |        12 |         15
  5 |   6 | 2019-01-15 10:15:41 |       12 |        12 |         15
(5 rows)

 
其中,sum_groups 計(jì)算按照字段 val 排序后,每一行所在的組以及前后各一個(gè)組中所有行的合計(jì)值。

ORDER BY val決定了組,示例中存在 3 個(gè)組(1、3、6)。對(duì)于第一行數(shù)據(jù),它所在的組有兩個(gè) val 值(1 和 1),1 PRECEDING組不存在,1 FOLLOWING組有一個(gè)值(3),合計(jì)為 5(1+1+3)。

最后,PostgreSQL 11 支持窗口范圍子句的排除選項(xiàng) frame_exclusion,用于排除窗口內(nèi)的某些數(shù)據(jù):

{ RANGE | ROWS | GROUPS } frame_start [ frame_exclusion ]
{ RANGE | ROWS | GROUPS } BETWEEN frame_start AND frame_end [ frame_exclusion ]

    
其中,frame_start 和 frame_end 可以是以下選項(xiàng)之一:

UNBOUNDED PRECEDING
off_set PRECEDING
CURRENT ROW
off_set FOLLOWING
UNBOUNDED FOLLOWING

    
frame_exclusion 可以是以下選項(xiàng)之一:

EXCLUDE CURRENT ROW
EXCLUDE GROUP
EXCLUDE TIES
EXCLUDE NO OTHERS

 
它們的作用分別為:

    EXCLUDE CURRENT ROW排除當(dāng)前行
    EXCLUDE GROUP排除當(dāng)前行所在的組,也就是和當(dāng)前行排序相同的行
    EXCLUDE TIES排除當(dāng)前行所在組中的其他行,但是不排除當(dāng)前行
    EXCLUDE NO OTHERS不排除任何行,默認(rèn)值

以下示例演示了這些選項(xiàng)的效果:

SELECT id, val, logtime,
       sum(val) OVER (ORDER BY val ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE NO OTHERS) as sum_rows,
       sum(val) OVER (ORDER BY val ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE CURRENT ROW) as exclude_current_row,
       sum(val) OVER (ORDER BY val ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE GROUP) as exclude_group,
       sum(val) OVER (ORDER BY val ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE TIES) as exclude_ties
  FROM test;
 id | val |       logtime       | sum_rows | exclude_current_row | exclude_group | exclude_ties
----+-----+---------------------+----------+---------------------+---------------+--------------
  1 |   1 | 2019-01-15 08:08:17 |        2 |                   1 |               |            1
  2 |   1 | 2019-01-15 08:14:30 |        5 |                   4 |             3 |            4
  3 |   3 | 2019-01-15 08:36:00 |       10 |                   7 |             7 |           10
  4 |   6 | 2019-01-15 09:20:56 |       15 |                   9 |             3 |            9
  5 |   6 | 2019-01-15 10:15:41 |       12 |                   6 |               |            6
(5 rows)

    
至此,PostgreSQL 11 支持 SQL:2011 標(biāo)準(zhǔn)中定義的所有窗口范圍子句選項(xiàng)。

官方文檔:
https://www.postgresql.org/docs/11/tutorial-window.html
https://www.postgresql.org/docs/11/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS
https://www.postgresql.org/docs/11/queries-table-expressions.html#QUERIES-WINDOW
https://www.postgresql.org/docs/11/functions-window.html
https://www.postgresql.org/docs/11/sql-select.html#SQL-WINDOW