You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Alessandro Solimando (Jira)" <ji...@apache.org> on 2022/01/28 13:01:00 UTC

[jira] [Updated] (HIVE-25909) Add test for 'hive.default.nulls.last' property for windows with ordering

     [ https://issues.apache.org/jira/browse/HIVE-25909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alessandro Solimando updated HIVE-25909:
----------------------------------------
    Description: 
Add a test around "hive.default.nulls.last" configuration property and its interaction with order by clauses within windows.

The property is known to respect such properties:

 
||hive.default.nulls.last||ASC||DESC||
|true|NULL LAST|NULL FIRST|
|false|NULL FIRST|NULL LAST|

 

The test can be based along the line of the following examples:
{noformat}
-- hive.default.nulls.last is true by default, it sets NULLS_FIRST for DESC
set hive.default.nulls.last;

OUT:
hive.default.nulls.last=true

SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC)
FROM test1;

OUT:
John Doe        1990-05-10 00:00:00     2022-01-10 00:00:00     1
John Doe        1990-05-10 00:00:00     2021-12-10 00:00:00     2
John Doe        1990-05-10 00:00:00     2021-11-10 00:00:00     3
John Doe        1990-05-10 00:00:00     2021-10-10 00:00:00     4
John Doe        1990-05-10 00:00:00     2021-09-10 00:00:00     5
John Doe        1987-05-10 00:00:00     NULL    1
John Doe        1987-05-10 00:00:00     2022-01-10 00:00:00     2
John Doe        1987-05-10 00:00:00     2021-12-10 00:00:00     3
John Doe        1987-05-10 00:00:00     2021-11-10 00:00:00     4
John Doe        1987-05-10 00:00:00     2021-10-10 00:00:00     5

-- we set hive.default.nulls.last=false, it sets NULLS_LAST for DESC
set hive.default.nulls.last=false;

SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC)
FROM test1;

OUT:
John Doe        1990-05-10 00:00:00     2022-01-10 00:00:00     1
John Doe        1990-05-10 00:00:00     2021-12-10 00:00:00     2
John Doe        1990-05-10 00:00:00     2021-11-10 00:00:00     3
John Doe        1990-05-10 00:00:00     2021-10-10 00:00:00     4
John Doe        1990-05-10 00:00:00     2021-09-10 00:00:00     5
John Doe        1987-05-10 00:00:00     2022-01-10 00:00:00     1
John Doe        1987-05-10 00:00:00     2021-12-10 00:00:00     2
John Doe        1987-05-10 00:00:00     2021-11-10 00:00:00     3
John Doe        1987-05-10 00:00:00     2021-10-10 00:00:00     4
John Doe        1987-05-10 00:00:00     NULL    5

-- we set hive.default.nulls.last=false but we have explicit NULLS_LAST, we expect NULLS_LAST
set hive.default.nulls.last=false;

SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC NULLS LAST)
FROM test1;

OUT:
John Doe        1990-05-10 00:00:00     2022-01-10 00:00:00     1
John Doe        1990-05-10 00:00:00     2021-12-10 00:00:00     2
John Doe        1990-05-10 00:00:00     2021-11-10 00:00:00     3
John Doe        1990-05-10 00:00:00     2021-10-10 00:00:00     4
John Doe        1990-05-10 00:00:00     2021-09-10 00:00:00     5
John Doe        1987-05-10 00:00:00     2022-01-10 00:00:00     1
John Doe        1987-05-10 00:00:00     2021-12-10 00:00:00     2
John Doe        1987-05-10 00:00:00     2021-11-10 00:00:00     3
John Doe        1987-05-10 00:00:00     2021-10-10 00:00:00     4
John Doe        1987-05-10 00:00:00     NULL    5

-- we have explicit NULLS_FIRST, we expect NULLS_FIRST
SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC NULLS FIRST)
FROM test1;

--OUT:
John Doe        1990-05-10 00:00:00     2022-01-10 00:00:00     1
John Doe        1990-05-10 00:00:00     2021-12-10 00:00:00     2
John Doe        1990-05-10 00:00:00     2021-11-10 00:00:00     3
John Doe        1990-05-10 00:00:00     2021-10-10 00:00:00     4
John Doe        1990-05-10 00:00:00     2021-09-10 00:00:00     5
John Doe        1987-05-10 00:00:00     NULL    1
John Doe        1987-05-10 00:00:00     2022-01-10 00:00:00     2
John Doe        1987-05-10 00:00:00     2021-12-10 00:00:00     3
John Doe        1987-05-10 00:00:00     2021-11-10 00:00:00     4
John Doe        1987-05-10 00:00:00     2021-10-10 00:00:00     5{noformat}
 

  was:
Add a test around "hive.default.nulls.last" configuration property and its interaction with order by clauses within windows.

The property is known to respect such properties:

 
||hive.default.nulls.last||ASC||DESC||
|true|NULL LAST|NULL FIRST|
|false|NULL FIRST|NULL LAST|

 

 

The test can be based along the line of the following examples:
{noformat}
-- hive.default.nulls.last is true by default, it sets NULLS_FIRST for DESC
set hive.default.nulls.last;

OUT:
hive.default.nulls.last=true

SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC)
FROM test1;

OUT:
John Doe        1990-05-10 00:00:00     2022-01-10 00:00:00     1
John Doe        1990-05-10 00:00:00     2021-12-10 00:00:00     2
John Doe        1990-05-10 00:00:00     2021-11-10 00:00:00     3
John Doe        1990-05-10 00:00:00     2021-10-10 00:00:00     4
John Doe        1990-05-10 00:00:00     2021-09-10 00:00:00     5
John Doe        1987-05-10 00:00:00     NULL    1
John Doe        1987-05-10 00:00:00     2022-01-10 00:00:00     2
John Doe        1987-05-10 00:00:00     2021-12-10 00:00:00     3
John Doe        1987-05-10 00:00:00     2021-11-10 00:00:00     4
John Doe        1987-05-10 00:00:00     2021-10-10 00:00:00     5

-- we set hive.default.nulls.last=false, it sets NULLS_LAST for DESC
set hive.default.nulls.last=false;

SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC)
FROM test1;

OUT:
John Doe        1990-05-10 00:00:00     2022-01-10 00:00:00     1
John Doe        1990-05-10 00:00:00     2021-12-10 00:00:00     2
John Doe        1990-05-10 00:00:00     2021-11-10 00:00:00     3
John Doe        1990-05-10 00:00:00     2021-10-10 00:00:00     4
John Doe        1990-05-10 00:00:00     2021-09-10 00:00:00     5
John Doe        1987-05-10 00:00:00     2022-01-10 00:00:00     1
John Doe        1987-05-10 00:00:00     2021-12-10 00:00:00     2
John Doe        1987-05-10 00:00:00     2021-11-10 00:00:00     3
John Doe        1987-05-10 00:00:00     2021-10-10 00:00:00     4
John Doe        1987-05-10 00:00:00     NULL    5

-- we set hive.default.nulls.last=false but we have explicit NULLS_LAST, we expect NULLS_LAST
set hive.default.nulls.last=false;

SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC NULLS LAST)
FROM test1;

OUT:
John Doe        1990-05-10 00:00:00     2022-01-10 00:00:00     1
John Doe        1990-05-10 00:00:00     2021-12-10 00:00:00     2
John Doe        1990-05-10 00:00:00     2021-11-10 00:00:00     3
John Doe        1990-05-10 00:00:00     2021-10-10 00:00:00     4
John Doe        1990-05-10 00:00:00     2021-09-10 00:00:00     5
John Doe        1987-05-10 00:00:00     2022-01-10 00:00:00     1
John Doe        1987-05-10 00:00:00     2021-12-10 00:00:00     2
John Doe        1987-05-10 00:00:00     2021-11-10 00:00:00     3
John Doe        1987-05-10 00:00:00     2021-10-10 00:00:00     4
John Doe        1987-05-10 00:00:00     NULL    5

-- we have explicit NULLS_FIRST, we expect NULLS_FIRST
SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC NULLS FIRST)
FROM test1;

--OUT:
John Doe        1990-05-10 00:00:00     2022-01-10 00:00:00     1
John Doe        1990-05-10 00:00:00     2021-12-10 00:00:00     2
John Doe        1990-05-10 00:00:00     2021-11-10 00:00:00     3
John Doe        1990-05-10 00:00:00     2021-10-10 00:00:00     4
John Doe        1990-05-10 00:00:00     2021-09-10 00:00:00     5
John Doe        1987-05-10 00:00:00     NULL    1
John Doe        1987-05-10 00:00:00     2022-01-10 00:00:00     2
John Doe        1987-05-10 00:00:00     2021-12-10 00:00:00     3
John Doe        1987-05-10 00:00:00     2021-11-10 00:00:00     4
John Doe        1987-05-10 00:00:00     2021-10-10 00:00:00     5{noformat}
 


> Add test for 'hive.default.nulls.last' property for windows with ordering
> -------------------------------------------------------------------------
>
>                 Key: HIVE-25909
>                 URL: https://issues.apache.org/jira/browse/HIVE-25909
>             Project: Hive
>          Issue Type: Test
>          Components: CBO
>    Affects Versions: 4.0.0
>            Reporter: Alessandro Solimando
>            Assignee: Alessandro Solimando
>            Priority: Major
>
> Add a test around "hive.default.nulls.last" configuration property and its interaction with order by clauses within windows.
> The property is known to respect such properties:
>  
> ||hive.default.nulls.last||ASC||DESC||
> |true|NULL LAST|NULL FIRST|
> |false|NULL FIRST|NULL LAST|
>  
> The test can be based along the line of the following examples:
> {noformat}
> -- hive.default.nulls.last is true by default, it sets NULLS_FIRST for DESC
> set hive.default.nulls.last;
> OUT:
> hive.default.nulls.last=true
> SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC)
> FROM test1;
> OUT:
> John Doe        1990-05-10 00:00:00     2022-01-10 00:00:00     1
> John Doe        1990-05-10 00:00:00     2021-12-10 00:00:00     2
> John Doe        1990-05-10 00:00:00     2021-11-10 00:00:00     3
> John Doe        1990-05-10 00:00:00     2021-10-10 00:00:00     4
> John Doe        1990-05-10 00:00:00     2021-09-10 00:00:00     5
> John Doe        1987-05-10 00:00:00     NULL    1
> John Doe        1987-05-10 00:00:00     2022-01-10 00:00:00     2
> John Doe        1987-05-10 00:00:00     2021-12-10 00:00:00     3
> John Doe        1987-05-10 00:00:00     2021-11-10 00:00:00     4
> John Doe        1987-05-10 00:00:00     2021-10-10 00:00:00     5
> -- we set hive.default.nulls.last=false, it sets NULLS_LAST for DESC
> set hive.default.nulls.last=false;
> SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC)
> FROM test1;
> OUT:
> John Doe        1990-05-10 00:00:00     2022-01-10 00:00:00     1
> John Doe        1990-05-10 00:00:00     2021-12-10 00:00:00     2
> John Doe        1990-05-10 00:00:00     2021-11-10 00:00:00     3
> John Doe        1990-05-10 00:00:00     2021-10-10 00:00:00     4
> John Doe        1990-05-10 00:00:00     2021-09-10 00:00:00     5
> John Doe        1987-05-10 00:00:00     2022-01-10 00:00:00     1
> John Doe        1987-05-10 00:00:00     2021-12-10 00:00:00     2
> John Doe        1987-05-10 00:00:00     2021-11-10 00:00:00     3
> John Doe        1987-05-10 00:00:00     2021-10-10 00:00:00     4
> John Doe        1987-05-10 00:00:00     NULL    5
> -- we set hive.default.nulls.last=false but we have explicit NULLS_LAST, we expect NULLS_LAST
> set hive.default.nulls.last=false;
> SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC NULLS LAST)
> FROM test1;
> OUT:
> John Doe        1990-05-10 00:00:00     2022-01-10 00:00:00     1
> John Doe        1990-05-10 00:00:00     2021-12-10 00:00:00     2
> John Doe        1990-05-10 00:00:00     2021-11-10 00:00:00     3
> John Doe        1990-05-10 00:00:00     2021-10-10 00:00:00     4
> John Doe        1990-05-10 00:00:00     2021-09-10 00:00:00     5
> John Doe        1987-05-10 00:00:00     2022-01-10 00:00:00     1
> John Doe        1987-05-10 00:00:00     2021-12-10 00:00:00     2
> John Doe        1987-05-10 00:00:00     2021-11-10 00:00:00     3
> John Doe        1987-05-10 00:00:00     2021-10-10 00:00:00     4
> John Doe        1987-05-10 00:00:00     NULL    5
> -- we have explicit NULLS_FIRST, we expect NULLS_FIRST
> SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC NULLS FIRST)
> FROM test1;
> --OUT:
> John Doe        1990-05-10 00:00:00     2022-01-10 00:00:00     1
> John Doe        1990-05-10 00:00:00     2021-12-10 00:00:00     2
> John Doe        1990-05-10 00:00:00     2021-11-10 00:00:00     3
> John Doe        1990-05-10 00:00:00     2021-10-10 00:00:00     4
> John Doe        1990-05-10 00:00:00     2021-09-10 00:00:00     5
> John Doe        1987-05-10 00:00:00     NULL    1
> John Doe        1987-05-10 00:00:00     2022-01-10 00:00:00     2
> John Doe        1987-05-10 00:00:00     2021-12-10 00:00:00     3
> John Doe        1987-05-10 00:00:00     2021-11-10 00:00:00     4
> John Doe        1987-05-10 00:00:00     2021-10-10 00:00:00     5{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)