You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/01/10 15:55:00 UTC

[jira] [Work logged] (HIVE-25856) Intermittent null ordering in plans of queries with GROUP BY and LIMIT

     [ https://issues.apache.org/jira/browse/HIVE-25856?focusedWorklogId=706254&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-706254 ]

ASF GitHub Bot logged work on HIVE-25856:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 10/Jan/22 15:54
            Start Date: 10/Jan/22 15:54
    Worklog Time Spent: 10m 
      Work Description: zabetak opened a new pull request #2932:
URL: https://github.com/apache/hive/pull/2932


   ### What changes were proposed in this pull request?
   Remove singleton instantiation of `HiveAggregateSortLimitRule` binding the default null ordering behavior for every subsequent query.
   
   ### Why are the changes needed?
   1. Avoid intermittent test failures and plan changes in CI.
   2. Respect value of `hive.default.nulls.last` property.
   
   ### Does this PR introduce _any_ user-facing change?
   Small differences in query plans may appear.
   
   ### How was this patch tested?
   mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqtest="cbo_AggregateSortLimitRule.q" (with & without the changes in `HiveAggregateSortLimitRule`)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 706254)
    Remaining Estimate: 0h
            Time Spent: 10m

> Intermittent null ordering in plans of queries with GROUP BY and LIMIT
> ----------------------------------------------------------------------
>
>                 Key: HIVE-25856
>                 URL: https://issues.apache.org/jira/browse/HIVE-25856
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:sql}
> CREATE TABLE person (id INTEGER, country STRING);
> EXPLAIN CBO SELECT country, count(1) FROM person GROUP BY country LIMIT 5;
> {code}
> The {{EXPLAIN}} query produces a slightly different plan (ordering of nulls) from one execution to another.
> {noformat}
> CBO PLAN:
> HiveSortLimit(sort0=[$1], dir0=[ASC-nulls-first], fetch=[5])
>   HiveProject(country=[$0], $f1=[$1])
>     HiveAggregate(group=[{1}], agg#0=[count()])
>       HiveTableScan(table=[[default, person]], table:alias=[person])
> {noformat}
> {noformat}
> CBO PLAN:
> HiveSortLimit(sort0=[$1], dir0=[ASC], fetch=[5])
>   HiveProject(country=[$0], $f1=[$1])
>     HiveAggregate(group=[{1}], agg#0=[count()])
>       HiveTableScan(table=[[default, person]], table:alias=[person])
> {noformat}
> This is unlikely to cause wrong results cause most aggregate functions (not all) do not return nulls thus null ordering doesn't matter much but it can lead to other problems such as:
> * intermittent CI failures
> * query/plan caching
> I bumped into this problem after investigating test failures in CI. The following query in [offset_limit_ppd_optimizer.q|https://github.com/apache/hive/blob/9cfdac44975bf38193de7449fc21b9536109daea/ql/src/test/queries/clientpositive/offset_limit_ppd_optimizer.q] returns different plan when it runs individually and when it runs along with some other qtest files.
> {code:sql}
> explain
> select * from
> (select key, count(1) from src group by key order by key limit 10,20) subq
> join
> (select key, count(1) from src group by key limit 20,20) subq2
> on subq.key=subq2.key limit 3,5;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)