You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/02/19 04:06:00 UTC

[jira] [Work logged] (HIVE-22903) Vectorized row_number() resets the row number after one batch in case of constant expression in partition clause

     [ https://issues.apache.org/jira/browse/HIVE-22903?focusedWorklogId=389297&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389297 ]

ASF GitHub Bot logged work on HIVE-22903:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 19/Feb/20 04:05
            Start Date: 19/Feb/20 04:05
    Worklog Time Spent: 10m 
      Work Description: ShubhamChaurasia commented on pull request #918: HIVE-22903: Vectorized row_number() resets the row number after one b…
URL: https://github.com/apache/hive/pull/918
 
 
   Vectorized row_number() resets the row number after one batch in case of constant expression in partition clause - This patch skips resetting of row numbers in such scenarios.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 389297)
    Remaining Estimate: 0h
            Time Spent: 10m

> Vectorized row_number() resets the row number after one batch in case of constant expression in partition clause
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-22903
>                 URL: https://issues.apache.org/jira/browse/HIVE-22903
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF, Vectorization
>    Affects Versions: 4.0.0
>            Reporter: Shubham Chaurasia
>            Assignee: Shubham Chaurasia
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Vectorized row number implementation resets the row number when constant expression is passed in partition clause.
> Repro Query
> {code}
> select row_number() over(partition by 1) r1, t from over10k_n8;
> Or
> select row_number() over() r1, t from over10k_n8;
> {code}
> where table over10k_n8 contains more than 1024 records.
> This happens because currently in VectorPTFOperator, we reset evaluators if only partition clause is there.
> {code:java}
>     // If we are only processing a PARTITION BY, reset our evaluators.
>     if (!isPartitionOrderBy) {
>       groupBatches.resetEvaluators();
>     }
> {code}
> To resolve, we should also check if the entire partition clause is a constant expression, if it is so then we should not do {{groupBatches.resetEvaluators()}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)