You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/06/06 00:26:01 UTC

[jira] [Work logged] (HIVE-22903) Vectorized row_number() resets the row number after one batch in case of constant expression in partition clause

     [ https://issues.apache.org/jira/browse/HIVE-22903?focusedWorklogId=442137&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-442137 ]

ASF GitHub Bot logged work on HIVE-22903:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Jun/20 00:25
            Start Date: 06/Jun/20 00:25
    Worklog Time Spent: 10m 
      Work Description: github-actions[bot] commented on pull request #918:
URL: https://github.com/apache/hive/pull/918#issuecomment-639914123


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 442137)
    Time Spent: 20m  (was: 10m)

> Vectorized row_number() resets the row number after one batch in case of constant expression in partition clause
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-22903
>                 URL: https://issues.apache.org/jira/browse/HIVE-22903
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF, Vectorization
>    Affects Versions: 4.0.0
>            Reporter: Shubham Chaurasia
>            Assignee: Shubham Chaurasia
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>         Attachments: HIVE-22903.01.patch, HIVE-22903.02.patch, HIVE-22903.03.patch, HIVE-22903.04.patch, HIVE-22903.patch
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Vectorized row number implementation resets the row number when constant expression is passed in partition clause.
> Repro Query
> {code}
> select row_number() over(partition by 1) r1, t from over10k_n8;
> Or
> select row_number() over() r1, t from over10k_n8;
> {code}
> where table over10k_n8 contains more than 1024 records.
> This happens because currently in VectorPTFOperator, we reset evaluators if only partition clause is there.
> {code:java}
>     // If we are only processing a PARTITION BY, reset our evaluators.
>     if (!isPartitionOrderBy) {
>       groupBatches.resetEvaluators();
>     }
> {code}
> To resolve, we should also check if the entire partition clause is a constant expression, if it is so then we should not do {{groupBatches.resetEvaluators()}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)