You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Ramesh Kumar Thangarajan (Jira)" <ji...@apache.org> on 2020/02/20 21:40:00 UTC

[jira] [Comment Edited] (HIVE-22903) Vectorized row_number() resets the row number after one batch in case of constant expression in partition clause

    [ https://issues.apache.org/jira/browse/HIVE-22903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041312#comment-17041312 ] 

Ramesh Kumar Thangarajan edited comment on HIVE-22903 at 2/20/20 9:39 PM:
--------------------------------------------------------------------------

[~ShubhamChaurasia] I was thinking something like
{code:java}
for (VectorPTFEvaluatorBase evaluator : evaluators) {
  if(!(evaluator instanceof VectorPTFEvaluatorRowNumber && verifyEvaluatorArgumentsAreConstant)) {
    evaluator.resetEvaluator();
  }
}
{code}
Need to pass the arguments of each of the evaluators to compute this –  verifyEvaluatorArgumentsAreConstant

Looking more into this, the problem doesn't look specific to constants too. For example, we reset the evaluators for every batch. So the problem should exists for grouping by columns too. We might notice the issue if we actually group by a column, where the column contains a repeated value for more than 1024 times(spanning the VRB size). Thinking more about this, it looks like we are not calling the resetEvaluators() at the right place in the code. I think we are not differentiating between the partition groups and the row batch groups. We should only reset for the partition groups and not for the row batch groups.

 


was (Author: rameshkumar):
I was thinking something like

 
{code:java}
for (VectorPTFEvaluatorBase evaluator : evaluators) {
  if(!(evaluator instanceof VectorPTFEvaluatorRowNumber && verifyEvaluatorArgumentsAreConstant)) {
    evaluator.resetEvaluator();
  }
}
{code}
Need to pass the arguments of each of the evaluators to compute this –  verifyEvaluatorArgumentsAreConstant

Looking more into this, the problem doesn't look specific to constants too. For example, we reset the evaluators for every batch. So the problem should exists for grouping by columns too. We might notice the issue if we actually group by a column, where the column contains a repeated value for more than 1024 times(spanning the VRB size). Thinking more about this, it looks like we are not calling the resetEvaluators() at the right place in the code. I think we are not differentiating between the partition groups and the row batch groups. We should only reset for the partition groups and not for the row batch groups.

 

> Vectorized row_number() resets the row number after one batch in case of constant expression in partition clause
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-22903
>                 URL: https://issues.apache.org/jira/browse/HIVE-22903
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF, Vectorization
>    Affects Versions: 4.0.0
>            Reporter: Shubham Chaurasia
>            Assignee: Shubham Chaurasia
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-22903.01.patch, HIVE-22903.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Vectorized row number implementation resets the row number when constant expression is passed in partition clause.
> Repro Query
> {code}
> select row_number() over(partition by 1) r1, t from over10k_n8;
> Or
> select row_number() over() r1, t from over10k_n8;
> {code}
> where table over10k_n8 contains more than 1024 records.
> This happens because currently in VectorPTFOperator, we reset evaluators if only partition clause is there.
> {code:java}
>     // If we are only processing a PARTITION BY, reset our evaluators.
>     if (!isPartitionOrderBy) {
>       groupBatches.resetEvaluators();
>     }
> {code}
> To resolve, we should also check if the entire partition clause is a constant expression, if it is so then we should not do {{groupBatches.resetEvaluators()}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)