You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Shubham Chaurasia (Jira)" <ji...@apache.org> on 2020/02/18 14:49:00 UTC

[jira] [Created] (HIVE-22903) Vectorized row_number() resets the row number after one batch in case of constant expression in partition clause

Shubham Chaurasia created HIVE-22903:
----------------------------------------

             Summary: Vectorized row_number() resets the row number after one batch in case of constant expression in partition clause
                 Key: HIVE-22903
                 URL: https://issues.apache.org/jira/browse/HIVE-22903
             Project: Hive
          Issue Type: Bug
          Components: UDF, Vectorization
    Affects Versions: 4.0.0
            Reporter: Shubham Chaurasia
            Assignee: Shubham Chaurasia


Vectorized row number implementation resets the row number when constant expression is passed in partition clause.

Repro Query
{code}
select row_number() over(partition by 1) r1, t from over10k_n8;

Or

select row_number() over() r1, t from over10k_n8;
{code}
where table over10k_n8 contains more than 1024 records.

This happens because currently in VectorPTFOperator, we reset evaluators if only partition clause is there.
{code:java}
    // If we are only processing a PARTITION BY, reset our evaluators.
    if (!isPartitionOrderBy) {
      groupBatches.resetEvaluators();
    }
{code}

To resolve, we should also check if the entire partition clause is a constant expression, if it is so then we should not do {{groupBatches.resetEvaluators()}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)