You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Harish Butani (JIRA)" <ji...@apache.org> on 2013/08/20 19:28:51 UTC

[jira] [Created] (HIVE-5125) Better error messages from Windowing queries

Harish Butani created HIVE-5125:
-----------------------------------

             Summary: Better error messages from Windowing queries
                 Key: HIVE-5125
                 URL: https://issues.apache.org/jira/browse/HIVE-5125
             Project: Hive
          Issue Type: Bug
          Components: PTF-Windowing
            Reporter: Harish Butani
            Assignee: Harish Butani


There are 2 issues.
- Ranking invocations must notify user of their relation to the Order clause.
- When all fn invocations have the same Partition and Order spec, there is no need to talk about componentization in the error message.

Here is more details about these issues. Pasting message posted on Hive User Group:

Regarding the issue posted with rank and dense_rank. The example posted was:

CREATE TABLE test (a INT);
EXPLAIN
SELECT
 DENSE_RANK() OVER (PARTITION BY a),
 a
FROM test;

Some comments on this:

1. The underlying issue is the bug that ranking functions had to be lowercase.  This was fixed in Jira 4954 by Edward.
    The ranking functions are rank, dense_rank, percent_rank and cume_dist.
    Some details about this: we check for ranking functions and this check was looking for lowercase names.
    More below on why we do this check.
    The rest of hive: parser, functionRegistry etc is not case sensitive. So from a language perspective these
    functions are not case sensitive.

2. The ranking functions don't accept any arguments. They operate on the expressions specified in the order spec
   of the window clause. Internally the ranking functions are implemented as UDAFs, so what we do is pass the
   order expressions as arguments to the ranking functions.  So from a language perspective invoke ranking functions
  without any arguments.

3. Now the e.g. above is saying partition by 'a'; but no order is specified. Since no order is specified for the ranking functions
   the order expression assumed is the partition expressions, in this it is 'a'. So the output of this e.g. would be all rows would
   have a dense_rank = 1.

4. The explanation for this case: DENSE_RANK() OVER (PARTITION BY a)
you got:
FAILED: SemanticException Failed to breakup Windowing invocations into
Groups. At least 1 group must only depend on input columns. Also check
for circular dependencies.
Underlying error:
org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more
arguments are expected.

The problem is DENSE_RANK didn't get flagged as a ranking function because of the lowercase bug.
This had the effect that we add the order expressions as arguments.
It got compiled into a dense_rank UDAF invocation. Which is than expecting at least 1 arg.

5. The explanation for this case: dense_rank(a) OVER (PARTITION BY a)

this is the right behavior.
But I can see the confusion. The error message is not helpful. We should say something to the effect that ranking functions work of order expressions.
Adding a Jira for this.

6. What is is 'Failed to breakup Windowing invocations into
Groups. At least 1 group must only depend on input columns. Also check
for circular dependencies.'

We allow for multiple window invocations in a Query. These may be on different partition and order specifications.
Internally we go through a process of breaking up Window invocations into Groups and do translation on each Group.
Again in this case, we should provide better error messages. This message is confusing.
When there is only 1 Group, we should just output the underlying message.
Adding a Jira for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira