You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Lefty Leverenz <le...@gmail.com> on 2013/09/03 10:03:04 UTC

Re: Semantics of Rank.

What's the answer -- does the "rank" keyword have to be lowercase?

If lowercase is obligatory we need to revise the wiki, which shows all
uppercase (
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics
).

In the test files it's lowercase (windowing_rank.q,
ptf_negative_WhereWithRankCond.q).
 The patch for HIVE-896 shows a lowercase name in GenericUDAFRank.java but
I don't know if that means lowercase is required:

@WindowFunctionDescription
>
> (
>
> description = @Description(
>
> name = "rank",
>
> value = "_FUNC_(x)"
>
> ),
>
> supportsWindow = false,
>
> pivotResult = true
>
> )
>


And what about the other keywords in the wikidoc?  Same lowercase
requirement?

-- Lefty


On Fri, Jul 26, 2013 at 5:30 PM, saurabh <mp...@gmail.com> wrote:

> Hi all,
>
> Below are some of observations based on the on-going rank function
> discussion.
>
> 1. I executed below mentioned queries  and only the query with "rank"
> (lowercase) executed successfully, rest were throwing exceptions "FAILED:
> SemanticException Failed to breakup Windowing invocations into Groups."
>
> -  select cust_id, ord_dt, RANK() w from cust_ord window w as (partition
> by cust_id order by ord_dt);
>
> -  select cust_id, ord_dt, Rank() w from cust_ord window w as (partition
> by cust_id order by ord_dt);
>
> -   select cust_id, ord_dt, rank() w from cust_ord window w as (partition
> by cust_id order by ord_dt);
>
> It seems "rank" keyword is case-sensitive. Attached is the screenshot for
> reference.
>
> 2. I created a dummy table with the data provided in the below mail trail
> and achieved the expected output, using the below mentioned query.
>
> *select cust_id, ord_dt, rank() over (partition by cust_id order by
> ord_dt) from cust_ord;*
>
>  Request all to kindly review these details and suggest if it was of any
> help!
>
> Thanks.
>
>
> On Sat, Jul 27, 2013 at 12:07 AM, j.barrett Strausser <
> j.barrett.strausser@gmail.com> wrote:
>
>> Any further help on this, otherwise I'll file a jira.
>>
>>
>> On Wed, Jul 24, 2013 at 11:32 PM, j.barrett Strausser <
>> j.barrett.strausser@gmail.com> wrote:
>>
>>> As an example : If I run my query above removing the arg the following
>>> is thrown.
>>>
>>> FAILED: SemanticException Failed to breakup Windowing invocations into
>>> Groups. At least 1 group must only depend on input columns. Also check for
>>> circular dependencies.
>>> Underlying error:
>>> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more
>>> arguments are expected.
>>>
>>>
>>> Similar issue and fix here:
>>>
>>> http://www.marshut.com/rqvpz/use-rank-over-partition-function-in-hive-11.html
>>>
>>> Even if it didn't require an arg it still doesn't explain my anomalous
>>> output.
>>>
>>>
>>>
>>> On Wed, Jul 24, 2013 at 11:28 PM, j.barrett Strausser <
>>> j.barrett.strausser@gmail.com> wrote:
>>>
>>>> That isn't true. If you try to run the above HIVE without an argument,
>>>> it will throw an exception. I have seen other users replicate this problem
>>>> as well.
>>>>
>>>> I can file a JIRA if someone can confirm that my query should work.
>>>>
>>>>
>>>> On Wed, Jul 24, 2013 at 11:02 PM, manishbhoge@rocketmail.com <
>>>> manishbhoge@rocketmail.com> wrote:
>>>>
>>>>> Analytical function doesn't expect any argument. Rank() itself enough
>>>>> to sequence based on the window you have defined in partition by. So
>>>>>
>>>>> Rank() over (partition by cmscustid  order by orderdate)
>>>>>
>>>>> Should work as long as I have wrote right syntax for hive.
>>>>>
>>>>> Sent via Rocket from my HTC
>>>>>
>>>>> ----- Reply message -----
>>>>> From: "j.barrett Strausser" <j....@gmail.com>
>>>>> To: <us...@hive.apache.org>
>>>>> Subject: Semantics of Rank.
>>>>> Date: Thu, Jul 25, 2013 1:08 AM
>>>>>
>>>>>
>>>>> Thanks for the reply. Perhaps my misunderstanding of the relation
>>>>> between
>>>>> rank and the windowing function is wrong.
>>>>>
>>>>> What I want to achieve for the following is : For a given customer id,
>>>>> sort his orders. I thought the below would work.
>>>>>
>>>>> SELECT eh.cmsorderid, eh.orderdate, RANK(orderdate) w FROM order_data
>>>>> eh
>>>>> window w as (partition by cmscustid  order by orderdate);
>>>>>
>>>>> The rank function instead returns the rank of the order date over all
>>>>> all
>>>>> order dates.
>>>>>
>>>>> Example snippet from above
>>>>>
>>>>> Actual :
>>>>>
>>>>> 6758783    27APR2012    94
>>>>> 6758783    23JUN2012    95
>>>>> 6758785    14DEC2012    96
>>>>> 6758795    18DEC2011    97
>>>>> 6758796    06MAY2012    98
>>>>> 6758798    24MAR2013    99
>>>>> 6758799    23NOV2012    100
>>>>>
>>>>>
>>>>> Expected :
>>>>>
>>>>> 6758783    27APR2012    1
>>>>> 6758783    23JUN2012    2
>>>>> 6758785    14DEC2012    1
>>>>> 6758795    18DEC2011    1
>>>>> 6758796    06MAY2012    1
>>>>> 6758798    24MAR2013    1
>>>>> 6758799    23NOV2012    1
>>>>>
>>>>>
>>>>> -b
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jul 24, 2013 at 3:17 PM, Shahar Glixman <sglixman@outbrain.com
>>>>> >wrote:
>>>>>
>>>>> > the argument to rank is simply some value, whereas the rank function
>>>>> > compare this value
>>>>> >  to the previous value received, if value is same, rank returns
>>>>> ++index,
>>>>> > otherwise, rank return 1.
>>>>> > pseudo code:
>>>>> >
>>>>> > class Rank {
>>>>> > int index;
>>>>> > Object previousValue = null;
>>>>> > int evaluate(Object value) {
>>>>> >   if (value == previousValue) {
>>>>> >     return ++index;
>>>>> >   }
>>>>> >   previousValue = value;
>>>>> >   index = 1;
>>>>> >   return 1;
>>>>> > }
>>>>> >
>>>>> >
>>>>> > On Wed, Jul 24, 2013 at 9:59 PM, j.barrett Strausser <
>>>>> > j.barrett.strausser@gmail.com> wrote:
>>>>> >
>>>>> >> It seems as though I am required to pass in an argument to RANK().
>>>>> >>
>>>>> >> What is the effect of passing this argument in ?
>>>>> >>
>>>>> >> In a RANK function the output for RANK should be the number of rows
>>>>> >> preceding
>>>>> >> a row in a given window. As  windows are specified by the partition
>>>>> and
>>>>> >> order by fields I don't understand the effect of passing in an arg
>>>>> to RANK
>>>>> >>
>>>>> >> Are the other non-aggregate function similar? Meaning : Cume_dist,
>>>>> Ntile
>>>>> >> --
>>>>> >>
>>>>> >>
>>>>> >> https://github.com/bearrito
>>>>> >> @deepbearrito
>>>>> >>
>>>>> >
>>>>> >
>>>>> > The above terms reflect a potential business arrangement, are
>>>>> provided solely
>>>>> > as a basis for further discussion, and are not intended to be and do
>>>>> not
>>>>> > constitute a legally binding obligation. No legally binding
>>>>> obligations will
>>>>> > be created, implied, or inferred until an agreement in final form is
>>>>> executed
>>>>> > in writing by all parties involved.
>>>>> >
>>>>> > This email and any attachments hereto may be confidential or
>>>>> privileged.
>>>>> >  If you received this communication by mistake, please don't forward
>>>>> it
>>>>> > to anyone else, please erase all copies and attachments, and please
>>>>> let
>>>>> > me know that it has gone to the wrong person. Thanks.
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> https://github.com/bearrito
>>>>> @deepbearrito
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> https://github.com/bearrito
>>>> @deepbearrito
>>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>> https://github.com/bearrito
>>> @deepbearrito
>>>
>>
>>
>>
>> --
>>
>>
>> https://github.com/bearrito
>> @deepbearrito
>>
>
>

Re: Semantics of Rank.

Posted by Lefty Leverenz <le...@gmail.com>.
Thanks, I've added a note about case-insensitivity to the UDF doc and the
Tutorial:

   - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
   -
   https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-Builtinoperatorsandfunctions

-- Lefty


On Tue, Sep 3, 2013 at 7:38 AM, Edward Capriolo <ed...@gmail.com>wrote:

> Beyond that answer please note:
> Udf names are always case insensitive.  RaNk = RANK = rank
> For rank this is a special case that was fixed.
>
>
> On Tue, Sep 3, 2013 at 9:15 AM, j.barrett Strausser <
> j.barrett.strausser@gmail.com> wrote:
>
>> I filed HIVE-5038. My issue was that it seemed incorrect that rank took
>> an arg. I haven't verified that the issue is fixed in trunk as claimed in
>> the JIRA.However, between the reponse in JIRA and in this thread it seems
>> resolved.
>>
>>
>> On Tue, Sep 3, 2013 at 4:23 AM, Lefty Leverenz <le...@gmail.com>wrote:
>>
>>> Another email thread led me to HIVE-5038<https://issues.apache.org/jira/browse/HIVE-5038>("rank operator is case-sensitive and has odd semantics") -- it's resolved
>>> as invalid, but is that only for the odd semantics?
>>>
>>> Perhaps this issue is clarified in more recent emails.  I'm catching up
>>> on a huge backlog.
>>>
>>> -- Lefty
>>>
>>>
>>> On Tue, Sep 3, 2013 at 4:03 AM, Lefty Leverenz <le...@gmail.com>wrote:
>>>
>>>> What's the answer -- does the "rank" keyword have to be lowercase?
>>>>
>>>> If lowercase is obligatory we need to revise the wiki, which shows all
>>>> uppercase (
>>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics
>>>> ).
>>>>
>>>> In the test files it's lowercase (windowing_rank.q, ptf_negative_WhereWithRankCond.q).
>>>>  The patch for HIVE-896 shows a lowercase name in GenericUDAFRank.java but
>>>> I don't know if that means lowercase is required:
>>>>
>>>>  @WindowFunctionDescription
>>>>>
>>>>> (
>>>>>
>>>>> description = @Description(
>>>>>
>>>>> name = "rank",
>>>>>
>>>>> value = "_FUNC_(x)"
>>>>>
>>>>> ),
>>>>>
>>>>> supportsWindow = false,
>>>>>
>>>>> pivotResult = true
>>>>>
>>>>> )
>>>>>
>>>>
>>>>
>>>> And what about the other keywords in the wikidoc?  Same lowercase
>>>> requirement?
>>>>
>>>> -- Lefty
>>>>
>>>>
>>>> On Fri, Jul 26, 2013 at 5:30 PM, saurabh <mp...@gmail.com>wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Below are some of observations based on the on-going rank function
>>>>> discussion.
>>>>>
>>>>> 1. I executed below mentioned queries  and only the query with "rank"
>>>>> (lowercase) executed successfully, rest were throwing exceptions "FAILED:
>>>>> SemanticException Failed to breakup Windowing invocations into Groups."
>>>>>
>>>>> -  select cust_id, ord_dt, RANK() w from cust_ord window w as
>>>>> (partition by cust_id order by ord_dt);
>>>>>
>>>>> -  select cust_id, ord_dt, Rank() w from cust_ord window w as
>>>>> (partition by cust_id order by ord_dt);
>>>>>
>>>>> -   select cust_id, ord_dt, rank() w from cust_ord window w as
>>>>> (partition by cust_id order by ord_dt);
>>>>>
>>>>> It seems "rank" keyword is case-sensitive. Attached is the screenshot
>>>>> for reference.
>>>>>
>>>>> 2. I created a dummy table with the data provided in the below mail
>>>>> trail and achieved the expected output, using the below mentioned query.
>>>>>
>>>>> *select cust_id, ord_dt, rank() over (partition by cust_id order by
>>>>> ord_dt) from cust_ord;*
>>>>>
>>>>>  Request all to kindly review these details and suggest if it was of
>>>>> any help!
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> On Sat, Jul 27, 2013 at 12:07 AM, j.barrett Strausser <
>>>>> j.barrett.strausser@gmail.com> wrote:
>>>>>
>>>>>> Any further help on this, otherwise I'll file a jira.
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 24, 2013 at 11:32 PM, j.barrett Strausser <
>>>>>> j.barrett.strausser@gmail.com> wrote:
>>>>>>
>>>>>>> As an example : If I run my query above removing the arg the
>>>>>>> following is thrown.
>>>>>>>
>>>>>>> FAILED: SemanticException Failed to breakup Windowing invocations
>>>>>>> into Groups. At least 1 group must only depend on input columns. Also check
>>>>>>> for circular dependencies.
>>>>>>> Underlying error:
>>>>>>> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more
>>>>>>> arguments are expected.
>>>>>>>
>>>>>>>
>>>>>>> Similar issue and fix here:
>>>>>>>
>>>>>>> http://www.marshut.com/rqvpz/use-rank-over-partition-function-in-hive-11.html
>>>>>>>
>>>>>>> Even if it didn't require an arg it still doesn't explain my
>>>>>>> anomalous output.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 24, 2013 at 11:28 PM, j.barrett Strausser <
>>>>>>> j.barrett.strausser@gmail.com> wrote:
>>>>>>>
>>>>>>>> That isn't true. If you try to run the above HIVE without an
>>>>>>>> argument, it will throw an exception. I have seen other users replicate
>>>>>>>> this problem as well.
>>>>>>>>
>>>>>>>> I can file a JIRA if someone can confirm that my query should work.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jul 24, 2013 at 11:02 PM, manishbhoge@rocketmail.com <
>>>>>>>> manishbhoge@rocketmail.com> wrote:
>>>>>>>>
>>>>>>>>> Analytical function doesn't expect any argument. Rank() itself
>>>>>>>>> enough to sequence based on the window you have defined in partition by. So
>>>>>>>>>
>>>>>>>>> Rank() over (partition by cmscustid  order by orderdate)
>>>>>>>>>
>>>>>>>>> Should work as long as I have wrote right syntax for hive.
>>>>>>>>>
>>>>>>>>> Sent via Rocket from my HTC
>>>>>>>>>
>>>>>>>>> ----- Reply message -----
>>>>>>>>> From: "j.barrett Strausser" <j....@gmail.com>
>>>>>>>>> To: <us...@hive.apache.org>
>>>>>>>>> Subject: Semantics of Rank.
>>>>>>>>> Date: Thu, Jul 25, 2013 1:08 AM
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks for the reply. Perhaps my misunderstanding of the relation
>>>>>>>>> between
>>>>>>>>> rank and the windowing function is wrong.
>>>>>>>>>
>>>>>>>>> What I want to achieve for the following is : For a given customer
>>>>>>>>> id,
>>>>>>>>> sort his orders. I thought the below would work.
>>>>>>>>>
>>>>>>>>> SELECT eh.cmsorderid, eh.orderdate, RANK(orderdate) w FROM
>>>>>>>>> order_data eh
>>>>>>>>> window w as (partition by cmscustid  order by orderdate);
>>>>>>>>>
>>>>>>>>> The rank function instead returns the rank of the order date over
>>>>>>>>> all all
>>>>>>>>> order dates.
>>>>>>>>>
>>>>>>>>> Example snippet from above
>>>>>>>>>
>>>>>>>>> Actual :
>>>>>>>>>
>>>>>>>>> 6758783    27APR2012    94
>>>>>>>>> 6758783    23JUN2012    95
>>>>>>>>> 6758785    14DEC2012    96
>>>>>>>>> 6758795    18DEC2011    97
>>>>>>>>> 6758796    06MAY2012    98
>>>>>>>>> 6758798    24MAR2013    99
>>>>>>>>> 6758799    23NOV2012    100
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Expected :
>>>>>>>>>
>>>>>>>>> 6758783    27APR2012    1
>>>>>>>>> 6758783    23JUN2012    2
>>>>>>>>> 6758785    14DEC2012    1
>>>>>>>>> 6758795    18DEC2011    1
>>>>>>>>> 6758796    06MAY2012    1
>>>>>>>>> 6758798    24MAR2013    1
>>>>>>>>> 6758799    23NOV2012    1
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -b
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jul 24, 2013 at 3:17 PM, Shahar Glixman <
>>>>>>>>> sglixman@outbrain.com>wrote:
>>>>>>>>>
>>>>>>>>> > the argument to rank is simply some value, whereas the rank
>>>>>>>>> function
>>>>>>>>> > compare this value
>>>>>>>>> >  to the previous value received, if value is same, rank returns
>>>>>>>>> ++index,
>>>>>>>>> > otherwise, rank return 1.
>>>>>>>>> > pseudo code:
>>>>>>>>> >
>>>>>>>>> > class Rank {
>>>>>>>>> > int index;
>>>>>>>>> > Object previousValue = null;
>>>>>>>>> > int evaluate(Object value) {
>>>>>>>>> >   if (value == previousValue) {
>>>>>>>>> >     return ++index;
>>>>>>>>> >   }
>>>>>>>>> >   previousValue = value;
>>>>>>>>> >   index = 1;
>>>>>>>>> >   return 1;
>>>>>>>>> > }
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Wed, Jul 24, 2013 at 9:59 PM, j.barrett Strausser <
>>>>>>>>> > j.barrett.strausser@gmail.com> wrote:
>>>>>>>>> >
>>>>>>>>> >> It seems as though I am required to pass in an argument to
>>>>>>>>> RANK().
>>>>>>>>> >>
>>>>>>>>> >> What is the effect of passing this argument in ?
>>>>>>>>> >>
>>>>>>>>> >> In a RANK function the output for RANK should be the number of
>>>>>>>>> rows
>>>>>>>>> >> preceding
>>>>>>>>> >> a row in a given window. As  windows are specified by the
>>>>>>>>> partition and
>>>>>>>>> >> order by fields I don't understand the effect of passing in an
>>>>>>>>> arg to RANK
>>>>>>>>> >>
>>>>>>>>> >> Are the other non-aggregate function similar? Meaning :
>>>>>>>>> Cume_dist, Ntile
>>>>>>>>> >> --
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> https://github.com/bearrito
>>>>>>>>> >> @deepbearrito
>>>>>>>>> >>
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > The above terms reflect a potential business arrangement, are
>>>>>>>>> provided solely
>>>>>>>>> > as a basis for further discussion, and are not intended to be
>>>>>>>>> and do not
>>>>>>>>> > constitute a legally binding obligation. No legally binding
>>>>>>>>> obligations will
>>>>>>>>> > be created, implied, or inferred until an agreement in final
>>>>>>>>> form is executed
>>>>>>>>> > in writing by all parties involved.
>>>>>>>>> >
>>>>>>>>> > This email and any attachments hereto may be confidential or
>>>>>>>>> privileged.
>>>>>>>>> >  If you received this communication by mistake, please don't
>>>>>>>>> forward it
>>>>>>>>> > to anyone else, please erase all copies and attachments, and
>>>>>>>>> please let
>>>>>>>>> > me know that it has gone to the wrong person. Thanks.
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://github.com/bearrito
>>>>>>>>> @deepbearrito
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/bearrito
>>>>>>>> @deepbearrito
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/bearrito
>>>>>>> @deepbearrito
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>> https://github.com/bearrito
>>>>>> @deepbearrito
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>>
>>
>> https://github.com/bearrito
>> @deepbearrito
>>
>
>

Re: Semantics of Rank.

Posted by Edward Capriolo <ed...@gmail.com>.
Beyond that answer please note:
Udf names are always case insensitive.  RaNk = RANK = rank
For rank this is a special case that was fixed.


On Tue, Sep 3, 2013 at 9:15 AM, j.barrett Strausser <
j.barrett.strausser@gmail.com> wrote:

> I filed HIVE-5038. My issue was that it seemed incorrect that rank took an
> arg. I haven't verified that the issue is fixed in trunk as claimed in the
> JIRA.However, between the reponse in JIRA and in this thread it seems
> resolved.
>
>
> On Tue, Sep 3, 2013 at 4:23 AM, Lefty Leverenz <le...@gmail.com>wrote:
>
>> Another email thread led me to HIVE-5038<https://issues.apache.org/jira/browse/HIVE-5038>("rank operator is case-sensitive and has odd semantics") -- it's resolved
>> as invalid, but is that only for the odd semantics?
>>
>> Perhaps this issue is clarified in more recent emails.  I'm catching up
>> on a huge backlog.
>>
>> -- Lefty
>>
>>
>> On Tue, Sep 3, 2013 at 4:03 AM, Lefty Leverenz <le...@gmail.com>wrote:
>>
>>> What's the answer -- does the "rank" keyword have to be lowercase?
>>>
>>> If lowercase is obligatory we need to revise the wiki, which shows all
>>> uppercase (
>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics
>>> ).
>>>
>>> In the test files it's lowercase (windowing_rank.q, ptf_negative_WhereWithRankCond.q).
>>>  The patch for HIVE-896 shows a lowercase name in GenericUDAFRank.java but
>>> I don't know if that means lowercase is required:
>>>
>>>  @WindowFunctionDescription
>>>>
>>>> (
>>>>
>>>> description = @Description(
>>>>
>>>> name = "rank",
>>>>
>>>> value = "_FUNC_(x)"
>>>>
>>>> ),
>>>>
>>>> supportsWindow = false,
>>>>
>>>> pivotResult = true
>>>>
>>>> )
>>>>
>>>
>>>
>>> And what about the other keywords in the wikidoc?  Same lowercase
>>> requirement?
>>>
>>> -- Lefty
>>>
>>>
>>> On Fri, Jul 26, 2013 at 5:30 PM, saurabh <mp...@gmail.com>wrote:
>>>
>>>> Hi all,
>>>>
>>>> Below are some of observations based on the on-going rank function
>>>> discussion.
>>>>
>>>> 1. I executed below mentioned queries  and only the query with "rank"
>>>> (lowercase) executed successfully, rest were throwing exceptions "FAILED:
>>>> SemanticException Failed to breakup Windowing invocations into Groups."
>>>>
>>>> -  select cust_id, ord_dt, RANK() w from cust_ord window w as
>>>> (partition by cust_id order by ord_dt);
>>>>
>>>> -  select cust_id, ord_dt, Rank() w from cust_ord window w as
>>>> (partition by cust_id order by ord_dt);
>>>>
>>>> -   select cust_id, ord_dt, rank() w from cust_ord window w as
>>>> (partition by cust_id order by ord_dt);
>>>>
>>>> It seems "rank" keyword is case-sensitive. Attached is the screenshot
>>>> for reference.
>>>>
>>>> 2. I created a dummy table with the data provided in the below mail
>>>> trail and achieved the expected output, using the below mentioned query.
>>>>
>>>> *select cust_id, ord_dt, rank() over (partition by cust_id order by
>>>> ord_dt) from cust_ord;*
>>>>
>>>>  Request all to kindly review these details and suggest if it was of
>>>> any help!
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> On Sat, Jul 27, 2013 at 12:07 AM, j.barrett Strausser <
>>>> j.barrett.strausser@gmail.com> wrote:
>>>>
>>>>> Any further help on this, otherwise I'll file a jira.
>>>>>
>>>>>
>>>>> On Wed, Jul 24, 2013 at 11:32 PM, j.barrett Strausser <
>>>>> j.barrett.strausser@gmail.com> wrote:
>>>>>
>>>>>> As an example : If I run my query above removing the arg the
>>>>>> following is thrown.
>>>>>>
>>>>>> FAILED: SemanticException Failed to breakup Windowing invocations
>>>>>> into Groups. At least 1 group must only depend on input columns. Also check
>>>>>> for circular dependencies.
>>>>>> Underlying error:
>>>>>> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more
>>>>>> arguments are expected.
>>>>>>
>>>>>>
>>>>>> Similar issue and fix here:
>>>>>>
>>>>>> http://www.marshut.com/rqvpz/use-rank-over-partition-function-in-hive-11.html
>>>>>>
>>>>>> Even if it didn't require an arg it still doesn't explain my
>>>>>> anomalous output.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 24, 2013 at 11:28 PM, j.barrett Strausser <
>>>>>> j.barrett.strausser@gmail.com> wrote:
>>>>>>
>>>>>>> That isn't true. If you try to run the above HIVE without an
>>>>>>> argument, it will throw an exception. I have seen other users replicate
>>>>>>> this problem as well.
>>>>>>>
>>>>>>> I can file a JIRA if someone can confirm that my query should work.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 24, 2013 at 11:02 PM, manishbhoge@rocketmail.com <
>>>>>>> manishbhoge@rocketmail.com> wrote:
>>>>>>>
>>>>>>>> Analytical function doesn't expect any argument. Rank() itself
>>>>>>>> enough to sequence based on the window you have defined in partition by. So
>>>>>>>>
>>>>>>>> Rank() over (partition by cmscustid  order by orderdate)
>>>>>>>>
>>>>>>>> Should work as long as I have wrote right syntax for hive.
>>>>>>>>
>>>>>>>> Sent via Rocket from my HTC
>>>>>>>>
>>>>>>>> ----- Reply message -----
>>>>>>>> From: "j.barrett Strausser" <j....@gmail.com>
>>>>>>>> To: <us...@hive.apache.org>
>>>>>>>> Subject: Semantics of Rank.
>>>>>>>> Date: Thu, Jul 25, 2013 1:08 AM
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks for the reply. Perhaps my misunderstanding of the relation
>>>>>>>> between
>>>>>>>> rank and the windowing function is wrong.
>>>>>>>>
>>>>>>>> What I want to achieve for the following is : For a given customer
>>>>>>>> id,
>>>>>>>> sort his orders. I thought the below would work.
>>>>>>>>
>>>>>>>> SELECT eh.cmsorderid, eh.orderdate, RANK(orderdate) w FROM
>>>>>>>> order_data eh
>>>>>>>> window w as (partition by cmscustid  order by orderdate);
>>>>>>>>
>>>>>>>> The rank function instead returns the rank of the order date over
>>>>>>>> all all
>>>>>>>> order dates.
>>>>>>>>
>>>>>>>> Example snippet from above
>>>>>>>>
>>>>>>>> Actual :
>>>>>>>>
>>>>>>>> 6758783    27APR2012    94
>>>>>>>> 6758783    23JUN2012    95
>>>>>>>> 6758785    14DEC2012    96
>>>>>>>> 6758795    18DEC2011    97
>>>>>>>> 6758796    06MAY2012    98
>>>>>>>> 6758798    24MAR2013    99
>>>>>>>> 6758799    23NOV2012    100
>>>>>>>>
>>>>>>>>
>>>>>>>> Expected :
>>>>>>>>
>>>>>>>> 6758783    27APR2012    1
>>>>>>>> 6758783    23JUN2012    2
>>>>>>>> 6758785    14DEC2012    1
>>>>>>>> 6758795    18DEC2011    1
>>>>>>>> 6758796    06MAY2012    1
>>>>>>>> 6758798    24MAR2013    1
>>>>>>>> 6758799    23NOV2012    1
>>>>>>>>
>>>>>>>>
>>>>>>>> -b
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jul 24, 2013 at 3:17 PM, Shahar Glixman <
>>>>>>>> sglixman@outbrain.com>wrote:
>>>>>>>>
>>>>>>>> > the argument to rank is simply some value, whereas the rank
>>>>>>>> function
>>>>>>>> > compare this value
>>>>>>>> >  to the previous value received, if value is same, rank returns
>>>>>>>> ++index,
>>>>>>>> > otherwise, rank return 1.
>>>>>>>> > pseudo code:
>>>>>>>> >
>>>>>>>> > class Rank {
>>>>>>>> > int index;
>>>>>>>> > Object previousValue = null;
>>>>>>>> > int evaluate(Object value) {
>>>>>>>> >   if (value == previousValue) {
>>>>>>>> >     return ++index;
>>>>>>>> >   }
>>>>>>>> >   previousValue = value;
>>>>>>>> >   index = 1;
>>>>>>>> >   return 1;
>>>>>>>> > }
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Wed, Jul 24, 2013 at 9:59 PM, j.barrett Strausser <
>>>>>>>> > j.barrett.strausser@gmail.com> wrote:
>>>>>>>> >
>>>>>>>> >> It seems as though I am required to pass in an argument to
>>>>>>>> RANK().
>>>>>>>> >>
>>>>>>>> >> What is the effect of passing this argument in ?
>>>>>>>> >>
>>>>>>>> >> In a RANK function the output for RANK should be the number of
>>>>>>>> rows
>>>>>>>> >> preceding
>>>>>>>> >> a row in a given window. As  windows are specified by the
>>>>>>>> partition and
>>>>>>>> >> order by fields I don't understand the effect of passing in an
>>>>>>>> arg to RANK
>>>>>>>> >>
>>>>>>>> >> Are the other non-aggregate function similar? Meaning :
>>>>>>>> Cume_dist, Ntile
>>>>>>>> >> --
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> https://github.com/bearrito
>>>>>>>> >> @deepbearrito
>>>>>>>> >>
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > The above terms reflect a potential business arrangement, are
>>>>>>>> provided solely
>>>>>>>> > as a basis for further discussion, and are not intended to be and
>>>>>>>> do not
>>>>>>>> > constitute a legally binding obligation. No legally binding
>>>>>>>> obligations will
>>>>>>>> > be created, implied, or inferred until an agreement in final form
>>>>>>>> is executed
>>>>>>>> > in writing by all parties involved.
>>>>>>>> >
>>>>>>>> > This email and any attachments hereto may be confidential or
>>>>>>>> privileged.
>>>>>>>> >  If you received this communication by mistake, please don't
>>>>>>>> forward it
>>>>>>>> > to anyone else, please erase all copies and attachments, and
>>>>>>>> please let
>>>>>>>> > me know that it has gone to the wrong person. Thanks.
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/bearrito
>>>>>>>> @deepbearrito
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/bearrito
>>>>>>> @deepbearrito
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>> https://github.com/bearrito
>>>>>> @deepbearrito
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> https://github.com/bearrito
>>>>> @deepbearrito
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
>
>
> https://github.com/bearrito
> @deepbearrito
>

Re: Semantics of Rank.

Posted by "j.barrett Strausser" <j....@gmail.com>.
I filed HIVE-5038. My issue was that it seemed incorrect that rank took an
arg. I haven't verified that the issue is fixed in trunk as claimed in the
JIRA.However, between the reponse in JIRA and in this thread it seems
resolved.


On Tue, Sep 3, 2013 at 4:23 AM, Lefty Leverenz <le...@gmail.com>wrote:

> Another email thread led me to HIVE-5038<https://issues.apache.org/jira/browse/HIVE-5038>("rank operator is case-sensitive and has odd semantics") -- it's resolved
> as invalid, but is that only for the odd semantics?
>
> Perhaps this issue is clarified in more recent emails.  I'm catching up on
> a huge backlog.
>
> -- Lefty
>
>
> On Tue, Sep 3, 2013 at 4:03 AM, Lefty Leverenz <le...@gmail.com>wrote:
>
>> What's the answer -- does the "rank" keyword have to be lowercase?
>>
>> If lowercase is obligatory we need to revise the wiki, which shows all
>> uppercase (
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics
>> ).
>>
>> In the test files it's lowercase (windowing_rank.q, ptf_negative_WhereWithRankCond.q).
>>  The patch for HIVE-896 shows a lowercase name in GenericUDAFRank.java but
>> I don't know if that means lowercase is required:
>>
>>  @WindowFunctionDescription
>>>
>>> (
>>>
>>> description = @Description(
>>>
>>> name = "rank",
>>>
>>> value = "_FUNC_(x)"
>>>
>>> ),
>>>
>>> supportsWindow = false,
>>>
>>> pivotResult = true
>>>
>>> )
>>>
>>
>>
>> And what about the other keywords in the wikidoc?  Same lowercase
>> requirement?
>>
>> -- Lefty
>>
>>
>> On Fri, Jul 26, 2013 at 5:30 PM, saurabh <mp...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> Below are some of observations based on the on-going rank function
>>> discussion.
>>>
>>> 1. I executed below mentioned queries  and only the query with "rank"
>>> (lowercase) executed successfully, rest were throwing exceptions "FAILED:
>>> SemanticException Failed to breakup Windowing invocations into Groups."
>>>
>>> -  select cust_id, ord_dt, RANK() w from cust_ord window w as (partition
>>> by cust_id order by ord_dt);
>>>
>>> -  select cust_id, ord_dt, Rank() w from cust_ord window w as (partition
>>> by cust_id order by ord_dt);
>>>
>>> -   select cust_id, ord_dt, rank() w from cust_ord window w as
>>> (partition by cust_id order by ord_dt);
>>>
>>> It seems "rank" keyword is case-sensitive. Attached is the screenshot
>>> for reference.
>>>
>>> 2. I created a dummy table with the data provided in the below mail
>>> trail and achieved the expected output, using the below mentioned query.
>>>
>>> *select cust_id, ord_dt, rank() over (partition by cust_id order by
>>> ord_dt) from cust_ord;*
>>>
>>>  Request all to kindly review these details and suggest if it was of any
>>> help!
>>>
>>> Thanks.
>>>
>>>
>>> On Sat, Jul 27, 2013 at 12:07 AM, j.barrett Strausser <
>>> j.barrett.strausser@gmail.com> wrote:
>>>
>>>> Any further help on this, otherwise I'll file a jira.
>>>>
>>>>
>>>> On Wed, Jul 24, 2013 at 11:32 PM, j.barrett Strausser <
>>>> j.barrett.strausser@gmail.com> wrote:
>>>>
>>>>> As an example : If I run my query above removing the arg the following
>>>>> is thrown.
>>>>>
>>>>> FAILED: SemanticException Failed to breakup Windowing invocations into
>>>>> Groups. At least 1 group must only depend on input columns. Also check for
>>>>> circular dependencies.
>>>>> Underlying error:
>>>>> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more
>>>>> arguments are expected.
>>>>>
>>>>>
>>>>> Similar issue and fix here:
>>>>>
>>>>> http://www.marshut.com/rqvpz/use-rank-over-partition-function-in-hive-11.html
>>>>>
>>>>> Even if it didn't require an arg it still doesn't explain my anomalous
>>>>> output.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jul 24, 2013 at 11:28 PM, j.barrett Strausser <
>>>>> j.barrett.strausser@gmail.com> wrote:
>>>>>
>>>>>> That isn't true. If you try to run the above HIVE without an
>>>>>> argument, it will throw an exception. I have seen other users replicate
>>>>>> this problem as well.
>>>>>>
>>>>>> I can file a JIRA if someone can confirm that my query should work.
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 24, 2013 at 11:02 PM, manishbhoge@rocketmail.com <
>>>>>> manishbhoge@rocketmail.com> wrote:
>>>>>>
>>>>>>> Analytical function doesn't expect any argument. Rank() itself
>>>>>>> enough to sequence based on the window you have defined in partition by. So
>>>>>>>
>>>>>>> Rank() over (partition by cmscustid  order by orderdate)
>>>>>>>
>>>>>>> Should work as long as I have wrote right syntax for hive.
>>>>>>>
>>>>>>> Sent via Rocket from my HTC
>>>>>>>
>>>>>>> ----- Reply message -----
>>>>>>> From: "j.barrett Strausser" <j....@gmail.com>
>>>>>>> To: <us...@hive.apache.org>
>>>>>>> Subject: Semantics of Rank.
>>>>>>> Date: Thu, Jul 25, 2013 1:08 AM
>>>>>>>
>>>>>>>
>>>>>>> Thanks for the reply. Perhaps my misunderstanding of the relation
>>>>>>> between
>>>>>>> rank and the windowing function is wrong.
>>>>>>>
>>>>>>> What I want to achieve for the following is : For a given customer
>>>>>>> id,
>>>>>>> sort his orders. I thought the below would work.
>>>>>>>
>>>>>>> SELECT eh.cmsorderid, eh.orderdate, RANK(orderdate) w FROM
>>>>>>> order_data eh
>>>>>>> window w as (partition by cmscustid  order by orderdate);
>>>>>>>
>>>>>>> The rank function instead returns the rank of the order date over
>>>>>>> all all
>>>>>>> order dates.
>>>>>>>
>>>>>>> Example snippet from above
>>>>>>>
>>>>>>> Actual :
>>>>>>>
>>>>>>> 6758783    27APR2012    94
>>>>>>> 6758783    23JUN2012    95
>>>>>>> 6758785    14DEC2012    96
>>>>>>> 6758795    18DEC2011    97
>>>>>>> 6758796    06MAY2012    98
>>>>>>> 6758798    24MAR2013    99
>>>>>>> 6758799    23NOV2012    100
>>>>>>>
>>>>>>>
>>>>>>> Expected :
>>>>>>>
>>>>>>> 6758783    27APR2012    1
>>>>>>> 6758783    23JUN2012    2
>>>>>>> 6758785    14DEC2012    1
>>>>>>> 6758795    18DEC2011    1
>>>>>>> 6758796    06MAY2012    1
>>>>>>> 6758798    24MAR2013    1
>>>>>>> 6758799    23NOV2012    1
>>>>>>>
>>>>>>>
>>>>>>> -b
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 24, 2013 at 3:17 PM, Shahar Glixman <
>>>>>>> sglixman@outbrain.com>wrote:
>>>>>>>
>>>>>>> > the argument to rank is simply some value, whereas the rank
>>>>>>> function
>>>>>>> > compare this value
>>>>>>> >  to the previous value received, if value is same, rank returns
>>>>>>> ++index,
>>>>>>> > otherwise, rank return 1.
>>>>>>> > pseudo code:
>>>>>>> >
>>>>>>> > class Rank {
>>>>>>> > int index;
>>>>>>> > Object previousValue = null;
>>>>>>> > int evaluate(Object value) {
>>>>>>> >   if (value == previousValue) {
>>>>>>> >     return ++index;
>>>>>>> >   }
>>>>>>> >   previousValue = value;
>>>>>>> >   index = 1;
>>>>>>> >   return 1;
>>>>>>> > }
>>>>>>> >
>>>>>>> >
>>>>>>> > On Wed, Jul 24, 2013 at 9:59 PM, j.barrett Strausser <
>>>>>>> > j.barrett.strausser@gmail.com> wrote:
>>>>>>> >
>>>>>>> >> It seems as though I am required to pass in an argument to RANK().
>>>>>>> >>
>>>>>>> >> What is the effect of passing this argument in ?
>>>>>>> >>
>>>>>>> >> In a RANK function the output for RANK should be the number of
>>>>>>> rows
>>>>>>> >> preceding
>>>>>>> >> a row in a given window. As  windows are specified by the
>>>>>>> partition and
>>>>>>> >> order by fields I don't understand the effect of passing in an
>>>>>>> arg to RANK
>>>>>>> >>
>>>>>>> >> Are the other non-aggregate function similar? Meaning :
>>>>>>> Cume_dist, Ntile
>>>>>>> >> --
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> https://github.com/bearrito
>>>>>>> >> @deepbearrito
>>>>>>> >>
>>>>>>> >
>>>>>>> >
>>>>>>> > The above terms reflect a potential business arrangement, are
>>>>>>> provided solely
>>>>>>> > as a basis for further discussion, and are not intended to be and
>>>>>>> do not
>>>>>>> > constitute a legally binding obligation. No legally binding
>>>>>>> obligations will
>>>>>>> > be created, implied, or inferred until an agreement in final form
>>>>>>> is executed
>>>>>>> > in writing by all parties involved.
>>>>>>> >
>>>>>>> > This email and any attachments hereto may be confidential or
>>>>>>> privileged.
>>>>>>> >  If you received this communication by mistake, please don't
>>>>>>> forward it
>>>>>>> > to anyone else, please erase all copies and attachments, and
>>>>>>> please let
>>>>>>> > me know that it has gone to the wrong person. Thanks.
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/bearrito
>>>>>>> @deepbearrito
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>> https://github.com/bearrito
>>>>>> @deepbearrito
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> https://github.com/bearrito
>>>>> @deepbearrito
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> https://github.com/bearrito
>>>> @deepbearrito
>>>>
>>>
>>>
>>
>


-- 


https://github.com/bearrito
@deepbearrito

Re: Semantics of Rank.

Posted by Lefty Leverenz <le...@gmail.com>.
Another email thread led me to
HIVE-5038<https://issues.apache.org/jira/browse/HIVE-5038>("rank
operator is case-sensitive and has odd semantics") -- it's resolved
as invalid, but is that only for the odd semantics?

Perhaps this issue is clarified in more recent emails.  I'm catching up on
a huge backlog.

-- Lefty


On Tue, Sep 3, 2013 at 4:03 AM, Lefty Leverenz <le...@gmail.com>wrote:

> What's the answer -- does the "rank" keyword have to be lowercase?
>
> If lowercase is obligatory we need to revise the wiki, which shows all
> uppercase (
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics
> ).
>
> In the test files it's lowercase (windowing_rank.q, ptf_negative_WhereWithRankCond.q).
>  The patch for HIVE-896 shows a lowercase name in GenericUDAFRank.java but
> I don't know if that means lowercase is required:
>
> @WindowFunctionDescription
>>
>> (
>>
>> description = @Description(
>>
>> name = "rank",
>>
>> value = "_FUNC_(x)"
>>
>> ),
>>
>> supportsWindow = false,
>>
>> pivotResult = true
>>
>> )
>>
>
>
> And what about the other keywords in the wikidoc?  Same lowercase
> requirement?
>
> -- Lefty
>
>
> On Fri, Jul 26, 2013 at 5:30 PM, saurabh <mp...@gmail.com> wrote:
>
>> Hi all,
>>
>> Below are some of observations based on the on-going rank function
>> discussion.
>>
>> 1. I executed below mentioned queries  and only the query with "rank"
>> (lowercase) executed successfully, rest were throwing exceptions "FAILED:
>> SemanticException Failed to breakup Windowing invocations into Groups."
>>
>> -  select cust_id, ord_dt, RANK() w from cust_ord window w as (partition
>> by cust_id order by ord_dt);
>>
>> -  select cust_id, ord_dt, Rank() w from cust_ord window w as (partition
>> by cust_id order by ord_dt);
>>
>> -   select cust_id, ord_dt, rank() w from cust_ord window w as (partition
>> by cust_id order by ord_dt);
>>
>> It seems "rank" keyword is case-sensitive. Attached is the screenshot
>> for reference.
>>
>> 2. I created a dummy table with the data provided in the below mail trail
>> and achieved the expected output, using the below mentioned query.
>>
>> *select cust_id, ord_dt, rank() over (partition by cust_id order by
>> ord_dt) from cust_ord;*
>>
>>  Request all to kindly review these details and suggest if it was of any
>> help!
>>
>> Thanks.
>>
>>
>> On Sat, Jul 27, 2013 at 12:07 AM, j.barrett Strausser <
>> j.barrett.strausser@gmail.com> wrote:
>>
>>> Any further help on this, otherwise I'll file a jira.
>>>
>>>
>>> On Wed, Jul 24, 2013 at 11:32 PM, j.barrett Strausser <
>>> j.barrett.strausser@gmail.com> wrote:
>>>
>>>> As an example : If I run my query above removing the arg the following
>>>> is thrown.
>>>>
>>>> FAILED: SemanticException Failed to breakup Windowing invocations into
>>>> Groups. At least 1 group must only depend on input columns. Also check for
>>>> circular dependencies.
>>>> Underlying error:
>>>> org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or more
>>>> arguments are expected.
>>>>
>>>>
>>>> Similar issue and fix here:
>>>>
>>>> http://www.marshut.com/rqvpz/use-rank-over-partition-function-in-hive-11.html
>>>>
>>>> Even if it didn't require an arg it still doesn't explain my anomalous
>>>> output.
>>>>
>>>>
>>>>
>>>> On Wed, Jul 24, 2013 at 11:28 PM, j.barrett Strausser <
>>>> j.barrett.strausser@gmail.com> wrote:
>>>>
>>>>> That isn't true. If you try to run the above HIVE without an argument,
>>>>> it will throw an exception. I have seen other users replicate this problem
>>>>> as well.
>>>>>
>>>>> I can file a JIRA if someone can confirm that my query should work.
>>>>>
>>>>>
>>>>> On Wed, Jul 24, 2013 at 11:02 PM, manishbhoge@rocketmail.com <
>>>>> manishbhoge@rocketmail.com> wrote:
>>>>>
>>>>>> Analytical function doesn't expect any argument. Rank() itself enough
>>>>>> to sequence based on the window you have defined in partition by. So
>>>>>>
>>>>>> Rank() over (partition by cmscustid  order by orderdate)
>>>>>>
>>>>>> Should work as long as I have wrote right syntax for hive.
>>>>>>
>>>>>> Sent via Rocket from my HTC
>>>>>>
>>>>>> ----- Reply message -----
>>>>>> From: "j.barrett Strausser" <j....@gmail.com>
>>>>>> To: <us...@hive.apache.org>
>>>>>> Subject: Semantics of Rank.
>>>>>> Date: Thu, Jul 25, 2013 1:08 AM
>>>>>>
>>>>>>
>>>>>> Thanks for the reply. Perhaps my misunderstanding of the relation
>>>>>> between
>>>>>> rank and the windowing function is wrong.
>>>>>>
>>>>>> What I want to achieve for the following is : For a given customer id,
>>>>>> sort his orders. I thought the below would work.
>>>>>>
>>>>>> SELECT eh.cmsorderid, eh.orderdate, RANK(orderdate) w FROM order_data
>>>>>> eh
>>>>>> window w as (partition by cmscustid  order by orderdate);
>>>>>>
>>>>>> The rank function instead returns the rank of the order date over all
>>>>>> all
>>>>>> order dates.
>>>>>>
>>>>>> Example snippet from above
>>>>>>
>>>>>> Actual :
>>>>>>
>>>>>> 6758783    27APR2012    94
>>>>>> 6758783    23JUN2012    95
>>>>>> 6758785    14DEC2012    96
>>>>>> 6758795    18DEC2011    97
>>>>>> 6758796    06MAY2012    98
>>>>>> 6758798    24MAR2013    99
>>>>>> 6758799    23NOV2012    100
>>>>>>
>>>>>>
>>>>>> Expected :
>>>>>>
>>>>>> 6758783    27APR2012    1
>>>>>> 6758783    23JUN2012    2
>>>>>> 6758785    14DEC2012    1
>>>>>> 6758795    18DEC2011    1
>>>>>> 6758796    06MAY2012    1
>>>>>> 6758798    24MAR2013    1
>>>>>> 6758799    23NOV2012    1
>>>>>>
>>>>>>
>>>>>> -b
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 24, 2013 at 3:17 PM, Shahar Glixman <
>>>>>> sglixman@outbrain.com>wrote:
>>>>>>
>>>>>> > the argument to rank is simply some value, whereas the rank function
>>>>>> > compare this value
>>>>>> >  to the previous value received, if value is same, rank returns
>>>>>> ++index,
>>>>>> > otherwise, rank return 1.
>>>>>> > pseudo code:
>>>>>> >
>>>>>> > class Rank {
>>>>>> > int index;
>>>>>> > Object previousValue = null;
>>>>>> > int evaluate(Object value) {
>>>>>> >   if (value == previousValue) {
>>>>>> >     return ++index;
>>>>>> >   }
>>>>>> >   previousValue = value;
>>>>>> >   index = 1;
>>>>>> >   return 1;
>>>>>> > }
>>>>>> >
>>>>>> >
>>>>>> > On Wed, Jul 24, 2013 at 9:59 PM, j.barrett Strausser <
>>>>>> > j.barrett.strausser@gmail.com> wrote:
>>>>>> >
>>>>>> >> It seems as though I am required to pass in an argument to RANK().
>>>>>> >>
>>>>>> >> What is the effect of passing this argument in ?
>>>>>> >>
>>>>>> >> In a RANK function the output for RANK should be the number of rows
>>>>>> >> preceding
>>>>>> >> a row in a given window. As  windows are specified by the
>>>>>> partition and
>>>>>> >> order by fields I don't understand the effect of passing in an arg
>>>>>> to RANK
>>>>>> >>
>>>>>> >> Are the other non-aggregate function similar? Meaning : Cume_dist,
>>>>>> Ntile
>>>>>> >> --
>>>>>> >>
>>>>>> >>
>>>>>> >> https://github.com/bearrito
>>>>>> >> @deepbearrito
>>>>>> >>
>>>>>> >
>>>>>> >
>>>>>> > The above terms reflect a potential business arrangement, are
>>>>>> provided solely
>>>>>> > as a basis for further discussion, and are not intended to be and
>>>>>> do not
>>>>>> > constitute a legally binding obligation. No legally binding
>>>>>> obligations will
>>>>>> > be created, implied, or inferred until an agreement in final form
>>>>>> is executed
>>>>>> > in writing by all parties involved.
>>>>>> >
>>>>>> > This email and any attachments hereto may be confidential or
>>>>>> privileged.
>>>>>> >  If you received this communication by mistake, please don't
>>>>>> forward it
>>>>>> > to anyone else, please erase all copies and attachments, and please
>>>>>> let
>>>>>> > me know that it has gone to the wrong person. Thanks.
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>> https://github.com/bearrito
>>>>>> @deepbearrito
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> https://github.com/bearrito
>>>>> @deepbearrito
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> https://github.com/bearrito
>>>> @deepbearrito
>>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>> https://github.com/bearrito
>>> @deepbearrito
>>>
>>
>>
>