You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Mich Talebzadeh <mi...@peridale.co.uk> on 2015/03/29 01:10:34 UTC

ORDER BY clause in Hive

Hi,

 

Can someone point me to doc or otherwise to see if ORDER BY clause in Hive
is working OK

 

I have a simple aggregate query as follows:

 

SELECT cust_id AS Customer_ID,

COUNT(amount_sold) AS Number_of_orders,

SUM(amount_sold) AS Total_customer_amount,

AVG(amount_sold) AS Average_order,

STDDEV(amount_sold) AS Standard_deviation

FROM sales

GROUP BY cust_id

HAVING SUM(amount_sold) > 94000

AND AVG(amount_sold) < STDDEV(amount_sold)

ORDER BY 3 ;

 

The original table and data are from Oracle sh.sales table

 

Oracle comes back for this query with

 

CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER
STANDARD_DEVIATION

----------- ---------------- --------------------- -------------
------------------

       1743              238              94786.13     398.26105
582.26845

        429              231              94819.41    410.473636
615.038404

       2994              227              94862.61     417.89696
624.53793

       6395              268              97010.48    361.979403
577.991448

      12783              240              97573.55    406.556458
591.6785

       4974              235              98006.16    417.047489
625.670115

      42167              266              98585.96     370.62391
592.079099

      10747              256              99578.09    388.976914
601.938312

      11407              248             103412.66    416.986532
623.479751

 

9 rows selected.

 

Ordered by TOTAL_CUSTOMER_AMOUNT

 

And hive returns for the same query

 

+--------------+-------------------+------------------------+---------------
------+---------------------+--+

| customer_id  | number_of_orders  | total_customer_amount  |
average_order    | standard_deviation  |

+--------------+-------------------+------------------------+---------------
------+---------------------+--+

| 42167.0      | 266               | 98585.96000000002      |
370.6239097744362   | 590.965120684093    |

| 12783.0      | 240               | 97573.54999999996      |
406.5564583333332   | 590.4445500393804   |

| 11407.0      | 248               | 103412.65999999995     |
416.9865322580643   | 622.221465710723    |

| 10747.0      | 256               | 99578.08999999997      |
388.9769140624999   | 600.7615005975689   |

| 6395.0       | 268               | 97010.47999999998      |
361.97940298507456  | 576.9120977984521   |

| 4974.0       | 235               | 98006.16000000002      |
417.0474893617022   | 624.337482834059    |

| 2994.0       | 227               | 94862.61000000006      |
417.89696035242315  | 623.1607772763742   |

| 1743.0       | 238               | 94786.12999999993      |
398.2610504201678   | 581.0439095219863   |

| 429.0        | 231               | 94819.41000000006      |
410.4736363636366   | 613.7057080691426   |

+--------------+-------------------+------------------------+---------------
------+---------------------+--+

9 rows selected (215.774 seconds)

 

But ordering in Hive does not seem to be correct! Please note ordering is on
column three, total_customer_amount

 

I also tried this in Sybase and got the same as Oracle.

 

Adaptive Server cpu time: 100 ms.

Customer_ID          Number_of_orders Total_customer_amount
Average_order                                        Standard_deviation

-------------------- ----------------
-----------------------------------------
----------------------------------------------------
---------------------------

                1743              238
94786.13                                    398.2610504201680
582.268450

                  429              231
94819.41                                    410.4736363636363
615.038404

                 2994              227
94862.61                                    417.8969603524229
624.537930

                 6395              268
97010.48                                    361.9794029850746
577.991448

                12783              240
97573.55                                    406.5564583333333
591.678500

                 4974              235
98006.16                                    417.0474893617021
625.670115

                42167              266
98585.96                                    370.6239097744360
592.079099

                10747              256
99578.09                                    388.9769140625000
601.938312

                11407              248
103412.66                                    416.9865322580645
623.479751

 

I tried Google search and seems to be different suggestions. May be I have
to rewrite the code?

 

Thanks

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

 


Re: ORDER BY clause in Hive

Posted by Lefty Leverenz <le...@gmail.com>.
Have you looked at the Hive wiki?  Here's the section on ORDER BY
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
.

-- Lefty

On Sat, Mar 28, 2015 at 9:45 PM, Gopal Vijayaraghavan <go...@apache.org>
wrote:

>
> > SELECT cust_id AS Customer_ID,
> > Š
> > ORDER BY 3 ;
>
>
> You¹re sorting on a constant (literal value 3). The results are what you
> get when you run a non-stable sort on a constant.
>
> Cheers,
> Gopal
>
>
>

Re: ORDER BY clause in Hive

Posted by Gopal Vijayaraghavan <go...@apache.org>.
 
> SELECT cust_id AS Customer_ID,
> Š
> ORDER BY 3 ;
 

You¹re sorting on a constant (literal value 3). The results are what you
get when you run a non-stable sort on a constant.

Cheers,
Gopal



Re: ORDER BY clause in Hive

Posted by Lefty Leverenz <le...@gmail.com>.
Thanks for the update, Mich.  And thanks to you & Gopal for prompting me to
get the documentation done.

-- Lefty

On Tue, Mar 31, 2015 at 5:20 AM, Mich Talebzadeh <mi...@peridale.co.uk>
wrote:

> Thanks Lefty for the information provided.
>
>
>
> My version of hive is 014.0
>
>
>
> hive --version
>
> Hive 0.14.0
>
>
>
> Which should support the configuration parameter at the session level à
> set hive.groupby.orderby.position.alias=true
>
>
>
>
>
> set hive.groupby.orderby.position.alias=true;
>
> SELECT
>
>           rs.Customer_ID
>
>         , rs.Number_of_orders
>
>         , *rs.Total_customer_amount*
>
>         , rs.Average_order
>
>         , rs.Standard_deviation
>
> FROM
>
> (
>
>         SELECT cust_id AS Customer_ID,
>
>         COUNT(amount_sold) AS Number_of_orders,
>
>         SUM(amount_sold) AS Total_customer_amount,
>
>         AVG(amount_sold) AS Average_order,
>
>         stddev_samp(amount_sold) AS Standard_deviation
>
>         FROM sales
>
>         GROUP BY cust_id
>
>         HAVING SUM(amount_sold) > 94000
>
>         AND AVG(amount_sold) < stddev_samp(amount_sold)
>
> ) rs
>
> ORDER BY
>
>           -- Total_customer_amount DESC
>
> *          3 DESC*
>
> *;*
>
>
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> | rs.customer_id  | rs.number_of_orders  | rs.total_customer_amount  |
> rs.average_order   | rs.standard_deviation  |
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> | 11407.0         | 248                  | 103412.65999999995        |
> 416.9865322580643   | 623.4797510518939      |
>
> | 10747.0         | 256                  | 99578.08999999997         |
> 388.9769140624999   | 601.9383117167412      |
>
> | 42167.0         | 266                  | 98585.96000000002         |
> 370.6239097744362   | 592.0790992800527      |
>
> | 4974.0          | 235                  | 98006.16000000002         |
> 417.0474893617022   | 625.670115050053       |
>
> | 12783.0         | 240                  | 97573.54999999996         |
> 406.5564583333332   | 591.6785002882084      |
>
> | 6395.0          | 268                  | 97010.47999999998         |
> 361.97940298507456  | 577.991447849281       |
>
> | 2994.0          | 227                  | 94862.61000000006         |
> 417.89696035242315  | 624.5379298449825      |
>
> | 429.0           | 231                  | 94819.41000000006         |
> 410.4736363636366   | 615.0384039014772      |
>
> | 1743.0          | 238                  | 94786.12999999993         |
> 398.2610504201678   | 582.2684502048478      |
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> 9 rows selected (212.535 seconds)
>
>
>
> Indeed this is the correct result ordering by column posirion
>
>
>
> At the Hive server level I added the folowing properties to
> $HIVE_HOME/conf/ hive-site.xml
>
>
>
>   <property>
>
>     <name>hive.groupby.orderby.position.alias</name>
>
>     <value>true</value>
>
>     <description>Eenables using Column Position Alias in GROUP BY and
> ORDER BY clauses of queries.</description>
>
>   </property>
>
>
>
> And ran the above query without session level setting and it worked
>
>
>
> Regards,
>
>
>
> Mich Talebzadeh
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Publications due shortly:*
>
> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache*
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Ltd, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
> any responsibility.
>
>
>
> *From:* Lefty Leverenz [mailto:leftyleverenz@gmail.com]
> *Sent:* 31 March 2015 07:18
> *To:* user@hive.apache.org
> *Subject:* Re: ORDER BY clause in Hive
>
>
>
> I've opened HIVE-10160 <https://issues.apache.org/jira/browse/HIVE-10160>:
>  Give a warning when grouping or ordering by a constant column.
>
>
>
> Thanks Gopal.
>
>
> -- Lefty
>
>
>
> On Tue, Mar 31, 2015 at 2:14 AM, Lefty Leverenz <le...@gmail.com>
> wrote:
>
> ---------- Forwarded message ----------
>
> From: *Lefty Leverenz* <le...@gmail.com>
> Date: Tue, Mar 31, 2015 at 1:47 AM
> Subject: Re: ORDER BY clause in Hive
>
> To: Mich Talebzadeh <mi...@peridale.co.uk>
>
> Hive as I see it does not support ORDER BY *Column position*. It only
> supports ORDER BY *Column name*.
>
>
>
> That's just in Hive release 0.10.0 and earlier.  In release 0.11.0+ you
> can set the configuration parameter
> <https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-ConfiguringHive>
>  *hive.groupby.orderby.position.alias* to true, and then you can use
> column positions in ORDER BY.
>
>
>
> Here's the new documentation:
>
>    - hive.groupby.orderby.position.alias
>    <https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.groupby.orderby.position.alias>
>    - Order By
>    <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
>
>    - Group By
>    <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy#LanguageManualGroupBy-GroupBySyntax>
>
>
> -- Lefty
>

RE: ORDER BY clause in Hive

Posted by Mich Talebzadeh <mi...@peridale.co.uk>.
Thanks Lefty for the information provided.

 

My version of hive is 014.0

 

hive --version

Hive 0.14.0

 

Which should support the configuration parameter at the session level à set hive.groupby.orderby.position.alias=true

 

 

set hive.groupby.orderby.position.alias=true;

SELECT

          rs.Customer_ID

        , rs.Number_of_orders

        , rs.Total_customer_amount

        , rs.Average_order

        , rs.Standard_deviation

FROM

(

        SELECT cust_id AS Customer_ID,

        COUNT(amount_sold) AS Number_of_orders,

        SUM(amount_sold) AS Total_customer_amount,

        AVG(amount_sold) AS Average_order,

        stddev_samp(amount_sold) AS Standard_deviation

        FROM sales

        GROUP BY cust_id

        HAVING SUM(amount_sold) > 94000

        AND AVG(amount_sold) < stddev_samp(amount_sold)

) rs

ORDER BY

          -- Total_customer_amount DESC

          3 DESC

;

 

+-----------------+----------------------+---------------------------+---------------------+------------------------+--+

| rs.customer_id  | rs.number_of_orders  | rs.total_customer_amount  |  rs.average_order   | rs.standard_deviation  |

+-----------------+----------------------+---------------------------+---------------------+------------------------+--+

| 11407.0         | 248                  | 103412.65999999995        | 416.9865322580643   | 623.4797510518939      |

| 10747.0         | 256                  | 99578.08999999997         | 388.9769140624999   | 601.9383117167412      |

| 42167.0         | 266                  | 98585.96000000002         | 370.6239097744362   | 592.0790992800527      |

| 4974.0          | 235                  | 98006.16000000002         | 417.0474893617022   | 625.670115050053       |

| 12783.0         | 240                  | 97573.54999999996         | 406.5564583333332   | 591.6785002882084      |

| 6395.0          | 268                  | 97010.47999999998         | 361.97940298507456  | 577.991447849281       |

| 2994.0          | 227                  | 94862.61000000006         | 417.89696035242315  | 624.5379298449825      |

| 429.0           | 231                  | 94819.41000000006         | 410.4736363636366   | 615.0384039014772      |

| 1743.0          | 238                  | 94786.12999999993         | 398.2610504201678   | 582.2684502048478      |

+-----------------+----------------------+---------------------------+---------------------+------------------------+--+

9 rows selected (212.535 seconds)

 

Indeed this is the correct result ordering by column posirion

 

At the Hive server level I added the folowing properties to $HIVE_HOME/conf/ hive-site.xml

 

  <property>

    <name>hive.groupby.orderby.position.alias</name>

    <value>true</value>

    <description>Eenables using Column Position Alias in GROUP BY and ORDER BY clauses of queries.</description>

  </property>

 

And ran the above query without session level setting and it worked

 

Regards,

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and Coherence Cache

 

NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.

 

From: Lefty Leverenz [mailto:leftyleverenz@gmail.com] 
Sent: 31 March 2015 07:18
To: user@hive.apache.org
Subject: Re: ORDER BY clause in Hive

 

I've opened HIVE-10160 <https://issues.apache.org/jira/browse/HIVE-10160> :  Give a warning when grouping or ordering by a constant column.

 

Thanks Gopal.




-- Lefty

 

On Tue, Mar 31, 2015 at 2:14 AM, Lefty Leverenz <le...@gmail.com> wrote:

---------- Forwarded message ----------

From: Lefty Leverenz <le...@gmail.com>
Date: Tue, Mar 31, 2015 at 1:47 AM
Subject: Re: ORDER BY clause in Hive

To: Mich Talebzadeh <mi...@peridale.co.uk>



Hive as I see it does not support ORDER BY Column position. It only supports ORDER BY Column name.

 

That's just in Hive release 0.10.0 and earlier.  In release 0.11.0+ you can set the configuration parameter <https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-ConfiguringHive>  hive.groupby.orderby.position.alias to true, and then you can use column positions in ORDER BY.

 

Here's the new documentation:

*	 <https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.groupby.orderby.position.alias> hive.groupby.orderby.position.alias
*	Order By <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>  
*	Group By <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy#LanguageManualGroupBy-GroupBySyntax> 




-- Lefty


Re: ORDER BY clause in Hive

Posted by Lefty Leverenz <le...@gmail.com>.
I've opened HIVE-10160 <https://issues.apache.org/jira/browse/HIVE-10160>:
 Give a warning when grouping or ordering by a constant column.

Thanks Gopal.

-- Lefty

On Tue, Mar 31, 2015 at 2:14 AM, Lefty Leverenz <le...@gmail.com>
wrote:

> ---------- Forwarded message ----------
> From: Lefty Leverenz <le...@gmail.com>
> Date: Tue, Mar 31, 2015 at 1:47 AM
> Subject: Re: ORDER BY clause in Hive
> To: Mich Talebzadeh <mi...@peridale.co.uk>
>
>
> Hive as I see it does not support ORDER BY *Column position*. It only
>> supports ORDER BY *Column name*.
>>
>
> That's just in Hive release 0.10.0 and earlier.  In release 0.11.0+ you
> can set the configuration parameter
> <https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-ConfiguringHive>
>  *hive.groupby.orderby.position.**alias* to true, and then you can use
> column positions in ORDER BY.
>
> Here's the new documentation:
>
>    - hive.groupby.orderby.position.alias
>    <https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.groupby.orderby.position.alias>
>    - Order By
>    <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
>
>    - Group By
>    <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy#LanguageManualGroupBy-GroupBySyntax>
>
>
> -- Lefty
>
> On Mon, Mar 30, 2015 at 4:46 AM, Mich Talebzadeh <mi...@peridale.co.uk>
> wrote:
>
>> Gents,
>>
>>
>>
>> Hive as I see it does not support ORDER BY *Column position*. It only
>> supports ORDER BY *Column name*.
>>
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>> Mich Talebzadeh
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> *Publications due shortly:*
>>
>> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen
>> and Coherence Cache*
>>
>>
>>
>> NOTE: The information in this email is proprietary and confidential. This
>> message is for the designated recipient only, if you are not the intended
>> recipient, you should destroy it immediately. Any information in this
>> message shall not be understood as given or endorsed by Peridale Ltd, its
>> subsidiaries or their employees, unless expressly so stated. It is the
>> responsibility of the recipient to ensure that this email is virus free,
>> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
>> any responsibility.
>>
>>
>>
>> *From:* Gopal Vijayaraghavan [mailto:gopal@hortonworks.com] *On Behalf
>> Of *Gopal Vijayaraghavan
>> *Sent:* 30 March 2015 05:26
>> *To:* user@hive.apache.org
>> *Cc:* Lefty Leverenz
>>
>> *Subject:* Re: ORDER BY clause in Hive
>>
>>
>>
>> Hi Lefty,
>>
>>
>>
>> Couldn’t find the documentation for what
>> hive.groupby.orderby.position.alias=true does.
>>
>>
>>
>> I suspect that might be what Mich was looking for (though I tend to write
>> the column names explicitly).
>>
>>
>>
>> Cheers,
>>
>> Gopal
>>
>>
>>
>> *From: *Lefty Leverenz <le...@gmail.com>
>> *Reply-To: *"user@hive.apache.org" <us...@hive.apache.org>
>> *Date: *Sunday, March 29, 2015 at 8:32 PM
>> *To: *"user@hive.apache.org" <us...@hive.apache.org>
>> *Subject: *Re: ORDER BY clause in Hive
>>
>>
>>
>> I added information about this in the Order By
>> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
>> section of the wiki.  Thanks, Mich and Gopal!
>>
>>
>> -- Lefty
>>
>>
>>
>> On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh <mi...@peridale.co.uk>
>> wrote:
>>
>> Hi Lefty, Gopal,
>>
>>
>>
>> It appears that ORDER BY 3 is not interpreted as ORDERR BY “the result
>> set column three” which standard SQL as evident from Oracle and Sybase does.
>>
>>
>>
>> So I made it an ORDER BY from the result set EXPLICITELY as shown below
>> and it worked OK
>>
>>
>>
>> SELECT
>>
>>           rs.Customer_ID
>>
>>         , rs.Number_of_orders
>>
>>         , rs.Total_customer_amount
>>
>>         , rs.Average_order
>>
>>         , rs.Standard_deviation
>>
>> FROM
>>
>> (
>>
>>         SELECT cust_id AS Customer_ID,
>>
>>         COUNT(amount_sold) AS Number_of_orders,
>>
>>         SUM(amount_sold) AS Total_customer_amount,
>>
>>         AVG(amount_sold) AS Average_order,
>>
>>         STDDEV(amount_sold) AS Standard_deviation
>>
>>         FROM sales
>>
>>         GROUP BY cust_id
>>
>>         HAVING SUM(amount_sold) > 94000
>>
>>         AND AVG(amount_sold) < STDDEV(amount_sold)
>>
>> ) rs
>>
>> ORDER BY
>>
>>           *rs.Total_customer_amount*
>>
>> ;
>>
>>
>>
>>
>>
>>
>> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>>
>> | rs.customer_id  | rs.number_of_orders  | rs.total_customer_amount  |
>> rs.average_order   | rs.standard_deviation  |
>>
>>
>> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>>
>> | 1743.0          | 238                  | 94786.12999999993         |
>> 398.2610504201678   | 581.0439095219863      |
>>
>> | 429.0           | 231                  | 94819.41000000006         |
>> 410.4736363636366   | 613.7057080691426      |
>>
>> | 2994.0          | 227                  | 94862.61000000006         |
>> 417.89696035242315  | 623.1607772763742      |
>>
>> | 6395.0          | 268                  | 97010.47999999998         |
>> 361.97940298507456  | 576.9120977984521      |
>>
>> | 12783.0         | 240                  | 97573.54999999996         |
>> 406.5564583333332   | 590.4445500393804      |
>>
>> | 4974.0          | 235                  | 98006.16000000002         |
>> 417.0474893617022   | 624.337482834059       |
>>
>> | 42167.0         | 266                  | 98585.96000000002         |
>> 370.6239097744362   | 590.965120684093       |
>>
>> | 10747.0         | 256                  | 99578.08999999997         |
>> 388.9769140624999   | 600.7615005975689      |
>>
>> | 11407.0         | 248                  | 103412.65999999995        |
>> 416.9865322580643   | 622.221465710723       |
>>
>>
>> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>>
>> 9 rows selected (209.699 seconds)
>>
>>
>>
>>
>>
>> Regards,
>>
>>
>>
>> Mich Talebzadeh
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> *Publications due shortly:*
>>
>> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen
>> and Coherence Cache*
>>
>>
>>
>> NOTE: The information in this email is proprietary and confidential. This
>> message is for the designated recipient only, if you are not the intended
>> recipient, you should destroy it immediately. Any information in this
>> message shall not be understood as given or endorsed by Peridale Ltd, its
>> subsidiaries or their employees, unless expressly so stated. It is the
>> responsibility of the recipient to ensure that this email is virus free,
>> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
>> any responsibility.
>>
>>
>>
>> *From:* Mich Talebzadeh [mailto:mich@peridale.co.uk]
>> *Sent:* 29 March 2015 00:11
>> *To:* user@hive.apache.org
>> *Subject:* ORDER BY clause in Hive
>>
>>
>>
>> Hi,
>>
>>
>>
>> Can someone point me to doc or otherwise to see if ORDER BY clause in
>> Hive is working OK
>>
>>
>>
>> I have a simple aggregate query as follows:
>>
>>
>>
>> SELECT cust_id AS Customer_ID,
>>
>> COUNT(amount_sold) AS Number_of_orders,
>>
>> SUM(amount_sold) AS Total_customer_amount,
>>
>> AVG(amount_sold) AS Average_order,
>>
>> STDDEV(amount_sold) AS Standard_deviation
>>
>> FROM sales
>>
>> GROUP BY cust_id
>>
>> HAVING SUM(amount_sold) > 94000
>>
>> AND AVG(amount_sold) < STDDEV(amount_sold)
>>
>> ORDER BY 3 ;
>>
>>
>>
>> The original table and data are from Oracle sh.sales table
>>
>>
>>
>> Oracle comes back for this query with
>>
>>
>>
>> CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER
>> STANDARD_DEVIATION
>>
>> ----------- ---------------- --------------------- -------------
>> ------------------
>>
>>        1743              238              94786.13     398.26105
>> 582.26845
>>
>>         429              231              94819.41    410.473636
>> 615.038404
>>
>>        2994              227              94862.61     417.89696
>> 624.53793
>>
>>        6395              268              97010.48    361.979403
>> 577.991448
>>
>>       12783              240              97573.55
>> 406.556458           591.6785
>>
>>        4974              235              98006.16    417.047489
>> 625.670115
>>
>>       42167              266              98585.96     370.62391
>> 592.079099
>>
>>       10747              256              99578.09    388.976914
>> 601.938312
>>
>>       11407              248             103412.66    416.986532
>> 623.479751
>>
>>
>>
>> 9 rows selected.
>>
>>
>>
>> Ordered by TOTAL_CUSTOMER_AMOUNT
>>
>>
>>
>> And hive returns for the same query
>>
>>
>>
>>
>> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>>
>> | customer_id  | number_of_orders  | total_customer_amount  |
>> average_order    | standard_deviation  |
>>
>>
>> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>>
>> | 42167.0      | 266               | 98585.96000000002      |
>> 370.6239097744362   | 590.965120684093    |
>>
>> | 12783.0      | 240               | 97573.54999999996      |
>> 406.5564583333332   | 590.4445500393804   |
>>
>> | 11407.0      | 248               | 103412.65999999995     |
>> 416.9865322580643   | 622.221465710723    |
>>
>> | 10747.0      | 256               | 99578.08999999997      |
>> 388.9769140624999   | 600.7615005975689   |
>>
>> | 6395.0       | 268               | 97010.47999999998      |
>> 361.97940298507456  | 576.9120977984521   |
>>
>> | 4974.0       | 235               | 98006.16000000002      |
>> 417.0474893617022   | 624.337482834059    |
>>
>> | 2994.0       | 227               | 94862.61000000006      |
>> 417.89696035242315  | 623.1607772763742   |
>>
>> | 1743.0       | 238               | 94786.12999999993      |
>> 398.2610504201678   | 581.0439095219863   |
>>
>> | 429.0        | 231               | 94819.41000000006      |
>> 410.4736363636366   | 613.7057080691426   |
>>
>>
>> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>>
>> 9 rows selected (215.774 seconds)
>>
>>
>>
>> But ordering in Hive does not seem to be correct! Please note ordering is
>> on column three, *total_customer_amount*
>>
>>
>>
>> I also tried this in Sybase and got the same as Oracle.
>>
>>
>>
>> Adaptive Server cpu time: 100 ms.
>>
>> Customer_ID          Number_of_orders
>> Total_customer_amount
>> Average_order                                        Standard_deviation
>>
>> -------------------- ----------------
>> -----------------------------------------
>> ----------------------------------------------------
>> ---------------------------
>>
>>                 1743              238
>> 94786.13
>> 398.2610504201680                  582.268450
>>
>>                   429              231
>> 94819.41
>>                      410.4736363636363                  615.038404
>>
>>                  2994              227
>> 94862.61
>> 417.8969603524229                  624.537930
>>
>>                  6395              268
>> 97010.48
>> 361.9794029850746                  577.991448
>>
>>                 12783              240
>> 97573.55
>> 406.5564583333333                  591.678500
>>
>>                  4974              235
>> 98006.16
>> 417.0474893617021                  625.670115
>>
>>                 42167              266
>>                    98585.96
>> 370.6239097744360                  592.079099
>>
>>                 10747              256
>> 99578.09                                    388.9769140625000
>>     601.938312
>>
>>                 11407              248
>> 103412.66
>> 416.9865322580645                  623.479751
>>
>>
>>
>> I tried Google search and seems to be different suggestions. May be I
>> have to rewrite the code?
>>
>>
>>
>> Thanks
>>
>>
>>
>> Mich Talebzadeh
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> *Publications due shortly:*
>>
>> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen
>> and Coherence Cache*
>>
>>
>>
>> NOTE: The information in this email is proprietary and confidential. This
>> message is for the designated recipient only, if you are not the intended
>> recipient, you should destroy it immediately. Any information in this
>> message shall not be understood as given or endorsed by Peridale Ltd, its
>> subsidiaries or their employees, unless expressly so stated. It is the
>> responsibility of the recipient to ensure that this email is virus free,
>> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
>> any responsibility.
>>
>>
>>
>>
>>
>
>
>

Fwd: ORDER BY clause in Hive

Posted by Lefty Leverenz <le...@gmail.com>.
---------- Forwarded message ----------
From: Lefty Leverenz <le...@gmail.com>
Date: Tue, Mar 31, 2015 at 1:47 AM
Subject: Re: ORDER BY clause in Hive
To: Mich Talebzadeh <mi...@peridale.co.uk>


Hive as I see it does not support ORDER BY *Column position*. It only
> supports ORDER BY *Column name*.
>

That's just in Hive release 0.10.0 and earlier.  In release 0.11.0+ you can set
the configuration parameter
<https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-ConfiguringHive>
 *hive.groupby.orderby.position.**alias* to true, and then you can use
column positions in ORDER BY.

Here's the new documentation:

   - hive.groupby.orderby.position.alias
   <https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.groupby.orderby.position.alias>
   - Order By
   <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>

   - Group By
   <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy#LanguageManualGroupBy-GroupBySyntax>


-- Lefty

On Mon, Mar 30, 2015 at 4:46 AM, Mich Talebzadeh <mi...@peridale.co.uk>
wrote:

> Gents,
>
>
>
> Hive as I see it does not support ORDER BY *Column position*. It only
> supports ORDER BY *Column name*.
>
>
>
> Thanks
>
>
>
>
>
> Mich Talebzadeh
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Publications due shortly:*
>
> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache*
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Ltd, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
> any responsibility.
>
>
>
> *From:* Gopal Vijayaraghavan [mailto:gopal@hortonworks.com] *On Behalf Of
> *Gopal Vijayaraghavan
> *Sent:* 30 March 2015 05:26
> *To:* user@hive.apache.org
> *Cc:* Lefty Leverenz
>
> *Subject:* Re: ORDER BY clause in Hive
>
>
>
> Hi Lefty,
>
>
>
> Couldn’t find the documentation for what
> hive.groupby.orderby.position.alias=true does.
>
>
>
> I suspect that might be what Mich was looking for (though I tend to write
> the column names explicitly).
>
>
>
> Cheers,
>
> Gopal
>
>
>
> *From: *Lefty Leverenz <le...@gmail.com>
> *Reply-To: *"user@hive.apache.org" <us...@hive.apache.org>
> *Date: *Sunday, March 29, 2015 at 8:32 PM
> *To: *"user@hive.apache.org" <us...@hive.apache.org>
> *Subject: *Re: ORDER BY clause in Hive
>
>
>
> I added information about this in the Order By
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
> section of the wiki.  Thanks, Mich and Gopal!
>
>
> -- Lefty
>
>
>
> On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh <mi...@peridale.co.uk>
> wrote:
>
> Hi Lefty, Gopal,
>
>
>
> It appears that ORDER BY 3 is not interpreted as ORDERR BY “the result set
> column three” which standard SQL as evident from Oracle and Sybase does.
>
>
>
> So I made it an ORDER BY from the result set EXPLICITELY as shown below
> and it worked OK
>
>
>
> SELECT
>
>           rs.Customer_ID
>
>         , rs.Number_of_orders
>
>         , rs.Total_customer_amount
>
>         , rs.Average_order
>
>         , rs.Standard_deviation
>
> FROM
>
> (
>
>         SELECT cust_id AS Customer_ID,
>
>         COUNT(amount_sold) AS Number_of_orders,
>
>         SUM(amount_sold) AS Total_customer_amount,
>
>         AVG(amount_sold) AS Average_order,
>
>         STDDEV(amount_sold) AS Standard_deviation
>
>         FROM sales
>
>         GROUP BY cust_id
>
>         HAVING SUM(amount_sold) > 94000
>
>         AND AVG(amount_sold) < STDDEV(amount_sold)
>
> ) rs
>
> ORDER BY
>
>           *rs.Total_customer_amount*
>
> ;
>
>
>
>
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> | rs.customer_id  | rs.number_of_orders  | rs.total_customer_amount  |
> rs.average_order   | rs.standard_deviation  |
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> | 1743.0          | 238                  | 94786.12999999993         |
> 398.2610504201678   | 581.0439095219863      |
>
> | 429.0           | 231                  | 94819.41000000006         |
> 410.4736363636366   | 613.7057080691426      |
>
> | 2994.0          | 227                  | 94862.61000000006         |
> 417.89696035242315  | 623.1607772763742      |
>
> | 6395.0          | 268                  | 97010.47999999998         |
> 361.97940298507456  | 576.9120977984521      |
>
> | 12783.0         | 240                  | 97573.54999999996         |
> 406.5564583333332   | 590.4445500393804      |
>
> | 4974.0          | 235                  | 98006.16000000002         |
> 417.0474893617022   | 624.337482834059       |
>
> | 42167.0         | 266                  | 98585.96000000002         |
> 370.6239097744362   | 590.965120684093       |
>
> | 10747.0         | 256                  | 99578.08999999997         |
> 388.9769140624999   | 600.7615005975689      |
>
> | 11407.0         | 248                  | 103412.65999999995        |
> 416.9865322580643   | 622.221465710723       |
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> 9 rows selected (209.699 seconds)
>
>
>
>
>
> Regards,
>
>
>
> Mich Talebzadeh
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Publications due shortly:*
>
> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache*
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Ltd, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
> any responsibility.
>
>
>
> *From:* Mich Talebzadeh [mailto:mich@peridale.co.uk]
> *Sent:* 29 March 2015 00:11
> *To:* user@hive.apache.org
> *Subject:* ORDER BY clause in Hive
>
>
>
> Hi,
>
>
>
> Can someone point me to doc or otherwise to see if ORDER BY clause in Hive
> is working OK
>
>
>
> I have a simple aggregate query as follows:
>
>
>
> SELECT cust_id AS Customer_ID,
>
> COUNT(amount_sold) AS Number_of_orders,
>
> SUM(amount_sold) AS Total_customer_amount,
>
> AVG(amount_sold) AS Average_order,
>
> STDDEV(amount_sold) AS Standard_deviation
>
> FROM sales
>
> GROUP BY cust_id
>
> HAVING SUM(amount_sold) > 94000
>
> AND AVG(amount_sold) < STDDEV(amount_sold)
>
> ORDER BY 3 ;
>
>
>
> The original table and data are from Oracle sh.sales table
>
>
>
> Oracle comes back for this query with
>
>
>
> CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER
> STANDARD_DEVIATION
>
> ----------- ---------------- --------------------- -------------
> ------------------
>
>        1743              238              94786.13     398.26105
> 582.26845
>
>         429              231              94819.41    410.473636
> 615.038404
>
>        2994              227              94862.61     417.89696
> 624.53793
>
>        6395              268              97010.48    361.979403
> 577.991448
>
>       12783              240              97573.55    406.556458
> 591.6785
>
>        4974              235              98006.16    417.047489
> 625.670115
>
>       42167              266              98585.96     370.62391
> 592.079099
>
>       10747              256              99578.09    388.976914
> 601.938312
>
>       11407              248             103412.66    416.986532
> 623.479751
>
>
>
> 9 rows selected.
>
>
>
> Ordered by TOTAL_CUSTOMER_AMOUNT
>
>
>
> And hive returns for the same query
>
>
>
>
> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>
> | customer_id  | number_of_orders  | total_customer_amount  |
> average_order    | standard_deviation  |
>
>
> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>
> | 42167.0      | 266               | 98585.96000000002      |
> 370.6239097744362   | 590.965120684093    |
>
> | 12783.0      | 240               | 97573.54999999996      |
> 406.5564583333332   | 590.4445500393804   |
>
> | 11407.0      | 248               | 103412.65999999995     |
> 416.9865322580643   | 622.221465710723    |
>
> | 10747.0      | 256               | 99578.08999999997      |
> 388.9769140624999   | 600.7615005975689   |
>
> | 6395.0       | 268               | 97010.47999999998      |
> 361.97940298507456  | 576.9120977984521   |
>
> | 4974.0       | 235               | 98006.16000000002      |
> 417.0474893617022   | 624.337482834059    |
>
> | 2994.0       | 227               | 94862.61000000006      |
> 417.89696035242315  | 623.1607772763742   |
>
> | 1743.0       | 238               | 94786.12999999993      |
> 398.2610504201678   | 581.0439095219863   |
>
> | 429.0        | 231               | 94819.41000000006      |
> 410.4736363636366   | 613.7057080691426   |
>
>
> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>
> 9 rows selected (215.774 seconds)
>
>
>
> But ordering in Hive does not seem to be correct! Please note ordering is
> on column three, *total_customer_amount*
>
>
>
> I also tried this in Sybase and got the same as Oracle.
>
>
>
> Adaptive Server cpu time: 100 ms.
>
> Customer_ID          Number_of_orders
> Total_customer_amount
> Average_order                                        Standard_deviation
>
> -------------------- ----------------
> -----------------------------------------
> ----------------------------------------------------
> ---------------------------
>
>                 1743              238
> 94786.13
> 398.2610504201680                  582.268450
>
>                   429              231
> 94819.41
>                      410.4736363636363                  615.038404
>
>                  2994              227
> 94862.61
> 417.8969603524229                  624.537930
>
>                  6395              268
> 97010.48
> 361.9794029850746                  577.991448
>
>                 12783              240
> 97573.55
> 406.5564583333333                  591.678500
>
>                  4974              235
> 98006.16
> 417.0474893617021                  625.670115
>
>                 42167              266
>                    98585.96
> 370.6239097744360                  592.079099
>
>                 10747              256
> 99578.09                                    388.9769140625000
>     601.938312
>
>                 11407              248
> 103412.66
> 416.9865322580645                  623.479751
>
>
>
> I tried Google search and seems to be different suggestions. May be I have
> to rewrite the code?
>
>
>
> Thanks
>
>
>
> Mich Talebzadeh
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Publications due shortly:*
>
> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache*
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Ltd, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
> any responsibility.
>
>
>
>
>

RE: ORDER BY clause in Hive

Posted by Mich Talebzadeh <mi...@peridale.co.uk>.
Gents,

 

Hive as I see it does not support ORDER BY Column position. It only supports
ORDER BY Column name.

 

Thanks

 

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

 

From: Gopal Vijayaraghavan [mailto:gopal@hortonworks.com] On Behalf Of Gopal
Vijayaraghavan
Sent: 30 March 2015 05:26
To: user@hive.apache.org
Cc: Lefty Leverenz
Subject: Re: ORDER BY clause in Hive

 

Hi Lefty,

 

Couldn't find the documentation for what
hive.groupby.orderby.position.alias=true does.

 

I suspect that might be what Mich was looking for (though I tend to write
the column names explicitly).

 

Cheers,

Gopal

 

From: Lefty Leverenz <le...@gmail.com>
Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
Date: Sunday, March 29, 2015 at 8:32 PM
To: "user@hive.apache.org" <us...@hive.apache.org>
Subject: Re: ORDER BY clause in Hive

 

I added information about this in the Order By
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#Lang
uageManualSortBy-SyntaxofOrderBy>  section of the wiki.  Thanks, Mich and
Gopal!




-- Lefty

 

On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh <mi...@peridale.co.uk>
wrote:

Hi Lefty, Gopal,

 

It appears that ORDER BY 3 is not interpreted as ORDERR BY "the result set
column three" which standard SQL as evident from Oracle and Sybase does.

 

So I made it an ORDER BY from the result set EXPLICITELY as shown below and
it worked OK

 

SELECT

          rs.Customer_ID

        , rs.Number_of_orders

        , rs.Total_customer_amount

        , rs.Average_order

        , rs.Standard_deviation

FROM

(

        SELECT cust_id AS Customer_ID,

        COUNT(amount_sold) AS Number_of_orders,

        SUM(amount_sold) AS Total_customer_amount,

        AVG(amount_sold) AS Average_order,

        STDDEV(amount_sold) AS Standard_deviation

        FROM sales

        GROUP BY cust_id

        HAVING SUM(amount_sold) > 94000

        AND AVG(amount_sold) < STDDEV(amount_sold)

) rs

ORDER BY

          rs.Total_customer_amount

;

 

 

+-----------------+----------------------+---------------------------+------
---------------+------------------------+--+

| rs.customer_id  | rs.number_of_orders  | rs.total_customer_amount  |
rs.average_order   | rs.standard_deviation  |

+-----------------+----------------------+---------------------------+------
---------------+------------------------+--+

| 1743.0          | 238                  | 94786.12999999993         |
398.2610504201678   | 581.0439095219863      |

| 429.0           | 231                  | 94819.41000000006         |
410.4736363636366   | 613.7057080691426      |

| 2994.0          | 227                  | 94862.61000000006         |
417.89696035242315  | 623.1607772763742      |

| 6395.0          | 268                  | 97010.47999999998         |
361.97940298507456  | 576.9120977984521      |

| 12783.0         | 240                  | 97573.54999999996         |
406.5564583333332   | 590.4445500393804      |

| 4974.0          | 235                  | 98006.16000000002         |
417.0474893617022   | 624.337482834059       |

| 42167.0         | 266                  | 98585.96000000002         |
370.6239097744362   | 590.965120684093       |

| 10747.0         | 256                  | 99578.08999999997         |
388.9769140624999   | 600.7615005975689      |

| 11407.0         | 248                  | 103412.65999999995        |
416.9865322580643   | 622.221465710723       |

+-----------------+----------------------+---------------------------+------
---------------+------------------------+--+

9 rows selected (209.699 seconds)

 

 

Regards,

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

 

From: Mich Talebzadeh [mailto:mich@peridale.co.uk] 
Sent: 29 March 2015 00:11
To: user@hive.apache.org
Subject: ORDER BY clause in Hive

 

Hi,

 

Can someone point me to doc or otherwise to see if ORDER BY clause in Hive
is working OK

 

I have a simple aggregate query as follows:

 

SELECT cust_id AS Customer_ID,

COUNT(amount_sold) AS Number_of_orders,

SUM(amount_sold) AS Total_customer_amount,

AVG(amount_sold) AS Average_order,

STDDEV(amount_sold) AS Standard_deviation

FROM sales

GROUP BY cust_id

HAVING SUM(amount_sold) > 94000

AND AVG(amount_sold) < STDDEV(amount_sold)

ORDER BY 3 ;

 

The original table and data are from Oracle sh.sales table

 

Oracle comes back for this query with

 

CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER
STANDARD_DEVIATION

----------- ---------------- --------------------- -------------
------------------

       1743              238              94786.13     398.26105
582.26845

        429              231              94819.41    410.473636
615.038404

       2994              227              94862.61     417.89696
624.53793

       6395              268              97010.48    361.979403
577.991448

      12783              240              97573.55    406.556458
591.6785

       4974              235              98006.16    417.047489
625.670115

      42167              266              98585.96     370.62391
592.079099

      10747              256              99578.09    388.976914
601.938312

      11407              248             103412.66    416.986532
623.479751

 

9 rows selected.

 

Ordered by TOTAL_CUSTOMER_AMOUNT

 

And hive returns for the same query

 

+--------------+-------------------+------------------------+---------------
------+---------------------+--+

| customer_id  | number_of_orders  | total_customer_amount  |
average_order    | standard_deviation  |

+--------------+-------------------+------------------------+---------------
------+---------------------+--+

| 42167.0      | 266               | 98585.96000000002      |
370.6239097744362   | 590.965120684093    |

| 12783.0      | 240               | 97573.54999999996      |
406.5564583333332   | 590.4445500393804   |

| 11407.0      | 248               | 103412.65999999995     |
416.9865322580643   | 622.221465710723    |

| 10747.0      | 256               | 99578.08999999997      |
388.9769140624999   | 600.7615005975689   |

| 6395.0       | 268               | 97010.47999999998      |
361.97940298507456  | 576.9120977984521   |

| 4974.0       | 235               | 98006.16000000002      |
417.0474893617022   | 624.337482834059    |

| 2994.0       | 227               | 94862.61000000006      |
417.89696035242315  | 623.1607772763742   |

| 1743.0       | 238               | 94786.12999999993      |
398.2610504201678   | 581.0439095219863   |

| 429.0        | 231               | 94819.41000000006      |
410.4736363636366   | 613.7057080691426   |

+--------------+-------------------+------------------------+---------------
------+---------------------+--+

9 rows selected (215.774 seconds)

 

But ordering in Hive does not seem to be correct! Please note ordering is on
column three, total_customer_amount

 

I also tried this in Sybase and got the same as Oracle.

 

Adaptive Server cpu time: 100 ms.

Customer_ID          Number_of_orders Total_customer_amount
Average_order                                        Standard_deviation

-------------------- ----------------
-----------------------------------------
----------------------------------------------------
---------------------------

                1743              238
94786.13                                    398.2610504201680
582.268450

                  429              231
94819.41                                    410.4736363636363
615.038404

                 2994              227
94862.61                                    417.8969603524229
624.537930

                 6395              268
97010.48                                    361.9794029850746
577.991448

                12783              240
97573.55                                    406.5564583333333
591.678500

                 4974              235
98006.16                                    417.0474893617021
625.670115

                42167              266
98585.96                                    370.6239097744360
592.079099

                10747              256
99578.09                                    388.9769140625000
601.938312

                11407              248
103412.66                                    416.9865322580645
623.479751

 

I tried Google search and seems to be different suggestions. May be I have
to rewrite the code?

 

Thanks

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

 

 


Fwd: ORDER BY clause in Hive

Posted by Lefty Leverenz <le...@gmail.com>.
Oops, failed to send this to user@hive.
-- Lefty

---------- Forwarded message ----------
From: Lefty Leverenz <le...@gmail.com>
Date: Mon, Mar 30, 2015 at 12:42 AM
Subject: Re: ORDER BY clause in Hive
To: Gopal Vijayaraghavan <go...@apache.org>


Oho!  Good point, Gopal.

hive.groupby.orderby.position.alias isn't in the wiki yet, but I can put it
there (and revise the ORDER BY and GROUP BY docs).  It was introduced in
release 0.11.0 by HIVE-581
<https://issues.apache.org/jira/browse/HIVE-581> with
a default of false.

Description from HiveConf.java:  "Whether to enable using Column Position
Alias in Group By or Order By."

But shouldn't Mich have gotten an error message?

-- Lefty

On Mon, Mar 30, 2015 at 12:25 AM, Gopal Vijayaraghavan <go...@apache.org>
wrote:

> Hi Lefty,
>
> Couldn’t find the documentation for what
> hive.groupby.orderby.position.alias=true does.
>
> I suspect that might be what Mich was looking for (though I tend to write
> the column names explicitly).
>
> Cheers,
> Gopal
>
> From: Lefty Leverenz <le...@gmail.com>
> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
> Date: Sunday, March 29, 2015 at 8:32 PM
> To: "user@hive.apache.org" <us...@hive.apache.org>
> Subject: Re: ORDER BY clause in Hive
>
> I added information about this in the Order By
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
> section of the wiki.  Thanks, Mich and Gopal!
>
> -- Lefty
>
> On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh <mi...@peridale.co.uk>
> wrote:
>
>> Hi Lefty, Gopal,
>>
>>
>>
>> It appears that ORDER BY 3 is not interpreted as ORDERR BY “the result
>> set column three” which standard SQL as evident from Oracle and Sybase does.
>>
>>
>>
>> So I made it an ORDER BY from the result set EXPLICITELY as shown below
>> and it worked OK
>>
>>
>>
>> SELECT
>>
>>           rs.Customer_ID
>>
>>         , rs.Number_of_orders
>>
>>         , rs.Total_customer_amount
>>
>>         , rs.Average_order
>>
>>         , rs.Standard_deviation
>>
>> FROM
>>
>> (
>>
>>         SELECT cust_id AS Customer_ID,
>>
>>         COUNT(amount_sold) AS Number_of_orders,
>>
>>         SUM(amount_sold) AS Total_customer_amount,
>>
>>         AVG(amount_sold) AS Average_order,
>>
>>         STDDEV(amount_sold) AS Standard_deviation
>>
>>         FROM sales
>>
>>         GROUP BY cust_id
>>
>>         HAVING SUM(amount_sold) > 94000
>>
>>         AND AVG(amount_sold) < STDDEV(amount_sold)
>>
>> ) rs
>>
>> ORDER BY
>>
>>           *rs.Total_customer_amount*
>>
>> ;
>>
>>
>>
>>
>>
>>
>> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>>
>> | rs.customer_id  | rs.number_of_orders  | rs.total_customer_amount  |
>> rs.average_order   | rs.standard_deviation  |
>>
>>
>> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>>
>> | 1743.0          | 238                  | 94786.12999999993         |
>> 398.2610504201678   | 581.0439095219863      |
>>
>> | 429.0           | 231                  | 94819.41000000006         |
>> 410.4736363636366   | 613.7057080691426      |
>>
>> | 2994.0          | 227                  | 94862.61000000006         |
>> 417.89696035242315  | 623.1607772763742      |
>>
>> | 6395.0          | 268                  | 97010.47999999998         |
>> 361.97940298507456  | 576.9120977984521      |
>>
>> | 12783.0         | 240                  | 97573.54999999996         |
>> 406.5564583333332   | 590.4445500393804      |
>>
>> | 4974.0          | 235                  | 98006.16000000002         |
>> 417.0474893617022   | 624.337482834059       |
>>
>> | 42167.0         | 266                  | 98585.96000000002         |
>> 370.6239097744362   | 590.965120684093       |
>>
>> | 10747.0         | 256                  | 99578.08999999997         |
>> 388.9769140624999   | 600.7615005975689      |
>>
>> | 11407.0         | 248                  | 103412.65999999995        |
>> 416.9865322580643   | 622.221465710723       |
>>
>>
>> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>>
>> 9 rows selected (209.699 seconds)
>>
>>
>>
>>
>>
>> Regards,
>>
>>
>>
>> Mich Talebzadeh
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> *Publications due shortly:*
>>
>> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen
>> and Coherence Cache*
>>
>>
>>
>> NOTE: The information in this email is proprietary and confidential. This
>> message is for the designated recipient only, if you are not the intended
>> recipient, you should destroy it immediately. Any information in this
>> message shall not be understood as given or endorsed by Peridale Ltd, its
>> subsidiaries or their employees, unless expressly so stated. It is the
>> responsibility of the recipient to ensure that this email is virus free,
>> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
>> any responsibility.
>>
>>
>>
>> *From:* Mich Talebzadeh [mailto:mich@peridale.co.uk]
>> *Sent:* 29 March 2015 00:11
>> *To:* user@hive.apache.org
>> *Subject:* ORDER BY clause in Hive
>>
>>
>>
>> Hi,
>>
>>
>>
>> Can someone point me to doc or otherwise to see if ORDER BY clause in
>> Hive is working OK
>>
>>
>>
>> I have a simple aggregate query as follows:
>>
>>
>>
>> SELECT cust_id AS Customer_ID,
>>
>> COUNT(amount_sold) AS Number_of_orders,
>>
>> SUM(amount_sold) AS Total_customer_amount,
>>
>> AVG(amount_sold) AS Average_order,
>>
>> STDDEV(amount_sold) AS Standard_deviation
>>
>> FROM sales
>>
>> GROUP BY cust_id
>>
>> HAVING SUM(amount_sold) > 94000
>>
>> AND AVG(amount_sold) < STDDEV(amount_sold)
>>
>> ORDER BY 3 ;
>>
>>
>>
>> The original table and data are from Oracle sh.sales table
>>
>>
>>
>> Oracle comes back for this query with
>>
>>
>>
>> CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER
>> STANDARD_DEVIATION
>>
>> ----------- ---------------- --------------------- -------------
>> ------------------
>>
>>        1743              238              94786.13     398.26105
>> 582.26845
>>
>>         429              231              94819.41    410.473636
>> 615.038404
>>
>>        2994              227              94862.61     417.89696
>> 624.53793
>>
>>        6395              268              97010.48    361.979403
>> 577.991448
>>
>>       12783              240              97573.55
>> 406.556458           591.6785
>>
>>        4974              235              98006.16    417.047489
>> 625.670115
>>
>>       42167              266              98585.96     370.62391
>> 592.079099
>>
>>       10747              256              99578.09    388.976914
>> 601.938312
>>
>>       11407              248             103412.66    416.986532
>> 623.479751
>>
>>
>>
>> 9 rows selected.
>>
>>
>>
>> Ordered by TOTAL_CUSTOMER_AMOUNT
>>
>>
>>
>> And hive returns for the same query
>>
>>
>>
>>
>> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>>
>> | customer_id  | number_of_orders  | total_customer_amount  |
>> average_order    | standard_deviation  |
>>
>>
>> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>>
>> | 42167.0      | 266               | 98585.96000000002      |
>> 370.6239097744362   | 590.965120684093    |
>>
>> | 12783.0      | 240               | 97573.54999999996      |
>> 406.5564583333332   | 590.4445500393804   |
>>
>> | 11407.0      | 248               | 103412.65999999995     |
>> 416.9865322580643   | 622.221465710723    |
>>
>> | 10747.0      | 256               | 99578.08999999997      |
>> 388.9769140624999   | 600.7615005975689   |
>>
>> | 6395.0       | 268               | 97010.47999999998      |
>> 361.97940298507456  | 576.9120977984521   |
>>
>> | 4974.0       | 235               | 98006.16000000002      |
>> 417.0474893617022   | 624.337482834059    |
>>
>> | 2994.0       | 227               | 94862.61000000006      |
>> 417.89696035242315  | 623.1607772763742   |
>>
>> | 1743.0       | 238               | 94786.12999999993      |
>> 398.2610504201678   | 581.0439095219863   |
>>
>> | 429.0        | 231               | 94819.41000000006      |
>> 410.4736363636366   | 613.7057080691426   |
>>
>>
>> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>>
>> 9 rows selected (215.774 seconds)
>>
>>
>>
>> But ordering in Hive does not seem to be correct! Please note ordering is
>> on column three, *total_customer_amount*
>>
>>
>>
>> I also tried this in Sybase and got the same as Oracle.
>>
>>
>>
>> Adaptive Server cpu time: 100 ms.
>>
>> Customer_ID          Number_of_orders
>> Total_customer_amount
>> Average_order                                        Standard_deviation
>>
>> -------------------- ----------------
>> -----------------------------------------
>> ----------------------------------------------------
>> ---------------------------
>>
>>                 1743              238
>> 94786.13
>> 398.2610504201680                  582.268450
>>
>>                   429              231
>> 94819.41
>>                      410.4736363636363                  615.038404
>>
>>                  2994              227
>> 94862.61
>> 417.8969603524229                  624.537930
>>
>>                  6395              268
>> 97010.48
>> 361.9794029850746                  577.991448
>>
>>                 12783              240
>> 97573.55
>> 406.5564583333333                  591.678500
>>
>>                  4974              235
>> 98006.16
>> 417.0474893617021                  625.670115
>>
>>                 42167              266
>>                    98585.96
>> 370.6239097744360                  592.079099
>>
>>                 10747              256
>> 99578.09                                    388.9769140625000
>>     601.938312
>>
>>                 11407              248
>> 103412.66
>> 416.9865322580645                  623.479751
>>
>>
>>
>> I tried Google search and seems to be different suggestions. May be I
>> have to rewrite the code?
>>
>>
>>
>> Thanks
>>
>>
>>
>> Mich Talebzadeh
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> *Publications due shortly:*
>>
>> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen
>> and Coherence Cache*
>>
>>
>>
>> NOTE: The information in this email is proprietary and confidential. This
>> message is for the designated recipient only, if you are not the intended
>> recipient, you should destroy it immediately. Any information in this
>> message shall not be understood as given or endorsed by Peridale Ltd, its
>> subsidiaries or their employees, unless expressly so stated. It is the
>> responsibility of the recipient to ensure that this email is virus free,
>> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
>> any responsibility.
>>
>>
>>
>
>

Re: ORDER BY clause in Hive

Posted by Gopal Vijayaraghavan <go...@apache.org>.
Hi Lefty,

Couldn¹t find the documentation for what
hive.groupby.orderby.position.alias=true does.

I suspect that might be what Mich was looking for (though I tend to write
the column names explicitly).

Cheers,
Gopal

From:  Lefty Leverenz <le...@gmail.com>
Reply-To:  "user@hive.apache.org" <us...@hive.apache.org>
Date:  Sunday, March 29, 2015 at 8:32 PM
To:  "user@hive.apache.org" <us...@hive.apache.org>
Subject:  Re: ORDER BY clause in Hive

I added information about this in the Order By
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#Lang
uageManualSortBy-SyntaxofOrderBy>  section of the wiki.  Thanks, Mich and
Gopal!

-- Lefty

On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh <mi...@peridale.co.uk>
wrote:
> Hi Lefty, Gopal,
>  
> It appears that ORDER BY 3 is not interpreted as ORDERR BY ³the result set
> column three² which standard SQL as evident from Oracle and Sybase does.
>  
> So I made it an ORDER BY from the result set EXPLICITELY as shown below and it
> worked OK
>  
> SELECT
>           rs.Customer_ID
>         , rs.Number_of_orders
>         , rs.Total_customer_amount
>         , rs.Average_order
>         , rs.Standard_deviation
> FROM
> (
>         SELECT cust_id AS Customer_ID,
>         COUNT(amount_sold) AS Number_of_orders,
>         SUM(amount_sold) AS Total_customer_amount,
>         AVG(amount_sold) AS Average_order,
>         STDDEV(amount_sold) AS Standard_deviation
>         FROM sales
>         GROUP BY cust_id
>         HAVING SUM(amount_sold) > 94000
>         AND AVG(amount_sold) < STDDEV(amount_sold)
> ) rs
> ORDER BY
>          rs.Total_customer_amount
> ;
>  
>  
> +-----------------+----------------------+---------------------------+--------
> -------------+------------------------+--+
> | rs.customer_id  | rs.number_of_orders  | rs.total_customer_amount  |
> rs.average_order   | rs.standard_deviation  |
> +-----------------+----------------------+---------------------------+--------
> -------------+------------------------+--+
> | 1743.0          | 238                  | 94786.12999999993         |
> 398.2610504201678   | 581.0439095219863      |
> | 429.0           | 231                  | 94819.41000000006         |
> 410.4736363636366   | 613.7057080691426      |
> | 2994.0          | 227                  | 94862.61000000006         |
> 417.89696035242315  | 623.1607772763742      |
> | 6395.0          | 268                  | 97010.47999999998         |
> 361.97940298507456  | 576.9120977984521      |
> | 12783.0         | 240                  | 97573.54999999996         |
> 406.5564583333332   | 590.4445500393804      |
> | 4974.0          | 235                  | 98006.16000000002         |
> 417.0474893617022   | 624.337482834059       |
> | 42167.0         | 266                  | 98585.96000000002         |
> 370.6239097744362   | 590.965120684093       |
> | 10747.0         | 256                  | 99578.08999999997         |
> 388.9769140624999   | 600.7615005975689      |
> | 11407.0         | 248                  | 103412.65999999995        |
> 416.9865322580643   | 622.221465710723       |
> +-----------------+----------------------+---------------------------+--------
> -------------+------------------------+--+
> 9 rows selected (209.699 seconds)
>  
>  
> Regards,
>  
> 
> Mich Talebzadeh
>  
> http://talebzadehmich.wordpress.com
>  
> Publications due shortly:
> Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache
>  
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this message
> shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries
> or their employees, unless expressly so stated. It is the responsibility of
> the recipient to ensure that this email is virus free, therefore neither
> Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
>  
> 
> From: Mich Talebzadeh [mailto:mich@peridale.co.uk]
> Sent: 29 March 2015 00:11
> To: user@hive.apache.org
> Subject: ORDER BY clause in Hive
> 
>  
> Hi,
>  
> Can someone point me to doc or otherwise to see if ORDER BY clause in Hive is
> working OK
>  
> I have a simple aggregate query as follows:
>  
> SELECT cust_id AS Customer_ID,
> COUNT(amount_sold) AS Number_of_orders,
> SUM(amount_sold) AS Total_customer_amount,
> AVG(amount_sold) AS Average_order,
> STDDEV(amount_sold) AS Standard_deviation
> FROM sales
> GROUP BY cust_id
> HAVING SUM(amount_sold) > 94000
> AND AVG(amount_sold) < STDDEV(amount_sold)
> ORDER BY 3 ;
>  
> The original table and data are from Oracle sh.sales table
>  
> Oracle comes back for this query with
>  
> CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER
> STANDARD_DEVIATION
> ----------- ---------------- --------------------- -------------
> ------------------
>        1743              238              94786.13     398.26105
> 582.26845
>         429              231              94819.41    410.473636
> 615.038404
>        2994              227              94862.61     417.89696
> 624.53793
>        6395              268              97010.48    361.979403
> 577.991448
>       12783              240              97573.55    406.556458
> 591.6785
>        4974              235              98006.16    417.047489
> 625.670115
>       42167              266              98585.96     370.62391
> 592.079099
>       10747              256              99578.09    388.976914
> 601.938312
>       11407              248             103412.66    416.986532
> 623.479751
>  
> 9 rows selected.
>  
> Ordered by TOTAL_CUSTOMER_AMOUNT
>  
> And hive returns for the same query
>  
> +--------------+-------------------+------------------------+-----------------
> ----+---------------------+--+
> | customer_id  | number_of_orders  | total_customer_amount  |    average_order
> | standard_deviation  |
> +--------------+-------------------+------------------------+-----------------
> ----+---------------------+--+
> | 42167.0      | 266               | 98585.96000000002      |
> 370.6239097744362   | 590.965120684093    |
> | 12783.0      | 240               | 97573.54999999996      |
> 406.5564583333332   | 590.4445500393804   |
> | 11407.0      | 248               | 103412.65999999995     |
> 416.9865322580643   | 622.221465710723    |
> | 10747.0      | 256               | 99578.08999999997      |
> 388.9769140624999   | 600.7615005975689   |
> | 6395.0       | 268               | 97010.47999999998      |
> 361.97940298507456  | 576.9120977984521   |
> | 4974.0       | 235               | 98006.16000000002      |
> 417.0474893617022   | 624.337482834059    |
> | 2994.0       | 227               | 94862.61000000006      |
> 417.89696035242315  | 623.1607772763742   |
> | 1743.0       | 238               | 94786.12999999993      |
> 398.2610504201678   | 581.0439095219863   |
> | 429.0        | 231               | 94819.41000000006      |
> 410.4736363636366   | 613.7057080691426   |
> +--------------+-------------------+------------------------+-----------------
> ----+---------------------+--+
> 9 rows selected (215.774 seconds)
>  
> But ordering in Hive does not seem to be correct! Please note ordering is on
> column three, total_customer_amount
>  
> I also tried this in Sybase and got the same as Oracle.
>  
> Adaptive Server cpu time: 100 ms.
> Customer_ID          Number_of_orders Total_customer_amount
> Average_order                                        Standard_deviation
> -------------------- ----------------
> -----------------------------------------
> ----------------------------------------------------
> ---------------------------
>                 1743              238
> 94786.13                                    398.2610504201680
> 582.268450
>                   429              231
> 94819.41                                    410.4736363636363
> 615.038404
>                  2994              227
> 94862.61                                    417.8969603524229
> 624.537930
>                  6395              268
> 97010.48                                    361.9794029850746
> 577.991448
>                 12783              240
> 97573.55                                    406.5564583333333
> 591.678500
>                  4974              235
> 98006.16                                    417.0474893617021
> 625.670115
>                 42167              266
> 98585.96                                    370.6239097744360
> 592.079099
>                 10747              256
> 99578.09                                    388.9769140625000
> 601.938312
>                 11407              248
> 103412.66                                    416.9865322580645
> 623.479751
>  
> I tried Google search and seems to be different suggestions. May be I have to
> rewrite the code?
>  
> Thanks
>  
> Mich Talebzadeh
>  
> http://talebzadehmich.wordpress.com
>  
> Publications due shortly:
> Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache
>  
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this message
> shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries
> or their employees, unless expressly so stated. It is the responsibility of
> the recipient to ensure that this email is virus free, therefore neither
> Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
>  




Re: ORDER BY clause in Hive

Posted by Lefty Leverenz <le...@gmail.com>.
I added information about this in the Order By
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
section of the wiki.  Thanks, Mich and Gopal!

-- Lefty

On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh <mi...@peridale.co.uk>
wrote:

> Hi Lefty, Gopal,
>
>
>
> It appears that ORDER BY 3 is not interpreted as ORDERR BY “the result set
> column three” which standard SQL as evident from Oracle and Sybase does.
>
>
>
> So I made it an ORDER BY from the result set EXPLICITELY as shown below
> and it worked OK
>
>
>
> SELECT
>
>           rs.Customer_ID
>
>         , rs.Number_of_orders
>
>         , rs.Total_customer_amount
>
>         , rs.Average_order
>
>         , rs.Standard_deviation
>
> FROM
>
> (
>
>         SELECT cust_id AS Customer_ID,
>
>         COUNT(amount_sold) AS Number_of_orders,
>
>         SUM(amount_sold) AS Total_customer_amount,
>
>         AVG(amount_sold) AS Average_order,
>
>         STDDEV(amount_sold) AS Standard_deviation
>
>         FROM sales
>
>         GROUP BY cust_id
>
>         HAVING SUM(amount_sold) > 94000
>
>         AND AVG(amount_sold) < STDDEV(amount_sold)
>
> ) rs
>
> ORDER BY
>
>           *rs.Total_customer_amount*
>
> ;
>
>
>
>
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> | rs.customer_id  | rs.number_of_orders  | rs.total_customer_amount  |
> rs.average_order   | rs.standard_deviation  |
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> | 1743.0          | 238                  | 94786.12999999993         |
> 398.2610504201678   | 581.0439095219863      |
>
> | 429.0           | 231                  | 94819.41000000006         |
> 410.4736363636366   | 613.7057080691426      |
>
> | 2994.0          | 227                  | 94862.61000000006         |
> 417.89696035242315  | 623.1607772763742      |
>
> | 6395.0          | 268                  | 97010.47999999998         |
> 361.97940298507456  | 576.9120977984521      |
>
> | 12783.0         | 240                  | 97573.54999999996         |
> 406.5564583333332   | 590.4445500393804      |
>
> | 4974.0          | 235                  | 98006.16000000002         |
> 417.0474893617022   | 624.337482834059       |
>
> | 42167.0         | 266                  | 98585.96000000002         |
> 370.6239097744362   | 590.965120684093       |
>
> | 10747.0         | 256                  | 99578.08999999997         |
> 388.9769140624999   | 600.7615005975689      |
>
> | 11407.0         | 248                  | 103412.65999999995        |
> 416.9865322580643   | 622.221465710723       |
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> 9 rows selected (209.699 seconds)
>
>
>
>
>
> Regards,
>
>
>
> Mich Talebzadeh
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Publications due shortly:*
>
> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache*
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Ltd, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
> any responsibility.
>
>
>
> *From:* Mich Talebzadeh [mailto:mich@peridale.co.uk]
> *Sent:* 29 March 2015 00:11
> *To:* user@hive.apache.org
> *Subject:* ORDER BY clause in Hive
>
>
>
> Hi,
>
>
>
> Can someone point me to doc or otherwise to see if ORDER BY clause in Hive
> is working OK
>
>
>
> I have a simple aggregate query as follows:
>
>
>
> SELECT cust_id AS Customer_ID,
>
> COUNT(amount_sold) AS Number_of_orders,
>
> SUM(amount_sold) AS Total_customer_amount,
>
> AVG(amount_sold) AS Average_order,
>
> STDDEV(amount_sold) AS Standard_deviation
>
> FROM sales
>
> GROUP BY cust_id
>
> HAVING SUM(amount_sold) > 94000
>
> AND AVG(amount_sold) < STDDEV(amount_sold)
>
> ORDER BY 3 ;
>
>
>
> The original table and data are from Oracle sh.sales table
>
>
>
> Oracle comes back for this query with
>
>
>
> CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER
> STANDARD_DEVIATION
>
> ----------- ---------------- --------------------- -------------
> ------------------
>
>        1743              238              94786.13     398.26105
> 582.26845
>
>         429              231              94819.41    410.473636
> 615.038404
>
>        2994              227              94862.61     417.89696
> 624.53793
>
>        6395              268              97010.48    361.979403
> 577.991448
>
>       12783              240              97573.55    406.556458
> 591.6785
>
>        4974              235              98006.16    417.047489
> 625.670115
>
>       42167              266              98585.96     370.62391
> 592.079099
>
>       10747              256              99578.09    388.976914
> 601.938312
>
>       11407              248             103412.66    416.986532
> 623.479751
>
>
>
> 9 rows selected.
>
>
>
> Ordered by TOTAL_CUSTOMER_AMOUNT
>
>
>
> And hive returns for the same query
>
>
>
>
> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>
> | customer_id  | number_of_orders  | total_customer_amount  |
> average_order    | standard_deviation  |
>
>
> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>
> | 42167.0      | 266               | 98585.96000000002      |
> 370.6239097744362   | 590.965120684093    |
>
> | 12783.0      | 240               | 97573.54999999996      |
> 406.5564583333332   | 590.4445500393804   |
>
> | 11407.0      | 248               | 103412.65999999995     |
> 416.9865322580643   | 622.221465710723    |
>
> | 10747.0      | 256               | 99578.08999999997      |
> 388.9769140624999   | 600.7615005975689   |
>
> | 6395.0       | 268               | 97010.47999999998      |
> 361.97940298507456  | 576.9120977984521   |
>
> | 4974.0       | 235               | 98006.16000000002      |
> 417.0474893617022   | 624.337482834059    |
>
> | 2994.0       | 227               | 94862.61000000006      |
> 417.89696035242315  | 623.1607772763742   |
>
> | 1743.0       | 238               | 94786.12999999993      |
> 398.2610504201678   | 581.0439095219863   |
>
> | 429.0        | 231               | 94819.41000000006      |
> 410.4736363636366   | 613.7057080691426   |
>
>
> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>
> 9 rows selected (215.774 seconds)
>
>
>
> But ordering in Hive does not seem to be correct! Please note ordering is
> on column three, *total_customer_amount*
>
>
>
> I also tried this in Sybase and got the same as Oracle.
>
>
>
> Adaptive Server cpu time: 100 ms.
>
> Customer_ID          Number_of_orders
> Total_customer_amount
> Average_order                                        Standard_deviation
>
> -------------------- ----------------
> -----------------------------------------
> ----------------------------------------------------
> ---------------------------
>
>                 1743              238
> 94786.13
> 398.2610504201680                  582.268450
>
>                   429              231
> 94819.41
>                      410.4736363636363                  615.038404
>
>                  2994              227
> 94862.61
> 417.8969603524229                  624.537930
>
>                  6395              268
> 97010.48
> 361.9794029850746                  577.991448
>
>                 12783              240
> 97573.55
> 406.5564583333333                  591.678500
>
>                  4974              235
> 98006.16
> 417.0474893617021                  625.670115
>
>                 42167              266
>                    98585.96
> 370.6239097744360                  592.079099
>
>                 10747              256
> 99578.09                                    388.9769140625000
>     601.938312
>
>                 11407              248
> 103412.66
> 416.9865322580645                  623.479751
>
>
>
> I tried Google search and seems to be different suggestions. May be I have
> to rewrite the code?
>
>
>
> Thanks
>
>
>
> Mich Talebzadeh
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Publications due shortly:*
>
> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache*
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Ltd, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
> any responsibility.
>
>
>

RE: ORDER BY clause in Hive

Posted by Mich Talebzadeh <mi...@peridale.co.uk>.
Hi Lefty, Gopal,

 

It appears that ORDER BY 3 is not interpreted as ORDERR BY "the result set
column three" which standard SQL as evident from Oracle and Sybase does.

 

So I made it an ORDER BY from the result set EXPLICITELY as shown below and
it worked OK

 

SELECT

          rs.Customer_ID

        , rs.Number_of_orders

        , rs.Total_customer_amount

        , rs.Average_order

        , rs.Standard_deviation

FROM

(

        SELECT cust_id AS Customer_ID,

        COUNT(amount_sold) AS Number_of_orders,

        SUM(amount_sold) AS Total_customer_amount,

        AVG(amount_sold) AS Average_order,

        STDDEV(amount_sold) AS Standard_deviation

        FROM sales

        GROUP BY cust_id

        HAVING SUM(amount_sold) > 94000

        AND AVG(amount_sold) < STDDEV(amount_sold)

) rs

ORDER BY

          rs.Total_customer_amount

;

 

 

+-----------------+----------------------+---------------------------+------
---------------+------------------------+--+

| rs.customer_id  | rs.number_of_orders  | rs.total_customer_amount  |
rs.average_order   | rs.standard_deviation  |

+-----------------+----------------------+---------------------------+------
---------------+------------------------+--+

| 1743.0          | 238                  | 94786.12999999993         |
398.2610504201678   | 581.0439095219863      |

| 429.0           | 231                  | 94819.41000000006         |
410.4736363636366   | 613.7057080691426      |

| 2994.0          | 227                  | 94862.61000000006         |
417.89696035242315  | 623.1607772763742      |

| 6395.0          | 268                  | 97010.47999999998         |
361.97940298507456  | 576.9120977984521      |

| 12783.0         | 240                  | 97573.54999999996         |
406.5564583333332   | 590.4445500393804      |

| 4974.0          | 235                  | 98006.16000000002         |
417.0474893617022   | 624.337482834059       |

| 42167.0         | 266                  | 98585.96000000002         |
370.6239097744362   | 590.965120684093       |

| 10747.0         | 256                  | 99578.08999999997         |
388.9769140624999   | 600.7615005975689      |

| 11407.0         | 248                  | 103412.65999999995        |
416.9865322580643   | 622.221465710723       |

+-----------------+----------------------+---------------------------+------
---------------+------------------------+--+

9 rows selected (209.699 seconds)

 

 

Regards,

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.

 

From: Mich Talebzadeh [mailto:mich@peridale.co.uk] 
Sent: 29 March 2015 00:11
To: user@hive.apache.org
Subject: ORDER BY clause in Hive

 

Hi,

 

Can someone point me to doc or otherwise to see if ORDER BY clause in Hive
is working OK

 

I have a simple aggregate query as follows:

 

SELECT cust_id AS Customer_ID,

COUNT(amount_sold) AS Number_of_orders,

SUM(amount_sold) AS Total_customer_amount,

AVG(amount_sold) AS Average_order,

STDDEV(amount_sold) AS Standard_deviation

FROM sales

GROUP BY cust_id

HAVING SUM(amount_sold) > 94000

AND AVG(amount_sold) < STDDEV(amount_sold)

ORDER BY 3 ;

 

The original table and data are from Oracle sh.sales table

 

Oracle comes back for this query with

 

CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER
STANDARD_DEVIATION

----------- ---------------- --------------------- -------------
------------------

       1743              238              94786.13     398.26105
582.26845

        429              231              94819.41    410.473636
615.038404

       2994              227              94862.61     417.89696
624.53793

       6395              268              97010.48    361.979403
577.991448

      12783              240              97573.55    406.556458
591.6785

       4974              235              98006.16    417.047489
625.670115

      42167              266              98585.96     370.62391
592.079099

      10747              256              99578.09    388.976914
601.938312

      11407              248             103412.66    416.986532
623.479751

 

9 rows selected.

 

Ordered by TOTAL_CUSTOMER_AMOUNT

 

And hive returns for the same query

 

+--------------+-------------------+------------------------+---------------
------+---------------------+--+

| customer_id  | number_of_orders  | total_customer_amount  |
average_order    | standard_deviation  |

+--------------+-------------------+------------------------+---------------
------+---------------------+--+

| 42167.0      | 266               | 98585.96000000002      |
370.6239097744362   | 590.965120684093    |

| 12783.0      | 240               | 97573.54999999996      |
406.5564583333332   | 590.4445500393804   |

| 11407.0      | 248               | 103412.65999999995     |
416.9865322580643   | 622.221465710723    |

| 10747.0      | 256               | 99578.08999999997      |
388.9769140624999   | 600.7615005975689   |

| 6395.0       | 268               | 97010.47999999998      |
361.97940298507456  | 576.9120977984521   |

| 4974.0       | 235               | 98006.16000000002      |
417.0474893617022   | 624.337482834059    |

| 2994.0       | 227               | 94862.61000000006      |
417.89696035242315  | 623.1607772763742   |

| 1743.0       | 238               | 94786.12999999993      |
398.2610504201678   | 581.0439095219863   |

| 429.0        | 231               | 94819.41000000006      |
410.4736363636366   | 613.7057080691426   |

+--------------+-------------------+------------------------+---------------
------+---------------------+--+

9 rows selected (215.774 seconds)

 

But ordering in Hive does not seem to be correct! Please note ordering is on
column three, total_customer_amount

 

I also tried this in Sybase and got the same as Oracle.

 

Adaptive Server cpu time: 100 ms.

Customer_ID          Number_of_orders Total_customer_amount
Average_order                                        Standard_deviation

-------------------- ----------------
-----------------------------------------
----------------------------------------------------
---------------------------

                1743              238
94786.13                                    398.2610504201680
582.268450

                  429              231
94819.41                                    410.4736363636363
615.038404

                 2994              227
94862.61                                    417.8969603524229
624.537930

                 6395              268
97010.48                                    361.9794029850746
577.991448

                12783              240
97573.55                                    406.5564583333333
591.678500

                 4974              235
98006.16                                    417.0474893617021
625.670115

                42167              266
98585.96                                    370.6239097744360
592.079099

                10747              256
99578.09                                    388.9769140625000
601.938312

                11407              248
103412.66                                    416.9865322580645
623.479751

 

I tried Google search and seems to be different suggestions. May be I have
to rewrite the code?

 

Thanks

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache

 

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.