You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Mich Talebzadeh <mi...@peridale.co.uk> on 2015/03/29 01:10:34 UTC
ORDER BY clause in Hive
Hi,
Can someone point me to doc or otherwise to see if ORDER BY clause in Hive
is working OK
I have a simple aggregate query as follows:
SELECT cust_id AS Customer_ID,
COUNT(amount_sold) AS Number_of_orders,
SUM(amount_sold) AS Total_customer_amount,
AVG(amount_sold) AS Average_order,
STDDEV(amount_sold) AS Standard_deviation
FROM sales
GROUP BY cust_id
HAVING SUM(amount_sold) > 94000
AND AVG(amount_sold) < STDDEV(amount_sold)
ORDER BY 3 ;
The original table and data are from Oracle sh.sales table
Oracle comes back for this query with
CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER
STANDARD_DEVIATION
----------- ---------------- --------------------- -------------
------------------
1743 238 94786.13 398.26105
582.26845
429 231 94819.41 410.473636
615.038404
2994 227 94862.61 417.89696
624.53793
6395 268 97010.48 361.979403
577.991448
12783 240 97573.55 406.556458
591.6785
4974 235 98006.16 417.047489
625.670115
42167 266 98585.96 370.62391
592.079099
10747 256 99578.09 388.976914
601.938312
11407 248 103412.66 416.986532
623.479751
9 rows selected.
Ordered by TOTAL_CUSTOMER_AMOUNT
And hive returns for the same query
+--------------+-------------------+------------------------+---------------
------+---------------------+--+
| customer_id | number_of_orders | total_customer_amount |
average_order | standard_deviation |
+--------------+-------------------+------------------------+---------------
------+---------------------+--+
| 42167.0 | 266 | 98585.96000000002 |
370.6239097744362 | 590.965120684093 |
| 12783.0 | 240 | 97573.54999999996 |
406.5564583333332 | 590.4445500393804 |
| 11407.0 | 248 | 103412.65999999995 |
416.9865322580643 | 622.221465710723 |
| 10747.0 | 256 | 99578.08999999997 |
388.9769140624999 | 600.7615005975689 |
| 6395.0 | 268 | 97010.47999999998 |
361.97940298507456 | 576.9120977984521 |
| 4974.0 | 235 | 98006.16000000002 |
417.0474893617022 | 624.337482834059 |
| 2994.0 | 227 | 94862.61000000006 |
417.89696035242315 | 623.1607772763742 |
| 1743.0 | 238 | 94786.12999999993 |
398.2610504201678 | 581.0439095219863 |
| 429.0 | 231 | 94819.41000000006 |
410.4736363636366 | 613.7057080691426 |
+--------------+-------------------+------------------------+---------------
------+---------------------+--+
9 rows selected (215.774 seconds)
But ordering in Hive does not seem to be correct! Please note ordering is on
column three, total_customer_amount
I also tried this in Sybase and got the same as Oracle.
Adaptive Server cpu time: 100 ms.
Customer_ID Number_of_orders Total_customer_amount
Average_order Standard_deviation
-------------------- ----------------
-----------------------------------------
----------------------------------------------------
---------------------------
1743 238
94786.13 398.2610504201680
582.268450
429 231
94819.41 410.4736363636363
615.038404
2994 227
94862.61 417.8969603524229
624.537930
6395 268
97010.48 361.9794029850746
577.991448
12783 240
97573.55 406.5564583333333
591.678500
4974 235
98006.16 417.0474893617021
625.670115
42167 266
98585.96 370.6239097744360
592.079099
10747 256
99578.09 388.9769140625000
601.938312
11407 248
103412.66 416.9865322580645
623.479751
I tried Google search and seems to be different suggestions. May be I have
to rewrite the code?
Thanks
Mich Talebzadeh
http://talebzadehmich.wordpress.com
Publications due shortly:
Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache
NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.
Re: ORDER BY clause in Hive
Posted by Lefty Leverenz <le...@gmail.com>.
Have you looked at the Hive wiki? Here's the section on ORDER BY
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
.
-- Lefty
On Sat, Mar 28, 2015 at 9:45 PM, Gopal Vijayaraghavan <go...@apache.org>
wrote:
>
> > SELECT cust_id AS Customer_ID,
> > Š
> > ORDER BY 3 ;
>
>
> You¹re sorting on a constant (literal value 3). The results are what you
> get when you run a non-stable sort on a constant.
>
> Cheers,
> Gopal
>
>
>
Re: ORDER BY clause in Hive
Posted by Gopal Vijayaraghavan <go...@apache.org>.
> SELECT cust_id AS Customer_ID,
>
> ORDER BY 3 ;
You¹re sorting on a constant (literal value 3). The results are what you
get when you run a non-stable sort on a constant.
Cheers,
Gopal
Re: ORDER BY clause in Hive
Posted by Lefty Leverenz <le...@gmail.com>.
Thanks for the update, Mich. And thanks to you & Gopal for prompting me to
get the documentation done.
-- Lefty
On Tue, Mar 31, 2015 at 5:20 AM, Mich Talebzadeh <mi...@peridale.co.uk>
wrote:
> Thanks Lefty for the information provided.
>
>
>
> My version of hive is 014.0
>
>
>
> hive --version
>
> Hive 0.14.0
>
>
>
> Which should support the configuration parameter at the session level à
> set hive.groupby.orderby.position.alias=true
>
>
>
>
>
> set hive.groupby.orderby.position.alias=true;
>
> SELECT
>
> rs.Customer_ID
>
> , rs.Number_of_orders
>
> , *rs.Total_customer_amount*
>
> , rs.Average_order
>
> , rs.Standard_deviation
>
> FROM
>
> (
>
> SELECT cust_id AS Customer_ID,
>
> COUNT(amount_sold) AS Number_of_orders,
>
> SUM(amount_sold) AS Total_customer_amount,
>
> AVG(amount_sold) AS Average_order,
>
> stddev_samp(amount_sold) AS Standard_deviation
>
> FROM sales
>
> GROUP BY cust_id
>
> HAVING SUM(amount_sold) > 94000
>
> AND AVG(amount_sold) < stddev_samp(amount_sold)
>
> ) rs
>
> ORDER BY
>
> -- Total_customer_amount DESC
>
> * 3 DESC*
>
> *;*
>
>
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> | rs.customer_id | rs.number_of_orders | rs.total_customer_amount |
> rs.average_order | rs.standard_deviation |
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> | 11407.0 | 248 | 103412.65999999995 |
> 416.9865322580643 | 623.4797510518939 |
>
> | 10747.0 | 256 | 99578.08999999997 |
> 388.9769140624999 | 601.9383117167412 |
>
> | 42167.0 | 266 | 98585.96000000002 |
> 370.6239097744362 | 592.0790992800527 |
>
> | 4974.0 | 235 | 98006.16000000002 |
> 417.0474893617022 | 625.670115050053 |
>
> | 12783.0 | 240 | 97573.54999999996 |
> 406.5564583333332 | 591.6785002882084 |
>
> | 6395.0 | 268 | 97010.47999999998 |
> 361.97940298507456 | 577.991447849281 |
>
> | 2994.0 | 227 | 94862.61000000006 |
> 417.89696035242315 | 624.5379298449825 |
>
> | 429.0 | 231 | 94819.41000000006 |
> 410.4736363636366 | 615.0384039014772 |
>
> | 1743.0 | 238 | 94786.12999999993 |
> 398.2610504201678 | 582.2684502048478 |
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> 9 rows selected (212.535 seconds)
>
>
>
> Indeed this is the correct result ordering by column posirion
>
>
>
> At the Hive server level I added the folowing properties to
> $HIVE_HOME/conf/ hive-site.xml
>
>
>
> <property>
>
> <name>hive.groupby.orderby.position.alias</name>
>
> <value>true</value>
>
> <description>Eenables using Column Position Alias in GROUP BY and
> ORDER BY clauses of queries.</description>
>
> </property>
>
>
>
> And ran the above query without session level setting and it worked
>
>
>
> Regards,
>
>
>
> Mich Talebzadeh
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Publications due shortly:*
>
> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache*
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Ltd, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
> any responsibility.
>
>
>
> *From:* Lefty Leverenz [mailto:leftyleverenz@gmail.com]
> *Sent:* 31 March 2015 07:18
> *To:* user@hive.apache.org
> *Subject:* Re: ORDER BY clause in Hive
>
>
>
> I've opened HIVE-10160 <https://issues.apache.org/jira/browse/HIVE-10160>:
> Give a warning when grouping or ordering by a constant column.
>
>
>
> Thanks Gopal.
>
>
> -- Lefty
>
>
>
> On Tue, Mar 31, 2015 at 2:14 AM, Lefty Leverenz <le...@gmail.com>
> wrote:
>
> ---------- Forwarded message ----------
>
> From: *Lefty Leverenz* <le...@gmail.com>
> Date: Tue, Mar 31, 2015 at 1:47 AM
> Subject: Re: ORDER BY clause in Hive
>
> To: Mich Talebzadeh <mi...@peridale.co.uk>
>
> Hive as I see it does not support ORDER BY *Column position*. It only
> supports ORDER BY *Column name*.
>
>
>
> That's just in Hive release 0.10.0 and earlier. In release 0.11.0+ you
> can set the configuration parameter
> <https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-ConfiguringHive>
> *hive.groupby.orderby.position.alias* to true, and then you can use
> column positions in ORDER BY.
>
>
>
> Here's the new documentation:
>
> - hive.groupby.orderby.position.alias
> <https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.groupby.orderby.position.alias>
> - Order By
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
>
> - Group By
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy#LanguageManualGroupBy-GroupBySyntax>
>
>
> -- Lefty
>
RE: ORDER BY clause in Hive
Posted by Mich Talebzadeh <mi...@peridale.co.uk>.
Thanks Lefty for the information provided.
My version of hive is 014.0
hive --version
Hive 0.14.0
Which should support the configuration parameter at the session level à set hive.groupby.orderby.position.alias=true
set hive.groupby.orderby.position.alias=true;
SELECT
rs.Customer_ID
, rs.Number_of_orders
, rs.Total_customer_amount
, rs.Average_order
, rs.Standard_deviation
FROM
(
SELECT cust_id AS Customer_ID,
COUNT(amount_sold) AS Number_of_orders,
SUM(amount_sold) AS Total_customer_amount,
AVG(amount_sold) AS Average_order,
stddev_samp(amount_sold) AS Standard_deviation
FROM sales
GROUP BY cust_id
HAVING SUM(amount_sold) > 94000
AND AVG(amount_sold) < stddev_samp(amount_sold)
) rs
ORDER BY
-- Total_customer_amount DESC
3 DESC
;
+-----------------+----------------------+---------------------------+---------------------+------------------------+--+
| rs.customer_id | rs.number_of_orders | rs.total_customer_amount | rs.average_order | rs.standard_deviation |
+-----------------+----------------------+---------------------------+---------------------+------------------------+--+
| 11407.0 | 248 | 103412.65999999995 | 416.9865322580643 | 623.4797510518939 |
| 10747.0 | 256 | 99578.08999999997 | 388.9769140624999 | 601.9383117167412 |
| 42167.0 | 266 | 98585.96000000002 | 370.6239097744362 | 592.0790992800527 |
| 4974.0 | 235 | 98006.16000000002 | 417.0474893617022 | 625.670115050053 |
| 12783.0 | 240 | 97573.54999999996 | 406.5564583333332 | 591.6785002882084 |
| 6395.0 | 268 | 97010.47999999998 | 361.97940298507456 | 577.991447849281 |
| 2994.0 | 227 | 94862.61000000006 | 417.89696035242315 | 624.5379298449825 |
| 429.0 | 231 | 94819.41000000006 | 410.4736363636366 | 615.0384039014772 |
| 1743.0 | 238 | 94786.12999999993 | 398.2610504201678 | 582.2684502048478 |
+-----------------+----------------------+---------------------------+---------------------+------------------------+--+
9 rows selected (212.535 seconds)
Indeed this is the correct result ordering by column posirion
At the Hive server level I added the folowing properties to $HIVE_HOME/conf/ hive-site.xml
<property>
<name>hive.groupby.orderby.position.alias</name>
<value>true</value>
<description>Eenables using Column Position Alias in GROUP BY and ORDER BY clauses of queries.</description>
</property>
And ran the above query without session level setting and it worked
Regards,
Mich Talebzadeh
http://talebzadehmich.wordpress.com
Publications due shortly:
Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and Coherence Cache
NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
From: Lefty Leverenz [mailto:leftyleverenz@gmail.com]
Sent: 31 March 2015 07:18
To: user@hive.apache.org
Subject: Re: ORDER BY clause in Hive
I've opened HIVE-10160 <https://issues.apache.org/jira/browse/HIVE-10160> : Give a warning when grouping or ordering by a constant column.
Thanks Gopal.
-- Lefty
On Tue, Mar 31, 2015 at 2:14 AM, Lefty Leverenz <le...@gmail.com> wrote:
---------- Forwarded message ----------
From: Lefty Leverenz <le...@gmail.com>
Date: Tue, Mar 31, 2015 at 1:47 AM
Subject: Re: ORDER BY clause in Hive
To: Mich Talebzadeh <mi...@peridale.co.uk>
Hive as I see it does not support ORDER BY Column position. It only supports ORDER BY Column name.
That's just in Hive release 0.10.0 and earlier. In release 0.11.0+ you can set the configuration parameter <https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-ConfiguringHive> hive.groupby.orderby.position.alias to true, and then you can use column positions in ORDER BY.
Here's the new documentation:
* <https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.groupby.orderby.position.alias> hive.groupby.orderby.position.alias
* Order By <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
* Group By <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy#LanguageManualGroupBy-GroupBySyntax>
-- Lefty
Re: ORDER BY clause in Hive
Posted by Lefty Leverenz <le...@gmail.com>.
I've opened HIVE-10160 <https://issues.apache.org/jira/browse/HIVE-10160>:
Give a warning when grouping or ordering by a constant column.
Thanks Gopal.
-- Lefty
On Tue, Mar 31, 2015 at 2:14 AM, Lefty Leverenz <le...@gmail.com>
wrote:
> ---------- Forwarded message ----------
> From: Lefty Leverenz <le...@gmail.com>
> Date: Tue, Mar 31, 2015 at 1:47 AM
> Subject: Re: ORDER BY clause in Hive
> To: Mich Talebzadeh <mi...@peridale.co.uk>
>
>
> Hive as I see it does not support ORDER BY *Column position*. It only
>> supports ORDER BY *Column name*.
>>
>
> That's just in Hive release 0.10.0 and earlier. In release 0.11.0+ you
> can set the configuration parameter
> <https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-ConfiguringHive>
> *hive.groupby.orderby.position.**alias* to true, and then you can use
> column positions in ORDER BY.
>
> Here's the new documentation:
>
> - hive.groupby.orderby.position.alias
> <https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.groupby.orderby.position.alias>
> - Order By
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
>
> - Group By
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy#LanguageManualGroupBy-GroupBySyntax>
>
>
> -- Lefty
>
> On Mon, Mar 30, 2015 at 4:46 AM, Mich Talebzadeh <mi...@peridale.co.uk>
> wrote:
>
>> Gents,
>>
>>
>>
>> Hive as I see it does not support ORDER BY *Column position*. It only
>> supports ORDER BY *Column name*.
>>
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>> Mich Talebzadeh
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> *Publications due shortly:*
>>
>> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen
>> and Coherence Cache*
>>
>>
>>
>> NOTE: The information in this email is proprietary and confidential. This
>> message is for the designated recipient only, if you are not the intended
>> recipient, you should destroy it immediately. Any information in this
>> message shall not be understood as given or endorsed by Peridale Ltd, its
>> subsidiaries or their employees, unless expressly so stated. It is the
>> responsibility of the recipient to ensure that this email is virus free,
>> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
>> any responsibility.
>>
>>
>>
>> *From:* Gopal Vijayaraghavan [mailto:gopal@hortonworks.com] *On Behalf
>> Of *Gopal Vijayaraghavan
>> *Sent:* 30 March 2015 05:26
>> *To:* user@hive.apache.org
>> *Cc:* Lefty Leverenz
>>
>> *Subject:* Re: ORDER BY clause in Hive
>>
>>
>>
>> Hi Lefty,
>>
>>
>>
>> Couldn’t find the documentation for what
>> hive.groupby.orderby.position.alias=true does.
>>
>>
>>
>> I suspect that might be what Mich was looking for (though I tend to write
>> the column names explicitly).
>>
>>
>>
>> Cheers,
>>
>> Gopal
>>
>>
>>
>> *From: *Lefty Leverenz <le...@gmail.com>
>> *Reply-To: *"user@hive.apache.org" <us...@hive.apache.org>
>> *Date: *Sunday, March 29, 2015 at 8:32 PM
>> *To: *"user@hive.apache.org" <us...@hive.apache.org>
>> *Subject: *Re: ORDER BY clause in Hive
>>
>>
>>
>> I added information about this in the Order By
>> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
>> section of the wiki. Thanks, Mich and Gopal!
>>
>>
>> -- Lefty
>>
>>
>>
>> On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh <mi...@peridale.co.uk>
>> wrote:
>>
>> Hi Lefty, Gopal,
>>
>>
>>
>> It appears that ORDER BY 3 is not interpreted as ORDERR BY “the result
>> set column three” which standard SQL as evident from Oracle and Sybase does.
>>
>>
>>
>> So I made it an ORDER BY from the result set EXPLICITELY as shown below
>> and it worked OK
>>
>>
>>
>> SELECT
>>
>> rs.Customer_ID
>>
>> , rs.Number_of_orders
>>
>> , rs.Total_customer_amount
>>
>> , rs.Average_order
>>
>> , rs.Standard_deviation
>>
>> FROM
>>
>> (
>>
>> SELECT cust_id AS Customer_ID,
>>
>> COUNT(amount_sold) AS Number_of_orders,
>>
>> SUM(amount_sold) AS Total_customer_amount,
>>
>> AVG(amount_sold) AS Average_order,
>>
>> STDDEV(amount_sold) AS Standard_deviation
>>
>> FROM sales
>>
>> GROUP BY cust_id
>>
>> HAVING SUM(amount_sold) > 94000
>>
>> AND AVG(amount_sold) < STDDEV(amount_sold)
>>
>> ) rs
>>
>> ORDER BY
>>
>> *rs.Total_customer_amount*
>>
>> ;
>>
>>
>>
>>
>>
>>
>> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>>
>> | rs.customer_id | rs.number_of_orders | rs.total_customer_amount |
>> rs.average_order | rs.standard_deviation |
>>
>>
>> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>>
>> | 1743.0 | 238 | 94786.12999999993 |
>> 398.2610504201678 | 581.0439095219863 |
>>
>> | 429.0 | 231 | 94819.41000000006 |
>> 410.4736363636366 | 613.7057080691426 |
>>
>> | 2994.0 | 227 | 94862.61000000006 |
>> 417.89696035242315 | 623.1607772763742 |
>>
>> | 6395.0 | 268 | 97010.47999999998 |
>> 361.97940298507456 | 576.9120977984521 |
>>
>> | 12783.0 | 240 | 97573.54999999996 |
>> 406.5564583333332 | 590.4445500393804 |
>>
>> | 4974.0 | 235 | 98006.16000000002 |
>> 417.0474893617022 | 624.337482834059 |
>>
>> | 42167.0 | 266 | 98585.96000000002 |
>> 370.6239097744362 | 590.965120684093 |
>>
>> | 10747.0 | 256 | 99578.08999999997 |
>> 388.9769140624999 | 600.7615005975689 |
>>
>> | 11407.0 | 248 | 103412.65999999995 |
>> 416.9865322580643 | 622.221465710723 |
>>
>>
>> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>>
>> 9 rows selected (209.699 seconds)
>>
>>
>>
>>
>>
>> Regards,
>>
>>
>>
>> Mich Talebzadeh
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> *Publications due shortly:*
>>
>> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen
>> and Coherence Cache*
>>
>>
>>
>> NOTE: The information in this email is proprietary and confidential. This
>> message is for the designated recipient only, if you are not the intended
>> recipient, you should destroy it immediately. Any information in this
>> message shall not be understood as given or endorsed by Peridale Ltd, its
>> subsidiaries or their employees, unless expressly so stated. It is the
>> responsibility of the recipient to ensure that this email is virus free,
>> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
>> any responsibility.
>>
>>
>>
>> *From:* Mich Talebzadeh [mailto:mich@peridale.co.uk]
>> *Sent:* 29 March 2015 00:11
>> *To:* user@hive.apache.org
>> *Subject:* ORDER BY clause in Hive
>>
>>
>>
>> Hi,
>>
>>
>>
>> Can someone point me to doc or otherwise to see if ORDER BY clause in
>> Hive is working OK
>>
>>
>>
>> I have a simple aggregate query as follows:
>>
>>
>>
>> SELECT cust_id AS Customer_ID,
>>
>> COUNT(amount_sold) AS Number_of_orders,
>>
>> SUM(amount_sold) AS Total_customer_amount,
>>
>> AVG(amount_sold) AS Average_order,
>>
>> STDDEV(amount_sold) AS Standard_deviation
>>
>> FROM sales
>>
>> GROUP BY cust_id
>>
>> HAVING SUM(amount_sold) > 94000
>>
>> AND AVG(amount_sold) < STDDEV(amount_sold)
>>
>> ORDER BY 3 ;
>>
>>
>>
>> The original table and data are from Oracle sh.sales table
>>
>>
>>
>> Oracle comes back for this query with
>>
>>
>>
>> CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER
>> STANDARD_DEVIATION
>>
>> ----------- ---------------- --------------------- -------------
>> ------------------
>>
>> 1743 238 94786.13 398.26105
>> 582.26845
>>
>> 429 231 94819.41 410.473636
>> 615.038404
>>
>> 2994 227 94862.61 417.89696
>> 624.53793
>>
>> 6395 268 97010.48 361.979403
>> 577.991448
>>
>> 12783 240 97573.55
>> 406.556458 591.6785
>>
>> 4974 235 98006.16 417.047489
>> 625.670115
>>
>> 42167 266 98585.96 370.62391
>> 592.079099
>>
>> 10747 256 99578.09 388.976914
>> 601.938312
>>
>> 11407 248 103412.66 416.986532
>> 623.479751
>>
>>
>>
>> 9 rows selected.
>>
>>
>>
>> Ordered by TOTAL_CUSTOMER_AMOUNT
>>
>>
>>
>> And hive returns for the same query
>>
>>
>>
>>
>> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>>
>> | customer_id | number_of_orders | total_customer_amount |
>> average_order | standard_deviation |
>>
>>
>> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>>
>> | 42167.0 | 266 | 98585.96000000002 |
>> 370.6239097744362 | 590.965120684093 |
>>
>> | 12783.0 | 240 | 97573.54999999996 |
>> 406.5564583333332 | 590.4445500393804 |
>>
>> | 11407.0 | 248 | 103412.65999999995 |
>> 416.9865322580643 | 622.221465710723 |
>>
>> | 10747.0 | 256 | 99578.08999999997 |
>> 388.9769140624999 | 600.7615005975689 |
>>
>> | 6395.0 | 268 | 97010.47999999998 |
>> 361.97940298507456 | 576.9120977984521 |
>>
>> | 4974.0 | 235 | 98006.16000000002 |
>> 417.0474893617022 | 624.337482834059 |
>>
>> | 2994.0 | 227 | 94862.61000000006 |
>> 417.89696035242315 | 623.1607772763742 |
>>
>> | 1743.0 | 238 | 94786.12999999993 |
>> 398.2610504201678 | 581.0439095219863 |
>>
>> | 429.0 | 231 | 94819.41000000006 |
>> 410.4736363636366 | 613.7057080691426 |
>>
>>
>> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>>
>> 9 rows selected (215.774 seconds)
>>
>>
>>
>> But ordering in Hive does not seem to be correct! Please note ordering is
>> on column three, *total_customer_amount*
>>
>>
>>
>> I also tried this in Sybase and got the same as Oracle.
>>
>>
>>
>> Adaptive Server cpu time: 100 ms.
>>
>> Customer_ID Number_of_orders
>> Total_customer_amount
>> Average_order Standard_deviation
>>
>> -------------------- ----------------
>> -----------------------------------------
>> ----------------------------------------------------
>> ---------------------------
>>
>> 1743 238
>> 94786.13
>> 398.2610504201680 582.268450
>>
>> 429 231
>> 94819.41
>> 410.4736363636363 615.038404
>>
>> 2994 227
>> 94862.61
>> 417.8969603524229 624.537930
>>
>> 6395 268
>> 97010.48
>> 361.9794029850746 577.991448
>>
>> 12783 240
>> 97573.55
>> 406.5564583333333 591.678500
>>
>> 4974 235
>> 98006.16
>> 417.0474893617021 625.670115
>>
>> 42167 266
>> 98585.96
>> 370.6239097744360 592.079099
>>
>> 10747 256
>> 99578.09 388.9769140625000
>> 601.938312
>>
>> 11407 248
>> 103412.66
>> 416.9865322580645 623.479751
>>
>>
>>
>> I tried Google search and seems to be different suggestions. May be I
>> have to rewrite the code?
>>
>>
>>
>> Thanks
>>
>>
>>
>> Mich Talebzadeh
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> *Publications due shortly:*
>>
>> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen
>> and Coherence Cache*
>>
>>
>>
>> NOTE: The information in this email is proprietary and confidential. This
>> message is for the designated recipient only, if you are not the intended
>> recipient, you should destroy it immediately. Any information in this
>> message shall not be understood as given or endorsed by Peridale Ltd, its
>> subsidiaries or their employees, unless expressly so stated. It is the
>> responsibility of the recipient to ensure that this email is virus free,
>> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
>> any responsibility.
>>
>>
>>
>>
>>
>
>
>
Fwd: ORDER BY clause in Hive
Posted by Lefty Leverenz <le...@gmail.com>.
---------- Forwarded message ----------
From: Lefty Leverenz <le...@gmail.com>
Date: Tue, Mar 31, 2015 at 1:47 AM
Subject: Re: ORDER BY clause in Hive
To: Mich Talebzadeh <mi...@peridale.co.uk>
Hive as I see it does not support ORDER BY *Column position*. It only
> supports ORDER BY *Column name*.
>
That's just in Hive release 0.10.0 and earlier. In release 0.11.0+ you can set
the configuration parameter
<https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-ConfiguringHive>
*hive.groupby.orderby.position.**alias* to true, and then you can use
column positions in ORDER BY.
Here's the new documentation:
- hive.groupby.orderby.position.alias
<https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.groupby.orderby.position.alias>
- Order By
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
- Group By
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy#LanguageManualGroupBy-GroupBySyntax>
-- Lefty
On Mon, Mar 30, 2015 at 4:46 AM, Mich Talebzadeh <mi...@peridale.co.uk>
wrote:
> Gents,
>
>
>
> Hive as I see it does not support ORDER BY *Column position*. It only
> supports ORDER BY *Column name*.
>
>
>
> Thanks
>
>
>
>
>
> Mich Talebzadeh
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Publications due shortly:*
>
> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache*
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Ltd, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
> any responsibility.
>
>
>
> *From:* Gopal Vijayaraghavan [mailto:gopal@hortonworks.com] *On Behalf Of
> *Gopal Vijayaraghavan
> *Sent:* 30 March 2015 05:26
> *To:* user@hive.apache.org
> *Cc:* Lefty Leverenz
>
> *Subject:* Re: ORDER BY clause in Hive
>
>
>
> Hi Lefty,
>
>
>
> Couldn’t find the documentation for what
> hive.groupby.orderby.position.alias=true does.
>
>
>
> I suspect that might be what Mich was looking for (though I tend to write
> the column names explicitly).
>
>
>
> Cheers,
>
> Gopal
>
>
>
> *From: *Lefty Leverenz <le...@gmail.com>
> *Reply-To: *"user@hive.apache.org" <us...@hive.apache.org>
> *Date: *Sunday, March 29, 2015 at 8:32 PM
> *To: *"user@hive.apache.org" <us...@hive.apache.org>
> *Subject: *Re: ORDER BY clause in Hive
>
>
>
> I added information about this in the Order By
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
> section of the wiki. Thanks, Mich and Gopal!
>
>
> -- Lefty
>
>
>
> On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh <mi...@peridale.co.uk>
> wrote:
>
> Hi Lefty, Gopal,
>
>
>
> It appears that ORDER BY 3 is not interpreted as ORDERR BY “the result set
> column three” which standard SQL as evident from Oracle and Sybase does.
>
>
>
> So I made it an ORDER BY from the result set EXPLICITELY as shown below
> and it worked OK
>
>
>
> SELECT
>
> rs.Customer_ID
>
> , rs.Number_of_orders
>
> , rs.Total_customer_amount
>
> , rs.Average_order
>
> , rs.Standard_deviation
>
> FROM
>
> (
>
> SELECT cust_id AS Customer_ID,
>
> COUNT(amount_sold) AS Number_of_orders,
>
> SUM(amount_sold) AS Total_customer_amount,
>
> AVG(amount_sold) AS Average_order,
>
> STDDEV(amount_sold) AS Standard_deviation
>
> FROM sales
>
> GROUP BY cust_id
>
> HAVING SUM(amount_sold) > 94000
>
> AND AVG(amount_sold) < STDDEV(amount_sold)
>
> ) rs
>
> ORDER BY
>
> *rs.Total_customer_amount*
>
> ;
>
>
>
>
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> | rs.customer_id | rs.number_of_orders | rs.total_customer_amount |
> rs.average_order | rs.standard_deviation |
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> | 1743.0 | 238 | 94786.12999999993 |
> 398.2610504201678 | 581.0439095219863 |
>
> | 429.0 | 231 | 94819.41000000006 |
> 410.4736363636366 | 613.7057080691426 |
>
> | 2994.0 | 227 | 94862.61000000006 |
> 417.89696035242315 | 623.1607772763742 |
>
> | 6395.0 | 268 | 97010.47999999998 |
> 361.97940298507456 | 576.9120977984521 |
>
> | 12783.0 | 240 | 97573.54999999996 |
> 406.5564583333332 | 590.4445500393804 |
>
> | 4974.0 | 235 | 98006.16000000002 |
> 417.0474893617022 | 624.337482834059 |
>
> | 42167.0 | 266 | 98585.96000000002 |
> 370.6239097744362 | 590.965120684093 |
>
> | 10747.0 | 256 | 99578.08999999997 |
> 388.9769140624999 | 600.7615005975689 |
>
> | 11407.0 | 248 | 103412.65999999995 |
> 416.9865322580643 | 622.221465710723 |
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> 9 rows selected (209.699 seconds)
>
>
>
>
>
> Regards,
>
>
>
> Mich Talebzadeh
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Publications due shortly:*
>
> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache*
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Ltd, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
> any responsibility.
>
>
>
> *From:* Mich Talebzadeh [mailto:mich@peridale.co.uk]
> *Sent:* 29 March 2015 00:11
> *To:* user@hive.apache.org
> *Subject:* ORDER BY clause in Hive
>
>
>
> Hi,
>
>
>
> Can someone point me to doc or otherwise to see if ORDER BY clause in Hive
> is working OK
>
>
>
> I have a simple aggregate query as follows:
>
>
>
> SELECT cust_id AS Customer_ID,
>
> COUNT(amount_sold) AS Number_of_orders,
>
> SUM(amount_sold) AS Total_customer_amount,
>
> AVG(amount_sold) AS Average_order,
>
> STDDEV(amount_sold) AS Standard_deviation
>
> FROM sales
>
> GROUP BY cust_id
>
> HAVING SUM(amount_sold) > 94000
>
> AND AVG(amount_sold) < STDDEV(amount_sold)
>
> ORDER BY 3 ;
>
>
>
> The original table and data are from Oracle sh.sales table
>
>
>
> Oracle comes back for this query with
>
>
>
> CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER
> STANDARD_DEVIATION
>
> ----------- ---------------- --------------------- -------------
> ------------------
>
> 1743 238 94786.13 398.26105
> 582.26845
>
> 429 231 94819.41 410.473636
> 615.038404
>
> 2994 227 94862.61 417.89696
> 624.53793
>
> 6395 268 97010.48 361.979403
> 577.991448
>
> 12783 240 97573.55 406.556458
> 591.6785
>
> 4974 235 98006.16 417.047489
> 625.670115
>
> 42167 266 98585.96 370.62391
> 592.079099
>
> 10747 256 99578.09 388.976914
> 601.938312
>
> 11407 248 103412.66 416.986532
> 623.479751
>
>
>
> 9 rows selected.
>
>
>
> Ordered by TOTAL_CUSTOMER_AMOUNT
>
>
>
> And hive returns for the same query
>
>
>
>
> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>
> | customer_id | number_of_orders | total_customer_amount |
> average_order | standard_deviation |
>
>
> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>
> | 42167.0 | 266 | 98585.96000000002 |
> 370.6239097744362 | 590.965120684093 |
>
> | 12783.0 | 240 | 97573.54999999996 |
> 406.5564583333332 | 590.4445500393804 |
>
> | 11407.0 | 248 | 103412.65999999995 |
> 416.9865322580643 | 622.221465710723 |
>
> | 10747.0 | 256 | 99578.08999999997 |
> 388.9769140624999 | 600.7615005975689 |
>
> | 6395.0 | 268 | 97010.47999999998 |
> 361.97940298507456 | 576.9120977984521 |
>
> | 4974.0 | 235 | 98006.16000000002 |
> 417.0474893617022 | 624.337482834059 |
>
> | 2994.0 | 227 | 94862.61000000006 |
> 417.89696035242315 | 623.1607772763742 |
>
> | 1743.0 | 238 | 94786.12999999993 |
> 398.2610504201678 | 581.0439095219863 |
>
> | 429.0 | 231 | 94819.41000000006 |
> 410.4736363636366 | 613.7057080691426 |
>
>
> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>
> 9 rows selected (215.774 seconds)
>
>
>
> But ordering in Hive does not seem to be correct! Please note ordering is
> on column three, *total_customer_amount*
>
>
>
> I also tried this in Sybase and got the same as Oracle.
>
>
>
> Adaptive Server cpu time: 100 ms.
>
> Customer_ID Number_of_orders
> Total_customer_amount
> Average_order Standard_deviation
>
> -------------------- ----------------
> -----------------------------------------
> ----------------------------------------------------
> ---------------------------
>
> 1743 238
> 94786.13
> 398.2610504201680 582.268450
>
> 429 231
> 94819.41
> 410.4736363636363 615.038404
>
> 2994 227
> 94862.61
> 417.8969603524229 624.537930
>
> 6395 268
> 97010.48
> 361.9794029850746 577.991448
>
> 12783 240
> 97573.55
> 406.5564583333333 591.678500
>
> 4974 235
> 98006.16
> 417.0474893617021 625.670115
>
> 42167 266
> 98585.96
> 370.6239097744360 592.079099
>
> 10747 256
> 99578.09 388.9769140625000
> 601.938312
>
> 11407 248
> 103412.66
> 416.9865322580645 623.479751
>
>
>
> I tried Google search and seems to be different suggestions. May be I have
> to rewrite the code?
>
>
>
> Thanks
>
>
>
> Mich Talebzadeh
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Publications due shortly:*
>
> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache*
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Ltd, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
> any responsibility.
>
>
>
>
>
RE: ORDER BY clause in Hive
Posted by Mich Talebzadeh <mi...@peridale.co.uk>.
Gents,
Hive as I see it does not support ORDER BY Column position. It only supports
ORDER BY Column name.
Thanks
Mich Talebzadeh
http://talebzadehmich.wordpress.com
Publications due shortly:
Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache
NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.
From: Gopal Vijayaraghavan [mailto:gopal@hortonworks.com] On Behalf Of Gopal
Vijayaraghavan
Sent: 30 March 2015 05:26
To: user@hive.apache.org
Cc: Lefty Leverenz
Subject: Re: ORDER BY clause in Hive
Hi Lefty,
Couldn't find the documentation for what
hive.groupby.orderby.position.alias=true does.
I suspect that might be what Mich was looking for (though I tend to write
the column names explicitly).
Cheers,
Gopal
From: Lefty Leverenz <le...@gmail.com>
Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
Date: Sunday, March 29, 2015 at 8:32 PM
To: "user@hive.apache.org" <us...@hive.apache.org>
Subject: Re: ORDER BY clause in Hive
I added information about this in the Order By
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#Lang
uageManualSortBy-SyntaxofOrderBy> section of the wiki. Thanks, Mich and
Gopal!
-- Lefty
On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh <mi...@peridale.co.uk>
wrote:
Hi Lefty, Gopal,
It appears that ORDER BY 3 is not interpreted as ORDERR BY "the result set
column three" which standard SQL as evident from Oracle and Sybase does.
So I made it an ORDER BY from the result set EXPLICITELY as shown below and
it worked OK
SELECT
rs.Customer_ID
, rs.Number_of_orders
, rs.Total_customer_amount
, rs.Average_order
, rs.Standard_deviation
FROM
(
SELECT cust_id AS Customer_ID,
COUNT(amount_sold) AS Number_of_orders,
SUM(amount_sold) AS Total_customer_amount,
AVG(amount_sold) AS Average_order,
STDDEV(amount_sold) AS Standard_deviation
FROM sales
GROUP BY cust_id
HAVING SUM(amount_sold) > 94000
AND AVG(amount_sold) < STDDEV(amount_sold)
) rs
ORDER BY
rs.Total_customer_amount
;
+-----------------+----------------------+---------------------------+------
---------------+------------------------+--+
| rs.customer_id | rs.number_of_orders | rs.total_customer_amount |
rs.average_order | rs.standard_deviation |
+-----------------+----------------------+---------------------------+------
---------------+------------------------+--+
| 1743.0 | 238 | 94786.12999999993 |
398.2610504201678 | 581.0439095219863 |
| 429.0 | 231 | 94819.41000000006 |
410.4736363636366 | 613.7057080691426 |
| 2994.0 | 227 | 94862.61000000006 |
417.89696035242315 | 623.1607772763742 |
| 6395.0 | 268 | 97010.47999999998 |
361.97940298507456 | 576.9120977984521 |
| 12783.0 | 240 | 97573.54999999996 |
406.5564583333332 | 590.4445500393804 |
| 4974.0 | 235 | 98006.16000000002 |
417.0474893617022 | 624.337482834059 |
| 42167.0 | 266 | 98585.96000000002 |
370.6239097744362 | 590.965120684093 |
| 10747.0 | 256 | 99578.08999999997 |
388.9769140624999 | 600.7615005975689 |
| 11407.0 | 248 | 103412.65999999995 |
416.9865322580643 | 622.221465710723 |
+-----------------+----------------------+---------------------------+------
---------------+------------------------+--+
9 rows selected (209.699 seconds)
Regards,
Mich Talebzadeh
http://talebzadehmich.wordpress.com
Publications due shortly:
Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache
NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.
From: Mich Talebzadeh [mailto:mich@peridale.co.uk]
Sent: 29 March 2015 00:11
To: user@hive.apache.org
Subject: ORDER BY clause in Hive
Hi,
Can someone point me to doc or otherwise to see if ORDER BY clause in Hive
is working OK
I have a simple aggregate query as follows:
SELECT cust_id AS Customer_ID,
COUNT(amount_sold) AS Number_of_orders,
SUM(amount_sold) AS Total_customer_amount,
AVG(amount_sold) AS Average_order,
STDDEV(amount_sold) AS Standard_deviation
FROM sales
GROUP BY cust_id
HAVING SUM(amount_sold) > 94000
AND AVG(amount_sold) < STDDEV(amount_sold)
ORDER BY 3 ;
The original table and data are from Oracle sh.sales table
Oracle comes back for this query with
CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER
STANDARD_DEVIATION
----------- ---------------- --------------------- -------------
------------------
1743 238 94786.13 398.26105
582.26845
429 231 94819.41 410.473636
615.038404
2994 227 94862.61 417.89696
624.53793
6395 268 97010.48 361.979403
577.991448
12783 240 97573.55 406.556458
591.6785
4974 235 98006.16 417.047489
625.670115
42167 266 98585.96 370.62391
592.079099
10747 256 99578.09 388.976914
601.938312
11407 248 103412.66 416.986532
623.479751
9 rows selected.
Ordered by TOTAL_CUSTOMER_AMOUNT
And hive returns for the same query
+--------------+-------------------+------------------------+---------------
------+---------------------+--+
| customer_id | number_of_orders | total_customer_amount |
average_order | standard_deviation |
+--------------+-------------------+------------------------+---------------
------+---------------------+--+
| 42167.0 | 266 | 98585.96000000002 |
370.6239097744362 | 590.965120684093 |
| 12783.0 | 240 | 97573.54999999996 |
406.5564583333332 | 590.4445500393804 |
| 11407.0 | 248 | 103412.65999999995 |
416.9865322580643 | 622.221465710723 |
| 10747.0 | 256 | 99578.08999999997 |
388.9769140624999 | 600.7615005975689 |
| 6395.0 | 268 | 97010.47999999998 |
361.97940298507456 | 576.9120977984521 |
| 4974.0 | 235 | 98006.16000000002 |
417.0474893617022 | 624.337482834059 |
| 2994.0 | 227 | 94862.61000000006 |
417.89696035242315 | 623.1607772763742 |
| 1743.0 | 238 | 94786.12999999993 |
398.2610504201678 | 581.0439095219863 |
| 429.0 | 231 | 94819.41000000006 |
410.4736363636366 | 613.7057080691426 |
+--------------+-------------------+------------------------+---------------
------+---------------------+--+
9 rows selected (215.774 seconds)
But ordering in Hive does not seem to be correct! Please note ordering is on
column three, total_customer_amount
I also tried this in Sybase and got the same as Oracle.
Adaptive Server cpu time: 100 ms.
Customer_ID Number_of_orders Total_customer_amount
Average_order Standard_deviation
-------------------- ----------------
-----------------------------------------
----------------------------------------------------
---------------------------
1743 238
94786.13 398.2610504201680
582.268450
429 231
94819.41 410.4736363636363
615.038404
2994 227
94862.61 417.8969603524229
624.537930
6395 268
97010.48 361.9794029850746
577.991448
12783 240
97573.55 406.5564583333333
591.678500
4974 235
98006.16 417.0474893617021
625.670115
42167 266
98585.96 370.6239097744360
592.079099
10747 256
99578.09 388.9769140625000
601.938312
11407 248
103412.66 416.9865322580645
623.479751
I tried Google search and seems to be different suggestions. May be I have
to rewrite the code?
Thanks
Mich Talebzadeh
http://talebzadehmich.wordpress.com
Publications due shortly:
Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache
NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.
Fwd: ORDER BY clause in Hive
Posted by Lefty Leverenz <le...@gmail.com>.
Oops, failed to send this to user@hive.
-- Lefty
---------- Forwarded message ----------
From: Lefty Leverenz <le...@gmail.com>
Date: Mon, Mar 30, 2015 at 12:42 AM
Subject: Re: ORDER BY clause in Hive
To: Gopal Vijayaraghavan <go...@apache.org>
Oho! Good point, Gopal.
hive.groupby.orderby.position.alias isn't in the wiki yet, but I can put it
there (and revise the ORDER BY and GROUP BY docs). It was introduced in
release 0.11.0 by HIVE-581
<https://issues.apache.org/jira/browse/HIVE-581> with
a default of false.
Description from HiveConf.java: "Whether to enable using Column Position
Alias in Group By or Order By."
But shouldn't Mich have gotten an error message?
-- Lefty
On Mon, Mar 30, 2015 at 12:25 AM, Gopal Vijayaraghavan <go...@apache.org>
wrote:
> Hi Lefty,
>
> Couldn’t find the documentation for what
> hive.groupby.orderby.position.alias=true does.
>
> I suspect that might be what Mich was looking for (though I tend to write
> the column names explicitly).
>
> Cheers,
> Gopal
>
> From: Lefty Leverenz <le...@gmail.com>
> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
> Date: Sunday, March 29, 2015 at 8:32 PM
> To: "user@hive.apache.org" <us...@hive.apache.org>
> Subject: Re: ORDER BY clause in Hive
>
> I added information about this in the Order By
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
> section of the wiki. Thanks, Mich and Gopal!
>
> -- Lefty
>
> On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh <mi...@peridale.co.uk>
> wrote:
>
>> Hi Lefty, Gopal,
>>
>>
>>
>> It appears that ORDER BY 3 is not interpreted as ORDERR BY “the result
>> set column three” which standard SQL as evident from Oracle and Sybase does.
>>
>>
>>
>> So I made it an ORDER BY from the result set EXPLICITELY as shown below
>> and it worked OK
>>
>>
>>
>> SELECT
>>
>> rs.Customer_ID
>>
>> , rs.Number_of_orders
>>
>> , rs.Total_customer_amount
>>
>> , rs.Average_order
>>
>> , rs.Standard_deviation
>>
>> FROM
>>
>> (
>>
>> SELECT cust_id AS Customer_ID,
>>
>> COUNT(amount_sold) AS Number_of_orders,
>>
>> SUM(amount_sold) AS Total_customer_amount,
>>
>> AVG(amount_sold) AS Average_order,
>>
>> STDDEV(amount_sold) AS Standard_deviation
>>
>> FROM sales
>>
>> GROUP BY cust_id
>>
>> HAVING SUM(amount_sold) > 94000
>>
>> AND AVG(amount_sold) < STDDEV(amount_sold)
>>
>> ) rs
>>
>> ORDER BY
>>
>> *rs.Total_customer_amount*
>>
>> ;
>>
>>
>>
>>
>>
>>
>> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>>
>> | rs.customer_id | rs.number_of_orders | rs.total_customer_amount |
>> rs.average_order | rs.standard_deviation |
>>
>>
>> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>>
>> | 1743.0 | 238 | 94786.12999999993 |
>> 398.2610504201678 | 581.0439095219863 |
>>
>> | 429.0 | 231 | 94819.41000000006 |
>> 410.4736363636366 | 613.7057080691426 |
>>
>> | 2994.0 | 227 | 94862.61000000006 |
>> 417.89696035242315 | 623.1607772763742 |
>>
>> | 6395.0 | 268 | 97010.47999999998 |
>> 361.97940298507456 | 576.9120977984521 |
>>
>> | 12783.0 | 240 | 97573.54999999996 |
>> 406.5564583333332 | 590.4445500393804 |
>>
>> | 4974.0 | 235 | 98006.16000000002 |
>> 417.0474893617022 | 624.337482834059 |
>>
>> | 42167.0 | 266 | 98585.96000000002 |
>> 370.6239097744362 | 590.965120684093 |
>>
>> | 10747.0 | 256 | 99578.08999999997 |
>> 388.9769140624999 | 600.7615005975689 |
>>
>> | 11407.0 | 248 | 103412.65999999995 |
>> 416.9865322580643 | 622.221465710723 |
>>
>>
>> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>>
>> 9 rows selected (209.699 seconds)
>>
>>
>>
>>
>>
>> Regards,
>>
>>
>>
>> Mich Talebzadeh
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> *Publications due shortly:*
>>
>> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen
>> and Coherence Cache*
>>
>>
>>
>> NOTE: The information in this email is proprietary and confidential. This
>> message is for the designated recipient only, if you are not the intended
>> recipient, you should destroy it immediately. Any information in this
>> message shall not be understood as given or endorsed by Peridale Ltd, its
>> subsidiaries or their employees, unless expressly so stated. It is the
>> responsibility of the recipient to ensure that this email is virus free,
>> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
>> any responsibility.
>>
>>
>>
>> *From:* Mich Talebzadeh [mailto:mich@peridale.co.uk]
>> *Sent:* 29 March 2015 00:11
>> *To:* user@hive.apache.org
>> *Subject:* ORDER BY clause in Hive
>>
>>
>>
>> Hi,
>>
>>
>>
>> Can someone point me to doc or otherwise to see if ORDER BY clause in
>> Hive is working OK
>>
>>
>>
>> I have a simple aggregate query as follows:
>>
>>
>>
>> SELECT cust_id AS Customer_ID,
>>
>> COUNT(amount_sold) AS Number_of_orders,
>>
>> SUM(amount_sold) AS Total_customer_amount,
>>
>> AVG(amount_sold) AS Average_order,
>>
>> STDDEV(amount_sold) AS Standard_deviation
>>
>> FROM sales
>>
>> GROUP BY cust_id
>>
>> HAVING SUM(amount_sold) > 94000
>>
>> AND AVG(amount_sold) < STDDEV(amount_sold)
>>
>> ORDER BY 3 ;
>>
>>
>>
>> The original table and data are from Oracle sh.sales table
>>
>>
>>
>> Oracle comes back for this query with
>>
>>
>>
>> CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER
>> STANDARD_DEVIATION
>>
>> ----------- ---------------- --------------------- -------------
>> ------------------
>>
>> 1743 238 94786.13 398.26105
>> 582.26845
>>
>> 429 231 94819.41 410.473636
>> 615.038404
>>
>> 2994 227 94862.61 417.89696
>> 624.53793
>>
>> 6395 268 97010.48 361.979403
>> 577.991448
>>
>> 12783 240 97573.55
>> 406.556458 591.6785
>>
>> 4974 235 98006.16 417.047489
>> 625.670115
>>
>> 42167 266 98585.96 370.62391
>> 592.079099
>>
>> 10747 256 99578.09 388.976914
>> 601.938312
>>
>> 11407 248 103412.66 416.986532
>> 623.479751
>>
>>
>>
>> 9 rows selected.
>>
>>
>>
>> Ordered by TOTAL_CUSTOMER_AMOUNT
>>
>>
>>
>> And hive returns for the same query
>>
>>
>>
>>
>> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>>
>> | customer_id | number_of_orders | total_customer_amount |
>> average_order | standard_deviation |
>>
>>
>> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>>
>> | 42167.0 | 266 | 98585.96000000002 |
>> 370.6239097744362 | 590.965120684093 |
>>
>> | 12783.0 | 240 | 97573.54999999996 |
>> 406.5564583333332 | 590.4445500393804 |
>>
>> | 11407.0 | 248 | 103412.65999999995 |
>> 416.9865322580643 | 622.221465710723 |
>>
>> | 10747.0 | 256 | 99578.08999999997 |
>> 388.9769140624999 | 600.7615005975689 |
>>
>> | 6395.0 | 268 | 97010.47999999998 |
>> 361.97940298507456 | 576.9120977984521 |
>>
>> | 4974.0 | 235 | 98006.16000000002 |
>> 417.0474893617022 | 624.337482834059 |
>>
>> | 2994.0 | 227 | 94862.61000000006 |
>> 417.89696035242315 | 623.1607772763742 |
>>
>> | 1743.0 | 238 | 94786.12999999993 |
>> 398.2610504201678 | 581.0439095219863 |
>>
>> | 429.0 | 231 | 94819.41000000006 |
>> 410.4736363636366 | 613.7057080691426 |
>>
>>
>> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>>
>> 9 rows selected (215.774 seconds)
>>
>>
>>
>> But ordering in Hive does not seem to be correct! Please note ordering is
>> on column three, *total_customer_amount*
>>
>>
>>
>> I also tried this in Sybase and got the same as Oracle.
>>
>>
>>
>> Adaptive Server cpu time: 100 ms.
>>
>> Customer_ID Number_of_orders
>> Total_customer_amount
>> Average_order Standard_deviation
>>
>> -------------------- ----------------
>> -----------------------------------------
>> ----------------------------------------------------
>> ---------------------------
>>
>> 1743 238
>> 94786.13
>> 398.2610504201680 582.268450
>>
>> 429 231
>> 94819.41
>> 410.4736363636363 615.038404
>>
>> 2994 227
>> 94862.61
>> 417.8969603524229 624.537930
>>
>> 6395 268
>> 97010.48
>> 361.9794029850746 577.991448
>>
>> 12783 240
>> 97573.55
>> 406.5564583333333 591.678500
>>
>> 4974 235
>> 98006.16
>> 417.0474893617021 625.670115
>>
>> 42167 266
>> 98585.96
>> 370.6239097744360 592.079099
>>
>> 10747 256
>> 99578.09 388.9769140625000
>> 601.938312
>>
>> 11407 248
>> 103412.66
>> 416.9865322580645 623.479751
>>
>>
>>
>> I tried Google search and seems to be different suggestions. May be I
>> have to rewrite the code?
>>
>>
>>
>> Thanks
>>
>>
>>
>> Mich Talebzadeh
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> *Publications due shortly:*
>>
>> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen
>> and Coherence Cache*
>>
>>
>>
>> NOTE: The information in this email is proprietary and confidential. This
>> message is for the designated recipient only, if you are not the intended
>> recipient, you should destroy it immediately. Any information in this
>> message shall not be understood as given or endorsed by Peridale Ltd, its
>> subsidiaries or their employees, unless expressly so stated. It is the
>> responsibility of the recipient to ensure that this email is virus free,
>> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
>> any responsibility.
>>
>>
>>
>
>
Re: ORDER BY clause in Hive
Posted by Gopal Vijayaraghavan <go...@apache.org>.
Hi Lefty,
Couldn¹t find the documentation for what
hive.groupby.orderby.position.alias=true does.
I suspect that might be what Mich was looking for (though I tend to write
the column names explicitly).
Cheers,
Gopal
From: Lefty Leverenz <le...@gmail.com>
Reply-To: "user@hive.apache.org" <us...@hive.apache.org>
Date: Sunday, March 29, 2015 at 8:32 PM
To: "user@hive.apache.org" <us...@hive.apache.org>
Subject: Re: ORDER BY clause in Hive
I added information about this in the Order By
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#Lang
uageManualSortBy-SyntaxofOrderBy> section of the wiki. Thanks, Mich and
Gopal!
-- Lefty
On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh <mi...@peridale.co.uk>
wrote:
> Hi Lefty, Gopal,
>
> It appears that ORDER BY 3 is not interpreted as ORDERR BY ³the result set
> column three² which standard SQL as evident from Oracle and Sybase does.
>
> So I made it an ORDER BY from the result set EXPLICITELY as shown below and it
> worked OK
>
> SELECT
> rs.Customer_ID
> , rs.Number_of_orders
> , rs.Total_customer_amount
> , rs.Average_order
> , rs.Standard_deviation
> FROM
> (
> SELECT cust_id AS Customer_ID,
> COUNT(amount_sold) AS Number_of_orders,
> SUM(amount_sold) AS Total_customer_amount,
> AVG(amount_sold) AS Average_order,
> STDDEV(amount_sold) AS Standard_deviation
> FROM sales
> GROUP BY cust_id
> HAVING SUM(amount_sold) > 94000
> AND AVG(amount_sold) < STDDEV(amount_sold)
> ) rs
> ORDER BY
> rs.Total_customer_amount
> ;
>
>
> +-----------------+----------------------+---------------------------+--------
> -------------+------------------------+--+
> | rs.customer_id | rs.number_of_orders | rs.total_customer_amount |
> rs.average_order | rs.standard_deviation |
> +-----------------+----------------------+---------------------------+--------
> -------------+------------------------+--+
> | 1743.0 | 238 | 94786.12999999993 |
> 398.2610504201678 | 581.0439095219863 |
> | 429.0 | 231 | 94819.41000000006 |
> 410.4736363636366 | 613.7057080691426 |
> | 2994.0 | 227 | 94862.61000000006 |
> 417.89696035242315 | 623.1607772763742 |
> | 6395.0 | 268 | 97010.47999999998 |
> 361.97940298507456 | 576.9120977984521 |
> | 12783.0 | 240 | 97573.54999999996 |
> 406.5564583333332 | 590.4445500393804 |
> | 4974.0 | 235 | 98006.16000000002 |
> 417.0474893617022 | 624.337482834059 |
> | 42167.0 | 266 | 98585.96000000002 |
> 370.6239097744362 | 590.965120684093 |
> | 10747.0 | 256 | 99578.08999999997 |
> 388.9769140624999 | 600.7615005975689 |
> | 11407.0 | 248 | 103412.65999999995 |
> 416.9865322580643 | 622.221465710723 |
> +-----------------+----------------------+---------------------------+--------
> -------------+------------------------+--+
> 9 rows selected (209.699 seconds)
>
>
> Regards,
>
>
> Mich Talebzadeh
>
> http://talebzadehmich.wordpress.com
>
> Publications due shortly:
> Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this message
> shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries
> or their employees, unless expressly so stated. It is the responsibility of
> the recipient to ensure that this email is virus free, therefore neither
> Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
>
>
> From: Mich Talebzadeh [mailto:mich@peridale.co.uk]
> Sent: 29 March 2015 00:11
> To: user@hive.apache.org
> Subject: ORDER BY clause in Hive
>
>
> Hi,
>
> Can someone point me to doc or otherwise to see if ORDER BY clause in Hive is
> working OK
>
> I have a simple aggregate query as follows:
>
> SELECT cust_id AS Customer_ID,
> COUNT(amount_sold) AS Number_of_orders,
> SUM(amount_sold) AS Total_customer_amount,
> AVG(amount_sold) AS Average_order,
> STDDEV(amount_sold) AS Standard_deviation
> FROM sales
> GROUP BY cust_id
> HAVING SUM(amount_sold) > 94000
> AND AVG(amount_sold) < STDDEV(amount_sold)
> ORDER BY 3 ;
>
> The original table and data are from Oracle sh.sales table
>
> Oracle comes back for this query with
>
> CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER
> STANDARD_DEVIATION
> ----------- ---------------- --------------------- -------------
> ------------------
> 1743 238 94786.13 398.26105
> 582.26845
> 429 231 94819.41 410.473636
> 615.038404
> 2994 227 94862.61 417.89696
> 624.53793
> 6395 268 97010.48 361.979403
> 577.991448
> 12783 240 97573.55 406.556458
> 591.6785
> 4974 235 98006.16 417.047489
> 625.670115
> 42167 266 98585.96 370.62391
> 592.079099
> 10747 256 99578.09 388.976914
> 601.938312
> 11407 248 103412.66 416.986532
> 623.479751
>
> 9 rows selected.
>
> Ordered by TOTAL_CUSTOMER_AMOUNT
>
> And hive returns for the same query
>
> +--------------+-------------------+------------------------+-----------------
> ----+---------------------+--+
> | customer_id | number_of_orders | total_customer_amount | average_order
> | standard_deviation |
> +--------------+-------------------+------------------------+-----------------
> ----+---------------------+--+
> | 42167.0 | 266 | 98585.96000000002 |
> 370.6239097744362 | 590.965120684093 |
> | 12783.0 | 240 | 97573.54999999996 |
> 406.5564583333332 | 590.4445500393804 |
> | 11407.0 | 248 | 103412.65999999995 |
> 416.9865322580643 | 622.221465710723 |
> | 10747.0 | 256 | 99578.08999999997 |
> 388.9769140624999 | 600.7615005975689 |
> | 6395.0 | 268 | 97010.47999999998 |
> 361.97940298507456 | 576.9120977984521 |
> | 4974.0 | 235 | 98006.16000000002 |
> 417.0474893617022 | 624.337482834059 |
> | 2994.0 | 227 | 94862.61000000006 |
> 417.89696035242315 | 623.1607772763742 |
> | 1743.0 | 238 | 94786.12999999993 |
> 398.2610504201678 | 581.0439095219863 |
> | 429.0 | 231 | 94819.41000000006 |
> 410.4736363636366 | 613.7057080691426 |
> +--------------+-------------------+------------------------+-----------------
> ----+---------------------+--+
> 9 rows selected (215.774 seconds)
>
> But ordering in Hive does not seem to be correct! Please note ordering is on
> column three, total_customer_amount
>
> I also tried this in Sybase and got the same as Oracle.
>
> Adaptive Server cpu time: 100 ms.
> Customer_ID Number_of_orders Total_customer_amount
> Average_order Standard_deviation
> -------------------- ----------------
> -----------------------------------------
> ----------------------------------------------------
> ---------------------------
> 1743 238
> 94786.13 398.2610504201680
> 582.268450
> 429 231
> 94819.41 410.4736363636363
> 615.038404
> 2994 227
> 94862.61 417.8969603524229
> 624.537930
> 6395 268
> 97010.48 361.9794029850746
> 577.991448
> 12783 240
> 97573.55 406.5564583333333
> 591.678500
> 4974 235
> 98006.16 417.0474893617021
> 625.670115
> 42167 266
> 98585.96 370.6239097744360
> 592.079099
> 10747 256
> 99578.09 388.9769140625000
> 601.938312
> 11407 248
> 103412.66 416.9865322580645
> 623.479751
>
> I tried Google search and seems to be different suggestions. May be I have to
> rewrite the code?
>
> Thanks
>
> Mich Talebzadeh
>
> http://talebzadehmich.wordpress.com
>
> Publications due shortly:
> Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this message
> shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries
> or their employees, unless expressly so stated. It is the responsibility of
> the recipient to ensure that this email is virus free, therefore neither
> Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
>
Re: ORDER BY clause in Hive
Posted by Lefty Leverenz <le...@gmail.com>.
I added information about this in the Order By
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
section of the wiki. Thanks, Mich and Gopal!
-- Lefty
On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh <mi...@peridale.co.uk>
wrote:
> Hi Lefty, Gopal,
>
>
>
> It appears that ORDER BY 3 is not interpreted as ORDERR BY “the result set
> column three” which standard SQL as evident from Oracle and Sybase does.
>
>
>
> So I made it an ORDER BY from the result set EXPLICITELY as shown below
> and it worked OK
>
>
>
> SELECT
>
> rs.Customer_ID
>
> , rs.Number_of_orders
>
> , rs.Total_customer_amount
>
> , rs.Average_order
>
> , rs.Standard_deviation
>
> FROM
>
> (
>
> SELECT cust_id AS Customer_ID,
>
> COUNT(amount_sold) AS Number_of_orders,
>
> SUM(amount_sold) AS Total_customer_amount,
>
> AVG(amount_sold) AS Average_order,
>
> STDDEV(amount_sold) AS Standard_deviation
>
> FROM sales
>
> GROUP BY cust_id
>
> HAVING SUM(amount_sold) > 94000
>
> AND AVG(amount_sold) < STDDEV(amount_sold)
>
> ) rs
>
> ORDER BY
>
> *rs.Total_customer_amount*
>
> ;
>
>
>
>
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> | rs.customer_id | rs.number_of_orders | rs.total_customer_amount |
> rs.average_order | rs.standard_deviation |
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> | 1743.0 | 238 | 94786.12999999993 |
> 398.2610504201678 | 581.0439095219863 |
>
> | 429.0 | 231 | 94819.41000000006 |
> 410.4736363636366 | 613.7057080691426 |
>
> | 2994.0 | 227 | 94862.61000000006 |
> 417.89696035242315 | 623.1607772763742 |
>
> | 6395.0 | 268 | 97010.47999999998 |
> 361.97940298507456 | 576.9120977984521 |
>
> | 12783.0 | 240 | 97573.54999999996 |
> 406.5564583333332 | 590.4445500393804 |
>
> | 4974.0 | 235 | 98006.16000000002 |
> 417.0474893617022 | 624.337482834059 |
>
> | 42167.0 | 266 | 98585.96000000002 |
> 370.6239097744362 | 590.965120684093 |
>
> | 10747.0 | 256 | 99578.08999999997 |
> 388.9769140624999 | 600.7615005975689 |
>
> | 11407.0 | 248 | 103412.65999999995 |
> 416.9865322580643 | 622.221465710723 |
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> 9 rows selected (209.699 seconds)
>
>
>
>
>
> Regards,
>
>
>
> Mich Talebzadeh
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Publications due shortly:*
>
> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache*
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Ltd, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
> any responsibility.
>
>
>
> *From:* Mich Talebzadeh [mailto:mich@peridale.co.uk]
> *Sent:* 29 March 2015 00:11
> *To:* user@hive.apache.org
> *Subject:* ORDER BY clause in Hive
>
>
>
> Hi,
>
>
>
> Can someone point me to doc or otherwise to see if ORDER BY clause in Hive
> is working OK
>
>
>
> I have a simple aggregate query as follows:
>
>
>
> SELECT cust_id AS Customer_ID,
>
> COUNT(amount_sold) AS Number_of_orders,
>
> SUM(amount_sold) AS Total_customer_amount,
>
> AVG(amount_sold) AS Average_order,
>
> STDDEV(amount_sold) AS Standard_deviation
>
> FROM sales
>
> GROUP BY cust_id
>
> HAVING SUM(amount_sold) > 94000
>
> AND AVG(amount_sold) < STDDEV(amount_sold)
>
> ORDER BY 3 ;
>
>
>
> The original table and data are from Oracle sh.sales table
>
>
>
> Oracle comes back for this query with
>
>
>
> CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER
> STANDARD_DEVIATION
>
> ----------- ---------------- --------------------- -------------
> ------------------
>
> 1743 238 94786.13 398.26105
> 582.26845
>
> 429 231 94819.41 410.473636
> 615.038404
>
> 2994 227 94862.61 417.89696
> 624.53793
>
> 6395 268 97010.48 361.979403
> 577.991448
>
> 12783 240 97573.55 406.556458
> 591.6785
>
> 4974 235 98006.16 417.047489
> 625.670115
>
> 42167 266 98585.96 370.62391
> 592.079099
>
> 10747 256 99578.09 388.976914
> 601.938312
>
> 11407 248 103412.66 416.986532
> 623.479751
>
>
>
> 9 rows selected.
>
>
>
> Ordered by TOTAL_CUSTOMER_AMOUNT
>
>
>
> And hive returns for the same query
>
>
>
>
> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>
> | customer_id | number_of_orders | total_customer_amount |
> average_order | standard_deviation |
>
>
> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>
> | 42167.0 | 266 | 98585.96000000002 |
> 370.6239097744362 | 590.965120684093 |
>
> | 12783.0 | 240 | 97573.54999999996 |
> 406.5564583333332 | 590.4445500393804 |
>
> | 11407.0 | 248 | 103412.65999999995 |
> 416.9865322580643 | 622.221465710723 |
>
> | 10747.0 | 256 | 99578.08999999997 |
> 388.9769140624999 | 600.7615005975689 |
>
> | 6395.0 | 268 | 97010.47999999998 |
> 361.97940298507456 | 576.9120977984521 |
>
> | 4974.0 | 235 | 98006.16000000002 |
> 417.0474893617022 | 624.337482834059 |
>
> | 2994.0 | 227 | 94862.61000000006 |
> 417.89696035242315 | 623.1607772763742 |
>
> | 1743.0 | 238 | 94786.12999999993 |
> 398.2610504201678 | 581.0439095219863 |
>
> | 429.0 | 231 | 94819.41000000006 |
> 410.4736363636366 | 613.7057080691426 |
>
>
> +--------------+-------------------+------------------------+---------------------+---------------------+--+
>
> 9 rows selected (215.774 seconds)
>
>
>
> But ordering in Hive does not seem to be correct! Please note ordering is
> on column three, *total_customer_amount*
>
>
>
> I also tried this in Sybase and got the same as Oracle.
>
>
>
> Adaptive Server cpu time: 100 ms.
>
> Customer_ID Number_of_orders
> Total_customer_amount
> Average_order Standard_deviation
>
> -------------------- ----------------
> -----------------------------------------
> ----------------------------------------------------
> ---------------------------
>
> 1743 238
> 94786.13
> 398.2610504201680 582.268450
>
> 429 231
> 94819.41
> 410.4736363636363 615.038404
>
> 2994 227
> 94862.61
> 417.8969603524229 624.537930
>
> 6395 268
> 97010.48
> 361.9794029850746 577.991448
>
> 12783 240
> 97573.55
> 406.5564583333333 591.678500
>
> 4974 235
> 98006.16
> 417.0474893617021 625.670115
>
> 42167 266
> 98585.96
> 370.6239097744360 592.079099
>
> 10747 256
> 99578.09 388.9769140625000
> 601.938312
>
> 11407 248
> 103412.66
> 416.9865322580645 623.479751
>
>
>
> I tried Google search and seems to be different suggestions. May be I have
> to rewrite the code?
>
>
>
> Thanks
>
>
>
> Mich Talebzadeh
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Publications due shortly:*
>
> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache*
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Ltd, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
> any responsibility.
>
>
>
RE: ORDER BY clause in Hive
Posted by Mich Talebzadeh <mi...@peridale.co.uk>.
Hi Lefty, Gopal,
It appears that ORDER BY 3 is not interpreted as ORDERR BY "the result set
column three" which standard SQL as evident from Oracle and Sybase does.
So I made it an ORDER BY from the result set EXPLICITELY as shown below and
it worked OK
SELECT
rs.Customer_ID
, rs.Number_of_orders
, rs.Total_customer_amount
, rs.Average_order
, rs.Standard_deviation
FROM
(
SELECT cust_id AS Customer_ID,
COUNT(amount_sold) AS Number_of_orders,
SUM(amount_sold) AS Total_customer_amount,
AVG(amount_sold) AS Average_order,
STDDEV(amount_sold) AS Standard_deviation
FROM sales
GROUP BY cust_id
HAVING SUM(amount_sold) > 94000
AND AVG(amount_sold) < STDDEV(amount_sold)
) rs
ORDER BY
rs.Total_customer_amount
;
+-----------------+----------------------+---------------------------+------
---------------+------------------------+--+
| rs.customer_id | rs.number_of_orders | rs.total_customer_amount |
rs.average_order | rs.standard_deviation |
+-----------------+----------------------+---------------------------+------
---------------+------------------------+--+
| 1743.0 | 238 | 94786.12999999993 |
398.2610504201678 | 581.0439095219863 |
| 429.0 | 231 | 94819.41000000006 |
410.4736363636366 | 613.7057080691426 |
| 2994.0 | 227 | 94862.61000000006 |
417.89696035242315 | 623.1607772763742 |
| 6395.0 | 268 | 97010.47999999998 |
361.97940298507456 | 576.9120977984521 |
| 12783.0 | 240 | 97573.54999999996 |
406.5564583333332 | 590.4445500393804 |
| 4974.0 | 235 | 98006.16000000002 |
417.0474893617022 | 624.337482834059 |
| 42167.0 | 266 | 98585.96000000002 |
370.6239097744362 | 590.965120684093 |
| 10747.0 | 256 | 99578.08999999997 |
388.9769140624999 | 600.7615005975689 |
| 11407.0 | 248 | 103412.65999999995 |
416.9865322580643 | 622.221465710723 |
+-----------------+----------------------+---------------------------+------
---------------+------------------------+--+
9 rows selected (209.699 seconds)
Regards,
Mich Talebzadeh
http://talebzadehmich.wordpress.com
Publications due shortly:
Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache
NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.
From: Mich Talebzadeh [mailto:mich@peridale.co.uk]
Sent: 29 March 2015 00:11
To: user@hive.apache.org
Subject: ORDER BY clause in Hive
Hi,
Can someone point me to doc or otherwise to see if ORDER BY clause in Hive
is working OK
I have a simple aggregate query as follows:
SELECT cust_id AS Customer_ID,
COUNT(amount_sold) AS Number_of_orders,
SUM(amount_sold) AS Total_customer_amount,
AVG(amount_sold) AS Average_order,
STDDEV(amount_sold) AS Standard_deviation
FROM sales
GROUP BY cust_id
HAVING SUM(amount_sold) > 94000
AND AVG(amount_sold) < STDDEV(amount_sold)
ORDER BY 3 ;
The original table and data are from Oracle sh.sales table
Oracle comes back for this query with
CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER
STANDARD_DEVIATION
----------- ---------------- --------------------- -------------
------------------
1743 238 94786.13 398.26105
582.26845
429 231 94819.41 410.473636
615.038404
2994 227 94862.61 417.89696
624.53793
6395 268 97010.48 361.979403
577.991448
12783 240 97573.55 406.556458
591.6785
4974 235 98006.16 417.047489
625.670115
42167 266 98585.96 370.62391
592.079099
10747 256 99578.09 388.976914
601.938312
11407 248 103412.66 416.986532
623.479751
9 rows selected.
Ordered by TOTAL_CUSTOMER_AMOUNT
And hive returns for the same query
+--------------+-------------------+------------------------+---------------
------+---------------------+--+
| customer_id | number_of_orders | total_customer_amount |
average_order | standard_deviation |
+--------------+-------------------+------------------------+---------------
------+---------------------+--+
| 42167.0 | 266 | 98585.96000000002 |
370.6239097744362 | 590.965120684093 |
| 12783.0 | 240 | 97573.54999999996 |
406.5564583333332 | 590.4445500393804 |
| 11407.0 | 248 | 103412.65999999995 |
416.9865322580643 | 622.221465710723 |
| 10747.0 | 256 | 99578.08999999997 |
388.9769140624999 | 600.7615005975689 |
| 6395.0 | 268 | 97010.47999999998 |
361.97940298507456 | 576.9120977984521 |
| 4974.0 | 235 | 98006.16000000002 |
417.0474893617022 | 624.337482834059 |
| 2994.0 | 227 | 94862.61000000006 |
417.89696035242315 | 623.1607772763742 |
| 1743.0 | 238 | 94786.12999999993 |
398.2610504201678 | 581.0439095219863 |
| 429.0 | 231 | 94819.41000000006 |
410.4736363636366 | 613.7057080691426 |
+--------------+-------------------+------------------------+---------------
------+---------------------+--+
9 rows selected (215.774 seconds)
But ordering in Hive does not seem to be correct! Please note ordering is on
column three, total_customer_amount
I also tried this in Sybase and got the same as Oracle.
Adaptive Server cpu time: 100 ms.
Customer_ID Number_of_orders Total_customer_amount
Average_order Standard_deviation
-------------------- ----------------
-----------------------------------------
----------------------------------------------------
---------------------------
1743 238
94786.13 398.2610504201680
582.268450
429 231
94819.41 410.4736363636363
615.038404
2994 227
94862.61 417.8969603524229
624.537930
6395 268
97010.48 361.9794029850746
577.991448
12783 240
97573.55 406.5564583333333
591.678500
4974 235
98006.16 417.0474893617021
625.670115
42167 266
98585.96 370.6239097744360
592.079099
10747 256
99578.09 388.9769140625000
601.938312
11407 248
103412.66 416.9865322580645
623.479751
I tried Google search and seems to be different suggestions. May be I have
to rewrite the code?
Thanks
Mich Talebzadeh
http://talebzadehmich.wordpress.com
Publications due shortly:
Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
Coherence Cache
NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Ltd, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.