You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Lefty Leverenz <le...@gmail.com> on 2015/04/01 05:43:34 UTC

Re: ORDER BY clause in Hive

Thanks for the update, Mich.  And thanks to you & Gopal for prompting me to
get the documentation done.

-- Lefty

On Tue, Mar 31, 2015 at 5:20 AM, Mich Talebzadeh <mi...@peridale.co.uk>
wrote:

> Thanks Lefty for the information provided.
>
>
>
> My version of hive is 014.0
>
>
>
> hive --version
>
> Hive 0.14.0
>
>
>
> Which should support the configuration parameter at the session level à
> set hive.groupby.orderby.position.alias=true
>
>
>
>
>
> set hive.groupby.orderby.position.alias=true;
>
> SELECT
>
>           rs.Customer_ID
>
>         , rs.Number_of_orders
>
>         , *rs.Total_customer_amount*
>
>         , rs.Average_order
>
>         , rs.Standard_deviation
>
> FROM
>
> (
>
>         SELECT cust_id AS Customer_ID,
>
>         COUNT(amount_sold) AS Number_of_orders,
>
>         SUM(amount_sold) AS Total_customer_amount,
>
>         AVG(amount_sold) AS Average_order,
>
>         stddev_samp(amount_sold) AS Standard_deviation
>
>         FROM sales
>
>         GROUP BY cust_id
>
>         HAVING SUM(amount_sold) > 94000
>
>         AND AVG(amount_sold) < stddev_samp(amount_sold)
>
> ) rs
>
> ORDER BY
>
>           -- Total_customer_amount DESC
>
> *          3 DESC*
>
> *;*
>
>
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> | rs.customer_id  | rs.number_of_orders  | rs.total_customer_amount  |
> rs.average_order   | rs.standard_deviation  |
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> | 11407.0         | 248                  | 103412.65999999995        |
> 416.9865322580643   | 623.4797510518939      |
>
> | 10747.0         | 256                  | 99578.08999999997         |
> 388.9769140624999   | 601.9383117167412      |
>
> | 42167.0         | 266                  | 98585.96000000002         |
> 370.6239097744362   | 592.0790992800527      |
>
> | 4974.0          | 235                  | 98006.16000000002         |
> 417.0474893617022   | 625.670115050053       |
>
> | 12783.0         | 240                  | 97573.54999999996         |
> 406.5564583333332   | 591.6785002882084      |
>
> | 6395.0          | 268                  | 97010.47999999998         |
> 361.97940298507456  | 577.991447849281       |
>
> | 2994.0          | 227                  | 94862.61000000006         |
> 417.89696035242315  | 624.5379298449825      |
>
> | 429.0           | 231                  | 94819.41000000006         |
> 410.4736363636366   | 615.0384039014772      |
>
> | 1743.0          | 238                  | 94786.12999999993         |
> 398.2610504201678   | 582.2684502048478      |
>
>
> +-----------------+----------------------+---------------------------+---------------------+------------------------+--+
>
> 9 rows selected (212.535 seconds)
>
>
>
> Indeed this is the correct result ordering by column posirion
>
>
>
> At the Hive server level I added the folowing properties to
> $HIVE_HOME/conf/ hive-site.xml
>
>
>
>   <property>
>
>     <name>hive.groupby.orderby.position.alias</name>
>
>     <value>true</value>
>
>     <description>Eenables using Column Position Alias in GROUP BY and
> ORDER BY clauses of queries.</description>
>
>   </property>
>
>
>
> And ran the above query without session level setting and it worked
>
>
>
> Regards,
>
>
>
> Mich Talebzadeh
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Publications due shortly:*
>
> *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache*
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Ltd, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Peridale Ltd, its subsidiaries nor their employees accept
> any responsibility.
>
>
>
> *From:* Lefty Leverenz [mailto:leftyleverenz@gmail.com]
> *Sent:* 31 March 2015 07:18
> *To:* user@hive.apache.org
> *Subject:* Re: ORDER BY clause in Hive
>
>
>
> I've opened HIVE-10160 <https://issues.apache.org/jira/browse/HIVE-10160>:
>  Give a warning when grouping or ordering by a constant column.
>
>
>
> Thanks Gopal.
>
>
> -- Lefty
>
>
>
> On Tue, Mar 31, 2015 at 2:14 AM, Lefty Leverenz <le...@gmail.com>
> wrote:
>
> ---------- Forwarded message ----------
>
> From: *Lefty Leverenz* <le...@gmail.com>
> Date: Tue, Mar 31, 2015 at 1:47 AM
> Subject: Re: ORDER BY clause in Hive
>
> To: Mich Talebzadeh <mi...@peridale.co.uk>
>
> Hive as I see it does not support ORDER BY *Column position*. It only
> supports ORDER BY *Column name*.
>
>
>
> That's just in Hive release 0.10.0 and earlier.  In release 0.11.0+ you
> can set the configuration parameter
> <https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-ConfiguringHive>
>  *hive.groupby.orderby.position.alias* to true, and then you can use
> column positions in ORDER BY.
>
>
>
> Here's the new documentation:
>
>    - hive.groupby.orderby.position.alias
>    <https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.groupby.orderby.position.alias>
>    - Order By
>    <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy>
>
>    - Group By
>    <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy#LanguageManualGroupBy-GroupBySyntax>
>
>
> -- Lefty
>