You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by rakesh sharma <ra...@hotmail.com> on 2017/02/27 04:56:45 UTC

Distinct clause in hive

When using distinct in hive query it runs for hours otherwise it's running for less than a minute. How can I optimise thisvquery.

Thanks

Get Outlook for Android<https://aka.ms/ghei36>


Re: Distinct clause in hive

Posted by "Pushkar.Gujar" <pu...@gmail.com>.
Hi Rakesh,

How big are your files? and is the data ordered/sorted by column on which
you are running distinct on? if column contains empty string, null and
spaces which all treated as different by hive. Converting them to hive's
native null type can help in improving performance..


Thank you,
*Pushkar Gujar*


On Sun, Feb 26, 2017 at 11:56 PM, rakesh sharma <ra...@hotmail.com>
wrote:

> When using distinct in hive query it runs for hours otherwise it's running
> for less than a minute. How can I optimise thisvquery.
>
> Thanks
>
> Get Outlook for Android <https://aka.ms/ghei36>
>
>