You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Alexis Rondeau <al...@gmail.com> on 2009/05/06 19:05:20 UTC

Individual distinction on two or more columns

Hi there,

I'm currently getting my feet wet with Hive and am very impressed how quick
and easy it was to get going and try things out.

I am trying to run a query using count(distinct) on two separate columns is
failing as follows:

hive> select count(distinct user), count(distinct session) from actions;
FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns
not Supported user


While the similar semantics work fine in sql, is there a recommended
workaround for this limitation or something that will be included in future
releases?

Any pointers are appreciated, thank you very much in advance,


Alexis


-- 
Alexis Rondeau

RE: Individual distinction on two or more columns

Posted by Ashish Thusoo <at...@facebook.com>.
Just wanted to add here...

One of the reasons we did not support this initially was because we were splitting a group by into two jobs with the first one generating partial counts on the distinct + group by keys. This was a better plan when there were data skews - which was one of the more common cases. Since, then we are already supporting a single map/reduce job for the group by and now is perhaps a good time to remove this restriction....

Thanks for filing the JIRA...

Ashish

________________________________
From: Alexis Rondeau [mailto:alexis.rondeau@gmail.com]
Sent: Wednesday, May 06, 2009 10:20 AM
To: hive-user@hadoop.apache.org
Subject: Re: Individual distinction on two or more columns

Namit and Prasad

thank you for your fast responses.

Your suggestion of combining two result sets makes sense, should have thought of that of course. My first impulse would be to somehow join those tables again, so I don't have to manually go into the files after issuing the queries. Well, since I'm new to this, it's probably best to try it out.

I will see how that goes and in the meantime will file the jira.


Again, thank you for both of your suggestions!


Alexis


On Wed, May 6, 2009 at 1:08 PM, Namit Jain <nj...@facebook.com>> wrote:

Right now, hive does not support multiple distinct in the same query -



The only workaround would be to have 2 different queries and then combine the results manually.



If you need it, can you file a jira ? We will try to look at it asap



Thanks,

-namit





From: Alexis Rondeau [mailto:alexis.rondeau@gmail.com<ma...@gmail.com>]
Sent: Wednesday, May 06, 2009 10:05 AM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Individual distinction on two or more columns



Hi there,

I'm currently getting my feet wet with Hive and am very impressed how quick and easy it was to get going and try things out.

I am trying to run a query using count(distinct) on two separate columns is failing as follows:

hive> select count(distinct user), count(distinct session) from actions;
FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns not Supported user


While the similar semantics work fine in sql, is there a recommended workaround for this limitation or something that will be included in future releases?

Any pointers are appreciated, thank you very much in advance,


Alexis


--
Alexis Rondeau



--
Alexis Robin Rondeau

Re: Individual distinction on two or more columns

Posted by Alexis Rondeau <al...@gmail.com>.
Namit and Prasad

thank you for your fast responses.

Your suggestion of combining two result sets makes sense, should have
thought of that of course. My first impulse would be to somehow join those
tables again, so I don't have to manually go into the files after issuing
the queries. Well, since I'm new to this, it's probably best to try it out.

I will see how that goes and in the meantime will file the jira.


Again, thank you for both of your suggestions!


Alexis


On Wed, May 6, 2009 at 1:08 PM, Namit Jain <nj...@facebook.com> wrote:

>  Right now, hive does not support multiple distinct in the same query –
>
>
>
> The only workaround would be to have 2 different queries and then combine
> the results manually.
>
>
>
> If you need it, can you file a jira ? We will try to look at it asap
>
>
>
> Thanks,
>
> -namit
>
>
>
>
>
> *From:* Alexis Rondeau [mailto:alexis.rondeau@gmail.com]
> *Sent:* Wednesday, May 06, 2009 10:05 AM
> *To:* hive-user@hadoop.apache.org
> *Subject:* Individual distinction on two or more columns
>
>
>
> Hi there,
>
> I'm currently getting my feet wet with Hive and am very impressed how quick
> and easy it was to get going and try things out.
>
> I am trying to run a query using count(distinct) on two separate columns is
> failing as follows:
>
> hive> select count(distinct user), count(distinct session) from actions;
>
> FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns
> not Supported user
>
>
> While the similar semantics work fine in sql, is there a recommended
> workaround for this limitation or something that will be included in future
> releases?
>
> Any pointers are appreciated, thank you very much in advance,
>
>
> Alexis
>
>
> --
> Alexis Rondeau
>



-- 
Alexis Robin Rondeau

RE: Individual distinction on two or more columns

Posted by Namit Jain <nj...@facebook.com>.
Right now, hive does not support multiple distinct in the same query -

The only workaround would be to have 2 different queries and then combine the results manually.

If you need it, can you file a jira ? We will try to look at it asap

Thanks,
-namit


From: Alexis Rondeau [mailto:alexis.rondeau@gmail.com]
Sent: Wednesday, May 06, 2009 10:05 AM
To: hive-user@hadoop.apache.org
Subject: Individual distinction on two or more columns

Hi there,

I'm currently getting my feet wet with Hive and am very impressed how quick and easy it was to get going and try things out.

I am trying to run a query using count(distinct) on two separate columns is failing as follows:

hive> select count(distinct user), count(distinct session) from actions;
FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns not Supported user


While the similar semantics work fine in sql, is there a recommended workaround for this limitation or something that will be included in future releases?

Any pointers are appreciated, thank you very much in advance,


Alexis


--
Alexis Rondeau

Re: Individual distinction on two or more columns

Posted by Prasad Chakka <pc...@facebook.com>.
Hive doesn't support distinct on more than one column. I don't think any one is working on that right now. The only work around I could think of is two issue separate queries

Or you could try something like this

from actions
insert overwrite local directory 'a'
    select count(distinct user)
insert overwrite local directory 'b'
    select count(distinct session);

Output will be in two different files named a & b.

Prasad


________________________________
From: Alexis Rondeau <al...@gmail.com>
Reply-To: <hi...@hadoop.apache.org>
Date: Wed, 6 May 2009 10:05:20 -0700
To: <hi...@hadoop.apache.org>
Subject: Individual distinction on two or more columns

Hi there,

I'm currently getting my feet wet with Hive and am very impressed how quick and easy it was to get going and try things out.

I am trying to run a query using count(distinct) on two separate columns is failing as follows:

hive> select count(distinct user), count(distinct session) from actions;
FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns not Supported user


While the similar semantics work fine in sql, is there a recommended workaround for this limitation or something that will be included in future releases?

Any pointers are appreciated, thank you very much in advance,


Alexis


--
Alexis Rondeau