You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Alexis Rondeau <al...@gmail.com> on 2009/05/06 19:05:20 UTC
Individual distinction on two or more columns
Hi there,
I'm currently getting my feet wet with Hive and am very impressed how quick
and easy it was to get going and try things out.
I am trying to run a query using count(distinct) on two separate columns is
failing as follows:
hive> select count(distinct user), count(distinct session) from actions;
FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns
not Supported user
While the similar semantics work fine in sql, is there a recommended
workaround for this limitation or something that will be included in future
releases?
Any pointers are appreciated, thank you very much in advance,
Alexis
--
Alexis Rondeau
RE: Individual distinction on two or more columns
Posted by Ashish Thusoo <at...@facebook.com>.
Just wanted to add here...
One of the reasons we did not support this initially was because we were splitting a group by into two jobs with the first one generating partial counts on the distinct + group by keys. This was a better plan when there were data skews - which was one of the more common cases. Since, then we are already supporting a single map/reduce job for the group by and now is perhaps a good time to remove this restriction....
Thanks for filing the JIRA...
Ashish
________________________________
From: Alexis Rondeau [mailto:alexis.rondeau@gmail.com]
Sent: Wednesday, May 06, 2009 10:20 AM
To: hive-user@hadoop.apache.org
Subject: Re: Individual distinction on two or more columns
Namit and Prasad
thank you for your fast responses.
Your suggestion of combining two result sets makes sense, should have thought of that of course. My first impulse would be to somehow join those tables again, so I don't have to manually go into the files after issuing the queries. Well, since I'm new to this, it's probably best to try it out.
I will see how that goes and in the meantime will file the jira.
Again, thank you for both of your suggestions!
Alexis
On Wed, May 6, 2009 at 1:08 PM, Namit Jain <nj...@facebook.com>> wrote:
Right now, hive does not support multiple distinct in the same query -
The only workaround would be to have 2 different queries and then combine the results manually.
If you need it, can you file a jira ? We will try to look at it asap
Thanks,
-namit
From: Alexis Rondeau [mailto:alexis.rondeau@gmail.com<ma...@gmail.com>]
Sent: Wednesday, May 06, 2009 10:05 AM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Individual distinction on two or more columns
Hi there,
I'm currently getting my feet wet with Hive and am very impressed how quick and easy it was to get going and try things out.
I am trying to run a query using count(distinct) on two separate columns is failing as follows:
hive> select count(distinct user), count(distinct session) from actions;
FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns not Supported user
While the similar semantics work fine in sql, is there a recommended workaround for this limitation or something that will be included in future releases?
Any pointers are appreciated, thank you very much in advance,
Alexis
--
Alexis Rondeau
--
Alexis Robin Rondeau
Re: Individual distinction on two or more columns
Posted by Alexis Rondeau <al...@gmail.com>.
Namit and Prasad
thank you for your fast responses.
Your suggestion of combining two result sets makes sense, should have
thought of that of course. My first impulse would be to somehow join those
tables again, so I don't have to manually go into the files after issuing
the queries. Well, since I'm new to this, it's probably best to try it out.
I will see how that goes and in the meantime will file the jira.
Again, thank you for both of your suggestions!
Alexis
On Wed, May 6, 2009 at 1:08 PM, Namit Jain <nj...@facebook.com> wrote:
> Right now, hive does not support multiple distinct in the same query –
>
>
>
> The only workaround would be to have 2 different queries and then combine
> the results manually.
>
>
>
> If you need it, can you file a jira ? We will try to look at it asap
>
>
>
> Thanks,
>
> -namit
>
>
>
>
>
> *From:* Alexis Rondeau [mailto:alexis.rondeau@gmail.com]
> *Sent:* Wednesday, May 06, 2009 10:05 AM
> *To:* hive-user@hadoop.apache.org
> *Subject:* Individual distinction on two or more columns
>
>
>
> Hi there,
>
> I'm currently getting my feet wet with Hive and am very impressed how quick
> and easy it was to get going and try things out.
>
> I am trying to run a query using count(distinct) on two separate columns is
> failing as follows:
>
> hive> select count(distinct user), count(distinct session) from actions;
>
> FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns
> not Supported user
>
>
> While the similar semantics work fine in sql, is there a recommended
> workaround for this limitation or something that will be included in future
> releases?
>
> Any pointers are appreciated, thank you very much in advance,
>
>
> Alexis
>
>
> --
> Alexis Rondeau
>
--
Alexis Robin Rondeau
RE: Individual distinction on two or more columns
Posted by Namit Jain <nj...@facebook.com>.
Right now, hive does not support multiple distinct in the same query -
The only workaround would be to have 2 different queries and then combine the results manually.
If you need it, can you file a jira ? We will try to look at it asap
Thanks,
-namit
From: Alexis Rondeau [mailto:alexis.rondeau@gmail.com]
Sent: Wednesday, May 06, 2009 10:05 AM
To: hive-user@hadoop.apache.org
Subject: Individual distinction on two or more columns
Hi there,
I'm currently getting my feet wet with Hive and am very impressed how quick and easy it was to get going and try things out.
I am trying to run a query using count(distinct) on two separate columns is failing as follows:
hive> select count(distinct user), count(distinct session) from actions;
FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns not Supported user
While the similar semantics work fine in sql, is there a recommended workaround for this limitation or something that will be included in future releases?
Any pointers are appreciated, thank you very much in advance,
Alexis
--
Alexis Rondeau
Re: Individual distinction on two or more columns
Posted by Prasad Chakka <pc...@facebook.com>.
Hive doesn't support distinct on more than one column. I don't think any one is working on that right now. The only work around I could think of is two issue separate queries
Or you could try something like this
from actions
insert overwrite local directory 'a'
select count(distinct user)
insert overwrite local directory 'b'
select count(distinct session);
Output will be in two different files named a & b.
Prasad
________________________________
From: Alexis Rondeau <al...@gmail.com>
Reply-To: <hi...@hadoop.apache.org>
Date: Wed, 6 May 2009 10:05:20 -0700
To: <hi...@hadoop.apache.org>
Subject: Individual distinction on two or more columns
Hi there,
I'm currently getting my feet wet with Hive and am very impressed how quick and easy it was to get going and try things out.
I am trying to run a query using count(distinct) on two separate columns is failing as follows:
hive> select count(distinct user), count(distinct session) from actions;
FAILED: Error in semantic analysis: line 2:7 DISTINCT on Different Columns not Supported user
While the similar semantics work fine in sql, is there a recommended workaround for this limitation or something that will be included in future releases?
Any pointers are appreciated, thank you very much in advance,
Alexis
--
Alexis Rondeau