You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Matt Pestritto <ma...@pestritto.com> on 2009/04/23 00:22:19 UTC

Aggregrate Query Fails.

Hi - I'm having a problem with a query below.  When I try to run any
aggregate function on a column from the sub-query, the job fails.
The queries and output messages are below.

Suggestions?

thanks in advance.

-- works:  2 map-reduces jobs.
select m.description, o_buyers.num as buyers
from clickstream_output o
  join merchandise m on (o.merchandise_id = m.merchandise_id)
  left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id)
as num from clickstream_output o1
         where o1.file_date >= '20090216' and o1.file_date <= '20090217' and
o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
     on (o_buyers.merchandise_id = o.merchandise_id)
where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;

Successful output: PROD7362, 3

-- fails: 3 map-reduce jobs - 2nd reduce fails.
select  m.description, o_buyers.num as buyers*, count(1) as total*  -- sum
or max(o_buyers.num) and removing from group by also fails.
from clickstream_output o
  join merchandise m on (o.merchandise_id = m.merchandise_id)
  left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id)
as num from clickstream_output o1
         where o1.file_date >= '20090216' and o1.file_date <= '20090217' and
o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
     on (o_buyers.merchandise_id = o.merchandise_id)
where o.file_date >= '20090216' and o.file_date <= '20090217'
*group by m.description, o_buyers.num* limit 40 ;


java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
be cast to org.apache.hadoop.io.Text
	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
be cast to org.apache.hadoop.io.Text
	at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
	at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
	at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
	at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
	at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
	at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
	at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
	at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)
	at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)

Re: Aggregrate Query Fails.

Posted by Matt Pestritto <ma...@pestritto.com>.

Zheng,

Eureka!  That did it.  I had already changed all of my table definitions so
all columns were strings so all I needed to do was cast the aggregate
function in the join sub-query and that did it for me. --
CAST(count(distinct o1.consumer_id) AS STRING) as num.  This would also
explain some of the exceptions I was seeing on other joins.

Thanks so much
-Matt

On Wed, Apr 22, 2009 at 9:35 PM, Zheng Shao <zs...@gmail.com> wrote:

> Hi Matt,
>
> There is a known bug with JOIN -  all output columns from JOIN will become
> STRING.
> The stacktrace shows this is exactly because of that bug.
>
> The workaround right now is to replace all columns with CAST(xxx as
> STRING), if the column type is not a STRING.
>
>
> select  m.description, o_buyers.num as buyers*, count(1) as total*
> from clickstream_output o
>   join merchandise m on (CAST(o.merchandise_id AS STRING) =
> CAST(m.merchandise_id AS STRING))
>   left outer join ( select CAST(o1.merchandise_id AS STRING) as
> merchandise_id, CAST(count(distinct o1.consumer_id) AS STRING) as num from
> clickstream_output o1
>          where o1.file_date >= '20090216' and o1.file_date <= '20090217'
> and o1.event_id = 'buy' group by CAST(o1.merchandise_id AS STRING) )
> o_buyers
>      on (CAST(o_buyers.merchandise_id AS STRING) = CAST(o.merchandise_id AS
> STRING))
> where o.file_date >= '20090216' and o.file_date <= '20090217'
> *group by m.description, o_buyers.num* limit 40 ;
>
>
> It will be fixed by https://issues.apache.org/jira/browse/HIVE-405.
>
> Zheng
>
>
> On Wed, Apr 22, 2009 at 5:09 PM, Ashish Thusoo <at...@facebook.com>wrote:
>
>> Can you do an explain <query> and send us the plan.
>>
>> Another thing that you may try is to put the entire subquery in the from
>> clause and then do an aggregation on it..
>>
>> i.e.
>>
>> select ..., count(1)
>> from (select ...
>>         from clickstream_output o join .....) t
>> group by t.description, t.num limit 40;
>>
>> Also are you using the 0.3.0 release candidate?
>>
>> Ashish
>>
>> ________________________________________
>> From: Matt Pestritto [matt@pestritto.com]
>> Sent: Wednesday, April 22, 2009 3:53 PM
>> To: hive-user@hadoop.apache.org
>> Subject: Re: Aggregrate Query Fails.
>>
>> So the only change was to remove the column alias correct?
>> Still no luck.  Same result.
>>
>> On Wed, Apr 22, 2009 at 6:49 PM, Prasad Chakka <pchakka@facebook.com
>> <ma...@facebook.com>> wrote:
>> That is strange... Does below also fail?
>>
>> select  m.description, o_buyers.num , count(1) as total
>> from clickstream_output o
>>  join merchandise m on (o.merchandise_id = m.merchandise_id)
>>  left outer join ( select o1.merchandise_id, count(distinct
>> o1.consumer_id) as num from clickstream_output o1
>>         where o1.file_date >= '20090216' and o1.file_date <= '20090217'
>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>>     on (o_buyers.merchandise_id = o.merchandise_id)
>> where o.file_date >= '20090216' and o.file_date <= '20090217'
>> group by m.description, o_buyers.num limit 40 ;
>>
>>
>> ________________________________
>> From: Matt Pestritto <ma...@pestritto.com>>
>> Reply-To: <hive-user@hadoop.apache.org<http://hive-user@hadoop.apache.org
>> >>
>> Date: Wed, 22 Apr 2009 15:43:40 -0700
>> To: <hi...@hadoop.apache.org>>
>> Subject: Re: Aggregrate Query Fails.
>>
>>
>> Thanks.
>> I tried that also earlier:
>> FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference
>> buyers
>>
>> On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <aaa@cloudera.com<
>> http://aaa@cloudera.com>> wrote:
>>
>>
>> in the group by, try this instead:
>>
>> group by m.description, buyers limit 40 ;
>>
>> Matt Pestritto wrote:
>> Hi - I'm having a problem with a query below.  When I try to run any
>> aggregate function on a column from the sub-query, the job fails.
>> The queries and output messages are below.
>>
>> Suggestions?
>>
>> thanks in advance.
>>
>> -- works:  2 map-reduces jobs.
>> select m.description, o_buyers.num as buyers
>> from clickstream_output o
>>  join merchandise m on (o.merchandise_id = m.merchandise_id)
>>  left outer join ( select o1.merchandise_id, count(distinct
>> o1.consumer_id) as num from clickstream_output o1
>>         where o1.file_date >= '20090216' and o1.file_date <= '20090217'
>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>>     on (o_buyers.merchandise_id = o.merchandise_id)
>> where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;
>>
>> Successful output: PROD7362, 3
>>
>> -- fails: 3 map-reduce jobs - 2nd reduce fails.
>> select  m.description, o_buyers.num as buyers, count(1) as total  -- sum
>> or max(o_buyers.num) and removing from group by also fails.
>> from clickstream_output o
>>  join merchandise m on (o.merchandise_id = m.merchandise_id)
>>  left outer join ( select o1.merchandise_id, count(distinct
>> o1.consumer_id) as num from clickstream_output o1
>>         where o1.file_date >= '20090216' and o1.file_date <= '20090217'
>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>>     on (o_buyers.merchandise_id = o.merchandise_id)
>> where o.file_date >= '20090216' and o.file_date <= '20090217'
>>  group by m.description, o_buyers.num limit 40 ;
>>
>>
>>
>> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException:
>> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
>> cast to org.apache.hadoop.io.Text
>>
>>  at
>> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
>>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
>>  at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)
>>
>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
>> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
>> cast to org.apache.hadoop.io.Text
>>  at
>> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)
>>
>>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
>>  at
>> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
>>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
>>
>>  at
>> org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
>>  at
>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
>>  at
>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>>
>>  at
>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>>  at
>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>>  at
>> org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)
>>
>>  at
>> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
>>  at
>> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)
>>
>>
>>
>>
>>
>>
>
>
> --
> Yours,
> Zheng
>

Re: Aggregrate Query Fails.

Posted by Zheng Shao <zs...@gmail.com>.

Hi Jeff,

Thanks for the suggestion. That's a good idea.
I just added that to http://wiki.apache.org/hadoop/Hive/FAQ


Zheng

On Wed, Apr 22, 2009 at 8:22 PM, Jeff Hammerbacher <ha...@cloudera.com>wrote:

> Hey Zheng,
>
> Thanks for the insight. Perhaps these sorts of quirks could be added to the
> documentation on the wiki?
>
> Thanks,
> Jeff
>
>
> On Wed, Apr 22, 2009 at 6:35 PM, Zheng Shao <zs...@gmail.com> wrote:
>
>> Hi Matt,
>>
>> There is a known bug with JOIN -  all output columns from JOIN will become
>> STRING.
>> The stacktrace shows this is exactly because of that bug.
>>
>> The workaround right now is to replace all columns with CAST(xxx as
>> STRING), if the column type is not a STRING.
>>
>>
>> select  m.description, o_buyers.num as buyers*, count(1) as total*
>> from clickstream_output o
>>   join merchandise m on (CAST(o.merchandise_id AS STRING) =
>> CAST(m.merchandise_id AS STRING))
>>   left outer join ( select CAST(o1.merchandise_id AS STRING) as
>> merchandise_id, CAST(count(distinct o1.consumer_id) AS STRING) as num from
>> clickstream_output o1
>>          where o1.file_date >= '20090216' and o1.file_date <= '20090217'
>> and o1.event_id = 'buy' group by CAST(o1.merchandise_id AS STRING) )
>> o_buyers
>>      on (CAST(o_buyers.merchandise_id AS STRING) = CAST(o.merchandise_id
>> AS STRING))
>> where o.file_date >= '20090216' and o.file_date <= '20090217'
>> *group by m.description, o_buyers.num* limit 40 ;
>>
>>
>> It will be fixed by https://issues.apache.org/jira/browse/HIVE-405.
>>
>> Zheng
>>
>>
>> On Wed, Apr 22, 2009 at 5:09 PM, Ashish Thusoo <at...@facebook.com>wrote:
>>
>>> Can you do an explain <query> and send us the plan.
>>>
>>> Another thing that you may try is to put the entire subquery in the from
>>> clause and then do an aggregation on it..
>>>
>>> i.e.
>>>
>>> select ..., count(1)
>>> from (select ...
>>>         from clickstream_output o join .....) t
>>> group by t.description, t.num limit 40;
>>>
>>> Also are you using the 0.3.0 release candidate?
>>>
>>> Ashish
>>>
>>> ________________________________________
>>> From: Matt Pestritto [matt@pestritto.com]
>>> Sent: Wednesday, April 22, 2009 3:53 PM
>>> To: hive-user@hadoop.apache.org
>>> Subject: Re: Aggregrate Query Fails.
>>>
>>> So the only change was to remove the column alias correct?
>>> Still no luck.  Same result.
>>>
>>> On Wed, Apr 22, 2009 at 6:49 PM, Prasad Chakka <pchakka@facebook.com
>>> <ma...@facebook.com>> wrote:
>>> That is strange... Does below also fail?
>>>
>>> select  m.description, o_buyers.num , count(1) as total
>>> from clickstream_output o
>>>  join merchandise m on (o.merchandise_id = m.merchandise_id)
>>>  left outer join ( select o1.merchandise_id, count(distinct
>>> o1.consumer_id) as num from clickstream_output o1
>>>         where o1.file_date >= '20090216' and o1.file_date <= '20090217'
>>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>>>     on (o_buyers.merchandise_id = o.merchandise_id)
>>> where o.file_date >= '20090216' and o.file_date <= '20090217'
>>> group by m.description, o_buyers.num limit 40 ;
>>>
>>>
>>> ________________________________
>>> From: Matt Pestritto <ma...@pestritto.com>>
>>> Reply-To: <hive-user@hadoop.apache.org<http://hive-user@
>>> hadoop.apache.org>>
>>> Date: Wed, 22 Apr 2009 15:43:40 -0700
>>> To: <hi...@hadoop.apache.org>>
>>> Subject: Re: Aggregrate Query Fails.
>>>
>>>
>>> Thanks.
>>> I tried that also earlier:
>>> FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference
>>> buyers
>>>
>>> On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <aaa@cloudera.com<
>>> http://aaa@cloudera.com>> wrote:
>>>
>>>
>>> in the group by, try this instead:
>>>
>>> group by m.description, buyers limit 40 ;
>>>
>>> Matt Pestritto wrote:
>>> Hi - I'm having a problem with a query below.  When I try to run any
>>> aggregate function on a column from the sub-query, the job fails.
>>> The queries and output messages are below.
>>>
>>> Suggestions?
>>>
>>> thanks in advance.
>>>
>>> -- works:  2 map-reduces jobs.
>>> select m.description, o_buyers.num as buyers
>>> from clickstream_output o
>>>  join merchandise m on (o.merchandise_id = m.merchandise_id)
>>>  left outer join ( select o1.merchandise_id, count(distinct
>>> o1.consumer_id) as num from clickstream_output o1
>>>         where o1.file_date >= '20090216' and o1.file_date <= '20090217'
>>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>>>     on (o_buyers.merchandise_id = o.merchandise_id)
>>> where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;
>>>
>>> Successful output: PROD7362, 3
>>>
>>> -- fails: 3 map-reduce jobs - 2nd reduce fails.
>>> select  m.description, o_buyers.num as buyers, count(1) as total  -- sum
>>> or max(o_buyers.num) and removing from group by also fails.
>>> from clickstream_output o
>>>  join merchandise m on (o.merchandise_id = m.merchandise_id)
>>>  left outer join ( select o1.merchandise_id, count(distinct
>>> o1.consumer_id) as num from clickstream_output o1
>>>         where o1.file_date >= '20090216' and o1.file_date <= '20090217'
>>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>>>     on (o_buyers.merchandise_id = o.merchandise_id)
>>> where o.file_date >= '20090216' and o.file_date <= '20090217'
>>>  group by m.description, o_buyers.num limit 40 ;
>>>
>>>
>>>
>>> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException:
>>> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
>>> cast to org.apache.hadoop.io.Text
>>>
>>>  at
>>> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
>>>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
>>>  at
>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)
>>>
>>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
>>> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
>>> cast to org.apache.hadoop.io.Text
>>>  at
>>> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)
>>>
>>>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
>>>  at
>>> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
>>>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
>>>
>>>  at
>>> org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
>>>  at
>>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
>>>  at
>>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>>>
>>>  at
>>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>>>  at
>>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>>>  at
>>> org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)
>>>
>>>  at
>>> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
>>>  at
>>> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Yours,
>> Zheng
>>
>
>


-- 
Yours,
Zheng

Re: Aggregrate Query Fails.

Posted by Jeff Hammerbacher <ha...@cloudera.com>.

Hey Zheng,

Thanks for the insight. Perhaps these sorts of quirks could be added to the
documentation on the wiki?

Thanks,
Jeff

On Wed, Apr 22, 2009 at 6:35 PM, Zheng Shao <zs...@gmail.com> wrote:

> Hi Matt,
>
> There is a known bug with JOIN -  all output columns from JOIN will become
> STRING.
> The stacktrace shows this is exactly because of that bug.
>
> The workaround right now is to replace all columns with CAST(xxx as
> STRING), if the column type is not a STRING.
>
>
> select  m.description, o_buyers.num as buyers*, count(1) as total*
> from clickstream_output o
>   join merchandise m on (CAST(o.merchandise_id AS STRING) =
> CAST(m.merchandise_id AS STRING))
>   left outer join ( select CAST(o1.merchandise_id AS STRING) as
> merchandise_id, CAST(count(distinct o1.consumer_id) AS STRING) as num from
> clickstream_output o1
>          where o1.file_date >= '20090216' and o1.file_date <= '20090217'
> and o1.event_id = 'buy' group by CAST(o1.merchandise_id AS STRING) )
> o_buyers
>      on (CAST(o_buyers.merchandise_id AS STRING) = CAST(o.merchandise_id AS
> STRING))
> where o.file_date >= '20090216' and o.file_date <= '20090217'
> *group by m.description, o_buyers.num* limit 40 ;
>
>
> It will be fixed by https://issues.apache.org/jira/browse/HIVE-405.
>
> Zheng
>
>
> On Wed, Apr 22, 2009 at 5:09 PM, Ashish Thusoo <at...@facebook.com>wrote:
>
>> Can you do an explain <query> and send us the plan.
>>
>> Another thing that you may try is to put the entire subquery in the from
>> clause and then do an aggregation on it..
>>
>> i.e.
>>
>> select ..., count(1)
>> from (select ...
>>         from clickstream_output o join .....) t
>> group by t.description, t.num limit 40;
>>
>> Also are you using the 0.3.0 release candidate?
>>
>> Ashish
>>
>> ________________________________________
>> From: Matt Pestritto [matt@pestritto.com]
>> Sent: Wednesday, April 22, 2009 3:53 PM
>> To: hive-user@hadoop.apache.org
>> Subject: Re: Aggregrate Query Fails.
>>
>> So the only change was to remove the column alias correct?
>> Still no luck.  Same result.
>>
>> On Wed, Apr 22, 2009 at 6:49 PM, Prasad Chakka <pchakka@facebook.com
>> <ma...@facebook.com>> wrote:
>> That is strange... Does below also fail?
>>
>> select  m.description, o_buyers.num , count(1) as total
>> from clickstream_output o
>>  join merchandise m on (o.merchandise_id = m.merchandise_id)
>>  left outer join ( select o1.merchandise_id, count(distinct
>> o1.consumer_id) as num from clickstream_output o1
>>         where o1.file_date >= '20090216' and o1.file_date <= '20090217'
>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>>     on (o_buyers.merchandise_id = o.merchandise_id)
>> where o.file_date >= '20090216' and o.file_date <= '20090217'
>> group by m.description, o_buyers.num limit 40 ;
>>
>>
>> ________________________________
>> From: Matt Pestritto <ma...@pestritto.com>>
>> Reply-To: <hive-user@hadoop.apache.org<http://hive-user@hadoop.apache.org
>> >>
>> Date: Wed, 22 Apr 2009 15:43:40 -0700
>> To: <hi...@hadoop.apache.org>>
>> Subject: Re: Aggregrate Query Fails.
>>
>>
>> Thanks.
>> I tried that also earlier:
>> FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference
>> buyers
>>
>> On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <aaa@cloudera.com<
>> http://aaa@cloudera.com>> wrote:
>>
>>
>> in the group by, try this instead:
>>
>> group by m.description, buyers limit 40 ;
>>
>> Matt Pestritto wrote:
>> Hi - I'm having a problem with a query below.  When I try to run any
>> aggregate function on a column from the sub-query, the job fails.
>> The queries and output messages are below.
>>
>> Suggestions?
>>
>> thanks in advance.
>>
>> -- works:  2 map-reduces jobs.
>> select m.description, o_buyers.num as buyers
>> from clickstream_output o
>>  join merchandise m on (o.merchandise_id = m.merchandise_id)
>>  left outer join ( select o1.merchandise_id, count(distinct
>> o1.consumer_id) as num from clickstream_output o1
>>         where o1.file_date >= '20090216' and o1.file_date <= '20090217'
>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>>     on (o_buyers.merchandise_id = o.merchandise_id)
>> where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;
>>
>> Successful output: PROD7362, 3
>>
>> -- fails: 3 map-reduce jobs - 2nd reduce fails.
>> select  m.description, o_buyers.num as buyers, count(1) as total  -- sum
>> or max(o_buyers.num) and removing from group by also fails.
>> from clickstream_output o
>>  join merchandise m on (o.merchandise_id = m.merchandise_id)
>>  left outer join ( select o1.merchandise_id, count(distinct
>> o1.consumer_id) as num from clickstream_output o1
>>         where o1.file_date >= '20090216' and o1.file_date <= '20090217'
>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>>     on (o_buyers.merchandise_id = o.merchandise_id)
>> where o.file_date >= '20090216' and o.file_date <= '20090217'
>>  group by m.description, o_buyers.num limit 40 ;
>>
>>
>>
>> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException:
>> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
>> cast to org.apache.hadoop.io.Text
>>
>>  at
>> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
>>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
>>  at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)
>>
>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
>> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
>> cast to org.apache.hadoop.io.Text
>>  at
>> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)
>>
>>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
>>  at
>> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
>>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
>>
>>  at
>> org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
>>  at
>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
>>  at
>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>>
>>  at
>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>>  at
>> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>>  at
>> org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)
>>
>>  at
>> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
>>  at
>> org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)
>>
>>
>>
>>
>>
>>
>
>
> --
> Yours,
> Zheng
>

Re: Aggregrate Query Fails.

Posted by Zheng Shao <zs...@gmail.com>.

Hi Matt,

There is a known bug with JOIN -  all output columns from JOIN will become
STRING.
The stacktrace shows this is exactly because of that bug.

The workaround right now is to replace all columns with CAST(xxx as STRING),
if the column type is not a STRING.


select  m.description, o_buyers.num as buyers*, count(1) as total*
from clickstream_output o
  join merchandise m on (CAST(o.merchandise_id AS STRING) =
CAST(m.merchandise_id AS STRING))
  left outer join ( select CAST(o1.merchandise_id AS STRING) as
merchandise_id, CAST(count(distinct o1.consumer_id) AS STRING) as num from
clickstream_output o1
         where o1.file_date >= '20090216' and o1.file_date <= '20090217' and
o1.event_id = 'buy' group by CAST(o1.merchandise_id AS STRING) ) o_buyers
     on (CAST(o_buyers.merchandise_id AS STRING) = CAST(o.merchandise_id AS
STRING))
where o.file_date >= '20090216' and o.file_date <= '20090217'
*group by m.description, o_buyers.num* limit 40 ;


It will be fixed by https://issues.apache.org/jira/browse/HIVE-405.

Zheng

On Wed, Apr 22, 2009 at 5:09 PM, Ashish Thusoo <at...@facebook.com> wrote:

> Can you do an explain <query> and send us the plan.
>
> Another thing that you may try is to put the entire subquery in the from
> clause and then do an aggregation on it..
>
> i.e.
>
> select ..., count(1)
> from (select ...
>         from clickstream_output o join .....) t
> group by t.description, t.num limit 40;
>
> Also are you using the 0.3.0 release candidate?
>
> Ashish
>
> ________________________________________
> From: Matt Pestritto [matt@pestritto.com]
> Sent: Wednesday, April 22, 2009 3:53 PM
> To: hive-user@hadoop.apache.org
> Subject: Re: Aggregrate Query Fails.
>
> So the only change was to remove the column alias correct?
> Still no luck.  Same result.
>
> On Wed, Apr 22, 2009 at 6:49 PM, Prasad Chakka <pchakka@facebook.com
> <ma...@facebook.com>> wrote:
> That is strange... Does below also fail?
>
> select  m.description, o_buyers.num , count(1) as total
> from clickstream_output o
>  join merchandise m on (o.merchandise_id = m.merchandise_id)
>  left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id)
> as num from clickstream_output o1
>         where o1.file_date >= '20090216' and o1.file_date <= '20090217' and
> o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>     on (o_buyers.merchandise_id = o.merchandise_id)
> where o.file_date >= '20090216' and o.file_date <= '20090217'
> group by m.description, o_buyers.num limit 40 ;
>
>
> ________________________________
> From: Matt Pestritto <ma...@pestritto.com>>
> Reply-To: <hive-user@hadoop.apache.org<http://hive-user@hadoop.apache.org
> >>
> Date: Wed, 22 Apr 2009 15:43:40 -0700
> To: <hi...@hadoop.apache.org>>
> Subject: Re: Aggregrate Query Fails.
>
>
> Thanks.
> I tried that also earlier:
> FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference
> buyers
>
> On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <aaa@cloudera.com<
> http://aaa@cloudera.com>> wrote:
>
>
> in the group by, try this instead:
>
> group by m.description, buyers limit 40 ;
>
> Matt Pestritto wrote:
> Hi - I'm having a problem with a query below.  When I try to run any
> aggregate function on a column from the sub-query, the job fails.
> The queries and output messages are below.
>
> Suggestions?
>
> thanks in advance.
>
> -- works:  2 map-reduces jobs.
> select m.description, o_buyers.num as buyers
> from clickstream_output o
>  join merchandise m on (o.merchandise_id = m.merchandise_id)
>  left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id)
> as num from clickstream_output o1
>         where o1.file_date >= '20090216' and o1.file_date <= '20090217' and
> o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>     on (o_buyers.merchandise_id = o.merchandise_id)
> where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;
>
> Successful output: PROD7362, 3
>
> -- fails: 3 map-reduce jobs - 2nd reduce fails.
> select  m.description, o_buyers.num as buyers, count(1) as total  -- sum or
> max(o_buyers.num) and removing from group by also fails.
> from clickstream_output o
>  join merchandise m on (o.merchandise_id = m.merchandise_id)
>  left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id)
> as num from clickstream_output o1
>         where o1.file_date >= '20090216' and o1.file_date <= '20090217' and
> o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>     on (o_buyers.merchandise_id = o.merchandise_id)
> where o.file_date >= '20090216' and o.file_date <= '20090217'
>  group by m.description, o_buyers.num limit 40 ;
>
>
>
> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
> cast to org.apache.hadoop.io.Text
>
>  at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
>  at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)
>
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
> cast to org.apache.hadoop.io.Text
>  at
> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)
>
>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
>  at
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
>
>  at
> org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
>  at
> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
>  at
> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>
>  at
> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>  at
> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>  at
> org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)
>
>  at
> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
>  at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)
>
>
>
>
>
>


-- 
Yours,
Zheng

RE: Aggregrate Query Fails.

Posted by Ashish Thusoo <at...@facebook.com>.

Can you do an explain <query> and send us the plan. 

Another thing that you may try is to put the entire subquery in the from clause and then do an aggregation on it..

i.e.

select ..., count(1)
from (select ...
         from clickstream_output o join .....) t
group by t.description, t.num limit 40;

Also are you using the 0.3.0 release candidate?

Ashish

________________________________________
From: Matt Pestritto [matt@pestritto.com]
Sent: Wednesday, April 22, 2009 3:53 PM
To: hive-user@hadoop.apache.org
Subject: Re: Aggregrate Query Fails.

So the only change was to remove the column alias correct?
Still no luck.  Same result.

On Wed, Apr 22, 2009 at 6:49 PM, Prasad Chakka <pc...@facebook.com>> wrote:
That is strange... Does below also fail?

select  m.description, o_buyers.num , count(1) as total
from clickstream_output o
  join merchandise m on (o.merchandise_id = m.merchandise_id)
  left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1
         where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
     on (o_buyers.merchandise_id = o.merchandise_id)
where o.file_date >= '20090216' and o.file_date <= '20090217'
group by m.description, o_buyers.num limit 40 ;


________________________________
From: Matt Pestritto <ma...@pestritto.com>>
Reply-To: <hi...@hadoop.apache.org>>
Date: Wed, 22 Apr 2009 15:43:40 -0700
To: <hi...@hadoop.apache.org>>
Subject: Re: Aggregrate Query Fails.


Thanks.
I tried that also earlier:
FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference buyers

On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <aa...@cloudera.com>> wrote:


in the group by, try this instead:

group by m.description, buyers limit 40 ;

Matt Pestritto wrote:
Hi - I'm having a problem with a query below.  When I try to run any aggregate function on a column from the sub-query, the job fails.
The queries and output messages are below.

Suggestions?

thanks in advance.

-- works:  2 map-reduces jobs.
select m.description, o_buyers.num as buyers
from clickstream_output o
  join merchandise m on (o.merchandise_id = m.merchandise_id)
  left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1
         where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
     on (o_buyers.merchandise_id = o.merchandise_id)
where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;

Successful output: PROD7362, 3

-- fails: 3 map-reduce jobs - 2nd reduce fails.
select  m.description, o_buyers.num as buyers, count(1) as total  -- sum or max(o_buyers.num) and removing from group by also fails.
from clickstream_output o
  join merchandise m on (o.merchandise_id = m.merchandise_id)
  left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1
         where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
     on (o_buyers.merchandise_id = o.merchandise_id)
where o.file_date >= '20090216' and o.file_date <= '20090217'
 group by m.description, o_buyers.num limit 40 ;



java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text

 at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
 at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
 at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)

 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
 at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)

 at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
 at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
 at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)

 at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
 at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
 at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)

 at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
 at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)

Re: Aggregrate Query Fails.

Posted by Matt Pestritto <ma...@pestritto.com>.

So the only change was to remove the column alias correct?
Still no luck.  Same result.

On Wed, Apr 22, 2009 at 6:49 PM, Prasad Chakka <pc...@facebook.com> wrote:

>  That is strange... Does below also fail?
>
> select  m.description,* o_buyers.num ,* count(1) as total
>
> from clickstream_output o
>   join merchandise m on (o.merchandise_id = m.merchandise_id)
>   left outer join ( select o1.merchandise_id, count(distinct
> o1.consumer_id) as num from clickstream_output o1
>          where o1.file_date >= '20090216' and o1.file_date <= '20090217'
> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>      on (o_buyers.merchandise_id = o.merchandise_id)
> where o.file_date >= '20090216' and o.file_date <= '20090217'
>
> group by m.description, o_buyers.num limit 40 ;
>
>
> ------------------------------
> *From: *Matt Pestritto <ma...@pestritto.com>
> *Reply-To: *<hi...@hadoop.apache.org>
> *Date: *Wed, 22 Apr 2009 15:43:40 -0700
> *To: *<hi...@hadoop.apache.org>
> *Subject: *Re: Aggregrate Query Fails.
>
>
> Thanks.
> I tried that also earlier:
> FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference
> buyers
>
> On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <aa...@cloudera.com> wrote:
>
>
>
> in the group by, try this instead:
>
> *group by m.description, buyers* limit 40 ;
>
> Matt Pestritto wrote:
>
> Hi - I'm having a problem with a query below.  When I try to run any
> aggregate function on a column from the sub-query, the job fails.
> The queries and output messages are below.
>
> Suggestions?
>
> thanks in advance.
>
> -- works:  2 map-reduces jobs.
> select m.description, o_buyers.num as buyers
> from clickstream_output o
>   join merchandise m on (o.merchandise_id = m.merchandise_id)
>   left outer join ( select o1.merchandise_id, count(distinct
> o1.consumer_id) as num from clickstream_output o1
>          where o1.file_date >= '20090216' and o1.file_date <= '20090217'
> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>      on (o_buyers.merchandise_id = o.merchandise_id)
> where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;
>
> Successful output: PROD7362, 3
>
> -- fails: 3 map-reduce jobs - 2nd reduce fails.
> select  m.description, o_buyers.num as buyers*, count(1) as total*  -- sum
> or max(o_buyers.num) and removing from group by also fails.
> from clickstream_output o
>   join merchandise m on (o.merchandise_id = m.merchandise_id)
>   left outer join ( select o1.merchandise_id, count(distinct
> o1.consumer_id) as num from clickstream_output o1
>          where o1.file_date >= '20090216' and o1.file_date <= '20090217'
> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>      on (o_buyers.merchandise_id = o.merchandise_id)
> where o.file_date >= '20090216' and o.file_date <= '20090217'
>  *group by m.description, o_buyers.num* limit 40 ;
>
>
>
> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
> cast to org.apache.hadoop.io.Text
>
>  at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
>  at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)
>
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
> cast to org.apache.hadoop.io.Text
>  at
> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)
>
>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
>  at
> org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
>  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
>
>  at
> org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
>  at
> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
>  at
> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>
>  at
> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>  at
> org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>  at
> org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)
>
>  at
> org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
>  at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)
>
>
>
>
>
>

Re: Aggregrate Query Fails.

Posted by Matt Pestritto <ma...@pestritto.com>.

Taking a look at the explain, the first stage of both queries are
identical.  The only differences are in the reduce in the 2nd stage.
I'm not sure if this helps or not.
Thanks.

Working Reduce:
      Reduce Operator Tree:
        Join Operator
          condition map:
               Inner Join 0 to 1
               Left Outer Join0 to 2
          condition expressions:
            0 {VALUE.0} {VALUE.1}
            1 {VALUE.0} {VALUE.1}
            2 {VALUE.0} {VALUE.1}
          Filter Operator
            predicate:
                expr: ((1 >= '20090216') and (1 <= '20090217'))
                type: boolean
            Select Operator
              expressions:
                    expr: 3
                    type: string
                    expr: 5
                    type: bigint
              Limit
                File Output Operator
                  compressed: false
                  GlobalTableId: 0
                  table:
                      input format: org.apache.hadoop.mapred.TextInputFormat
                      output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

Failed Reduce:
     Reduce Operator Tree:
        Join Operator
          condition map:
               Inner Join 0 to 1
               Left Outer Join0 to 2
          condition expressions:
            0 {VALUE.0} {VALUE.1}
            1 {VALUE.0} {VALUE.1}
            2 {VALUE.0} {VALUE.1}
          Filter Operator
            predicate:
                expr: ((1 >= '20090216') and (1 <= '20090217'))
                type: boolean
            Group By Operator
              aggregations:
                    expr: count(1)
              keys:
                    expr: 3
                    type: string
                    expr: 5
                    type: bigint
              mode: hash
              File Output Operator
                compressed: false
                GlobalTableId: 0
                table:
                    input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                    name: binary_table



On Wed, Apr 22, 2009 at 6:43 PM, Matt Pestritto <ma...@pestritto.com> wrote:

> Thanks.
> I tried that also earlier:
> FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference
> buyers
>
>
> On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <aa...@cloudera.com> wrote:
>
>>
>> in the group by, try this instead:
>>
>> *group by m.description, buyers* limit 40 ;
>>
>> Matt Pestritto wrote:
>>
>> Hi - I'm having a problem with a query below.  When I try to run any
>> aggregate function on a column from the sub-query, the job fails.
>> The queries and output messages are below.
>>
>> Suggestions?
>>
>> thanks in advance.
>>
>> -- works:  2 map-reduces jobs.
>> select m.description, o_buyers.num as buyers
>> from clickstream_output o
>>   join merchandise m on (o.merchandise_id = m.merchandise_id)
>>   left outer join ( select o1.merchandise_id, count(distinct
>> o1.consumer_id) as num from clickstream_output o1
>>          where o1.file_date >= '20090216' and o1.file_date <= '20090217'
>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>>      on (o_buyers.merchandise_id = o.merchandise_id)
>> where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;
>>
>> Successful output: PROD7362, 3
>>
>> -- fails: 3 map-reduce jobs - 2nd reduce fails.
>> select  m.description, o_buyers.num as buyers*, count(1) as total*  --
>> sum or max(o_buyers.num) and removing from group by also fails.
>> from clickstream_output o
>>   join merchandise m on (o.merchandise_id = m.merchandise_id)
>>   left outer join ( select o1.merchandise_id, count(distinct
>> o1.consumer_id) as num from clickstream_output o1
>>          where o1.file_date >= '20090216' and o1.file_date <= '20090217'
>> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>>      on (o_buyers.merchandise_id = o.merchandise_id)
>> where o.file_date >= '20090216' and o.file_date <= '20090217'
>> *group by m.description, o_buyers.num* limit 40 ;
>>
>>
>> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
>>
>> 	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
>> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
>> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)
>>
>> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
>> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)
>>
>> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
>> 	at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
>> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
>>
>> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
>> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
>> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>>
>> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)
>>
>> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
>> 	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)
>>
>>
>>
>>
>

Re: Aggregrate Query Fails.

Posted by Prasad Chakka <pc...@facebook.com>.

That is strange... Does below also fail?

select  m.description, o_buyers.num , count(1) as total
from clickstream_output o
  join merchandise m on (o.merchandise_id = m.merchandise_id)
  left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1
         where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
     on (o_buyers.merchandise_id = o.merchandise_id)
where o.file_date >= '20090216' and o.file_date <= '20090217'
group by m.description, o_buyers.num limit 40 ;


________________________________
From: Matt Pestritto <ma...@pestritto.com>
Reply-To: <hi...@hadoop.apache.org>
Date: Wed, 22 Apr 2009 15:43:40 -0700
To: <hi...@hadoop.apache.org>
Subject: Re: Aggregrate Query Fails.

Thanks.
I tried that also earlier:
FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference buyers

On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <aa...@cloudera.com> wrote:


in the group by, try this instead:

group by m.description, buyers limit 40 ;

Matt Pestritto wrote:
Hi - I'm having a problem with a query below.  When I try to run any aggregate function on a column from the sub-query, the job fails.
The queries and output messages are below.

Suggestions?

thanks in advance.

-- works:  2 map-reduces jobs.
select m.description, o_buyers.num as buyers
from clickstream_output o
  join merchandise m on (o.merchandise_id = m.merchandise_id)
  left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1
         where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
     on (o_buyers.merchandise_id = o.merchandise_id)
where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;

Successful output: PROD7362, 3

-- fails: 3 map-reduce jobs - 2nd reduce fails.
select  m.description, o_buyers.num as buyers, count(1) as total  -- sum or max(o_buyers.num) and removing from group by also fails.
from clickstream_output o
  join merchandise m on (o.merchandise_id = m.merchandise_id)
  left outer join ( select o1.merchandise_id, count(distinct o1.consumer_id) as num from clickstream_output o1
         where o1.file_date >= '20090216' and o1.file_date <= '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
     on (o_buyers.merchandise_id = o.merchandise_id)
where o.file_date >= '20090216' and o.file_date <= '20090217'
 group by m.description, o_buyers.num limit 40 ;



java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text

 at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
 at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
 at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)

 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
 at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)

 at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
 at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
 at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)

 at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
 at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
 at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)

 at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
 at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)

Re: Aggregrate Query Fails.

Posted by Matt Pestritto <ma...@pestritto.com>.

Thanks.
I tried that also earlier:
FAILED: Error in semantic analysis: line 9:24 Invalid Column Reference
buyers

On Wed, Apr 22, 2009 at 6:40 PM, Amr Awadallah <aa...@cloudera.com> wrote:

>
> in the group by, try this instead:
>
> *group by m.description, buyers* limit 40 ;
>
> Matt Pestritto wrote:
>
> Hi - I'm having a problem with a query below.  When I try to run any
> aggregate function on a column from the sub-query, the job fails.
> The queries and output messages are below.
>
> Suggestions?
>
> thanks in advance.
>
> -- works:  2 map-reduces jobs.
> select m.description, o_buyers.num as buyers
> from clickstream_output o
>   join merchandise m on (o.merchandise_id = m.merchandise_id)
>   left outer join ( select o1.merchandise_id, count(distinct
> o1.consumer_id) as num from clickstream_output o1
>          where o1.file_date >= '20090216' and o1.file_date <= '20090217'
> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>      on (o_buyers.merchandise_id = o.merchandise_id)
> where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;
>
> Successful output: PROD7362, 3
>
> -- fails: 3 map-reduce jobs - 2nd reduce fails.
> select  m.description, o_buyers.num as buyers*, count(1) as total*  -- sum
> or max(o_buyers.num) and removing from group by also fails.
> from clickstream_output o
>   join merchandise m on (o.merchandise_id = m.merchandise_id)
>   left outer join ( select o1.merchandise_id, count(distinct
> o1.consumer_id) as num from clickstream_output o1
>          where o1.file_date >= '20090216' and o1.file_date <= '20090217'
> and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>      on (o_buyers.merchandise_id = o.merchandise_id)
> where o.file_date >= '20090216' and o.file_date <= '20090217'
> *group by m.description, o_buyers.num* limit 40 ;
>
>
> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
>
> 	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)
>
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)
>
> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
> 	at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
>
> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>
> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)
>
> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
> 	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)
>
>
>
>

Re: Aggregrate Query Fails.

Posted by Amr Awadallah <aa...@cloudera.com>.

in the group by, try this instead:

*group by m.description, buyers* limit 40 ;

Matt Pestritto wrote:
> Hi - I'm having a problem with a query below.  When I try to run any 
> aggregate function on a column from the sub-query, the job fails. 
> The queries and output messages are below. 
>
> Suggestions?
>
> thanks in advance.
>
> -- works:  2 map-reduces jobs.
> select m.description, o_buyers.num as buyers
> from clickstream_output o
>   join merchandise m on (o.merchandise_id = m.merchandise_id)
>   left outer join ( select o1.merchandise_id, count(distinct 
> o1.consumer_id) as num from clickstream_output o1
>          where o1.file_date >= '20090216' and o1.file_date <= 
> '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>      on (o_buyers.merchandise_id = o.merchandise_id)
> where o.file_date >= '20090216' and o.file_date <= '20090217' limit 40;
>
> Successful output: PROD7362, 3
>
> -- fails: 3 map-reduce jobs - 2nd reduce fails.
> select  m.description, o_buyers.num as buyers*, count(1) as total*  -- 
> sum or max(o_buyers.num) and removing from group by also fails.
> from clickstream_output o
>   join merchandise m on (o.merchandise_id = m.merchandise_id)
>   left outer join ( select o1.merchandise_id, count(distinct 
> o1.consumer_id) as num from clickstream_output o1
>          where o1.file_date >= '20090216' and o1.file_date <= 
> '20090217' and o1.event_id = 'buy' group by o1.merchandise_id ) o_buyers
>      on (o_buyers.merchandise_id = o.merchandise_id)
> where o.file_date >= '20090216' and o.file_date <= '20090217'
> *group by m.description, o_buyers.num* limit 40 ;
>
>
> java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
>
> 	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:172)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)
>
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:489)
>
> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
> 	at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:69)
> 	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:353)
>
> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.createForwardJoinObject(JoinOperator.java:273)
> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:516)
> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
>
> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.genObject(JoinOperator.java:505)
> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.checkAndGenObject(JoinOperator.java:546)
>
> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:528)
> 	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:143)
>   
>