You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Eva Tse <et...@netflix.com> on 2009/07/17 20:06:46 UTC

Error in running group-by and join hive query...

Hive version:  r786648 w/ HIVE-487 2nd patch.

However, it is working on Hive 0.3.

Thanks,
Eva.

Running the script in this email gives the following errors:

Hive history 
file=/tmp/dataeng/hive_job_log_dataeng_200907171359_1511035858.txt
OK
Time taken: 3.419 seconds
OK
Time taken: 0.211 seconds
OK
Time taken: 0.364 seconds
OK
Time taken: 0.104 seconds
Total MapReduce jobs = 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Job Submission failed with exception 'java.io.IOException(cannot find dir =
hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in
partToPartitionInfo!)'
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.ExecDriver

Script:

drop table facts_details;
drop table facts;

CREATE TABLE facts
(xid string, 
devtype_id int)
PARTITIONED by (dateint int, hour int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED
BY '\004' MAP KEYS TERMINATED BY '\002' stored as SEQUENCEFILE;

CREATE TABLE facts_details
(xid string,
cdn_name string,
utc_ms array<bigint>,
moff array<int>)
PARTITIONED by (dateint int, hour int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED
BY '\004' MAP KEYS TERMINATED BY '\002' stored as SEQUENCEFILE;


select f.devtype_id from facts f join facts_details c on (f.xid = c.xid)
where c.dateint = 20090710 and f.dateint = 20090710
group by f.devtype_id;




RE: Error in running group-by and join hive query...

Posted by Ashish Thusoo <at...@facebook.com>.
We tried to run the test case that you gave in your email and it seems to work fine. I am running this on r793646. Can you try with that?

Thanks,
Ashish

________________________________
From: Eva Tse [mailto:etse@netflix.com]
Sent: Friday, July 17, 2009 3:33 PM
To: hive-user@hadoop.apache.org
Subject: Re: Error in running group-by and join hive query...

Ashish, it is in the attached file.

Thanks,
Eva.


On 7/17/09 1:27 PM, "Ashish Thusoo" <at...@facebook.com> wrote:

Looks like pathToPartitionInfo array did not get populated in your case.

Can you also send the output of

explain extended <query>

That will tell us the value of pathToPartitionInfo.

Ashish

________________________________
From: Eva Tse [mailto:etse@netflix.com]
Sent: Friday, July 17, 2009 12:24 PM
To: hive-user@hadoop.apache.org
Subject: Re: Error in running group-by and join hive query...

I believe this is the relevant section. Please let me know if we need add'l info.

Thanks,
Eva.

2009-07-17 13:59:30,953 ERROR ql.Driver (SessionState.java:printError(279)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ExecDriver
2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved.
2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved.
2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved.
2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved.
2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved.
2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved.
2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved.
2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved.
2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved.
2009-07-17 15:16:05,605 WARN  mapred.JobClient (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2009-07-17 15:16:05,814 ERROR exec.ExecDriver (SessionState.java:printError(279)) - Job Submission failed with exception 'java.io.IOException(cannot find dir = hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in partToPartitionInfo!)'
java.io.IOException: cannot find dir = hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in partToPartitionInfo!
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getTableDescFromPath(HiveInputFormat.java:256)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:208)
        at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
        at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:387)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:307)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:213)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:176)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:216)
        at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:234)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:278)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

2009-07-17 15:16:05,821 ERROR ql.Driver (SessionState.java:printError(279)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ExecDriver



On 7/17/09 12:02 PM, "Ashish Thusoo" <at...@facebook.com> wrote:

what does

/tmp/<username>/hive.log contain?

Ashish


________________________________
From: Eva Tse [mailto:etse@netflix.com]
Sent:  Friday, July 17, 2009 11:07 AM
To: hive-user@hadoop.apache.org
Subject:  Error in running group-by and join hive query...

Hive version:  r786648 w/ HIVE-487 2nd patch.

However, it is  working on Hive 0.3.

Thanks,
Eva.

Running the script in this email gives the  following errors:

Hive history  file=/tmp/dataeng/hive_job_log_dataeng_200907171359_1511035858.txt
OK
Time  taken: 3.419 seconds
OK
Time taken: 0.211 seconds
OK
Time taken:  0.364 seconds
OK
Time taken: 0.104 seconds
Total MapReduce jobs =  2
Number of reduce tasks not specified. Estimated from input data size:  1
In order to change the average load for a reducer (in  bytes):
  set  hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the  maximum number of reducers:
  set  hive.exec.reducers.max=<number>
In order to set a constant number of  reducers:
  set mapred.reduce.tasks=<number>
Job  Submission failed with exception 'java.io.IOException(cannot find dir =  hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in  partToPartitionInfo!)'
FAILED: Execution Error, return code 1 from  org.apache.hadoop.hive.ql.exec.ExecDriver

Script:

drop  table facts_details;
drop table facts;

CREATE TABLE facts
(xid  string,
devtype_id int)
PARTITIONED by (dateint int, hour int)
ROW  FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED BY  '\004' MAP KEYS TERMINATED BY '\002' stored as SEQUENCEFILE;

CREATE  TABLE facts_details
(xid string,
cdn_name string,
utc_ms  array<bigint>,
moff array<int>)
PARTITIONED by (dateint int,  hour int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION  ITEMS TERMINATED BY '\004' MAP KEYS TERMINATED BY '\002' stored as  SEQUENCEFILE;


select f.devtype_id from facts f join facts_details c  on (f.xid = c.xid)
where c.dateint = 20090710 and f.dateint = 20090710
group by f.devtype_id;





Re: Error in running group-by and join hive query...

Posted by Eva Tse <et...@netflix.com>.
Ashish, it is in the attached file.

Thanks,
Eva.


On 7/17/09 1:27 PM, "Ashish Thusoo" <at...@facebook.com> wrote:

> Looks like pathToPartitionInfo array did not get populated in your case.
>  
> Can you also send the output of
>  
> explain extended <query>
>  
> That will tell us the value of pathToPartitionInfo.
>  
> Ashish
> 
> 
> From: Eva Tse [mailto:etse@netflix.com]
> Sent: Friday, July 17, 2009 12:24 PM
> To: hive-user@hadoop.apache.org
> Subject: Re: Error in running group-by and join hive query...
> 
> I believe this is the relevant section. Please let me know if we need add¹l
> info.
> 
> Thanks,
> Eva.
> 
> 2009-07-17 13:59:30,953 ERROR ql.Driver (SessionState.java:printError(279)) -
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.ExecDriver
> 2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
> Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it
> cannot be resolved.
> 2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
> Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it
> cannot be resolved.
> 2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
> Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it
> cannot be resolved.
> 2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
> Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it
> cannot be resolved.
> 2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
> Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it
> cannot be resolved.
> 2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
> Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it
> cannot be resolved.
> 2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
> Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be
> resolved.
> 2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
> Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be
> resolved.
> 2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
> Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be
> resolved.
> 2009-07-17 15:16:05,605 WARN  mapred.JobClient
> (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser
> for parsing the arguments. Applications should implement Tool for the same.
> 2009-07-17 15:16:05,814 ERROR exec.ExecDriver
> (SessionState.java:printError(279)) - Job Submission failed with exception
> 'java.io.IOException(cannot find dir =
> hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in
> partToPartitionInfo!)'
> java.io.IOException: cannot find dir =
> hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in
> partToPartitionInfo!
>         at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getTableDescFromPath(HiveInputFor
> mat.java:256)
>         at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:20
> 8)
>         at 
> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
>         at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
>         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
>         at 
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:387)
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:307)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:213)
>         at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:176)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:216)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:234)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:278)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j
> ava:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> 
> 2009-07-17 15:16:05,821 ERROR ql.Driver (SessionState.java:printError(279)) -
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.ExecDriver
> 
> 
> 
> On 7/17/09 12:02 PM, "Ashish Thusoo" <at...@facebook.com> wrote:
> 
>> what does 
>> 
>> /tmp/<username>/hive.log contain?
>> 
>> Ashish
>> 
>>  
>> 
>>  From: Eva Tse [mailto:etse@netflix.com]
>> Sent:  Friday, July 17, 2009 11:07 AM
>> To: hive-user@hadoop.apache.org
>> Subject:  Error in running group-by and join hive query...
>> 
>> Hive version:  r786648 w/ HIVE-487 2nd patch.
>> 
>> However, it is  working on Hive 0.3.
>> 
>> Thanks,
>> Eva.
>> 
>> Running the script in this email gives the  following errors:
>> 
>> Hive history  
>> file=/tmp/dataeng/hive_job_log_dataeng_200907171359_1511035858.txt
>> OK
>> Time  taken: 3.419 seconds
>> OK
>> Time taken: 0.211 seconds
>> OK
>> Time taken:  0.364 seconds
>> OK
>> Time taken: 0.104 seconds
>> Total MapReduce jobs =  2
>> Number of reduce tasks not specified. Estimated from input data size:  1
>> In order to change the average load for a reducer (in  bytes):
>>   set  hive.exec.reducers.bytes.per.reducer=<number>
>> In order to limit the  maximum number of reducers:
>>   set  hive.exec.reducers.max=<number>
>> In order to set a constant number of  reducers:
>>   set mapred.reduce.tasks=<number>
>> Job  Submission failed with exception 'java.io.IOException(cannot find dir =
>> hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in
>> partToPartitionInfo!)'
>> FAILED: Execution Error, return code 1 from
>> org.apache.hadoop.hive.ql.exec.ExecDriver
>> 
>> Script:
>> 
>> drop  table facts_details;
>> drop table facts;
>> 
>> CREATE TABLE facts
>> (xid  string, 
>> devtype_id int)
>> PARTITIONED by (dateint int, hour int)
>> ROW  FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED
>> BY  '\004' MAP KEYS TERMINATED BY '\002' stored as SEQUENCEFILE;
>> 
>> CREATE  TABLE facts_details
>> (xid string,
>> cdn_name string,
>> utc_ms  array<bigint>,
>> moff array<int>)
>> PARTITIONED by (dateint int,  hour int)
>> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION  ITEMS TERMINATED
>> BY '\004' MAP KEYS TERMINATED BY '\002' stored as  SEQUENCEFILE;
>> 
>> 
>> select f.devtype_id from facts f join facts_details c  on (f.xid = c.xid)
>> where c.dateint = 20090710 and f.dateint = 20090710
>> group by f.devtype_id;
>> 
>> 
>> 
> 


RE: Error in running group-by and join hive query...

Posted by Ashish Thusoo <at...@facebook.com>.
Looks like pathToPartitionInfo array did not get populated in your case.

Can you also send the output of

explain extended <query>

That will tell us the value of pathToPartitionInfo.

Ashish

________________________________
From: Eva Tse [mailto:etse@netflix.com]
Sent: Friday, July 17, 2009 12:24 PM
To: hive-user@hadoop.apache.org
Subject: Re: Error in running group-by and join hive query...

I believe this is the relevant section. Please let me know if we need add'l info.

Thanks,
Eva.

2009-07-17 13:59:30,953 ERROR ql.Driver (SessionState.java:printError(279)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ExecDriver
2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved.
2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved.
2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved.
2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved.
2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved.
2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved.
2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved.
2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved.
2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved.
2009-07-17 15:16:05,605 WARN  mapred.JobClient (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2009-07-17 15:16:05,814 ERROR exec.ExecDriver (SessionState.java:printError(279)) - Job Submission failed with exception 'java.io.IOException(cannot find dir = hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in partToPartitionInfo!)'
java.io.IOException: cannot find dir = hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in partToPartitionInfo!
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getTableDescFromPath(HiveInputFormat.java:256)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:208)
        at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
        at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:387)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:307)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:213)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:176)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:216)
        at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:234)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:278)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

2009-07-17 15:16:05,821 ERROR ql.Driver (SessionState.java:printError(279)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ExecDriver



On 7/17/09 12:02 PM, "Ashish Thusoo" <at...@facebook.com> wrote:

what does

/tmp/<username>/hive.log contain?

Ashish

________________________________
From: Eva Tse [mailto:etse@netflix.com]
Sent: Friday, July 17, 2009 11:07 AM
To: hive-user@hadoop.apache.org
Subject: Error in running group-by and join hive query...

Hive version: r786648 w/ HIVE-487 2nd patch.

However, it is working on Hive 0.3.

Thanks,
Eva.

Running the script in this email gives the following errors:

Hive history file=/tmp/dataeng/hive_job_log_dataeng_200907171359_1511035858.txt
OK
Time taken: 3.419 seconds
OK
Time taken: 0.211 seconds
OK
Time taken: 0.364 seconds
OK
Time taken: 0.104 seconds
Total MapReduce jobs = 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Job Submission failed with exception 'java.io.IOException(cannot find dir = hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in partToPartitionInfo!)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ExecDriver

Script:

drop table facts_details;
drop table facts;

CREATE TABLE facts
(xid string,
devtype_id int)
PARTITIONED by (dateint int, hour int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED BY '\004' MAP KEYS TERMINATED BY '\002' stored as SEQUENCEFILE;

CREATE TABLE facts_details
(xid string,
cdn_name string,
utc_ms array<bigint>,
moff array<int>)
PARTITIONED by (dateint int, hour int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED BY '\004' MAP KEYS TERMINATED BY '\002' stored as SEQUENCEFILE;


select f.devtype_id from facts f join facts_details c on (f.xid = c.xid)
where c.dateint = 20090710 and f.dateint = 20090710
group by f.devtype_id;




Re: Error in running group-by and join hive query...

Posted by Eva Tse <et...@netflix.com>.
I believe this is the relevant section. Please let me know if we need add¹l
info.

Thanks,
Eva.

2009-07-17 13:59:30,953 ERROR ql.Driver (SessionState.java:printError(279))
- FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.ExecDriver
2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it
cannot be resolved.
2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it
cannot be resolved.
2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it
cannot be resolved.
2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it
cannot be resolved.
2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it
cannot be resolved.
2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it
cannot be resolved.
2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be
resolved.
2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be
resolved.
2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be
resolved.
2009-07-17 15:16:05,605 WARN  mapred.JobClient
(JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser
for parsing the arguments. Applications should implement Tool for the same.
2009-07-17 15:16:05,814 ERROR exec.ExecDriver
(SessionState.java:printError(279)) - Job Submission failed with exception
'java.io.IOException(cannot find dir =
hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in
partToPartitionInfo!)'
java.io.IOException: cannot find dir =
hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in
partToPartitionInfo!
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getTableDescFromPath(HiveInputF
ormat.java:256)
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:
208)
        at 
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
        at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
        at 
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:387)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:307)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:213)
        at 
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:176)
        at 
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:216)
        at 
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:234)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:278)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

2009-07-17 15:16:05,821 ERROR ql.Driver (SessionState.java:printError(279))
- FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.ExecDriver



On 7/17/09 12:02 PM, "Ashish Thusoo" <at...@facebook.com> wrote:

> what does 
>  
> /tmp/<username>/hive.log contain?
>  
> Ashish
> 
> 
> From: Eva Tse [mailto:etse@netflix.com]
> Sent: Friday, July 17, 2009 11:07 AM
> To: hive-user@hadoop.apache.org
> Subject: Error in running group-by and join hive query...
> 
> Hive version: r786648 w/ HIVE-487 2nd patch.
> 
> However, it is working on Hive 0.3.
> 
> Thanks,
> Eva.
> 
> Running the script in this email gives the following errors:
> 
> Hive history 
> file=/tmp/dataeng/hive_job_log_dataeng_200907171359_1511035858.txt
> OK
> Time taken: 3.419 seconds
> OK
> Time taken: 0.211 seconds
> OK
> Time taken: 0.364 seconds
> OK
> Time taken: 0.104 seconds
> Total MapReduce jobs = 2
> Number of reduce tasks not specified. Estimated from input data size: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
> Job Submission failed with exception 'java.io.IOException(cannot find dir =
> hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in
> partToPartitionInfo!)'
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.ExecDriver
> 
> Script:
> 
> drop table facts_details;
> drop table facts;
> 
> CREATE TABLE facts
> (xid string, 
> devtype_id int)
> PARTITIONED by (dateint int, hour int)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED
> BY '\004' MAP KEYS TERMINATED BY '\002' stored as SEQUENCEFILE;
> 
> CREATE TABLE facts_details
> (xid string,
> cdn_name string,
> utc_ms array<bigint>,
> moff array<int>)
> PARTITIONED by (dateint int, hour int)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED
> BY '\004' MAP KEYS TERMINATED BY '\002' stored as SEQUENCEFILE;
> 
> 
> select f.devtype_id from facts f join facts_details c on (f.xid = c.xid)
> where c.dateint = 20090710 and f.dateint = 20090710
> group by f.devtype_id;
> 
> 
> 


RE: Error in running group-by and join hive query...

Posted by Ashish Thusoo <at...@facebook.com>.
what does

/tmp/<username>/hive.log contain?

Ashish

________________________________
From: Eva Tse [mailto:etse@netflix.com]
Sent: Friday, July 17, 2009 11:07 AM
To: hive-user@hadoop.apache.org
Subject: Error in running group-by and join hive query...

Hive version: r786648 w/ HIVE-487 2nd patch.

However, it is working on Hive 0.3.

Thanks,
Eva.

Running the script in this email gives the following errors:

Hive history file=/tmp/dataeng/hive_job_log_dataeng_200907171359_1511035858.txt
OK
Time taken: 3.419 seconds
OK
Time taken: 0.211 seconds
OK
Time taken: 0.364 seconds
OK
Time taken: 0.104 seconds
Total MapReduce jobs = 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Job Submission failed with exception 'java.io.IOException(cannot find dir = hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in partToPartitionInfo!)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ExecDriver

Script:

drop table facts_details;
drop table facts;

CREATE TABLE facts
(xid string,
devtype_id int)
PARTITIONED by (dateint int, hour int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED BY '\004' MAP KEYS TERMINATED BY '\002' stored as SEQUENCEFILE;

CREATE TABLE facts_details
(xid string,
cdn_name string,
utc_ms array<bigint>,
moff array<int>)
PARTITIONED by (dateint int, hour int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED BY '\004' MAP KEYS TERMINATED BY '\002' stored as SEQUENCEFILE;


select f.devtype_id from facts f join facts_details c on (f.xid = c.xid)
where c.dateint = 20090710 and f.dateint = 20090710
group by f.devtype_id;