You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Eva Tse <et...@netflix.com> on 2009/07/17 20:06:46 UTC
Error in running group-by and join hive query...
Hive version: r786648 w/ HIVE-487 2nd patch.
However, it is working on Hive 0.3.
Thanks,
Eva.
Running the script in this email gives the following errors:
Hive history
file=/tmp/dataeng/hive_job_log_dataeng_200907171359_1511035858.txt
OK
Time taken: 3.419 seconds
OK
Time taken: 0.211 seconds
OK
Time taken: 0.364 seconds
OK
Time taken: 0.104 seconds
Total MapReduce jobs = 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Job Submission failed with exception 'java.io.IOException(cannot find dir =
hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in
partToPartitionInfo!)'
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.ExecDriver
Script:
drop table facts_details;
drop table facts;
CREATE TABLE facts
(xid string,
devtype_id int)
PARTITIONED by (dateint int, hour int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED
BY '\004' MAP KEYS TERMINATED BY '\002' stored as SEQUENCEFILE;
CREATE TABLE facts_details
(xid string,
cdn_name string,
utc_ms array<bigint>,
moff array<int>)
PARTITIONED by (dateint int, hour int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED
BY '\004' MAP KEYS TERMINATED BY '\002' stored as SEQUENCEFILE;
select f.devtype_id from facts f join facts_details c on (f.xid = c.xid)
where c.dateint = 20090710 and f.dateint = 20090710
group by f.devtype_id;
RE: Error in running group-by and join hive query...
Posted by Ashish Thusoo <at...@facebook.com>.
We tried to run the test case that you gave in your email and it seems to work fine. I am running this on r793646. Can you try with that?
Thanks,
Ashish
________________________________
From: Eva Tse [mailto:etse@netflix.com]
Sent: Friday, July 17, 2009 3:33 PM
To: hive-user@hadoop.apache.org
Subject: Re: Error in running group-by and join hive query...
Ashish, it is in the attached file.
Thanks,
Eva.
On 7/17/09 1:27 PM, "Ashish Thusoo" <at...@facebook.com> wrote:
Looks like pathToPartitionInfo array did not get populated in your case.
Can you also send the output of
explain extended <query>
That will tell us the value of pathToPartitionInfo.
Ashish
________________________________
From: Eva Tse [mailto:etse@netflix.com]
Sent: Friday, July 17, 2009 12:24 PM
To: hive-user@hadoop.apache.org
Subject: Re: Error in running group-by and join hive query...
I believe this is the relevant section. Please let me know if we need add'l info.
Thanks,
Eva.
2009-07-17 13:59:30,953 ERROR ql.Driver (SessionState.java:printError(279)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ExecDriver
2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved.
2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved.
2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved.
2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved.
2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved.
2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved.
2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved.
2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved.
2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved.
2009-07-17 15:16:05,605 WARN mapred.JobClient (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2009-07-17 15:16:05,814 ERROR exec.ExecDriver (SessionState.java:printError(279)) - Job Submission failed with exception 'java.io.IOException(cannot find dir = hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in partToPartitionInfo!)'
java.io.IOException: cannot find dir = hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in partToPartitionInfo!
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getTableDescFromPath(HiveInputFormat.java:256)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:208)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:387)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:307)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:176)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:234)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:278)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
2009-07-17 15:16:05,821 ERROR ql.Driver (SessionState.java:printError(279)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ExecDriver
On 7/17/09 12:02 PM, "Ashish Thusoo" <at...@facebook.com> wrote:
what does
/tmp/<username>/hive.log contain?
Ashish
________________________________
From: Eva Tse [mailto:etse@netflix.com]
Sent: Friday, July 17, 2009 11:07 AM
To: hive-user@hadoop.apache.org
Subject: Error in running group-by and join hive query...
Hive version: r786648 w/ HIVE-487 2nd patch.
However, it is working on Hive 0.3.
Thanks,
Eva.
Running the script in this email gives the following errors:
Hive history file=/tmp/dataeng/hive_job_log_dataeng_200907171359_1511035858.txt
OK
Time taken: 3.419 seconds
OK
Time taken: 0.211 seconds
OK
Time taken: 0.364 seconds
OK
Time taken: 0.104 seconds
Total MapReduce jobs = 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Job Submission failed with exception 'java.io.IOException(cannot find dir = hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in partToPartitionInfo!)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ExecDriver
Script:
drop table facts_details;
drop table facts;
CREATE TABLE facts
(xid string,
devtype_id int)
PARTITIONED by (dateint int, hour int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED BY '\004' MAP KEYS TERMINATED BY '\002' stored as SEQUENCEFILE;
CREATE TABLE facts_details
(xid string,
cdn_name string,
utc_ms array<bigint>,
moff array<int>)
PARTITIONED by (dateint int, hour int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED BY '\004' MAP KEYS TERMINATED BY '\002' stored as SEQUENCEFILE;
select f.devtype_id from facts f join facts_details c on (f.xid = c.xid)
where c.dateint = 20090710 and f.dateint = 20090710
group by f.devtype_id;
Re: Error in running group-by and join hive query...
Posted by Eva Tse <et...@netflix.com>.
Ashish, it is in the attached file.
Thanks,
Eva.
On 7/17/09 1:27 PM, "Ashish Thusoo" <at...@facebook.com> wrote:
> Looks like pathToPartitionInfo array did not get populated in your case.
>
> Can you also send the output of
>
> explain extended <query>
>
> That will tell us the value of pathToPartitionInfo.
>
> Ashish
>
>
> From: Eva Tse [mailto:etse@netflix.com]
> Sent: Friday, July 17, 2009 12:24 PM
> To: hive-user@hadoop.apache.org
> Subject: Re: Error in running group-by and join hive query...
>
> I believe this is the relevant section. Please let me know if we need add¹l
> info.
>
> Thanks,
> Eva.
>
> 2009-07-17 13:59:30,953 ERROR ql.Driver (SessionState.java:printError(279)) -
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.ExecDriver
> 2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
> Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it
> cannot be resolved.
> 2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
> Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it
> cannot be resolved.
> 2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
> Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it
> cannot be resolved.
> 2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
> Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it
> cannot be resolved.
> 2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
> Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it
> cannot be resolved.
> 2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
> Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it
> cannot be resolved.
> 2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
> Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be
> resolved.
> 2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
> Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be
> resolved.
> 2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
> Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be
> resolved.
> 2009-07-17 15:16:05,605 WARN mapred.JobClient
> (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser
> for parsing the arguments. Applications should implement Tool for the same.
> 2009-07-17 15:16:05,814 ERROR exec.ExecDriver
> (SessionState.java:printError(279)) - Job Submission failed with exception
> 'java.io.IOException(cannot find dir =
> hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in
> partToPartitionInfo!)'
> java.io.IOException: cannot find dir =
> hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in
> partToPartitionInfo!
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getTableDescFromPath(HiveInputFor
> mat.java:256)
> at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:20
> 8)
> at
> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
> at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
> at
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:387)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:307)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:213)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:176)
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:216)
> at
> org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:234)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:278)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j
> ava:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> 2009-07-17 15:16:05,821 ERROR ql.Driver (SessionState.java:printError(279)) -
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.ExecDriver
>
>
>
> On 7/17/09 12:02 PM, "Ashish Thusoo" <at...@facebook.com> wrote:
>
>> what does
>>
>> /tmp/<username>/hive.log contain?
>>
>> Ashish
>>
>>
>>
>> From: Eva Tse [mailto:etse@netflix.com]
>> Sent: Friday, July 17, 2009 11:07 AM
>> To: hive-user@hadoop.apache.org
>> Subject: Error in running group-by and join hive query...
>>
>> Hive version: r786648 w/ HIVE-487 2nd patch.
>>
>> However, it is working on Hive 0.3.
>>
>> Thanks,
>> Eva.
>>
>> Running the script in this email gives the following errors:
>>
>> Hive history
>> file=/tmp/dataeng/hive_job_log_dataeng_200907171359_1511035858.txt
>> OK
>> Time taken: 3.419 seconds
>> OK
>> Time taken: 0.211 seconds
>> OK
>> Time taken: 0.364 seconds
>> OK
>> Time taken: 0.104 seconds
>> Total MapReduce jobs = 2
>> Number of reduce tasks not specified. Estimated from input data size: 1
>> In order to change the average load for a reducer (in bytes):
>> set hive.exec.reducers.bytes.per.reducer=<number>
>> In order to limit the maximum number of reducers:
>> set hive.exec.reducers.max=<number>
>> In order to set a constant number of reducers:
>> set mapred.reduce.tasks=<number>
>> Job Submission failed with exception 'java.io.IOException(cannot find dir =
>> hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in
>> partToPartitionInfo!)'
>> FAILED: Execution Error, return code 1 from
>> org.apache.hadoop.hive.ql.exec.ExecDriver
>>
>> Script:
>>
>> drop table facts_details;
>> drop table facts;
>>
>> CREATE TABLE facts
>> (xid string,
>> devtype_id int)
>> PARTITIONED by (dateint int, hour int)
>> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED
>> BY '\004' MAP KEYS TERMINATED BY '\002' stored as SEQUENCEFILE;
>>
>> CREATE TABLE facts_details
>> (xid string,
>> cdn_name string,
>> utc_ms array<bigint>,
>> moff array<int>)
>> PARTITIONED by (dateint int, hour int)
>> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED
>> BY '\004' MAP KEYS TERMINATED BY '\002' stored as SEQUENCEFILE;
>>
>>
>> select f.devtype_id from facts f join facts_details c on (f.xid = c.xid)
>> where c.dateint = 20090710 and f.dateint = 20090710
>> group by f.devtype_id;
>>
>>
>>
>
RE: Error in running group-by and join hive query...
Posted by Ashish Thusoo <at...@facebook.com>.
Looks like pathToPartitionInfo array did not get populated in your case.
Can you also send the output of
explain extended <query>
That will tell us the value of pathToPartitionInfo.
Ashish
________________________________
From: Eva Tse [mailto:etse@netflix.com]
Sent: Friday, July 17, 2009 12:24 PM
To: hive-user@hadoop.apache.org
Subject: Re: Error in running group-by and join hive query...
I believe this is the relevant section. Please let me know if we need add'l info.
Thanks,
Eva.
2009-07-17 13:59:30,953 ERROR ql.Driver (SessionState.java:printError(279)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ExecDriver
2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved.
2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved.
2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it cannot be resolved.
2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved.
2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved.
2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it cannot be resolved.
2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved.
2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved.
2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) - Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be resolved.
2009-07-17 15:16:05,605 WARN mapred.JobClient (JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2009-07-17 15:16:05,814 ERROR exec.ExecDriver (SessionState.java:printError(279)) - Job Submission failed with exception 'java.io.IOException(cannot find dir = hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in partToPartitionInfo!)'
java.io.IOException: cannot find dir = hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in partToPartitionInfo!
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getTableDescFromPath(HiveInputFormat.java:256)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:208)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:387)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:307)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:176)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:234)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:278)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
2009-07-17 15:16:05,821 ERROR ql.Driver (SessionState.java:printError(279)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ExecDriver
On 7/17/09 12:02 PM, "Ashish Thusoo" <at...@facebook.com> wrote:
what does
/tmp/<username>/hive.log contain?
Ashish
________________________________
From: Eva Tse [mailto:etse@netflix.com]
Sent: Friday, July 17, 2009 11:07 AM
To: hive-user@hadoop.apache.org
Subject: Error in running group-by and join hive query...
Hive version: r786648 w/ HIVE-487 2nd patch.
However, it is working on Hive 0.3.
Thanks,
Eva.
Running the script in this email gives the following errors:
Hive history file=/tmp/dataeng/hive_job_log_dataeng_200907171359_1511035858.txt
OK
Time taken: 3.419 seconds
OK
Time taken: 0.211 seconds
OK
Time taken: 0.364 seconds
OK
Time taken: 0.104 seconds
Total MapReduce jobs = 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Job Submission failed with exception 'java.io.IOException(cannot find dir = hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in partToPartitionInfo!)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ExecDriver
Script:
drop table facts_details;
drop table facts;
CREATE TABLE facts
(xid string,
devtype_id int)
PARTITIONED by (dateint int, hour int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED BY '\004' MAP KEYS TERMINATED BY '\002' stored as SEQUENCEFILE;
CREATE TABLE facts_details
(xid string,
cdn_name string,
utc_ms array<bigint>,
moff array<int>)
PARTITIONED by (dateint int, hour int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED BY '\004' MAP KEYS TERMINATED BY '\002' stored as SEQUENCEFILE;
select f.devtype_id from facts f join facts_details c on (f.xid = c.xid)
where c.dateint = 20090710 and f.dateint = 20090710
group by f.devtype_id;
Re: Error in running group-by and join hive query...
Posted by Eva Tse <et...@netflix.com>.
I believe this is the relevant section. Please let me know if we need add¹l
info.
Thanks,
Eva.
2009-07-17 13:59:30,953 ERROR ql.Driver (SessionState.java:printError(279))
- FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.ExecDriver
2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it
cannot be resolved.
2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it
cannot be resolved.
2009-07-17 15:16:00,718 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.resources" but it
cannot be resolved.
2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it
cannot be resolved.
2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it
cannot be resolved.
2009-07-17 15:16:00,722 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
Bundle "org.eclipse.jdt.core" requires "org.eclipse.core.runtime" but it
cannot be resolved.
2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be
resolved.
2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be
resolved.
2009-07-17 15:16:00,723 ERROR JPOX.Plugin (Log4JLogger.java:error(117)) -
Bundle "org.eclipse.jdt.core" requires "org.eclipse.text" but it cannot be
resolved.
2009-07-17 15:16:05,605 WARN mapred.JobClient
(JobClient.java:configureCommandLineOptions(539)) - Use GenericOptionsParser
for parsing the arguments. Applications should implement Tool for the same.
2009-07-17 15:16:05,814 ERROR exec.ExecDriver
(SessionState.java:printError(279)) - Job Submission failed with exception
'java.io.IOException(cannot find dir =
hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in
partToPartitionInfo!)'
java.io.IOException: cannot find dir =
hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in
partToPartitionInfo!
at
org.apache.hadoop.hive.ql.io.HiveInputFormat.getTableDescFromPath(HiveInputF
ormat.java:256)
at
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:
208)
at
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:387)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:307)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:213)
at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:176)
at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:216)
at
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:234)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:278)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
2009-07-17 15:16:05,821 ERROR ql.Driver (SessionState.java:printError(279))
- FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.ExecDriver
On 7/17/09 12:02 PM, "Ashish Thusoo" <at...@facebook.com> wrote:
> what does
>
> /tmp/<username>/hive.log contain?
>
> Ashish
>
>
> From: Eva Tse [mailto:etse@netflix.com]
> Sent: Friday, July 17, 2009 11:07 AM
> To: hive-user@hadoop.apache.org
> Subject: Error in running group-by and join hive query...
>
> Hive version: r786648 w/ HIVE-487 2nd patch.
>
> However, it is working on Hive 0.3.
>
> Thanks,
> Eva.
>
> Running the script in this email gives the following errors:
>
> Hive history
> file=/tmp/dataeng/hive_job_log_dataeng_200907171359_1511035858.txt
> OK
> Time taken: 3.419 seconds
> OK
> Time taken: 0.211 seconds
> OK
> Time taken: 0.364 seconds
> OK
> Time taken: 0.104 seconds
> Total MapReduce jobs = 2
> Number of reduce tasks not specified. Estimated from input data size: 1
> In order to change the average load for a reducer (in bytes):
> set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
> set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
> set mapred.reduce.tasks=<number>
> Job Submission failed with exception 'java.io.IOException(cannot find dir =
> hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in
> partToPartitionInfo!)'
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.ExecDriver
>
> Script:
>
> drop table facts_details;
> drop table facts;
>
> CREATE TABLE facts
> (xid string,
> devtype_id int)
> PARTITIONED by (dateint int, hour int)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED
> BY '\004' MAP KEYS TERMINATED BY '\002' stored as SEQUENCEFILE;
>
> CREATE TABLE facts_details
> (xid string,
> cdn_name string,
> utc_ms array<bigint>,
> moff array<int>)
> PARTITIONED by (dateint int, hour int)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED
> BY '\004' MAP KEYS TERMINATED BY '\002' stored as SEQUENCEFILE;
>
>
> select f.devtype_id from facts f join facts_details c on (f.xid = c.xid)
> where c.dateint = 20090710 and f.dateint = 20090710
> group by f.devtype_id;
>
>
>
RE: Error in running group-by and join hive query...
Posted by Ashish Thusoo <at...@facebook.com>.
what does
/tmp/<username>/hive.log contain?
Ashish
________________________________
From: Eva Tse [mailto:etse@netflix.com]
Sent: Friday, July 17, 2009 11:07 AM
To: hive-user@hadoop.apache.org
Subject: Error in running group-by and join hive query...
Hive version: r786648 w/ HIVE-487 2nd patch.
However, it is working on Hive 0.3.
Thanks,
Eva.
Running the script in this email gives the following errors:
Hive history file=/tmp/dataeng/hive_job_log_dataeng_200907171359_1511035858.txt
OK
Time taken: 3.419 seconds
OK
Time taken: 0.211 seconds
OK
Time taken: 0.364 seconds
OK
Time taken: 0.104 seconds
Total MapReduce jobs = 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Job Submission failed with exception 'java.io.IOException(cannot find dir = hdfs://ip-10-251-49-188.ec2.internal:9000/tmp/hive-dataeng/1 in partToPartitionInfo!)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ExecDriver
Script:
drop table facts_details;
drop table facts;
CREATE TABLE facts
(xid string,
devtype_id int)
PARTITIONED by (dateint int, hour int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED BY '\004' MAP KEYS TERMINATED BY '\002' stored as SEQUENCEFILE;
CREATE TABLE facts_details
(xid string,
cdn_name string,
utc_ms array<bigint>,
moff array<int>)
PARTITIONED by (dateint int, hour int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED BY '\004' MAP KEYS TERMINATED BY '\002' stored as SEQUENCEFILE;
select f.devtype_id from facts f join facts_details c on (f.xid = c.xid)
where c.dateint = 20090710 and f.dateint = 20090710
group by f.devtype_id;