You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Viraj Bhat <vi...@yahoo-inc.com> on 2010/07/02 10:00:18 UTC
CombineInput Format does not seem to work correctly when accessing Dynamic partitions
Hi all,
I have a large number of small files, in some partitions as Hive does
not support merging small files when using dynamic partitioning.
To process these small files, I try to use the
set
hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
with Hadoop 20.
I have an error, any one has seen such errors before.
java.io.IOException: cannot find dir =
hdfs://namenodeurl/tmp/hive-viraj/hive_2010-07-02_07-53-24_366_91913091
91776626031/1318857972/1/emptyFile in partToPartitionInfo:
[/tmp/hive-viraj/hive_2010-07-02_07-53-24_366_9191309191776626031/131885
7972
/1]
at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getPartitionDescFrom
Path(CombineHiveInputFormat.java:373)
at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSpli
t.<init>(CombineHiveInputFormat.java:98)
at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiv
eInputFormat.java:298)
at
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:832)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:803)
at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752)
at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:674)
at
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:
55)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382)
at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
at
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:273)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Job Submission failed with exception 'java.io.IOException(cannot find
dir = hdfs://namenodeurl/tmp/hive-viraj/
hive_2010-07-02_07-53-24_366_9191309191776626031/1318857972/1/emptyFile
in partToPartitionInfo: [/tmp/hive-viraj/hive_2010-07-02_07-53-
24_366_9191309191776626031/1318857972/1])'
Thanks Viraj
RE: CombineInput Format does not seem to work correctly when accessing Dynamic partitions
Posted by Viraj Bhat <vi...@yahoo-inc.com>.
Hi all,
We are hit by:
https://issues.apache.org/jira/browse/HADOOP-5759
So is there a way to use MultipleInputFormat in Hive?
Viraj
________________________________
From: Viraj Bhat [mailto:viraj@yahoo-inc.com]
Sent: Friday, July 02, 2010 1:00 AM
To: hive-user@hadoop.apache.org
Subject: CombineInput Format does not seem to work correctly when
accessing Dynamic partitions
Hi all,
I have a large number of small files, in some partitions as Hive does
not support merging small files when using dynamic partitioning.
To process these small files, I try to use the
set
hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
with Hadoop 20.
I have an error, any one has seen such errors before.
java.io.IOException: cannot find dir =
hdfs://namenodeurl/tmp/hive-viraj/hive_2010-07-02_07-53-24_366_91913091
91776626031/1318857972/1/emptyFile in partToPartitionInfo:
[/tmp/hive-viraj/hive_2010-07-02_07-53-24_366_9191309191776626031/131885
7972
/1]
at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getPartitionDescFrom
Path(CombineHiveInputFormat.java:373)
at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSpli
t.<init>(CombineHiveInputFormat.java:98)
at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiv
eInputFormat.java:298)
at
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:832)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:803)
at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752)
at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:674)
at
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:
55)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382)
at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
at
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:273)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Job Submission failed with exception 'java.io.IOException(cannot find
dir = hdfs://namenodeurl/tmp/hive-viraj/
hive_2010-07-02_07-53-24_366_9191309191776626031/1318857972/1/emptyFile
in partToPartitionInfo: [/tmp/hive-viraj/hive_2010-07-02_07-53-
24_366_9191309191776626031/1318857972/1])'
Thanks Viraj