You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Viraj Bhat <vi...@yahoo-inc.com> on 2010/07/02 10:00:18 UTC

CombineInput Format does not seem to work correctly when accessing Dynamic partitions

Hi all,

 I have a large number of small files, in some partitions as Hive does
not support merging small files when using dynamic partitioning.

To process these small files, I try to use the 

set
hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
with Hadoop 20.

 

I have an error, any one has seen such errors before.

 

 

java.io.IOException: cannot find dir =
hdfs://namenodeurl/tmp/hive-viraj/hive_2010-07-02_07-53-24_366_91913091

91776626031/1318857972/1/emptyFile in partToPartitionInfo:
[/tmp/hive-viraj/hive_2010-07-02_07-53-24_366_9191309191776626031/131885
7972

/1]

        at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getPartitionDescFrom
Path(CombineHiveInputFormat.java:373)

        at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSpli
t.<init>(CombineHiveInputFormat.java:98)

        at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiv
eInputFormat.java:298)

        at
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:832)

        at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:803)

        at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752)

        at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:674)

        at
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)

        at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:
55)

        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631)

        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504)

        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382)

        at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)

        at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)

        at
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:216)

        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:273)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)

        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)

        at java.lang.reflect.Method.invoke(Method.java:597)

        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Job Submission failed with exception 'java.io.IOException(cannot find
dir = hdfs://namenodeurl/tmp/hive-viraj/

hive_2010-07-02_07-53-24_366_9191309191776626031/1318857972/1/emptyFile
in partToPartitionInfo: [/tmp/hive-viraj/hive_2010-07-02_07-53-

24_366_9191309191776626031/1318857972/1])'

 

Thanks Viraj

 

 


RE: CombineInput Format does not seem to work correctly when accessing Dynamic partitions

Posted by Viraj Bhat <vi...@yahoo-inc.com>.
Hi all,

 We are hit by:

https://issues.apache.org/jira/browse/HADOOP-5759 

So is there a way to use MultipleInputFormat in Hive? 

Viraj

 

________________________________

From: Viraj Bhat [mailto:viraj@yahoo-inc.com] 
Sent: Friday, July 02, 2010 1:00 AM
To: hive-user@hadoop.apache.org
Subject: CombineInput Format does not seem to work correctly when
accessing Dynamic partitions

 

Hi all,

 I have a large number of small files, in some partitions as Hive does
not support merging small files when using dynamic partitioning.

To process these small files, I try to use the 

set
hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
with Hadoop 20.

 

I have an error, any one has seen such errors before.

 

 

java.io.IOException: cannot find dir =
hdfs://namenodeurl/tmp/hive-viraj/hive_2010-07-02_07-53-24_366_91913091

91776626031/1318857972/1/emptyFile in partToPartitionInfo:
[/tmp/hive-viraj/hive_2010-07-02_07-53-24_366_9191309191776626031/131885
7972

/1]

        at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getPartitionDescFrom
Path(CombineHiveInputFormat.java:373)

        at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSpli
t.<init>(CombineHiveInputFormat.java:98)

        at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiv
eInputFormat.java:298)

        at
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:832)

        at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:803)

        at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:752)

        at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:674)

        at
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)

        at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:
55)

        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631)

        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504)

        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382)

        at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)

        at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)

        at
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:216)

        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:273)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.jav
a:39)

        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
Impl.java:25)

        at java.lang.reflect.Method.invoke(Method.java:597)

        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Job Submission failed with exception 'java.io.IOException(cannot find
dir = hdfs://namenodeurl/tmp/hive-viraj/

hive_2010-07-02_07-53-24_366_9191309191776626031/1318857972/1/emptyFile
in partToPartitionInfo: [/tmp/hive-viraj/hive_2010-07-02_07-53-

24_366_9191309191776626031/1318857972/1])'

 

Thanks Viraj