You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by phil young <ph...@gmail.com> on 2010/10/30 01:20:27 UTC

Single map task per file in an external table

I'm about to investigate the following situation, but I'd appreciate any
insight that can be given.

We have an external table which is comprised of 3 HDFS files.
We then run an INSERT OVERWRITE which is just a SELECT * from the external
table.
The table being overwritten has N buckets.
The issue is that the INSERT OVERWRITE job has only one map task per input
file.

I would have thought that there would be one map task per HDFS block.

The (slightly more general) question is:
Is there a way to utilize more of the hardware in the cluster when importing
data from flat files to a bucketized table?

Thanks for any help you might be able to provide.

And congratulations on Hive 0.6!

-Phil