You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Gopal Vijayaraghavan <go...@apache.org> on 2015/08/20 01:30:54 UTC

Re: Tez : Anyway to avoid creating subdirectories by "Insert with union all² ?

> Is there anyway to avoid creating sub-directories? Or this is by design
>and can not be changed?

This is because of the way file-formats generate hadoop name files without
collisions.

For instance, any change to that would break Parquet-MR for Tez. That's
why we generate a compatible, but colliding mapreduce.task.attempt.id
artificially for Tez jobs.

³Map 1² and ³Map 2² would both have an attempt 0 of task 1, generating
colliding file names (0001_0).

The easy workaround is a ³re-load² of the table.

insert overwrite table h1_passwords_target select * from
h1_passwords_target;


The slightly more complex one is to add a DISTRIBUTE BY & trigger a
reducer after the UNION ALL.

Cheers,
Gopal