You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Jim Green <op...@gmail.com> on 2015/08/19 23:40:57 UTC
Tez : Anyway to avoid creating subdirectories by "Insert with union all” ?
Hi Team,
Below insert with union-all will create sub-directories:
set hive.execution.engine=tez;
create table h1_passwords_target like h1_passwords;
insert overwrite table h1_passwords_target
select * from
(select * from h1_passwords limit 1
union all
select * from h1_passwords limit 2 ) sub;
[root@h1 h1_passwords_target]# ls -altr
total 2
drwxrwxrwx 115 xxx xxx 113 Aug 19 21:24 ..
drwxr-xr-x 2 xxx xxx 1 Aug 19 21:25 2
drwxr-xr-x 2 xxx xxx 1 Aug 19 21:25 1
drwxr-xr-x 4 xxx xxx 2 Aug 19 21:25 .
Is there anyway to avoid creating sub-directories? Or this is by design and
can not be changed?
Because non-Tez query by default they can not work fine since
hive.mapred.supports.subdirectories=false.
--
Thanks,
www.openkb.info
(Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)
Re: Tez : Anyway to avoid creating subdirectories by "Insert with union all² ?
Posted by Gopal Vijayaraghavan <go...@apache.org>.
> Is there anyway to avoid creating sub-directories? Or this is by design
>and can not be changed?
This is because of the way file-formats generate hadoop name files without
collisions.
For instance, any change to that would break Parquet-MR for Tez. That's
why we generate a compatible, but colliding mapreduce.task.attempt.id
artificially for Tez jobs.
³Map 1² and ³Map 2² would both have an attempt 0 of task 1, generating
colliding file names (0001_0).
The easy workaround is a ³re-load² of the table.
insert overwrite table h1_passwords_target select * from
h1_passwords_target;
The slightly more complex one is to add a DISTRIBUTE BY & trigger a
reducer after the UNION ALL.
Cheers,
Gopal
Re: Tez : Anyway to avoid creating subdirectories by "Insert with union all² ?
Posted by Gopal Vijayaraghavan <go...@apache.org>.
> Is there anyway to avoid creating sub-directories? Or this is by design
>and can not be changed?
This is because of the way file-formats generate hadoop name files without
collisions.
For instance, any change to that would break Parquet-MR for Tez. That's
why we generate a compatible, but colliding mapreduce.task.attempt.id
artificially for Tez jobs.
³Map 1² and ³Map 2² would both have an attempt 0 of task 1, generating
colliding file names (0001_0).
The easy workaround is a ³re-load² of the table.
insert overwrite table h1_passwords_target select * from
h1_passwords_target;
The slightly more complex one is to add a DISTRIBUTE BY & trigger a
reducer after the UNION ALL.
Cheers,
Gopal