You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tez.apache.org by David Ginzburg <da...@gmail.com> on 2015/05/24 22:34:41 UTC
one reducer only when inserting into an orc dynimicaly partitioned table
Hi,
I am running on 10 node cluster hdp 2.2.
Using tez and yarn.
hive version is 0.14
I have a 90 milion row table stroed in a plain text csv 10GB text file.
When trying to insert into an orc partitioned table using the statement:
"insert overwrite table 2h2 partition (dt) select *,TIME_STAMP from
2h_tmp;"
dt is the dynamic partition key.
Tez alloactes only one reducer to the job which results in a 6 hour run.
I expect about 120 partions to be created .
How can I increase number of reducers to speed up this job?
Is this related to https://issues.apache.org/jira/browse/HIVE-7158 , it is
marked as resolved for hive 0.14
I am running with default values
hive.tez.auto.reducer.parallelism
Default Value: false
Added In: Hive 0.14.0 with HIVE-7158
hive.tez.max.partition.factor
Default Value: 2
Added In: Hive 0.14.0 with HIVE-7158
hive.tez.min.partition.factor
Default Value: 0.25
Added In: Hive 0.14.0 with HIVE-7158
and hive.exec.dynamic.partition=true;
hive.exec.dynamic.partition.mode=nonstrict;
Re: one reducer only when inserting into an orc dynimicaly
partitioned table
Posted by Gopal Vijayaraghavan <go...@apache.org>.
Hi,
This is really a hive question & hopefully you can follow up on this on
the hive user@ mailing lists.
But since you¹re looking at Hive-on-Tez, this issue seems familiar to me.
> "insert overwrite table 2h2 partition (dt) select *,TIME_STAMP from
>2h_tmp;"
>
> Tez alloactes only one reducer to the job which results in a 6 hour run.
That doesn¹t look like it needs a reducer in normal cases.
Is the destination table bucketed into 1 bucket?
> Is this related to https://issues.apache.org/jira/browse/HIVE-7158 , it
>is marked as resolved for hive 0.14
No, it is not.
This might be related to a featured turned off by default in HDP-2.2.
If you have >1 partition in the dynamic partitioned insert, the feature
you need is in HIVE-6455 + HIVE-6761.
set hive.optimize.sort.dynamic.partition=true;
This is off by default, since it slows down ETL where the destination is
exactly 1 partition.
I keep updating the hive-testbench to do the right thing (because it does
both TPC-DS and TPC-H), so those settings might be of help
https://github.com/hortonworks/hive-testbench/blob/hive14/settings/load-par
titioned.sql#L10
Cheers,
Gopal