You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Daniel Haviv <da...@veracity-group.com> on 2014/12/06 15:27:23 UTC

Insert into dynamic partitions performance

Hi,
I'm executing an insert statement that goes over 1TB of data.
The map phase goes well but the reduce stage only used one reducer which becomes a great bottleneck.

 I've tried to set the number of reducers to four and added a distribute by clause to the statement but I'm still using just one reducer.

How can I increase the reducer's parallelism?

Thanks,
Daniel