You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Siying Dong <si...@fb.com> on 2011/05/04 04:15:39 UTC

Review Request: Block Sampling should adjust number of reducers accordingly

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/685/
-----------------------------------------------------------

Review request for hive, Ning Zhang and namit jain.


Summary
-------

Now number of reducers of block sampling is not modified, so that queries like:
select c from tab tablesample(1 percent) group by c;
can generate huge number of reducers although the input is sampled to be small.
We need to shrink number of reducers to make block sampling more useful.
Since now number of reducers are determined before get splits, the way to do it probably is not clean enough, but we can do a good guess.


This addresses bug HIVE-2146.
    https://issues.apache.org/jira/browse/HIVE-2146


Diffs
-----

  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MapRedTask.java 1098885 

Diff: https://reviews.apache.org/r/685/diff


Testing
-------


Thanks,

Siying