You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Yin Huai <hu...@cse.ohio-state.edu> on 2013/08/30 17:29:52 UTC
Re: Review Request 13862: [HIVE-5149] ReduceSinkDeDuplication can pick the
wrong partitioning columns
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13862/
-----------------------------------------------------------
(Updated Aug. 30, 2013, 3:29 p.m.)
Review request for hive.
Summary (updated)
-----------------
[HIVE-5149] ReduceSinkDeDuplication can pick the wrong partitioning columns
Bugs: HIVE-5149
https://issues.apache.org/jira/browse/HIVE-5149
Repository: hive-git
Description
-------
https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4QGc+cpAR5yVR8SJtM4Q@mail.gmail.com%3E
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java c380a2d
ql/src/test/results/clientpositive/groupby2_map_skew.q.out da7a128
ql/src/test/results/clientpositive/groupby_cube1.q.out a52f4eb
ql/src/test/results/clientpositive/groupby_rollup1.q.out f120471
ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out 3297ebb
Diff: https://reviews.apache.org/r/13862/diff/
Testing
-------
Thanks,
Yin Huai
Re: Review Request 13862: [HIVE-5149] ReduceSinkDeDuplication can pick the
wrong partitioning columns
Posted by Yin Huai <hu...@cse.ohio-state.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13862/
-----------------------------------------------------------
(Updated Sept. 3, 2013, 12:29 a.m.)
Review request for hive.
Changes
-------
addressed Ashutosh's comments
Bugs: HIVE-5149
https://issues.apache.org/jira/browse/HIVE-5149
Repository: hive-git
Description
-------
https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4QGc+cpAR5yVR8SJtM4Q@mail.gmail.com%3E
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java c380a2d
ql/src/test/results/clientpositive/groupby2_map_skew.q.out da7a128
ql/src/test/results/clientpositive/groupby_cube1.q.out a52f4eb
ql/src/test/results/clientpositive/groupby_rollup1.q.out f120471
ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out 3297ebb
Diff: https://reviews.apache.org/r/13862/diff/
Testing
-------
Thanks,
Yin Huai
Re: Review Request 13862: [HIVE-5149] ReduceSinkDeDuplication can pick the
wrong partitioning columns
Posted by Yin Huai <hu...@cse.ohio-state.edu>.
On Sept. 2, 2013, 5:29 a.m., Yin Huai wrote:
> > Thanks for adding comments!
We can have a query like ...
explain select * from (select * from src1 cluster by key) tmp sort by key, value;
In this case, at first, we have two MR jobs. Since the second job is used for "sort by". There is no partitioning column.
- Yin
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13862/#review25818
-----------------------------------------------------------
On Aug. 30, 2013, 3:29 p.m., Yin Huai wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13862/
> -----------------------------------------------------------
>
> (Updated Aug. 30, 2013, 3:29 p.m.)
>
>
> Review request for hive.
>
>
> Bugs: HIVE-5149
> https://issues.apache.org/jira/browse/HIVE-5149
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4QGc+cpAR5yVR8SJtM4Q@mail.gmail.com%3E
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java c380a2d
> ql/src/test/results/clientpositive/groupby2_map_skew.q.out da7a128
> ql/src/test/results/clientpositive/groupby_cube1.q.out a52f4eb
> ql/src/test/results/clientpositive/groupby_rollup1.q.out f120471
> ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out 3297ebb
>
> Diff: https://reviews.apache.org/r/13862/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Yin Huai
>
>
Re: Review Request 13862: [HIVE-5149] ReduceSinkDeDuplication can pick the
wrong partitioning columns
Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13862/#review25818
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
<https://reviews.apache.org/r/13862/#comment50365>
else { throw new SemanticException("Not able to correctly identify partitioning columns. Hint: Try hive.optimize.reducededuplication=false; ");}
Thanks for adding comments!
- Ashutosh Chauhan
On Aug. 30, 2013, 3:29 p.m., Yin Huai wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13862/
> -----------------------------------------------------------
>
> (Updated Aug. 30, 2013, 3:29 p.m.)
>
>
> Review request for hive.
>
>
> Bugs: HIVE-5149
> https://issues.apache.org/jira/browse/HIVE-5149
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4QGc+cpAR5yVR8SJtM4Q@mail.gmail.com%3E
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java c380a2d
> ql/src/test/results/clientpositive/groupby2_map_skew.q.out da7a128
> ql/src/test/results/clientpositive/groupby_cube1.q.out a52f4eb
> ql/src/test/results/clientpositive/groupby_rollup1.q.out f120471
> ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out 3297ebb
>
> Diff: https://reviews.apache.org/r/13862/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Yin Huai
>
>
Re: Review Request 13862: [HIVE-5149] ReduceSinkDeDuplication can pick the
wrong partitioning columns
Posted by Yin Huai <hu...@cse.ohio-state.edu>.
On Sept. 2, 2013, 5:39 a.m., Yin Huai wrote:
> > Another sanity check.
Done.
- Yin
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13862/#review25819
-----------------------------------------------------------
On Aug. 30, 2013, 3:29 p.m., Yin Huai wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13862/
> -----------------------------------------------------------
>
> (Updated Aug. 30, 2013, 3:29 p.m.)
>
>
> Review request for hive.
>
>
> Bugs: HIVE-5149
> https://issues.apache.org/jira/browse/HIVE-5149
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4QGc+cpAR5yVR8SJtM4Q@mail.gmail.com%3E
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java c380a2d
> ql/src/test/results/clientpositive/groupby2_map_skew.q.out da7a128
> ql/src/test/results/clientpositive/groupby_cube1.q.out a52f4eb
> ql/src/test/results/clientpositive/groupby_rollup1.q.out f120471
> ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out 3297ebb
>
> Diff: https://reviews.apache.org/r/13862/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Yin Huai
>
>
Re: Review Request 13862: [HIVE-5149] ReduceSinkDeDuplication can pick the
wrong partitioning columns
Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13862/#review25819
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
<https://reviews.apache.org/r/13862/#comment50366>
In here. if(result[0] <= 0) throw new SemanticException("Sort columns and order don't match. Try hive.optimize.reducesinkdeduplication=false;");
Another sanity check.
- Ashutosh Chauhan
On Aug. 30, 2013, 3:29 p.m., Yin Huai wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13862/
> -----------------------------------------------------------
>
> (Updated Aug. 30, 2013, 3:29 p.m.)
>
>
> Review request for hive.
>
>
> Bugs: HIVE-5149
> https://issues.apache.org/jira/browse/HIVE-5149
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4QGc+cpAR5yVR8SJtM4Q@mail.gmail.com%3E
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java c380a2d
> ql/src/test/results/clientpositive/groupby2_map_skew.q.out da7a128
> ql/src/test/results/clientpositive/groupby_cube1.q.out a52f4eb
> ql/src/test/results/clientpositive/groupby_rollup1.q.out f120471
> ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out 3297ebb
>
> Diff: https://reviews.apache.org/r/13862/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Yin Huai
>
>