You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Yin Huai <hu...@cse.ohio-state.edu> on 2013/08/27 20:43:39 UTC

Review Request 13862: ReduceSinkDeDuplication can pick the wrong partitioning columns

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13862/
-----------------------------------------------------------

Review request for hive.


Bugs: HIVE-5149
    https://issues.apache.org/jira/browse/HIVE-5149


Repository: hive-git


Description
-------

https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4QGc+cpAR5yVR8SJtM4Q@mail.gmail.com%3E


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java c380a2d 
  ql/src/test/results/clientpositive/groupby2_map_skew.q.out da7a128 
  ql/src/test/results/clientpositive/groupby_cube1.q.out a52f4eb 
  ql/src/test/results/clientpositive/groupby_rollup1.q.out f120471 
  ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out 3297ebb 

Diff: https://reviews.apache.org/r/13862/diff/


Testing
-------


Thanks,

Yin Huai


Re: Review Request 13862: [HIVE-5149] ReduceSinkDeDuplication can pick the wrong partitioning columns

Posted by Yin Huai <hu...@cse.ohio-state.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13862/
-----------------------------------------------------------

(Updated Sept. 3, 2013, 12:29 a.m.)


Review request for hive.


Changes
-------

addressed Ashutosh's comments


Bugs: HIVE-5149
    https://issues.apache.org/jira/browse/HIVE-5149


Repository: hive-git


Description
-------

https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4QGc+cpAR5yVR8SJtM4Q@mail.gmail.com%3E


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java c380a2d 
  ql/src/test/results/clientpositive/groupby2_map_skew.q.out da7a128 
  ql/src/test/results/clientpositive/groupby_cube1.q.out a52f4eb 
  ql/src/test/results/clientpositive/groupby_rollup1.q.out f120471 
  ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out 3297ebb 

Diff: https://reviews.apache.org/r/13862/diff/


Testing
-------


Thanks,

Yin Huai


Re: Review Request 13862: [HIVE-5149] ReduceSinkDeDuplication can pick the wrong partitioning columns

Posted by Yin Huai <hu...@cse.ohio-state.edu>.

On Sept. 2, 2013, 5:29 a.m., Yin Huai wrote:
> > Thanks for adding comments!

We can have a query like ...
explain select * from (select * from src1 cluster by key) tmp sort by key, value; 

In this case, at first, we have two MR jobs. Since the second job is used for "sort by". There is no partitioning column.


- Yin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13862/#review25818
-----------------------------------------------------------


On Aug. 30, 2013, 3:29 p.m., Yin Huai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13862/
> -----------------------------------------------------------
> 
> (Updated Aug. 30, 2013, 3:29 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-5149
>     https://issues.apache.org/jira/browse/HIVE-5149
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4QGc+cpAR5yVR8SJtM4Q@mail.gmail.com%3E
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java c380a2d 
>   ql/src/test/results/clientpositive/groupby2_map_skew.q.out da7a128 
>   ql/src/test/results/clientpositive/groupby_cube1.q.out a52f4eb 
>   ql/src/test/results/clientpositive/groupby_rollup1.q.out f120471 
>   ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out 3297ebb 
> 
> Diff: https://reviews.apache.org/r/13862/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Yin Huai
> 
>


Re: Review Request 13862: [HIVE-5149] ReduceSinkDeDuplication can pick the wrong partitioning columns

Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13862/#review25818
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
<https://reviews.apache.org/r/13862/#comment50365>

    else { throw new SemanticException("Not able to correctly identify partitioning columns. Hint: Try hive.optimize.reducededuplication=false; ");}


Thanks for adding comments!

- Ashutosh Chauhan


On Aug. 30, 2013, 3:29 p.m., Yin Huai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13862/
> -----------------------------------------------------------
> 
> (Updated Aug. 30, 2013, 3:29 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-5149
>     https://issues.apache.org/jira/browse/HIVE-5149
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4QGc+cpAR5yVR8SJtM4Q@mail.gmail.com%3E
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java c380a2d 
>   ql/src/test/results/clientpositive/groupby2_map_skew.q.out da7a128 
>   ql/src/test/results/clientpositive/groupby_cube1.q.out a52f4eb 
>   ql/src/test/results/clientpositive/groupby_rollup1.q.out f120471 
>   ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out 3297ebb 
> 
> Diff: https://reviews.apache.org/r/13862/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Yin Huai
> 
>


Re: Review Request 13862: [HIVE-5149] ReduceSinkDeDuplication can pick the wrong partitioning columns

Posted by Yin Huai <hu...@cse.ohio-state.edu>.

On Sept. 2, 2013, 5:39 a.m., Yin Huai wrote:
> > Another sanity check.

Done.


- Yin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13862/#review25819
-----------------------------------------------------------


On Aug. 30, 2013, 3:29 p.m., Yin Huai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13862/
> -----------------------------------------------------------
> 
> (Updated Aug. 30, 2013, 3:29 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-5149
>     https://issues.apache.org/jira/browse/HIVE-5149
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4QGc+cpAR5yVR8SJtM4Q@mail.gmail.com%3E
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java c380a2d 
>   ql/src/test/results/clientpositive/groupby2_map_skew.q.out da7a128 
>   ql/src/test/results/clientpositive/groupby_cube1.q.out a52f4eb 
>   ql/src/test/results/clientpositive/groupby_rollup1.q.out f120471 
>   ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out 3297ebb 
> 
> Diff: https://reviews.apache.org/r/13862/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Yin Huai
> 
>


Re: Review Request 13862: [HIVE-5149] ReduceSinkDeDuplication can pick the wrong partitioning columns

Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13862/#review25819
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java
<https://reviews.apache.org/r/13862/#comment50366>

    In here. if(result[0] <= 0) throw new SemanticException("Sort columns and order don't match. Try hive.optimize.reducesinkdeduplication=false;");


Another sanity check.

- Ashutosh Chauhan


On Aug. 30, 2013, 3:29 p.m., Yin Huai wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13862/
> -----------------------------------------------------------
> 
> (Updated Aug. 30, 2013, 3:29 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-5149
>     https://issues.apache.org/jira/browse/HIVE-5149
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4QGc+cpAR5yVR8SJtM4Q@mail.gmail.com%3E
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java c380a2d 
>   ql/src/test/results/clientpositive/groupby2_map_skew.q.out da7a128 
>   ql/src/test/results/clientpositive/groupby_cube1.q.out a52f4eb 
>   ql/src/test/results/clientpositive/groupby_rollup1.q.out f120471 
>   ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out 3297ebb 
> 
> Diff: https://reviews.apache.org/r/13862/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Yin Huai
> 
>


Re: Review Request 13862: [HIVE-5149] ReduceSinkDeDuplication can pick the wrong partitioning columns

Posted by Yin Huai <hu...@cse.ohio-state.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13862/
-----------------------------------------------------------

(Updated Aug. 30, 2013, 3:29 p.m.)


Review request for hive.


Summary (updated)
-----------------

[HIVE-5149] ReduceSinkDeDuplication can pick the wrong partitioning columns


Bugs: HIVE-5149
    https://issues.apache.org/jira/browse/HIVE-5149


Repository: hive-git


Description
-------

https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4QGc+cpAR5yVR8SJtM4Q@mail.gmail.com%3E


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java c380a2d 
  ql/src/test/results/clientpositive/groupby2_map_skew.q.out da7a128 
  ql/src/test/results/clientpositive/groupby_cube1.q.out a52f4eb 
  ql/src/test/results/clientpositive/groupby_rollup1.q.out f120471 
  ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out 3297ebb 

Diff: https://reviews.apache.org/r/13862/diff/


Testing
-------


Thanks,

Yin Huai


Re: Review Request 13862: ReduceSinkDeDuplication can pick the wrong partitioning columns

Posted by Yin Huai <hu...@cse.ohio-state.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13862/
-----------------------------------------------------------

(Updated Aug. 28, 2013, 7:03 p.m.)


Review request for hive.


Changes
-------

update comments


Bugs: HIVE-5149
    https://issues.apache.org/jira/browse/HIVE-5149


Repository: hive-git


Description
-------

https://mail-archives.apache.org/mod_mbox/hive-user/201308.mbox/%3CCAG6Lhyex5XPwszpihKqkPRpzri2k=m4QGc+cpAR5yVR8SJtM4Q@mail.gmail.com%3E


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/ReduceSinkDeDuplication.java c380a2d 
  ql/src/test/results/clientpositive/groupby2_map_skew.q.out da7a128 
  ql/src/test/results/clientpositive/groupby_cube1.q.out a52f4eb 
  ql/src/test/results/clientpositive/groupby_rollup1.q.out f120471 
  ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out 3297ebb 

Diff: https://reviews.apache.org/r/13862/diff/


Testing
-------


Thanks,

Yin Huai