You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Xuefu Zhang <xz...@cloudera.com> on 2017/01/20 18:07:36 UTC
Review Request 55776: Eliminate unbounded memory usage for orderBy and
groupBy in Hive on Spark
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/
-----------------------------------------------------------
Review request for hive, Chao Sun and Rui Li.
Bugs: HIVE-15580
https://issues.apache.org/jira/browse/HIVE-15580
Repository: hive-git
Description
-------
See JIRA description.
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java e128dd2
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java eeb4443
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java d57cac4
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 997ab7e
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 66ffe5d
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 0d31e5f
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f
ql/src/test/queries/clientpositive/union_top_level.q d93fe38
ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83
ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 65a6e3e
ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4
ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3
ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f
Diff: https://reviews.apache.org/r/55776/diff/
Testing
-------
All test passed
Thanks,
Xuefu Zhang
Re: Review Request 55776: Eliminate unbounded memory usage for
orderBy and groupBy in Hive on Spark
Posted by Xuefu Zhang <xz...@cloudera.com>.
On Jan. 20, 2017, 6:26 p.m., Xuefu Zhang wrote:
> > I think we also need to update ql/src/test/results/clientpositive/union_top_level.q.out
>
> Xuefu Zhang wrote:
> No. I verified that MR's plan and result don't change at all. This is because the keys are the same for group by and order by.
>
> Chao Sun wrote:
> Hmm.. I'm surprised. We changed the input qfile and how come the result is not changed?
>
> Xuefu Zhang wrote:
> MR group by is also sorted, so the order by is something not needed so eliminated during optimization. So you, the test didn't fail in the Jenkins result.
>
> Chao Sun wrote:
> No, I mean the input query is changed, so the output should also be changed. If you look at the MR output qfile, it still has
> ```
> PREHOOK: query: explain
> select * from (select s1.key as k, s2.value as v from src s1 join src s2 on (s1.key = s2.key) limit 10)a
> union all
> select * from (select s1.key as k, s2.value as v from src s1 join src s2 on (s1.key = s2.key) limit 10)b
> PREHOOK: type: QUERY
> POSTHOOK: query: explain
> select * from (select s1.key as k, s2.value as v from src s1 join src s2 on (s1.key = s2.key) limit 10)a
> union all
> select * from (select s1.key as k, s2.value as v from src s1 join src s2 on (s1.key = s2.key) limit 10)b
> POSTHOOK: type: QUERY
> ```
> which suggest the test is not triggered on the MR path. Anyway, maybe the test is turned off for MR.
Yeah, Got it. Maybe. I don't believe that's a blocker. We can file a followup JIRA for this. Do you have any other comments?
- Xuefu
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/#review162449
-----------------------------------------------------------
On Jan. 20, 2017, 6:07 p.m., Xuefu Zhang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55776/
> -----------------------------------------------------------
>
> (Updated Jan. 20, 2017, 6:07 p.m.)
>
>
> Review request for hive, Chao Sun and Rui Li.
>
>
> Bugs: HIVE-15580
> https://issues.apache.org/jira/browse/HIVE-15580
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> See JIRA description.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java e128dd2
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java eeb4443
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java d57cac4
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 997ab7e
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 66ffe5d
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 0d31e5f
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f
> ql/src/test/queries/clientpositive/union_top_level.q d93fe38
> ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83
> ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 65a6e3e
> ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4
> ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3
> ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f
>
> Diff: https://reviews.apache.org/r/55776/diff/
>
>
> Testing
> -------
>
> All test passed
>
>
> Thanks,
>
> Xuefu Zhang
>
>
Re: Review Request 55776: Eliminate unbounded memory usage for
orderBy and groupBy in Hive on Spark
Posted by Xuefu Zhang <xz...@cloudera.com>.
On Jan. 20, 2017, 6:26 p.m., Xuefu Zhang wrote:
> > I think we also need to update ql/src/test/results/clientpositive/union_top_level.q.out
No. I verified that MR's plan and result don't change at all. This is because the keys are the same for group by and order by.
- Xuefu
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/#review162449
-----------------------------------------------------------
On Jan. 20, 2017, 6:07 p.m., Xuefu Zhang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55776/
> -----------------------------------------------------------
>
> (Updated Jan. 20, 2017, 6:07 p.m.)
>
>
> Review request for hive, Chao Sun and Rui Li.
>
>
> Bugs: HIVE-15580
> https://issues.apache.org/jira/browse/HIVE-15580
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> See JIRA description.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java e128dd2
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java eeb4443
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java d57cac4
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 997ab7e
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 66ffe5d
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 0d31e5f
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f
> ql/src/test/queries/clientpositive/union_top_level.q d93fe38
> ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83
> ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 65a6e3e
> ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4
> ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3
> ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f
>
> Diff: https://reviews.apache.org/r/55776/diff/
>
>
> Testing
> -------
>
> All test passed
>
>
> Thanks,
>
> Xuefu Zhang
>
>
Re: Review Request 55776: Eliminate unbounded memory usage for
orderBy and groupBy in Hive on Spark
Posted by Xuefu Zhang <xz...@cloudera.com>.
On Jan. 20, 2017, 6:26 p.m., Xuefu Zhang wrote:
> > I think we also need to update ql/src/test/results/clientpositive/union_top_level.q.out
>
> Xuefu Zhang wrote:
> No. I verified that MR's plan and result don't change at all. This is because the keys are the same for group by and order by.
>
> Chao Sun wrote:
> Hmm.. I'm surprised. We changed the input qfile and how come the result is not changed?
MR group by is also sorted, so the order by is something not needed so eliminated during optimization. So you, the test didn't fail in the Jenkins result.
- Xuefu
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/#review162449
-----------------------------------------------------------
On Jan. 20, 2017, 6:07 p.m., Xuefu Zhang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55776/
> -----------------------------------------------------------
>
> (Updated Jan. 20, 2017, 6:07 p.m.)
>
>
> Review request for hive, Chao Sun and Rui Li.
>
>
> Bugs: HIVE-15580
> https://issues.apache.org/jira/browse/HIVE-15580
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> See JIRA description.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java e128dd2
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java eeb4443
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java d57cac4
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 997ab7e
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 66ffe5d
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 0d31e5f
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f
> ql/src/test/queries/clientpositive/union_top_level.q d93fe38
> ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83
> ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 65a6e3e
> ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4
> ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3
> ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f
>
> Diff: https://reviews.apache.org/r/55776/diff/
>
>
> Testing
> -------
>
> All test passed
>
>
> Thanks,
>
> Xuefu Zhang
>
>
Re: Review Request 55776: Eliminate unbounded memory usage for
orderBy and groupBy in Hive on Spark
Posted by Chao Sun <ch...@cloudera.com>.
On Jan. 20, 2017, 6:26 p.m., Xuefu Zhang wrote:
> > I think we also need to update ql/src/test/results/clientpositive/union_top_level.q.out
>
> Xuefu Zhang wrote:
> No. I verified that MR's plan and result don't change at all. This is because the keys are the same for group by and order by.
Hmm.. I'm surprised. We changed the input qfile and how come the result is not changed?
- Chao
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/#review162449
-----------------------------------------------------------
On Jan. 20, 2017, 6:07 p.m., Xuefu Zhang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55776/
> -----------------------------------------------------------
>
> (Updated Jan. 20, 2017, 6:07 p.m.)
>
>
> Review request for hive, Chao Sun and Rui Li.
>
>
> Bugs: HIVE-15580
> https://issues.apache.org/jira/browse/HIVE-15580
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> See JIRA description.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java e128dd2
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java eeb4443
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java d57cac4
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 997ab7e
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 66ffe5d
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 0d31e5f
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f
> ql/src/test/queries/clientpositive/union_top_level.q d93fe38
> ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83
> ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 65a6e3e
> ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4
> ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3
> ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f
>
> Diff: https://reviews.apache.org/r/55776/diff/
>
>
> Testing
> -------
>
> All test passed
>
>
> Thanks,
>
> Xuefu Zhang
>
>
Re: Review Request 55776: Eliminate unbounded memory usage for
orderBy and groupBy in Hive on Spark
Posted by Xuefu Zhang <xz...@cloudera.com>.
> On Jan. 20, 2017, 6:26 p.m., Chao Sun wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java, line 31
> > <https://reviews.apache.org/r/55776/diff/1/?file=1610799#file1610799line31>
> >
> > Is it possible that `numPartitions` equals to 0?
No. If partition number is zero, that means no partition. Then we will not even get here. Nevertheless, if it's set to 0, we take 1 instead.
> On Jan. 20, 2017, 6:26 p.m., Chao Sun wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java, line 34
> > <https://reviews.apache.org/r/55776/diff/1/?file=1610799#file1610799line34>
> >
> > I wonder whether this also has some extra cost comparing to the original `groupByKey`, since it needs to sort all records by key in a single partition, right?
Well, we don't know which one performs better yet. repartitionAndSortWithinPartitions() brings extra softing, but it eliminates grouping in groupByKey(). Also, groupByKey() has unbounded memory usage, which is the problem we are tryig to solve. As described in the JIRA description. We will follow up with performance testing, and may provide an option to use either groupBy() which might be more performing but w/ unlimitted memory usage or the new way where memory usage is bounded.
- Xuefu
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/#review162449
-----------------------------------------------------------
On Jan. 20, 2017, 6:07 p.m., Xuefu Zhang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55776/
> -----------------------------------------------------------
>
> (Updated Jan. 20, 2017, 6:07 p.m.)
>
>
> Review request for hive, Chao Sun and Rui Li.
>
>
> Bugs: HIVE-15580
> https://issues.apache.org/jira/browse/HIVE-15580
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> See JIRA description.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java e128dd2
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java eeb4443
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java d57cac4
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 997ab7e
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 66ffe5d
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 0d31e5f
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f
> ql/src/test/queries/clientpositive/union_top_level.q d93fe38
> ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83
> ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 65a6e3e
> ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4
> ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3
> ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f
>
> Diff: https://reviews.apache.org/r/55776/diff/
>
>
> Testing
> -------
>
> All test passed
>
>
> Thanks,
>
> Xuefu Zhang
>
>
Re: Review Request 55776: Eliminate unbounded memory usage for
orderBy and groupBy in Hive on Spark
Posted by Chao Sun <ch...@cloudera.com>.
On Jan. 20, 2017, 6:26 p.m., Xuefu Zhang wrote:
> > I think we also need to update ql/src/test/results/clientpositive/union_top_level.q.out
>
> Xuefu Zhang wrote:
> No. I verified that MR's plan and result don't change at all. This is because the keys are the same for group by and order by.
>
> Chao Sun wrote:
> Hmm.. I'm surprised. We changed the input qfile and how come the result is not changed?
>
> Xuefu Zhang wrote:
> MR group by is also sorted, so the order by is something not needed so eliminated during optimization. So you, the test didn't fail in the Jenkins result.
No, I mean the input query is changed, so the output should also be changed. If you look at the MR output qfile, it still has
```
PREHOOK: query: explain
select * from (select s1.key as k, s2.value as v from src s1 join src s2 on (s1.key = s2.key) limit 10)a
union all
select * from (select s1.key as k, s2.value as v from src s1 join src s2 on (s1.key = s2.key) limit 10)b
PREHOOK: type: QUERY
POSTHOOK: query: explain
select * from (select s1.key as k, s2.value as v from src s1 join src s2 on (s1.key = s2.key) limit 10)a
union all
select * from (select s1.key as k, s2.value as v from src s1 join src s2 on (s1.key = s2.key) limit 10)b
POSTHOOK: type: QUERY
```
which suggest the test is not triggered on the MR path. Anyway, maybe the test is turned off for MR.
- Chao
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/#review162449
-----------------------------------------------------------
On Jan. 20, 2017, 6:07 p.m., Xuefu Zhang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55776/
> -----------------------------------------------------------
>
> (Updated Jan. 20, 2017, 6:07 p.m.)
>
>
> Review request for hive, Chao Sun and Rui Li.
>
>
> Bugs: HIVE-15580
> https://issues.apache.org/jira/browse/HIVE-15580
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> See JIRA description.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java e128dd2
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java eeb4443
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java d57cac4
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 997ab7e
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 66ffe5d
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 0d31e5f
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f
> ql/src/test/queries/clientpositive/union_top_level.q d93fe38
> ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83
> ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 65a6e3e
> ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4
> ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3
> ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f
>
> Diff: https://reviews.apache.org/r/55776/diff/
>
>
> Testing
> -------
>
> All test passed
>
>
> Thanks,
>
> Xuefu Zhang
>
>
Re: Review Request 55776: Eliminate unbounded memory usage for
orderBy and groupBy in Hive on Spark
Posted by Chao Sun <ch...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/#review162449
-----------------------------------------------------------
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java (line 31)
<https://reviews.apache.org/r/55776/#comment233782>
Is it possible that `numPartitions` equals to 0?
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java (line 34)
<https://reviews.apache.org/r/55776/#comment233785>
I wonder whether this also has some extra cost comparing to the original `groupByKey`, since it needs to sort all records by key in a single partition, right?
I think we also need to update ql/src/test/results/clientpositive/union_top_level.q.out
- Chao Sun
On Jan. 20, 2017, 6:07 p.m., Xuefu Zhang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55776/
> -----------------------------------------------------------
>
> (Updated Jan. 20, 2017, 6:07 p.m.)
>
>
> Review request for hive, Chao Sun and Rui Li.
>
>
> Bugs: HIVE-15580
> https://issues.apache.org/jira/browse/HIVE-15580
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> See JIRA description.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java e128dd2
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java eeb4443
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java d57cac4
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 997ab7e
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 66ffe5d
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 0d31e5f
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f
> ql/src/test/queries/clientpositive/union_top_level.q d93fe38
> ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83
> ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 65a6e3e
> ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4
> ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3
> ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f
>
> Diff: https://reviews.apache.org/r/55776/diff/
>
>
> Testing
> -------
>
> All test passed
>
>
> Thanks,
>
> Xuefu Zhang
>
>
Re: Review Request 55776: Eliminate unbounded memory usage for
orderBy and groupBy in Hive on Spark
Posted by Chao Sun <ch...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55776/#review162487
-----------------------------------------------------------
Ship it!
+1. Patch looks good to me, even though a little concerned by the possible performance downgrade. Please file follow up JIRAs for the TODO.
It also may be good to have Rui had a look before this is committed. Thanks
- Chao Sun
On Jan. 20, 2017, 6:07 p.m., Xuefu Zhang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55776/
> -----------------------------------------------------------
>
> (Updated Jan. 20, 2017, 6:07 p.m.)
>
>
> Review request for hive, Chao Sun and Rui Li.
>
>
> Bugs: HIVE-15580
> https://issues.apache.org/jira/browse/HIVE-15580
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> See JIRA description.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GroupByShuffler.java e128dd2
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java eeb4443
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java d57cac4
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java 3d56876
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SortByShuffler.java 997ab7e
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 66ffe5d
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java 0d31e5f
> ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkShuffler.java 40e251f
> ql/src/test/queries/clientpositive/union_top_level.q d93fe38
> ql/src/test/results/clientpositive/llap/union_top_level.q.out b48ab83
> ql/src/test/results/clientpositive/spark/lateral_view_explode2.q.out 65a6e3e
> ql/src/test/results/clientpositive/spark/union_remove_25.q.out 9fec1d4
> ql/src/test/results/clientpositive/spark/union_top_level.q.out c9cb5d3
> ql/src/test/results/clientpositive/spark/vector_outer_join5.q.out 9e1742f
>
> Diff: https://reviews.apache.org/r/55776/diff/
>
>
> Testing
> -------
>
> All test passed
>
>
> Thanks,
>
> Xuefu Zhang
>
>