You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Harish Butani <rh...@gmail.com> on 2014/05/28 07:40:40 UTC
Review Request 21970: HIVE-7062: Support Streaming mode in Windowing
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21970/
-----------------------------------------------------------
Review request for hive and Ashutosh Chauhan.
Bugs: HIVE-7062
https://issues.apache.org/jira/browse/HIVE-7062
Repository: hive-git
Description
-------
1. Have the Windowing Table Function support streaming mode.
2. Have special handling for Ranking UDAFs.
3. Have special handling for Sum/Avg for fixed size Wdws.
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java d3800c2
ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java b5adb11
ql/src/java/org/apache/hadoop/hive/ql/exec/PTFRollingPartition.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 814ae37
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCumeDist.java 18c8c8d
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFDenseRank.java c1d43d8
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEvaluator.java 5668a3b
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentRank.java aab1922
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFRank.java 5c8f1e0
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java 8508ffb
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/ISupportStreamingModeForWindowing.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java d50a542
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java be1f9ab
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 8a1e085
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java cdb5624
ql/src/test/org/apache/hadoop/hive/ql/udaf/TestStreamingAvg.java PRE-CREATION
ql/src/test/org/apache/hadoop/hive/ql/udaf/TestStreamingSum.java PRE-CREATION
ql/src/test/results/clientpositive/ptf.q.out eb4997d
ql/src/test/results/clientpositive/windowing.q.out 7e23497
ql/src/test/results/clientpositive/windowing_windowspec.q.out 6ea068c
Diff: https://reviews.apache.org/r/21970/diff/
Testing
-------
run existing windowing and ptf tests
Add unit tests for StreamingSum and StreamingAvg evaluators.
Thanks,
Harish Butani
Re: Review Request 21970: HIVE-7062: Support Streaming mode in Windowing
Posted by Harish Butani <rh...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21970/
-----------------------------------------------------------
(Updated May 29, 2014, 4:27 p.m.)
Review request for hive and Ashutosh Chauhan.
Bugs: HIVE-7062
https://issues.apache.org/jira/browse/HIVE-7062
Repository: hive-git
Description
-------
1. Have the Windowing Table Function support streaming mode.
2. Have special handling for Ranking UDAFs.
3. Have special handling for Sum/Avg for fixed size Wdws.
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java d3800c2
ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java b5adb11
ql/src/java/org/apache/hadoop/hive/ql/exec/PTFRollingPartition.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 814ae37
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCumeDist.java 18c8c8d
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFDenseRank.java c1d43d8
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEvaluator.java 5668a3b
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentRank.java aab1922
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFRank.java 5c8f1e0
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java 8508ffb
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/ISupportStreamingModeForWindowing.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java d50a542
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java be1f9ab
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 8a1e085
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java cdb5624
ql/src/test/org/apache/hadoop/hive/ql/udaf/TestStreamingAvg.java PRE-CREATION
ql/src/test/org/apache/hadoop/hive/ql/udaf/TestStreamingSum.java PRE-CREATION
ql/src/test/results/clientpositive/ptf.q.out eb4997d
ql/src/test/results/clientpositive/windowing.q.out 7e23497
ql/src/test/results/clientpositive/windowing_windowspec.q.out 6ea068c
Diff: https://reviews.apache.org/r/21970/diff/
Testing
-------
run existing windowing and ptf tests
Add unit tests for StreamingSum and StreamingAvg evaluators.
Thanks,
Harish Butani
Re: Review Request 21970: HIVE-7062: Support Streaming mode in Windowing
Posted by Harish Butani <rh...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21970/
-----------------------------------------------------------
(Updated May 29, 2014, 4:24 p.m.)
Review request for hive and Ashutosh Chauhan.
Changes
-------
addressed Ashutosh'c comments.
added check if Window Size is less than 'hive.join.cache.size' setting
Bugs: HIVE-7062
https://issues.apache.org/jira/browse/HIVE-7062
Repository: hive-git
Description
-------
1. Have the Windowing Table Function support streaming mode.
2. Have special handling for Ranking UDAFs.
3. Have special handling for Sum/Avg for fixed size Wdws.
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java d3800c2
ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java b5adb11
ql/src/java/org/apache/hadoop/hive/ql/exec/PTFRollingPartition.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 814ae37
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCumeDist.java 18c8c8d
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFDenseRank.java c1d43d8
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEvaluator.java 5668a3b
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentRank.java aab1922
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFRank.java 5c8f1e0
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java 8508ffb
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/ISupportStreamingModeForWindowing.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java d50a542
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java be1f9ab
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 8a1e085
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java cdb5624
ql/src/test/org/apache/hadoop/hive/ql/udaf/TestStreamingAvg.java PRE-CREATION
ql/src/test/org/apache/hadoop/hive/ql/udaf/TestStreamingSum.java PRE-CREATION
ql/src/test/results/clientpositive/ptf.q.out eb4997d
ql/src/test/results/clientpositive/windowing.q.out 7e23497
ql/src/test/results/clientpositive/windowing_windowspec.q.out 6ea068c
Diff: https://reviews.apache.org/r/21970/diff/
Testing
-------
run existing windowing and ptf tests
Add unit tests for StreamingSum and StreamingAvg evaluators.
Thanks,
Harish Butani
Re: Review Request 21970: HIVE-7062: Support Streaming mode in Windowing
Posted by Harish Butani <rh...@gmail.com>.
> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java, line 65
> > <https://reviews.apache.org/r/21970/diff/1/?file=597243#file597243line65>
> >
> > Seems like there is no call of this constructor with createElemContainer = false. So, is this new boolean required ?
RollingPartition creation invokes this with false; so that underlying RowContainer is not allocated.
> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java, line 168
> > <https://reviews.apache.org/r/21970/diff/1/?file=597245#file597245line168>
> >
> > Good to add comment that range based windows not supported yet.
done
> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java, line 172
> > <https://reviews.apache.org/r/21970/diff/1/?file=597245#file597245line172>
> >
> > Add comment that unbounded windows not supported yet.
done
> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFRank.java, line 87
> > <https://reviews.apache.org/r/21970/diff/1/?file=597250#file597250line87>
> >
> > Do we need this boolean? All ranking functions can support streaming.. isn't it?
No CumeDist and PercentRank don't support streaming
Also supportsStreaming is used by control how rank computation is done.
In case of Rank/DenseRank there is no separate class for Streaming. Same class handles both modes;
flag is used to distinguish how ranking is achieved.
> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java, line 31
> > <https://reviews.apache.org/r/21970/diff/1/?file=597251#file597251line31>
> >
> > Better name : GenericUDAFStreamingEvaluator ?
as we discussed Enhancer because it provides Streaming mode to any Evaluator (provided incremental aggregations can be done based on cumulative aggregates)
> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java, line 52
> > <https://reviews.apache.org/r/21970/diff/1/?file=597251#file597251line52>
> >
> > This can always be computed using preceding + following, so we can get rid of it.
done
> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java, line 78
> > <https://reviews.apache.org/r/21970/diff/1/?file=597251#file597251line78>
> >
> > Can you add a comment for this? especially last factor of wSize * underlying.
done. (the formula was wrong)
> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java, line 1225
> > <https://reviews.apache.org/r/21970/diff/1/?file=597257#file597257line1225>
> >
> > Better name : funcArgs?
done
> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/PTFRollingPartition.java, line 66
> > <https://reviews.apache.org/r/21970/diff/1/?file=597244#file597244line66>
> >
> > I think we can do new ArrayList(precedingSpan + followingSpan). Also, since we currently support only fixed size windows, this could even be an array.
initialized arraylist
yes using array is possible, not doing here though
> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/PTFRollingPartition.java, line 53
> > <https://reviews.apache.org/r/21970/diff/1/?file=597244#file597244line53>
> >
> > For more clarity.. it will help to show these four indices using ASCII art, like following :
> >
> > --------------------------------------
> > | | | |
> >
done
- Harish
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21970/#review44178
-----------------------------------------------------------
On May 28, 2014, 5:40 a.m., Harish Butani wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21970/
> -----------------------------------------------------------
>
> (Updated May 28, 2014, 5:40 a.m.)
>
>
> Review request for hive and Ashutosh Chauhan.
>
>
> Bugs: HIVE-7062
> https://issues.apache.org/jira/browse/HIVE-7062
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> 1. Have the Windowing Table Function support streaming mode.
> 2. Have special handling for Ranking UDAFs.
> 3. Have special handling for Sum/Avg for fixed size Wdws.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java d3800c2
> ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java b5adb11
> ql/src/java/org/apache/hadoop/hive/ql/exec/PTFRollingPartition.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 814ae37
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCumeDist.java 18c8c8d
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFDenseRank.java c1d43d8
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEvaluator.java 5668a3b
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentRank.java aab1922
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFRank.java 5c8f1e0
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java 8508ffb
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/ISupportStreamingModeForWindowing.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java d50a542
> ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java be1f9ab
> ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 8a1e085
> ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java cdb5624
> ql/src/test/org/apache/hadoop/hive/ql/udaf/TestStreamingAvg.java PRE-CREATION
> ql/src/test/org/apache/hadoop/hive/ql/udaf/TestStreamingSum.java PRE-CREATION
> ql/src/test/results/clientpositive/ptf.q.out eb4997d
> ql/src/test/results/clientpositive/windowing.q.out 7e23497
> ql/src/test/results/clientpositive/windowing_windowspec.q.out 6ea068c
>
> Diff: https://reviews.apache.org/r/21970/diff/
>
>
> Testing
> -------
>
> run existing windowing and ptf tests
> Add unit tests for StreamingSum and StreamingAvg evaluators.
>
>
> Thanks,
>
> Harish Butani
>
>
Re: Review Request 21970: HIVE-7062: Support Streaming mode in Windowing
Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21970/#review44178
-----------------------------------------------------------
Mostly looks good. Some minor comments.
ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java
<https://reviews.apache.org/r/21970/#comment78495>
Seems like there is no call of this constructor with createElemContainer = false. So, is this new boolean required ?
ql/src/java/org/apache/hadoop/hive/ql/exec/PTFRollingPartition.java
<https://reviews.apache.org/r/21970/#comment78500>
For more clarity.. it will help to show these four indices using ASCII art, like following :
--------------------------------------
| | | |
ql/src/java/org/apache/hadoop/hive/ql/exec/PTFRollingPartition.java
<https://reviews.apache.org/r/21970/#comment78497>
I think we can do new ArrayList(precedingSpan + followingSpan). Also, since we currently support only fixed size windows, this could even be an array.
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java
<https://reviews.apache.org/r/21970/#comment78501>
Good to add comment that range based windows not supported yet.
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java
<https://reviews.apache.org/r/21970/#comment78502>
Add comment that unbounded windows not supported yet.
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFRank.java
<https://reviews.apache.org/r/21970/#comment78503>
Do we need this boolean? All ranking functions can support streaming.. isn't it?
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java
<https://reviews.apache.org/r/21970/#comment78504>
Better name : GenericUDAFStreamingEvaluator ?
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java
<https://reviews.apache.org/r/21970/#comment78505>
This can always be computed using preceding + following, so we can get rid of it.
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java
<https://reviews.apache.org/r/21970/#comment78510>
Can you add a comment for this? especially last factor of wSize * underlying.
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java
<https://reviews.apache.org/r/21970/#comment78511>
Better name : funcArgs?
- Ashutosh Chauhan
On May 28, 2014, 5:40 a.m., Harish Butani wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21970/
> -----------------------------------------------------------
>
> (Updated May 28, 2014, 5:40 a.m.)
>
>
> Review request for hive and Ashutosh Chauhan.
>
>
> Bugs: HIVE-7062
> https://issues.apache.org/jira/browse/HIVE-7062
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> 1. Have the Windowing Table Function support streaming mode.
> 2. Have special handling for Ranking UDAFs.
> 3. Have special handling for Sum/Avg for fixed size Wdws.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java d3800c2
> ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java b5adb11
> ql/src/java/org/apache/hadoop/hive/ql/exec/PTFRollingPartition.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 814ae37
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCumeDist.java 18c8c8d
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFDenseRank.java c1d43d8
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEvaluator.java 5668a3b
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentRank.java aab1922
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFRank.java 5c8f1e0
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java 8508ffb
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/ISupportStreamingModeForWindowing.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java d50a542
> ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java be1f9ab
> ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 8a1e085
> ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java cdb5624
> ql/src/test/org/apache/hadoop/hive/ql/udaf/TestStreamingAvg.java PRE-CREATION
> ql/src/test/org/apache/hadoop/hive/ql/udaf/TestStreamingSum.java PRE-CREATION
> ql/src/test/results/clientpositive/ptf.q.out eb4997d
> ql/src/test/results/clientpositive/windowing.q.out 7e23497
> ql/src/test/results/clientpositive/windowing_windowspec.q.out 6ea068c
>
> Diff: https://reviews.apache.org/r/21970/diff/
>
>
> Testing
> -------
>
> run existing windowing and ptf tests
> Add unit tests for StreamingSum and StreamingAvg evaluators.
>
>
> Thanks,
>
> Harish Butani
>
>