You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Harish Butani <rh...@gmail.com> on 2014/05/28 07:40:40 UTC

Review Request 21970: HIVE-7062: Support Streaming mode in Windowing

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21970/
-----------------------------------------------------------

Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-7062
    https://issues.apache.org/jira/browse/HIVE-7062


Repository: hive-git


Description
-------

1. Have the Windowing Table Function support streaming mode.
2. Have special handling for Ranking UDAFs.
3. Have special handling for Sum/Avg for fixed size Wdws.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java d3800c2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java b5adb11 
  ql/src/java/org/apache/hadoop/hive/ql/exec/PTFRollingPartition.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 814ae37 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCumeDist.java 18c8c8d 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFDenseRank.java c1d43d8 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEvaluator.java 5668a3b 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentRank.java aab1922 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFRank.java 5c8f1e0 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java 8508ffb 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/ISupportStreamingModeForWindowing.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java d50a542 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java be1f9ab 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 8a1e085 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java cdb5624 
  ql/src/test/org/apache/hadoop/hive/ql/udaf/TestStreamingAvg.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/udaf/TestStreamingSum.java PRE-CREATION 
  ql/src/test/results/clientpositive/ptf.q.out eb4997d 
  ql/src/test/results/clientpositive/windowing.q.out 7e23497 
  ql/src/test/results/clientpositive/windowing_windowspec.q.out 6ea068c 

Diff: https://reviews.apache.org/r/21970/diff/


Testing
-------

run existing windowing and ptf tests
Add unit tests for StreamingSum and StreamingAvg evaluators.


Thanks,

Harish Butani


Re: Review Request 21970: HIVE-7062: Support Streaming mode in Windowing

Posted by Harish Butani <rh...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21970/
-----------------------------------------------------------

(Updated May 29, 2014, 4:27 p.m.)


Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-7062
    https://issues.apache.org/jira/browse/HIVE-7062


Repository: hive-git


Description
-------

1. Have the Windowing Table Function support streaming mode.
2. Have special handling for Ranking UDAFs.
3. Have special handling for Sum/Avg for fixed size Wdws.


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java d3800c2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java b5adb11 
  ql/src/java/org/apache/hadoop/hive/ql/exec/PTFRollingPartition.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 814ae37 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCumeDist.java 18c8c8d 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFDenseRank.java c1d43d8 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEvaluator.java 5668a3b 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentRank.java aab1922 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFRank.java 5c8f1e0 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java 8508ffb 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/ISupportStreamingModeForWindowing.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java d50a542 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java be1f9ab 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 8a1e085 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java cdb5624 
  ql/src/test/org/apache/hadoop/hive/ql/udaf/TestStreamingAvg.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/udaf/TestStreamingSum.java PRE-CREATION 
  ql/src/test/results/clientpositive/ptf.q.out eb4997d 
  ql/src/test/results/clientpositive/windowing.q.out 7e23497 
  ql/src/test/results/clientpositive/windowing_windowspec.q.out 6ea068c 

Diff: https://reviews.apache.org/r/21970/diff/


Testing
-------

run existing windowing and ptf tests
Add unit tests for StreamingSum and StreamingAvg evaluators.


Thanks,

Harish Butani


Re: Review Request 21970: HIVE-7062: Support Streaming mode in Windowing

Posted by Harish Butani <rh...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21970/
-----------------------------------------------------------

(Updated May 29, 2014, 4:24 p.m.)


Review request for hive and Ashutosh Chauhan.


Changes
-------

addressed Ashutosh'c comments.
added check if Window Size is less than 'hive.join.cache.size' setting


Bugs: HIVE-7062
    https://issues.apache.org/jira/browse/HIVE-7062


Repository: hive-git


Description
-------

1. Have the Windowing Table Function support streaming mode.
2. Have special handling for Ranking UDAFs.
3. Have special handling for Sum/Avg for fixed size Wdws.


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java d3800c2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java b5adb11 
  ql/src/java/org/apache/hadoop/hive/ql/exec/PTFRollingPartition.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 814ae37 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCumeDist.java 18c8c8d 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFDenseRank.java c1d43d8 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEvaluator.java 5668a3b 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentRank.java aab1922 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFRank.java 5c8f1e0 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java 8508ffb 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/ISupportStreamingModeForWindowing.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java d50a542 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java be1f9ab 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 8a1e085 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java cdb5624 
  ql/src/test/org/apache/hadoop/hive/ql/udaf/TestStreamingAvg.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/udaf/TestStreamingSum.java PRE-CREATION 
  ql/src/test/results/clientpositive/ptf.q.out eb4997d 
  ql/src/test/results/clientpositive/windowing.q.out 7e23497 
  ql/src/test/results/clientpositive/windowing_windowspec.q.out 6ea068c 

Diff: https://reviews.apache.org/r/21970/diff/


Testing
-------

run existing windowing and ptf tests
Add unit tests for StreamingSum and StreamingAvg evaluators.


Thanks,

Harish Butani


Re: Review Request 21970: HIVE-7062: Support Streaming mode in Windowing

Posted by Harish Butani <rh...@gmail.com>.

> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java, line 65
> > <https://reviews.apache.org/r/21970/diff/1/?file=597243#file597243line65>
> >
> >     Seems like there is no call of this constructor with createElemContainer = false. So, is this new boolean required ?

RollingPartition creation invokes this with false; so that underlying RowContainer is not allocated.


> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java, line 168
> > <https://reviews.apache.org/r/21970/diff/1/?file=597245#file597245line168>
> >
> >     Good to add comment that range based windows not supported yet.

done


> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java, line 172
> > <https://reviews.apache.org/r/21970/diff/1/?file=597245#file597245line172>
> >
> >     Add comment that unbounded windows not supported yet.

done


> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFRank.java, line 87
> > <https://reviews.apache.org/r/21970/diff/1/?file=597250#file597250line87>
> >
> >     Do we need this boolean? All ranking functions can support streaming.. isn't it?

No CumeDist and PercentRank don't support streaming
Also supportsStreaming is used by control how rank computation is done.
In case of Rank/DenseRank there is no separate class for Streaming. Same class handles both modes;
flag is used to distinguish how ranking is achieved.


> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java, line 31
> > <https://reviews.apache.org/r/21970/diff/1/?file=597251#file597251line31>
> >
> >     Better name : GenericUDAFStreamingEvaluator ?

as we discussed Enhancer because it provides Streaming mode to any Evaluator (provided incremental aggregations can be done based on cumulative aggregates)


> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java, line 52
> > <https://reviews.apache.org/r/21970/diff/1/?file=597251#file597251line52>
> >
> >     This can always be computed using preceding + following, so we can get rid of it.

done


> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java, line 78
> > <https://reviews.apache.org/r/21970/diff/1/?file=597251#file597251line78>
> >
> >     Can you add a comment for this? especially last factor of wSize * underlying.

done. (the formula was wrong)


> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java, line 1225
> > <https://reviews.apache.org/r/21970/diff/1/?file=597257#file597257line1225>
> >
> >     Better name : funcArgs?

done


> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/PTFRollingPartition.java, line 66
> > <https://reviews.apache.org/r/21970/diff/1/?file=597244#file597244line66>
> >
> >     I think we can do new ArrayList(precedingSpan + followingSpan). Also, since we currently support only fixed size windows, this could even be an array.

initialized arraylist
yes using array is possible, not doing here though


> On May 28, 2014, 10:28 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/PTFRollingPartition.java, line 53
> > <https://reviews.apache.org/r/21970/diff/1/?file=597244#file597244line53>
> >
> >     For more clarity.. it will help to show these four indices using ASCII art, like following :
> >     
> >     --------------------------------------
> >            |      |         |         |
> >

done


- Harish


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21970/#review44178
-----------------------------------------------------------


On May 28, 2014, 5:40 a.m., Harish Butani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21970/
> -----------------------------------------------------------
> 
> (Updated May 28, 2014, 5:40 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-7062
>     https://issues.apache.org/jira/browse/HIVE-7062
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> 1. Have the Windowing Table Function support streaming mode.
> 2. Have special handling for Ranking UDAFs.
> 3. Have special handling for Sum/Avg for fixed size Wdws.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java d3800c2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java b5adb11 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/PTFRollingPartition.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 814ae37 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCumeDist.java 18c8c8d 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFDenseRank.java c1d43d8 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEvaluator.java 5668a3b 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentRank.java aab1922 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFRank.java 5c8f1e0 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java 8508ffb 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/ISupportStreamingModeForWindowing.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java d50a542 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java be1f9ab 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 8a1e085 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java cdb5624 
>   ql/src/test/org/apache/hadoop/hive/ql/udaf/TestStreamingAvg.java PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/udaf/TestStreamingSum.java PRE-CREATION 
>   ql/src/test/results/clientpositive/ptf.q.out eb4997d 
>   ql/src/test/results/clientpositive/windowing.q.out 7e23497 
>   ql/src/test/results/clientpositive/windowing_windowspec.q.out 6ea068c 
> 
> Diff: https://reviews.apache.org/r/21970/diff/
> 
> 
> Testing
> -------
> 
> run existing windowing and ptf tests
> Add unit tests for StreamingSum and StreamingAvg evaluators.
> 
> 
> Thanks,
> 
> Harish Butani
> 
>


Re: Review Request 21970: HIVE-7062: Support Streaming mode in Windowing

Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21970/#review44178
-----------------------------------------------------------


Mostly looks good. Some minor comments.


ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java
<https://reviews.apache.org/r/21970/#comment78495>

    Seems like there is no call of this constructor with createElemContainer = false. So, is this new boolean required ?



ql/src/java/org/apache/hadoop/hive/ql/exec/PTFRollingPartition.java
<https://reviews.apache.org/r/21970/#comment78500>

    For more clarity.. it will help to show these four indices using ASCII art, like following :
    
    --------------------------------------
           |      |         |         |
        



ql/src/java/org/apache/hadoop/hive/ql/exec/PTFRollingPartition.java
<https://reviews.apache.org/r/21970/#comment78497>

    I think we can do new ArrayList(precedingSpan + followingSpan). Also, since we currently support only fixed size windows, this could even be an array.



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java
<https://reviews.apache.org/r/21970/#comment78501>

    Good to add comment that range based windows not supported yet.



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java
<https://reviews.apache.org/r/21970/#comment78502>

    Add comment that unbounded windows not supported yet.



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFRank.java
<https://reviews.apache.org/r/21970/#comment78503>

    Do we need this boolean? All ranking functions can support streaming.. isn't it?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java
<https://reviews.apache.org/r/21970/#comment78504>

    Better name : GenericUDAFStreamingEvaluator ?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java
<https://reviews.apache.org/r/21970/#comment78505>

    This can always be computed using preceding + following, so we can get rid of it.



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java
<https://reviews.apache.org/r/21970/#comment78510>

    Can you add a comment for this? especially last factor of wSize * underlying.



ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java
<https://reviews.apache.org/r/21970/#comment78511>

    Better name : funcArgs?


- Ashutosh Chauhan


On May 28, 2014, 5:40 a.m., Harish Butani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21970/
> -----------------------------------------------------------
> 
> (Updated May 28, 2014, 5:40 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-7062
>     https://issues.apache.org/jira/browse/HIVE-7062
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> 1. Have the Windowing Table Function support streaming mode.
> 2. Have special handling for Ranking UDAFs.
> 3. Have special handling for Sum/Avg for fixed size Wdws.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java d3800c2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java b5adb11 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/PTFRollingPartition.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 814ae37 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCumeDist.java 18c8c8d 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFDenseRank.java c1d43d8 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEvaluator.java 5668a3b 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFPercentRank.java aab1922 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFRank.java 5c8f1e0 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFStreamingEnhancer.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java 8508ffb 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/ISupportStreamingModeForWindowing.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java d50a542 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java be1f9ab 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 8a1e085 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java cdb5624 
>   ql/src/test/org/apache/hadoop/hive/ql/udaf/TestStreamingAvg.java PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/udaf/TestStreamingSum.java PRE-CREATION 
>   ql/src/test/results/clientpositive/ptf.q.out eb4997d 
>   ql/src/test/results/clientpositive/windowing.q.out 7e23497 
>   ql/src/test/results/clientpositive/windowing_windowspec.q.out 6ea068c 
> 
> Diff: https://reviews.apache.org/r/21970/diff/
> 
> 
> Testing
> -------
> 
> run existing windowing and ptf tests
> Add unit tests for StreamingSum and StreamingAvg evaluators.
> 
> 
> Thanks,
> 
> Harish Butani
> 
>