You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Harish Butani <rh...@gmail.com> on 2014/05/07 20:11:53 UTC

Review Request 21168: HIVE-6999: Add streaming mode to PTFs

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21168/
-----------------------------------------------------------

Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-6999
    https://issues.apache.org/jira/browse/HIVE-6999


Repository: hive-git


Description
-------

There are a set of use cases where the Table Function can operate on a Partition row by row or on a subset(window) of rows as it is being streamed to it.
Windowing has couple of use cases of this:processing of Rank functions, processing of Window Aggregations.
But this is a generic concept: any analysis that operates on an Ordered partition maybe able to operate in Streaming mode.
This patch introduces streaming mode in PTFs and provides the mechanics to handle PTF chains that contain both modes of PTFs.
Subsequent patches will introduce Streaming mode for Windowing.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 3bb8fa9 
  ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java 4d314b7 
  ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java 34aebf0 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 1087bbf 
  ql/src/test/queries/clientpositive/ptf_streaming.q PRE-CREATION 
  ql/src/test/results/clientpositive/ptf_streaming.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/21168/diff/


Testing
-------

added new tests


Thanks,

Harish Butani


Re: Review Request 21168: HIVE-6999: Add streaming mode to PTFs

Posted by Harish Butani <rh...@gmail.com>.

> On May 9, 2014, 5:48 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java, line 58
> > <https://reviews.apache.org/r/21168/diff/1/?file=576144#file576144line58>
> >
> >     Can you add a comment why we need to keep track for first row processed in Map?

done


> On May 9, 2014, 5:48 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java, line 319
> > <https://reviews.apache.org/r/21168/diff/1/?file=576144#file576144line319>
> >
> >     Better name : outputPartRowsItr?

done


> On May 9, 2014, 5:48 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java, line 96
> > <https://reviews.apache.org/r/21168/diff/1/?file=576148#file576148line96>
> >
> >     Comment made sense. Since like those fields are not present in class any more. Shall we just get rid of this?

this is needed; transient based on BeanInfo (get/set methods) in class


> On May 9, 2014, 5:48 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java, line 218
> > <https://reviews.apache.org/r/21168/diff/1/?file=576148#file576148line218>
> >
> >     Better name: canAcceptInputAsStream?

done


> On May 9, 2014, 5:48 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java, line 449
> > <https://reviews.apache.org/r/21168/diff/1/?file=576143#file576143line449>
> >
> >     Instead of adding noop functions statically, we should put these functions in test/ package and than do add jar for testing. Multiple reasons:
> >     * Better to isolate test code from production code.
> >     * It will also exercise add jar functionality for PTF functions for which I am not sure we have coverage.
> >     * These functions also show up in default list of inbuilt functions. It may confuse user to wonder what good these functions are for. show_functions.q failed because of this.

Agree with your comments on Noop. This was done because for testing we need a PTF and Noop has some special short circuit path for Partition handling.
But can we do this as a separate Jira; removing references to Noop in the translation code is non trivial work.


- Harish


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21168/#review42578
-----------------------------------------------------------


On May 14, 2014, 9:21 p.m., Harish Butani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21168/
> -----------------------------------------------------------
> 
> (Updated May 14, 2014, 9:21 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-6999
>     https://issues.apache.org/jira/browse/HIVE-6999
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> There are a set of use cases where the Table Function can operate on a Partition row by row or on a subset(window) of rows as it is being streamed to it.
> Windowing has couple of use cases of this:processing of Rank functions, processing of Window Aggregations.
> But this is a generic concept: any analysis that operates on an Ordered partition maybe able to operate in Streaming mode.
> This patch introduces streaming mode in PTFs and provides the mechanics to handle PTF chains that contain both modes of PTFs.
> Subsequent patches will introduce Streaming mode for Windowing.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 3bb8fa9 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java 4d314b7 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java 34aebf0 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 1087bbf 
>   ql/src/test/queries/clientpositive/ptf_streaming.q PRE-CREATION 
>   ql/src/test/results/clientpositive/ptf_streaming.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/21168/diff/
> 
> 
> Testing
> -------
> 
> added new tests
> 
> 
> Thanks,
> 
> Harish Butani
> 
>


Re: Review Request 21168: HIVE-6999: Add streaming mode to PTFs

Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21168/#review42578
-----------------------------------------------------------


Design looks good. Few implementation related comments.


ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
<https://reviews.apache.org/r/21168/#comment76384>

    Instead of adding noop functions statically, we should put these functions in test/ package and than do add jar for testing. Multiple reasons:
    * Better to isolate test code from production code.
    * It will also exercise add jar functionality for PTF functions for which I am not sure we have coverage.
    * These functions also show up in default list of inbuilt functions. It may confuse user to wonder what good these functions are for. show_functions.q failed because of this.



ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
<https://reviews.apache.org/r/21168/#comment76385>

    Can you add a comment why we need to keep track for first row processed in Map?



ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
<https://reviews.apache.org/r/21168/#comment76394>

    Nice helpful comments!



ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
<https://reviews.apache.org/r/21168/#comment76386>

    Better name : outputPartRowsItr?



ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
<https://reviews.apache.org/r/21168/#comment76391>

    This doesnt feel right. Why do we need to special handle no-op functions? If there is a good reason for it, we should document in comment here.



ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java
<https://reviews.apache.org/r/21168/#comment76392>

    Better placed in test package? Also, ASF headers are missing.



ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java
<https://reviews.apache.org/r/21168/#comment76393>

    Better placed in test package? Also, ASF headers are missing.



ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java
<https://reviews.apache.org/r/21168/#comment76395>

    Nice helpful comments!



ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java
<https://reviews.apache.org/r/21168/#comment76396>

    Comment made sense. Since like those fields are not present in class any more. Shall we just get rid of this?



ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java
<https://reviews.apache.org/r/21168/#comment76397>

    Better name: canAcceptInputAsStream?


- Ashutosh Chauhan


On May 7, 2014, 6:11 p.m., Harish Butani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21168/
> -----------------------------------------------------------
> 
> (Updated May 7, 2014, 6:11 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-6999
>     https://issues.apache.org/jira/browse/HIVE-6999
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> There are a set of use cases where the Table Function can operate on a Partition row by row or on a subset(window) of rows as it is being streamed to it.
> Windowing has couple of use cases of this:processing of Rank functions, processing of Window Aggregations.
> But this is a generic concept: any analysis that operates on an Ordered partition maybe able to operate in Streaming mode.
> This patch introduces streaming mode in PTFs and provides the mechanics to handle PTF chains that contain both modes of PTFs.
> Subsequent patches will introduce Streaming mode for Windowing.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 3bb8fa9 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java 4d314b7 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java 34aebf0 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 1087bbf 
>   ql/src/test/queries/clientpositive/ptf_streaming.q PRE-CREATION 
>   ql/src/test/results/clientpositive/ptf_streaming.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/21168/diff/
> 
> 
> Testing
> -------
> 
> added new tests
> 
> 
> Thanks,
> 
> Harish Butani
> 
>


Re: Review Request 21168: HIVE-6999: Add streaming mode to PTFs

Posted by Harish Butani <rh...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21168/
-----------------------------------------------------------

(Updated May 14, 2014, 9:21 p.m.)


Review request for hive and Ashutosh Chauhan.


Bugs: HIVE-6999
    https://issues.apache.org/jira/browse/HIVE-6999


Repository: hive-git


Description
-------

There are a set of use cases where the Table Function can operate on a Partition row by row or on a subset(window) of rows as it is being streamed to it.
Windowing has couple of use cases of this:processing of Rank functions, processing of Window Aggregations.
But this is a generic concept: any analysis that operates on an Ordered partition maybe able to operate in Streaming mode.
This patch introduces streaming mode in PTFs and provides the mechanics to handle PTF chains that contain both modes of PTFs.
Subsequent patches will introduce Streaming mode for Windowing.


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 3bb8fa9 
  ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java 4d314b7 
  ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java 34aebf0 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 1087bbf 
  ql/src/test/queries/clientpositive/ptf_streaming.q PRE-CREATION 
  ql/src/test/results/clientpositive/ptf_streaming.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/21168/diff/


Testing
-------

added new tests


Thanks,

Harish Butani