You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Harish Butani <rh...@gmail.com> on 2014/05/07 20:11:53 UTC
Review Request 21168: HIVE-6999: Add streaming mode to PTFs
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21168/
-----------------------------------------------------------
Review request for hive and Ashutosh Chauhan.
Bugs: HIVE-6999
https://issues.apache.org/jira/browse/HIVE-6999
Repository: hive-git
Description
-------
There are a set of use cases where the Table Function can operate on a Partition row by row or on a subset(window) of rows as it is being streamed to it.
Windowing has couple of use cases of this:processing of Rank functions, processing of Window Aggregations.
But this is a generic concept: any analysis that operates on an Ordered partition maybe able to operate in Streaming mode.
This patch introduces streaming mode in PTFs and provides the mechanics to handle PTF chains that contain both modes of PTFs.
Subsequent patches will introduce Streaming mode for Windowing.
Diffs
-----
ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 3bb8fa9
ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java 4d314b7
ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java 34aebf0
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 1087bbf
ql/src/test/queries/clientpositive/ptf_streaming.q PRE-CREATION
ql/src/test/results/clientpositive/ptf_streaming.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/21168/diff/
Testing
-------
added new tests
Thanks,
Harish Butani
Re: Review Request 21168: HIVE-6999: Add streaming mode to PTFs
Posted by Harish Butani <rh...@gmail.com>.
> On May 9, 2014, 5:48 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java, line 58
> > <https://reviews.apache.org/r/21168/diff/1/?file=576144#file576144line58>
> >
> > Can you add a comment why we need to keep track for first row processed in Map?
done
> On May 9, 2014, 5:48 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java, line 319
> > <https://reviews.apache.org/r/21168/diff/1/?file=576144#file576144line319>
> >
> > Better name : outputPartRowsItr?
done
> On May 9, 2014, 5:48 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java, line 96
> > <https://reviews.apache.org/r/21168/diff/1/?file=576148#file576148line96>
> >
> > Comment made sense. Since like those fields are not present in class any more. Shall we just get rid of this?
this is needed; transient based on BeanInfo (get/set methods) in class
> On May 9, 2014, 5:48 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java, line 218
> > <https://reviews.apache.org/r/21168/diff/1/?file=576148#file576148line218>
> >
> > Better name: canAcceptInputAsStream?
done
> On May 9, 2014, 5:48 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java, line 449
> > <https://reviews.apache.org/r/21168/diff/1/?file=576143#file576143line449>
> >
> > Instead of adding noop functions statically, we should put these functions in test/ package and than do add jar for testing. Multiple reasons:
> > * Better to isolate test code from production code.
> > * It will also exercise add jar functionality for PTF functions for which I am not sure we have coverage.
> > * These functions also show up in default list of inbuilt functions. It may confuse user to wonder what good these functions are for. show_functions.q failed because of this.
Agree with your comments on Noop. This was done because for testing we need a PTF and Noop has some special short circuit path for Partition handling.
But can we do this as a separate Jira; removing references to Noop in the translation code is non trivial work.
- Harish
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21168/#review42578
-----------------------------------------------------------
On May 14, 2014, 9:21 p.m., Harish Butani wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21168/
> -----------------------------------------------------------
>
> (Updated May 14, 2014, 9:21 p.m.)
>
>
> Review request for hive and Ashutosh Chauhan.
>
>
> Bugs: HIVE-6999
> https://issues.apache.org/jira/browse/HIVE-6999
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> There are a set of use cases where the Table Function can operate on a Partition row by row or on a subset(window) of rows as it is being streamed to it.
> Windowing has couple of use cases of this:processing of Rank functions, processing of Window Aggregations.
> But this is a generic concept: any analysis that operates on an Ordered partition maybe able to operate in Streaming mode.
> This patch introduces streaming mode in PTFs and provides the mechanics to handle PTF chains that contain both modes of PTFs.
> Subsequent patches will introduce Streaming mode for Windowing.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 3bb8fa9
> ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java 4d314b7
> ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java 34aebf0
> ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 1087bbf
> ql/src/test/queries/clientpositive/ptf_streaming.q PRE-CREATION
> ql/src/test/results/clientpositive/ptf_streaming.q.out PRE-CREATION
>
> Diff: https://reviews.apache.org/r/21168/diff/
>
>
> Testing
> -------
>
> added new tests
>
>
> Thanks,
>
> Harish Butani
>
>
Re: Review Request 21168: HIVE-6999: Add streaming mode to PTFs
Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21168/#review42578
-----------------------------------------------------------
Design looks good. Few implementation related comments.
ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
<https://reviews.apache.org/r/21168/#comment76384>
Instead of adding noop functions statically, we should put these functions in test/ package and than do add jar for testing. Multiple reasons:
* Better to isolate test code from production code.
* It will also exercise add jar functionality for PTF functions for which I am not sure we have coverage.
* These functions also show up in default list of inbuilt functions. It may confuse user to wonder what good these functions are for. show_functions.q failed because of this.
ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
<https://reviews.apache.org/r/21168/#comment76385>
Can you add a comment why we need to keep track for first row processed in Map?
ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
<https://reviews.apache.org/r/21168/#comment76394>
Nice helpful comments!
ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java
<https://reviews.apache.org/r/21168/#comment76386>
Better name : outputPartRowsItr?
ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
<https://reviews.apache.org/r/21168/#comment76391>
This doesnt feel right. Why do we need to special handle no-op functions? If there is a good reason for it, we should document in comment here.
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java
<https://reviews.apache.org/r/21168/#comment76392>
Better placed in test package? Also, ASF headers are missing.
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java
<https://reviews.apache.org/r/21168/#comment76393>
Better placed in test package? Also, ASF headers are missing.
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java
<https://reviews.apache.org/r/21168/#comment76395>
Nice helpful comments!
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java
<https://reviews.apache.org/r/21168/#comment76396>
Comment made sense. Since like those fields are not present in class any more. Shall we just get rid of this?
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java
<https://reviews.apache.org/r/21168/#comment76397>
Better name: canAcceptInputAsStream?
- Ashutosh Chauhan
On May 7, 2014, 6:11 p.m., Harish Butani wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21168/
> -----------------------------------------------------------
>
> (Updated May 7, 2014, 6:11 p.m.)
>
>
> Review request for hive and Ashutosh Chauhan.
>
>
> Bugs: HIVE-6999
> https://issues.apache.org/jira/browse/HIVE-6999
>
>
> Repository: hive-git
>
>
> Description
> -------
>
> There are a set of use cases where the Table Function can operate on a Partition row by row or on a subset(window) of rows as it is being streamed to it.
> Windowing has couple of use cases of this:processing of Rank functions, processing of Window Aggregations.
> But this is a generic concept: any analysis that operates on an Ordered partition maybe able to operate in Streaming mode.
> This patch introduces streaming mode in PTFs and provides the mechanics to handle PTF chains that contain both modes of PTFs.
> Subsequent patches will introduce Streaming mode for Windowing.
>
>
> Diffs
> -----
>
> ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 3bb8fa9
> ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java 4d314b7
> ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java 34aebf0
> ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java PRE-CREATION
> ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 1087bbf
> ql/src/test/queries/clientpositive/ptf_streaming.q PRE-CREATION
> ql/src/test/results/clientpositive/ptf_streaming.q.out PRE-CREATION
>
> Diff: https://reviews.apache.org/r/21168/diff/
>
>
> Testing
> -------
>
> added new tests
>
>
> Thanks,
>
> Harish Butani
>
>
Re: Review Request 21168: HIVE-6999: Add streaming mode to PTFs
Posted by Harish Butani <rh...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21168/
-----------------------------------------------------------
(Updated May 14, 2014, 9:21 p.m.)
Review request for hive and Ashutosh Chauhan.
Bugs: HIVE-6999
https://issues.apache.org/jira/browse/HIVE-6999
Repository: hive-git
Description
-------
There are a set of use cases where the Table Function can operate on a Partition row by row or on a subset(window) of rows as it is being streamed to it.
Windowing has couple of use cases of this:processing of Rank functions, processing of Window Aggregations.
But this is a generic concept: any analysis that operates on an Ordered partition maybe able to operate in Streaming mode.
This patch introduces streaming mode in PTFs and provides the mechanics to handle PTF chains that contain both modes of PTFs.
Subsequent patches will introduce Streaming mode for Windowing.
Diffs (updated)
-----
ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 3bb8fa9
ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java 4d314b7
ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java 34aebf0
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopStreaming.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NoopWithMapStreaming.java PRE-CREATION
ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java 1087bbf
ql/src/test/queries/clientpositive/ptf_streaming.q PRE-CREATION
ql/src/test/results/clientpositive/ptf_streaming.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/21168/diff/
Testing
-------
added new tests
Thanks,
Harish Butani