You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Cheolsoo Park <pi...@gmail.com> on 2014/02/26 19:15:27 UTC
Review Request 18525: PIG-3679: Fix regression of the STATUS_NULL clean-up
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18525/
-----------------------------------------------------------
Review request for pig, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
Bugs: PIG-3679
https://issues.apache.org/jira/browse/PIG-3679
Repository: pig-git
Description
-------
I discovered this regression while debugging the e2e test StreamingPythonUDFs_10 in trunk. To summarize, replacing (STATUS_NULL) with (STATUS_OK + null) has changed how null values are handled in some cases. In particular, some UDFs that used to see no nulls are called with nulls and fail with NPE now. Since this is a major backward incompatibility, I changed POUserFunc to filter out nulls always. Technically, this still changes the behavior with nulls, but it seems ok that UDFs that used to fail with NPE no longer fail.
Here is my reasoning in more details-
https://issues.apache.org/jira/browse/PIG-3679?focusedCommentId=13892966&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13892966
Alternatively, we could let UDFs handle nulls by themselves. That seems cleaner to me, but backward incompatibility is a concern (i.e. "My UDFs used to work with 0.12, but it no longer works with 0.13").
Diffs
-----
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java f031b1d
Diff: https://reviews.apache.org/r/18525/diff/
Testing
-------
All e2e tests pass (except Warning_4 PIG-3739).
Thanks,
Cheolsoo Park
Re: Review Request 18525: PIG-3679: Fix regression of the STATUS_NULL
clean-up
Posted by Cheolsoo Park <pi...@gmail.com>.
> On Feb. 26, 2014, 7:28 p.m., Rohini Palaniswamy wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java, lines 302-303
> > <https://reviews.apache.org/r/18525/diff/1/?file=504701#file504701line302>
> >
> > Will there be a problem if schema had a bag or map?
No. As far as I understand, tuple is only a problem because UDF has to wrap them in a tuple if multiple fields are returned.
- Cheolsoo
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18525/#review35554
-----------------------------------------------------------
On Feb. 26, 2014, 6:15 p.m., Cheolsoo Park wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18525/
> -----------------------------------------------------------
>
> (Updated Feb. 26, 2014, 6:15 p.m.)
>
>
> Review request for pig, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
>
>
> Bugs: PIG-3679
> https://issues.apache.org/jira/browse/PIG-3679
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> I discovered this regression while debugging the e2e test StreamingPythonUDFs_10 in trunk. To summarize, replacing (STATUS_NULL) with (STATUS_OK + null) has changed how null values are handled in some cases. In particular, some UDFs that used to see no nulls are called with nulls and fail with NPE now. Since this is a major backward incompatibility, I changed POUserFunc to filter out nulls always. Technically, this still changes the behavior with nulls, but it seems ok that UDFs that used to fail with NPE no longer fail.
>
> Here is my reasoning in more details-
> https://issues.apache.org/jira/browse/PIG-3679?focusedCommentId=13892966&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13892966
>
> Alternatively, we could let UDFs handle nulls by themselves. That seems cleaner to me, but backward incompatibility is a concern (i.e. "My UDFs used to work with 0.12, but it no longer works with 0.13").
>
>
> Diffs
> -----
>
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java f031b1d
>
> Diff: https://reviews.apache.org/r/18525/diff/
>
>
> Testing
> -------
>
> All e2e tests pass (except Warning_4 PIG-3739).
>
>
> Thanks,
>
> Cheolsoo Park
>
>
Re: Review Request 18525: PIG-3679: Fix regression of the STATUS_NULL
clean-up
Posted by Rohini Palaniswamy <ro...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18525/#review35554
-----------------------------------------------------------
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/18525/#comment66158>
This might fill up the logs. Can we do PigLogger.warn which will aggregate the messages and also increment a counter?
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/18525/#comment66159>
Will there be a problem if schema had a bag or map?
- Rohini Palaniswamy
On Feb. 26, 2014, 6:15 p.m., Cheolsoo Park wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18525/
> -----------------------------------------------------------
>
> (Updated Feb. 26, 2014, 6:15 p.m.)
>
>
> Review request for pig, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
>
>
> Bugs: PIG-3679
> https://issues.apache.org/jira/browse/PIG-3679
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> I discovered this regression while debugging the e2e test StreamingPythonUDFs_10 in trunk. To summarize, replacing (STATUS_NULL) with (STATUS_OK + null) has changed how null values are handled in some cases. In particular, some UDFs that used to see no nulls are called with nulls and fail with NPE now. Since this is a major backward incompatibility, I changed POUserFunc to filter out nulls always. Technically, this still changes the behavior with nulls, but it seems ok that UDFs that used to fail with NPE no longer fail.
>
> Here is my reasoning in more details-
> https://issues.apache.org/jira/browse/PIG-3679?focusedCommentId=13892966&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13892966
>
> Alternatively, we could let UDFs handle nulls by themselves. That seems cleaner to me, but backward incompatibility is a concern (i.e. "My UDFs used to work with 0.12, but it no longer works with 0.13").
>
>
> Diffs
> -----
>
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java f031b1d
>
> Diff: https://reviews.apache.org/r/18525/diff/
>
>
> Testing
> -------
>
> All e2e tests pass (except Warning_4 PIG-3739).
>
>
> Thanks,
>
> Cheolsoo Park
>
>
Re: Review Request 18525: PIG-3679: Fix regression of the STATUS_NULL
clean-up
Posted by Rohini Palaniswamy <ro...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18525/#review35608
-----------------------------------------------------------
Ship it!
Ship It!
- Rohini Palaniswamy
On Feb. 27, 2014, 12:28 a.m., Cheolsoo Park wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18525/
> -----------------------------------------------------------
>
> (Updated Feb. 27, 2014, 12:28 a.m.)
>
>
> Review request for pig, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
>
>
> Bugs: PIG-3679
> https://issues.apache.org/jira/browse/PIG-3679
>
>
> Repository: pig-git
>
>
> Description
> -------
>
> I discovered this regression while debugging the e2e test StreamingPythonUDFs_10 in trunk. To summarize, replacing (STATUS_NULL) with (STATUS_OK + null) has changed how null values are handled in some cases. In particular, some UDFs that used to see no nulls are called with nulls and fail with NPE now. Since this is a major backward incompatibility, I changed POUserFunc to filter out nulls always. Technically, this still changes the behavior with nulls, but it seems ok that UDFs that used to fail with NPE no longer fail.
>
> Here is my reasoning in more details-
> https://issues.apache.org/jira/browse/PIG-3679?focusedCommentId=13892966&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13892966
>
> Alternatively, we could let UDFs handle nulls by themselves. That seems cleaner to me, but backward incompatibility is a concern (i.e. "My UDFs used to work with 0.12, but it no longer works with 0.13").
>
>
> Diffs
> -----
>
> src/org/apache/pig/PigWarning.java 523cf30
> src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java f031b1d
>
> Diff: https://reviews.apache.org/r/18525/diff/
>
>
> Testing
> -------
>
> All e2e tests pass (except Warning_4 PIG-3739).
>
>
> Thanks,
>
> Cheolsoo Park
>
>
Re: Review Request 18525: PIG-3679: Fix regression of the STATUS_NULL
clean-up
Posted by Cheolsoo Park <pi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18525/
-----------------------------------------------------------
(Updated Feb. 27, 2014, 12:28 a.m.)
Review request for pig, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
Changes
-------
Incorporate Rohini's comments.
Bugs: PIG-3679
https://issues.apache.org/jira/browse/PIG-3679
Repository: pig-git
Description
-------
I discovered this regression while debugging the e2e test StreamingPythonUDFs_10 in trunk. To summarize, replacing (STATUS_NULL) with (STATUS_OK + null) has changed how null values are handled in some cases. In particular, some UDFs that used to see no nulls are called with nulls and fail with NPE now. Since this is a major backward incompatibility, I changed POUserFunc to filter out nulls always. Technically, this still changes the behavior with nulls, but it seems ok that UDFs that used to fail with NPE no longer fail.
Here is my reasoning in more details-
https://issues.apache.org/jira/browse/PIG-3679?focusedCommentId=13892966&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13892966
Alternatively, we could let UDFs handle nulls by themselves. That seems cleaner to me, but backward incompatibility is a concern (i.e. "My UDFs used to work with 0.12, but it no longer works with 0.13").
Diffs (updated)
-----
src/org/apache/pig/PigWarning.java 523cf30
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java f031b1d
Diff: https://reviews.apache.org/r/18525/diff/
Testing
-------
All e2e tests pass (except Warning_4 PIG-3739).
Thanks,
Cheolsoo Park