You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Cheolsoo Park <pi...@gmail.com> on 2014/02/26 19:15:27 UTC

Review Request 18525: PIG-3679: Fix regression of the STATUS_NULL clean-up

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18525/
-----------------------------------------------------------

Review request for pig, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.


Bugs: PIG-3679
    https://issues.apache.org/jira/browse/PIG-3679


Repository: pig-git


Description
-------

I discovered this regression while debugging the e2e test StreamingPythonUDFs_10 in trunk. To summarize, replacing (STATUS_NULL) with (STATUS_OK + null) has changed how null values are handled in some cases. In particular, some UDFs that used to see no nulls are called with nulls and fail with NPE now. Since this is a major backward incompatibility, I changed POUserFunc to filter out nulls always. Technically, this still changes the behavior with nulls, but it seems ok that UDFs that used to fail with NPE no longer fail.

Here is my reasoning in more details-
https://issues.apache.org/jira/browse/PIG-3679?focusedCommentId=13892966&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13892966

Alternatively, we could let UDFs handle nulls by themselves. That seems cleaner to me, but backward incompatibility is a concern (i.e. "My UDFs used to work with 0.12, but it no longer works with 0.13").


Diffs
-----

  src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java f031b1d 

Diff: https://reviews.apache.org/r/18525/diff/


Testing
-------

All e2e tests pass (except Warning_4 PIG-3739).


Thanks,

Cheolsoo Park


Re: Review Request 18525: PIG-3679: Fix regression of the STATUS_NULL clean-up

Posted by Cheolsoo Park <pi...@gmail.com>.

> On Feb. 26, 2014, 7:28 p.m., Rohini Palaniswamy wrote:
> > src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java, lines 302-303
> > <https://reviews.apache.org/r/18525/diff/1/?file=504701#file504701line302>
> >
> >     Will there be a problem if schema had a bag or map?

No. As far as I understand, tuple is only a problem because UDF has to wrap them in a tuple if multiple fields are returned.


- Cheolsoo


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18525/#review35554
-----------------------------------------------------------


On Feb. 26, 2014, 6:15 p.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18525/
> -----------------------------------------------------------
> 
> (Updated Feb. 26, 2014, 6:15 p.m.)
> 
> 
> Review request for pig, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
> 
> 
> Bugs: PIG-3679
>     https://issues.apache.org/jira/browse/PIG-3679
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> I discovered this regression while debugging the e2e test StreamingPythonUDFs_10 in trunk. To summarize, replacing (STATUS_NULL) with (STATUS_OK + null) has changed how null values are handled in some cases. In particular, some UDFs that used to see no nulls are called with nulls and fail with NPE now. Since this is a major backward incompatibility, I changed POUserFunc to filter out nulls always. Technically, this still changes the behavior with nulls, but it seems ok that UDFs that used to fail with NPE no longer fail.
> 
> Here is my reasoning in more details-
> https://issues.apache.org/jira/browse/PIG-3679?focusedCommentId=13892966&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13892966
> 
> Alternatively, we could let UDFs handle nulls by themselves. That seems cleaner to me, but backward incompatibility is a concern (i.e. "My UDFs used to work with 0.12, but it no longer works with 0.13").
> 
> 
> Diffs
> -----
> 
>   src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java f031b1d 
> 
> Diff: https://reviews.apache.org/r/18525/diff/
> 
> 
> Testing
> -------
> 
> All e2e tests pass (except Warning_4 PIG-3739).
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request 18525: PIG-3679: Fix regression of the STATUS_NULL clean-up

Posted by Rohini Palaniswamy <ro...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18525/#review35554
-----------------------------------------------------------



src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/18525/#comment66158>

    This might fill up the logs. Can we do PigLogger.warn which will aggregate the messages and also increment a counter?



src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java
<https://reviews.apache.org/r/18525/#comment66159>

    Will there be a problem if schema had a bag or map?


- Rohini Palaniswamy


On Feb. 26, 2014, 6:15 p.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18525/
> -----------------------------------------------------------
> 
> (Updated Feb. 26, 2014, 6:15 p.m.)
> 
> 
> Review request for pig, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
> 
> 
> Bugs: PIG-3679
>     https://issues.apache.org/jira/browse/PIG-3679
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> I discovered this regression while debugging the e2e test StreamingPythonUDFs_10 in trunk. To summarize, replacing (STATUS_NULL) with (STATUS_OK + null) has changed how null values are handled in some cases. In particular, some UDFs that used to see no nulls are called with nulls and fail with NPE now. Since this is a major backward incompatibility, I changed POUserFunc to filter out nulls always. Technically, this still changes the behavior with nulls, but it seems ok that UDFs that used to fail with NPE no longer fail.
> 
> Here is my reasoning in more details-
> https://issues.apache.org/jira/browse/PIG-3679?focusedCommentId=13892966&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13892966
> 
> Alternatively, we could let UDFs handle nulls by themselves. That seems cleaner to me, but backward incompatibility is a concern (i.e. "My UDFs used to work with 0.12, but it no longer works with 0.13").
> 
> 
> Diffs
> -----
> 
>   src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java f031b1d 
> 
> Diff: https://reviews.apache.org/r/18525/diff/
> 
> 
> Testing
> -------
> 
> All e2e tests pass (except Warning_4 PIG-3739).
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request 18525: PIG-3679: Fix regression of the STATUS_NULL clean-up

Posted by Rohini Palaniswamy <ro...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18525/#review35608
-----------------------------------------------------------

Ship it!


Ship It!

- Rohini Palaniswamy


On Feb. 27, 2014, 12:28 a.m., Cheolsoo Park wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18525/
> -----------------------------------------------------------
> 
> (Updated Feb. 27, 2014, 12:28 a.m.)
> 
> 
> Review request for pig, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.
> 
> 
> Bugs: PIG-3679
>     https://issues.apache.org/jira/browse/PIG-3679
> 
> 
> Repository: pig-git
> 
> 
> Description
> -------
> 
> I discovered this regression while debugging the e2e test StreamingPythonUDFs_10 in trunk. To summarize, replacing (STATUS_NULL) with (STATUS_OK + null) has changed how null values are handled in some cases. In particular, some UDFs that used to see no nulls are called with nulls and fail with NPE now. Since this is a major backward incompatibility, I changed POUserFunc to filter out nulls always. Technically, this still changes the behavior with nulls, but it seems ok that UDFs that used to fail with NPE no longer fail.
> 
> Here is my reasoning in more details-
> https://issues.apache.org/jira/browse/PIG-3679?focusedCommentId=13892966&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13892966
> 
> Alternatively, we could let UDFs handle nulls by themselves. That seems cleaner to me, but backward incompatibility is a concern (i.e. "My UDFs used to work with 0.12, but it no longer works with 0.13").
> 
> 
> Diffs
> -----
> 
>   src/org/apache/pig/PigWarning.java 523cf30 
>   src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java f031b1d 
> 
> Diff: https://reviews.apache.org/r/18525/diff/
> 
> 
> Testing
> -------
> 
> All e2e tests pass (except Warning_4 PIG-3739).
> 
> 
> Thanks,
> 
> Cheolsoo Park
> 
>


Re: Review Request 18525: PIG-3679: Fix regression of the STATUS_NULL clean-up

Posted by Cheolsoo Park <pi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18525/
-----------------------------------------------------------

(Updated Feb. 27, 2014, 12:28 a.m.)


Review request for pig, Daniel Dai, Mark Wagner, and Rohini Palaniswamy.


Changes
-------

Incorporate Rohini's comments.


Bugs: PIG-3679
    https://issues.apache.org/jira/browse/PIG-3679


Repository: pig-git


Description
-------

I discovered this regression while debugging the e2e test StreamingPythonUDFs_10 in trunk. To summarize, replacing (STATUS_NULL) with (STATUS_OK + null) has changed how null values are handled in some cases. In particular, some UDFs that used to see no nulls are called with nulls and fail with NPE now. Since this is a major backward incompatibility, I changed POUserFunc to filter out nulls always. Technically, this still changes the behavior with nulls, but it seems ok that UDFs that used to fail with NPE no longer fail.

Here is my reasoning in more details-
https://issues.apache.org/jira/browse/PIG-3679?focusedCommentId=13892966&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13892966

Alternatively, we could let UDFs handle nulls by themselves. That seems cleaner to me, but backward incompatibility is a concern (i.e. "My UDFs used to work with 0.12, but it no longer works with 0.13").


Diffs (updated)
-----

  src/org/apache/pig/PigWarning.java 523cf30 
  src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POUserFunc.java f031b1d 

Diff: https://reviews.apache.org/r/18525/diff/


Testing
-------

All e2e tests pass (except Warning_4 PIG-3739).


Thanks,

Cheolsoo Park