You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Alan Gates (JIRA)" <ji...@apache.org> on 2008/07/14 23:42:31 UTC

[jira] Created: (PIG-312) Casting a byte array that contains a double value to an int results in a null pointer

Casting a byte array that contains a double value to an int results in a null pointer
-------------------------------------------------------------------------------------

                 Key: PIG-312
                 URL: https://issues.apache.org/jira/browse/PIG-312
             Project: Pig
          Issue Type: Bug
    Affects Versions: types_branch
            Reporter: Alan Gates
             Fix For: types_branch


{code}
a = load 'myfile' as (name, age, gpa);                                                                        
c = foreach a generate age * 10, (int)gpa * 2;                                                                                                                  
store c into 'outfile';
{code}
The values in gpa are doubles.  The issue is that they are read as byte arrays and then when the user tries to cast them to an int, the system does a direct cast from byte array to int, which results in a null.  First of all, it should result in a zero, not a null (unless the underlying value is null).  Second, we have to clarify semantics here.  gpa was never officially declared to be a double, so trying to do a cast directly from bytearray to int is a reasonable thing to do.  But users may not see it that way.  Do we want to first cast numbers to double and then to anything subsequent to avoid this?  Or should we force users to write this as (int)(double)gpa * 2 so we know to first cast to double and then int?  In the interest of speed (especially considering the rarity of doubles in most data) I'd vote for the latter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-312) Casting a byte array that contains a double value to an int results in a null pointer

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-312:
---------------------------

    Status: Patch Available  (was: Open)

> Casting a byte array that contains a double value to an int results in a null pointer
> -------------------------------------------------------------------------------------
>
>                 Key: PIG-312
>                 URL: https://issues.apache.org/jira/browse/PIG-312
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>             Fix For: types_branch
>
>         Attachments: intcast.patch
>
>
> {code}
> a = load 'myfile' as (name, age, gpa);                                                                        
> c = foreach a generate age * 10, (int)gpa * 2;                                                                                                                  
> store c into 'outfile';
> {code}
> The values in gpa are doubles.  The issue is that they are read as byte arrays and then when the user tries to cast them to an int, the system does a direct cast from byte array to int, which results in a null.  First of all, it should result in a zero, not a null (unless the underlying value is null).  Second, we have to clarify semantics here.  gpa was never officially declared to be a double, so trying to do a cast directly from bytearray to int is a reasonable thing to do.  But users may not see it that way.  Do we want to first cast numbers to double and then to anything subsequent to avoid this?  Or should we force users to write this as (int)(double)gpa * 2 so we know to first cast to double and then int?  In the interest of speed (especially considering the rarity of doubles in most data) I'd vote for the latter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-312) Casting a byte array that contains a double value to an int results in a null pointer

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-312:
---------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

intcast.patch checked in.

> Casting a byte array that contains a double value to an int results in a null pointer
> -------------------------------------------------------------------------------------
>
>                 Key: PIG-312
>                 URL: https://issues.apache.org/jira/browse/PIG-312
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>             Fix For: types_branch
>
>         Attachments: intcast.patch
>
>
> {code}
> a = load 'myfile' as (name, age, gpa);                                                                        
> c = foreach a generate age * 10, (int)gpa * 2;                                                                                                                  
> store c into 'outfile';
> {code}
> The values in gpa are doubles.  The issue is that they are read as byte arrays and then when the user tries to cast them to an int, the system does a direct cast from byte array to int, which results in a null.  First of all, it should result in a zero, not a null (unless the underlying value is null).  Second, we have to clarify semantics here.  gpa was never officially declared to be a double, so trying to do a cast directly from bytearray to int is a reasonable thing to do.  But users may not see it that way.  Do we want to first cast numbers to double and then to anything subsequent to avoid this?  Or should we force users to write this as (int)(double)gpa * 2 so we know to first cast to double and then int?  In the interest of speed (especially considering the rarity of doubles in most data) I'd vote for the latter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-312) Casting a byte array that contains a double value to an int results in a null pointer

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-312:
---------------------------

    Attachment: intcast.patch

Fix for this issue.  This fix does differ a bit from what I said in the initial posting.  At this point anything that cannot be cast to the requested numeric type is still returned as a null rather than 0.  After further thought, this seems like a better course, as putting a 0 in there implies we managed to cast the data rather than we didn't know what to do with the data.

It does fix the issue of casting double values to ints and longs.  The casts now first try to cast to int (or long) and if that fails they then cast to a double and then cast that to an int (or long) checking to make sure there isn't an overflow.

> Casting a byte array that contains a double value to an int results in a null pointer
> -------------------------------------------------------------------------------------
>
>                 Key: PIG-312
>                 URL: https://issues.apache.org/jira/browse/PIG-312
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>             Fix For: types_branch
>
>         Attachments: intcast.patch
>
>
> {code}
> a = load 'myfile' as (name, age, gpa);                                                                        
> c = foreach a generate age * 10, (int)gpa * 2;                                                                                                                  
> store c into 'outfile';
> {code}
> The values in gpa are doubles.  The issue is that they are read as byte arrays and then when the user tries to cast them to an int, the system does a direct cast from byte array to int, which results in a null.  First of all, it should result in a zero, not a null (unless the underlying value is null).  Second, we have to clarify semantics here.  gpa was never officially declared to be a double, so trying to do a cast directly from bytearray to int is a reasonable thing to do.  But users may not see it that way.  Do we want to first cast numbers to double and then to anything subsequent to avoid this?  Or should we force users to write this as (int)(double)gpa * 2 so we know to first cast to double and then int?  In the interest of speed (especially considering the rarity of doubles in most data) I'd vote for the latter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-312) Casting a byte array that contains a double value to an int results in a null pointer

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613639#action_12613639 ] 

Alan Gates commented on PIG-312:
--------------------------------

Rather than letting (int)gpa return a 0, we could change it to take just the integer portion of the double.  This seems better.

> Casting a byte array that contains a double value to an int results in a null pointer
> -------------------------------------------------------------------------------------
>
>                 Key: PIG-312
>                 URL: https://issues.apache.org/jira/browse/PIG-312
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Alan Gates
>             Fix For: types_branch
>
>
> {code}
> a = load 'myfile' as (name, age, gpa);                                                                        
> c = foreach a generate age * 10, (int)gpa * 2;                                                                                                                  
> store c into 'outfile';
> {code}
> The values in gpa are doubles.  The issue is that they are read as byte arrays and then when the user tries to cast them to an int, the system does a direct cast from byte array to int, which results in a null.  First of all, it should result in a zero, not a null (unless the underlying value is null).  Second, we have to clarify semantics here.  gpa was never officially declared to be a double, so trying to do a cast directly from bytearray to int is a reasonable thing to do.  But users may not see it that way.  Do we want to first cast numbers to double and then to anything subsequent to avoid this?  Or should we force users to write this as (int)(double)gpa * 2 so we know to first cast to double and then int?  In the interest of speed (especially considering the rarity of doubles in most data) I'd vote for the latter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-312) Casting a byte array that contains a double value to an int results in a null pointer

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates reassigned PIG-312:
------------------------------

    Assignee: Alan Gates

> Casting a byte array that contains a double value to an int results in a null pointer
> -------------------------------------------------------------------------------------
>
>                 Key: PIG-312
>                 URL: https://issues.apache.org/jira/browse/PIG-312
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>             Fix For: types_branch
>
>
> {code}
> a = load 'myfile' as (name, age, gpa);                                                                        
> c = foreach a generate age * 10, (int)gpa * 2;                                                                                                                  
> store c into 'outfile';
> {code}
> The values in gpa are doubles.  The issue is that they are read as byte arrays and then when the user tries to cast them to an int, the system does a direct cast from byte array to int, which results in a null.  First of all, it should result in a zero, not a null (unless the underlying value is null).  Second, we have to clarify semantics here.  gpa was never officially declared to be a double, so trying to do a cast directly from bytearray to int is a reasonable thing to do.  But users may not see it that way.  Do we want to first cast numbers to double and then to anything subsequent to avoid this?  Or should we force users to write this as (int)(double)gpa * 2 so we know to first cast to double and then int?  In the interest of speed (especially considering the rarity of doubles in most data) I'd vote for the latter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.