You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Santhosh Srinivasan (JIRA)" <ji...@apache.org> on 2008/07/11 00:02:33 UTC

[jira] Created: (PIG-303) POCast does not cast chararray to bytearray

POCast does not cast chararray to bytearray
-------------------------------------------

                 Key: PIG-303
                 URL: https://issues.apache.org/jira/browse/PIG-303
             Project: Pig
          Issue Type: Bug
    Affects Versions: types_branch
            Reporter: Santhosh Srinivasan
            Assignee: Santhosh Srinivasan
             Fix For: types_branch


When chararray is cast to bytearray, the query execution fails due to ClassCastException. The problem is inside the getNext(DataByteArray) code in POCast.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-303) POCast does not cast chararray to bytearray

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616671#action_12616671 ] 

Olga Natkovich commented on PIG-303:
------------------------------------

I reviewed the patch.

My feedback is that I don't think we need to support cast to bytearray. I can think of any reasonable use cases for it and makes a more common case of writing custom load function more complex.

> POCast does not cast chararray to bytearray
> -------------------------------------------
>
>                 Key: PIG-303
>                 URL: https://issues.apache.org/jira/browse/PIG-303
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>         Attachments: pig_type_to_bytearray.patch
>
>
> When chararray is cast to bytearray, the query execution fails due to ClassCastException. The problem is inside the getNext(DataByteArray) code in POCast.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-303) POCast does not cast chararray to bytearray

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617548#action_12617548 ] 

Alan Gates commented on PIG-303:
--------------------------------

One question on parsing complex types in PigStorage.  In the function parseFromBytes, which is called on every complex field that is parsed, you construct a new TextDataParser to parse that field.  Is that necessary?  Can the same one not be used repeatedly?

> POCast does not cast chararray to bytearray
> -------------------------------------------
>
>                 Key: PIG-303
>                 URL: https://issues.apache.org/jira/browse/PIG-303
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>         Attachments: pig_type_to_bytearray.patch, remove_cast_to_bytearray.patch
>
>
> When chararray is cast to bytearray, the query execution fails due to ClassCastException. The problem is inside the getNext(DataByteArray) code in POCast.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-303) POCast does not cast chararray to bytearray

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614107#action_12614107 ] 

Santhosh Srinivasan commented on PIG-303:
-----------------------------------------

This issue will be fixed a bit later, i.e., lowering the priority.

> POCast does not cast chararray to bytearray
> -------------------------------------------
>
>                 Key: PIG-303
>                 URL: https://issues.apache.org/jira/browse/PIG-303
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>
> When chararray is cast to bytearray, the query execution fails due to ClassCastException. The problem is inside the getNext(DataByteArray) code in POCast.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-303) POCast does not cast chararray to bytearray

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Santhosh Srinivasan updated PIG-303:
------------------------------------

    Attachment: remove_cast_to_bytearray.patch

Casts to bytearray are no longer allowed. The patch includes the following:

1. Parser throws an exception when explicit casts to bytearray are seen

2. Type checker throws an exception when implicit casts to bytearray are used

3. The load interface does not have toBytes(PigType) methods

4. The toBytes method in Utf8StorageConverter.java, BinStorage.java and TextLoader.java are retained for future use

5. A text data parser that converts text data to Pig Types

6. Unit test cases for the conversion routines and the casts.

7. Helper methods in DataType.java for checking equal byte arrays and converting Pig Map to strings.

TODO:

The byte arrays are parsed as strings as the parser cannot distinguish between the two.

Unit tests that still fail are:

    [junit] Running org.apache.pig.test.TestBuiltin
    [junit] Tests run: 23, Failures: 1, Errors: 1, Time elapsed: 14.986 sec
    [junit] Test org.apache.pig.test.TestBuiltin FAILED

    [junit] Running org.apache.pig.test.TestEvalPipeline
    [junit] Tests run: 9, Failures: 0, Errors: 1, Time elapsed: 159.046 sec
    [junit] Test org.apache.pig.test.TestEvalPipeline FAILED

    [junit] Running org.apache.pig.test.TestFilterOpNumeric
    [junit] Tests run: 8, Failures: 0, Errors: 1, Time elapsed: 56.258 sec
    [junit] Test org.apache.pig.test.TestFilterOpNumeric FAILED

    [junit] Running org.apache.pig.test.TestStoreOld
    [junit] Tests run: 3, Failures: 0, Errors: 2, Time elapsed: 41.005 sec
    [junit] Test org.apache.pig.test.TestStoreOld FAILED

> POCast does not cast chararray to bytearray
> -------------------------------------------
>
>                 Key: PIG-303
>                 URL: https://issues.apache.org/jira/browse/PIG-303
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>         Attachments: pig_type_to_bytearray.patch, remove_cast_to_bytearray.patch
>
>
> When chararray is cast to bytearray, the query execution fails due to ClassCastException. The problem is inside the getNext(DataByteArray) code in POCast.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-303) POCast does not cast chararray to bytearray

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616493#action_12616493 ] 

Pi Song commented on PIG-303:
-----------------------------

BTW, Sorry for late reply.

> POCast does not cast chararray to bytearray
> -------------------------------------------
>
>                 Key: PIG-303
>                 URL: https://issues.apache.org/jira/browse/PIG-303
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>         Attachments: pig_type_to_bytearray.patch
>
>
> When chararray is cast to bytearray, the query execution fails due to ClassCastException. The problem is inside the getNext(DataByteArray) code in POCast.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-303) POCast does not cast chararray to bytearray

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612937#action_12612937 ] 

Santhosh Srinivasan commented on PIG-303:
-----------------------------------------

The conversion from any pig type to byte array is broken. 

The cast functionality is used in the following scenarios:

1. Cast bytes to appropriate pig types during load
2. Cast one pig type to another during execution
3. Cast pig types to appropriate storage representation during a store

Out of these three scenarios, POCast plays a role in the first two. The third scenario influences the behavior of POCast.

Currently, POCast uses the load function to convert bytes to the appropriate pig type (scenario 1). During the pipeline execution, after the load, users can use casts as they deem fit. This covers scenarios like converting a pig type (other than byte array) to byte array followed by a conversion of the byte array to the same or a different pig type (Scenario 2). Consider the hypothetical use of the cast below.

{code}

a = load 'myfile' as (t: tuple(i: int, f: float));

b = foreach a generate (bytearray) $0;

c = foreach b generate (tuple(int, int)) $0;
{code}

The tuple is first cast to a byte array and then cast back to a tuple. In order to facilitate these types of casts, the byte array representation should retain information about the original type it was cast from. This information is conceptually encapsulated in the load function, which supports the ability to convert bytes to pig types. The inverse mechanism of converting pig types to bytes will nicely fit in the context of the load function. This will enable pig to use the conversion and inversion hooks in the load function to convert bytes to pig types and vice versa in the context of the pipeline execution (Scenario 2).

The obvious benefit of this approach: Store functions which understand the byte representation of the data can now convert the bytes back in  the format of choice (Scenario 3).

Summary:

1. Load function interface supports  toBytes for each pig type in addition to bytesToInteger, bytesToLong, etc.
2. POCast uses the load function to convert bytes to pig types and vice versa
3. PigStorage will be extended to support complex types (tuples, bags, maps) and provide inverse functions, i.e., convert pig types to bytes representation

> POCast does not cast chararray to bytearray
> -------------------------------------------
>
>                 Key: PIG-303
>                 URL: https://issues.apache.org/jira/browse/PIG-303
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>
> When chararray is cast to bytearray, the query execution fails due to ClassCastException. The problem is inside the getNext(DataByteArray) code in POCast.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-303) POCast does not cast chararray to bytearray

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Santhosh Srinivasan updated PIG-303:
------------------------------------

    Patch Info: [Patch Available]

> POCast does not cast chararray to bytearray
> -------------------------------------------
>
>                 Key: PIG-303
>                 URL: https://issues.apache.org/jira/browse/PIG-303
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>         Attachments: pig_type_to_bytearray.patch, remove_cast_to_bytearray.patch, remove_cast_to_bytearray_reuse_parser.patch
>
>
> When chararray is cast to bytearray, the query execution fails due to ClassCastException. The problem is inside the getNext(DataByteArray) code in POCast.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-303) POCast does not cast chararray to bytearray

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Santhosh Srinivasan updated PIG-303:
------------------------------------

    Attachment: pig_type_to_bytearray.patch

The patch (pig_type_to_bytearray.patch) includes the following:

1. The load function interface is extended to include toBytes, a method that will convert Pig Types to the appropriate byte representation that can be used with the bytesToPigType routine

2. The methods are implemented in all the classes that implement the load function interface, i.e., PigStorage, BinStorage, TextLoader, private classes used in the unit test cases, etc.

3. A text data parser that converts text data to Pig Types

4. Unit test cases for the conversion routines and the casts.

5. Helper methods in DataType.java for checking equal byte arrays and converting Pig Map to strings.


TODO:

The byte arrays are parsed as strings as the parser cannot distinguish between the two.

Unit test cases that still fail are:

    [junit] Running org.apache.pig.test.TestEvalPipeline
    [junit] Tests run: 8, Failures: 0, Errors: 1, Time elapsed: 179.904 sec
    [junit] Test org.apache.pig.test.TestEvalPipeline FAILED

    [junit] Running org.apache.pig.test.TestFilterOpNumeric
    [junit] Tests run: 8, Failures: 0, Errors: 1, Time elapsed: 56.124 sec
    [junit] Test org.apache.pig.test.TestFilterOpNumeric FAILED

    [junit] Running org.apache.pig.test.TestBuiltin
    [junit] Tests run: 23, Failures: 1, Errors: 1, Time elapsed: 14.8 sec
    [junit] Test org.apache.pig.test.TestBuiltin FAILED

    [junit] Running org.apache.pig.test.TestStoreOld
    [junit] Tests run: 3, Failures: 0, Errors: 2, Time elapsed: 21.453 sec
    [junit] Test org.apache.pig.test.TestStoreOld FAILED


> POCast does not cast chararray to bytearray
> -------------------------------------------
>
>                 Key: PIG-303
>                 URL: https://issues.apache.org/jira/browse/PIG-303
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>         Attachments: pig_type_to_bytearray.patch
>
>
> When chararray is cast to bytearray, the query execution fails due to ClassCastException. The problem is inside the getNext(DataByteArray) code in POCast.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-303) POCast does not cast chararray to bytearray

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616492#action_12616492 ] 

Pi Song commented on PIG-303:
-----------------------------

Is there really a scenario where users might want to cast a known type to bytearray and cast it back? I think users should be more happy to always work with the known type.

According to the Jira title, I guess if a user, for example, wants UTF-8 representation of Chararray, then he should use a UDF which returns ByteArray, not casting.

My concern is that we shouldn't make LoadFunc more complex just to support an operation which doesn't actually have a real use.

> POCast does not cast chararray to bytearray
> -------------------------------------------
>
>                 Key: PIG-303
>                 URL: https://issues.apache.org/jira/browse/PIG-303
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>         Attachments: pig_type_to_bytearray.patch
>
>
> When chararray is cast to bytearray, the query execution fails due to ClassCastException. The problem is inside the getNext(DataByteArray) code in POCast.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-303) POCast does not cast chararray to bytearray

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-303:
---------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

remove_cast_to_bytearray_reuse_parser.patch checked in.

> POCast does not cast chararray to bytearray
> -------------------------------------------
>
>                 Key: PIG-303
>                 URL: https://issues.apache.org/jira/browse/PIG-303
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>         Attachments: pig_type_to_bytearray.patch, remove_cast_to_bytearray.patch, remove_cast_to_bytearray_reuse_parser.patch
>
>
> When chararray is cast to bytearray, the query execution fails due to ClassCastException. The problem is inside the getNext(DataByteArray) code in POCast.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-303) POCast does not cast chararray to bytearray

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Santhosh Srinivasan updated PIG-303:
------------------------------------

    Attachment: pig_type_to_bytearray.patch

Replacing the old patch with the new one. The new patch includes the TextDataParser.jjt file.

> POCast does not cast chararray to bytearray
> -------------------------------------------
>
>                 Key: PIG-303
>                 URL: https://issues.apache.org/jira/browse/PIG-303
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>         Attachments: pig_type_to_bytearray.patch
>
>
> When chararray is cast to bytearray, the query execution fails due to ClassCastException. The problem is inside the getNext(DataByteArray) code in POCast.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-303) POCast does not cast chararray to bytearray

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615451#action_12615451 ] 

Santhosh Srinivasan commented on PIG-303:
-----------------------------------------

Ignore my previous patch. I will be uploading a new one shortly. One of the new files was not added to the patch.

> POCast does not cast chararray to bytearray
> -------------------------------------------
>
>                 Key: PIG-303
>                 URL: https://issues.apache.org/jira/browse/PIG-303
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>         Attachments: pig_type_to_bytearray.patch
>
>
> When chararray is cast to bytearray, the query execution fails due to ClassCastException. The problem is inside the getNext(DataByteArray) code in POCast.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-303) POCast does not cast chararray to bytearray

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-303:
---------------------------

    Status: Patch Available  (was: Open)

> POCast does not cast chararray to bytearray
> -------------------------------------------
>
>                 Key: PIG-303
>                 URL: https://issues.apache.org/jira/browse/PIG-303
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>         Attachments: pig_type_to_bytearray.patch
>
>
> When chararray is cast to bytearray, the query execution fails due to ClassCastException. The problem is inside the getNext(DataByteArray) code in POCast.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-303) POCast does not cast chararray to bytearray

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Santhosh Srinivasan updated PIG-303:
------------------------------------

    Attachment:     (was: pig_type_to_bytearray.patch)

> POCast does not cast chararray to bytearray
> -------------------------------------------
>
>                 Key: PIG-303
>                 URL: https://issues.apache.org/jira/browse/PIG-303
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>
> When chararray is cast to bytearray, the query execution fails due to ClassCastException. The problem is inside the getNext(DataByteArray) code in POCast.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-303) POCast does not cast chararray to bytearray

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Santhosh Srinivasan updated PIG-303:
------------------------------------

    Attachment: remove_cast_to_bytearray_reuse_parser.patch

The text data parser is reused in the parseFromBytes() method in Utf8StorageConverter.

Unit tests that still fail are:

    [junit] Running org.apache.pig.test.TestBuiltin
    [junit] Tests run: 23, Failures: 1, Errors: 1, Time elapsed: 14.732 sec
    [junit] Test org.apache.pig.test.TestBuiltin FAILED

    [junit] Running org.apache.pig.test.TestFilterOpNumeric
    [junit] Tests run: 8, Failures: 0, Errors: 1, Time elapsed: 56.902 sec
    [junit] Test org.apache.pig.test.TestFilterOpNumeric FAILED

    [junit] Running org.apache.pig.test.TestStoreOld
    [junit] Tests run: 3, Failures: 0, Errors: 2, Time elapsed: 41.091 sec
    [junit] Test org.apache.pig.test.TestStoreOld FAILED


> POCast does not cast chararray to bytearray
> -------------------------------------------
>
>                 Key: PIG-303
>                 URL: https://issues.apache.org/jira/browse/PIG-303
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>         Attachments: pig_type_to_bytearray.patch, remove_cast_to_bytearray.patch, remove_cast_to_bytearray_reuse_parser.patch
>
>
> When chararray is cast to bytearray, the query execution fails due to ClassCastException. The problem is inside the getNext(DataByteArray) code in POCast.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.