You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Santhosh Srinivasan (JIRA)" <ji...@apache.org> on 2008/07/11 19:32:31 UTC

[jira] Commented: (PIG-303) POCast does not cast chararray to bytearray

    [ https://issues.apache.org/jira/browse/PIG-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12612937#action_12612937 ] 

Santhosh Srinivasan commented on PIG-303:
-----------------------------------------

The conversion from any pig type to byte array is broken. 

The cast functionality is used in the following scenarios:

1. Cast bytes to appropriate pig types during load
2. Cast one pig type to another during execution
3. Cast pig types to appropriate storage representation during a store

Out of these three scenarios, POCast plays a role in the first two. The third scenario influences the behavior of POCast.

Currently, POCast uses the load function to convert bytes to the appropriate pig type (scenario 1). During the pipeline execution, after the load, users can use casts as they deem fit. This covers scenarios like converting a pig type (other than byte array) to byte array followed by a conversion of the byte array to the same or a different pig type (Scenario 2). Consider the hypothetical use of the cast below.

{code}

a = load 'myfile' as (t: tuple(i: int, f: float));

b = foreach a generate (bytearray) $0;

c = foreach b generate (tuple(int, int)) $0;
{code}

The tuple is first cast to a byte array and then cast back to a tuple. In order to facilitate these types of casts, the byte array representation should retain information about the original type it was cast from. This information is conceptually encapsulated in the load function, which supports the ability to convert bytes to pig types. The inverse mechanism of converting pig types to bytes will nicely fit in the context of the load function. This will enable pig to use the conversion and inversion hooks in the load function to convert bytes to pig types and vice versa in the context of the pipeline execution (Scenario 2).

The obvious benefit of this approach: Store functions which understand the byte representation of the data can now convert the bytes back in  the format of choice (Scenario 3).

Summary:

1. Load function interface supports  toBytes for each pig type in addition to bytesToInteger, bytesToLong, etc.
2. POCast uses the load function to convert bytes to pig types and vice versa
3. PigStorage will be extended to support complex types (tuples, bags, maps) and provide inverse functions, i.e., convert pig types to bytes representation

> POCast does not cast chararray to bytearray
> -------------------------------------------
>
>                 Key: PIG-303
>                 URL: https://issues.apache.org/jira/browse/PIG-303
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Santhosh Srinivasan
>            Assignee: Santhosh Srinivasan
>             Fix For: types_branch
>
>
> When chararray is cast to bytearray, the query execution fails due to ClassCastException. The problem is inside the getNext(DataByteArray) code in POCast.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.