You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Thejas M Nair (JIRA)" <ji...@apache.org> on 2011/04/16 00:01:05 UTC

[jira] [Commented] (PIG-1997) define semantics of referring to a column within bytearray column

    [ https://issues.apache.org/jira/browse/PIG-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020448#comment-13020448 ] 

Thejas M Nair commented on PIG-1997:
------------------------------------

Referring to a column in a bytearray column results in error, if the object is a DataByteArray -

{code}
grunt> cat bag.txt
1       {(1,2,3)}
4       {(4,5,6)}
grunt> l = load 'bag.txt' as (a, b);
grunt> f = foreach l generate b.$0;

grunt> dump f;
...
java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to org.apache.pig.data.Tuple
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:451)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:290)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:158)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:333)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:320)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:332)
        at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:284)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:261)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:256)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:58)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)


{code}

> define semantics of referring to a column within bytearray column 
> ------------------------------------------------------------------
>
>                 Key: PIG-1997
>                 URL: https://issues.apache.org/jira/browse/PIG-1997
>             Project: Pig
>          Issue Type: Task
>            Reporter: Thejas M Nair
>
> When column x is of type bytearray, the semantics of x.$0 is not clear.
> We need to define the behavior of this expression in the following cases -
> 1. Column type is bytearray and the actual object is DataByteArray.
> 2. Column type is bytearray and the actual object is either a Tuple or DataBag.
> When the bytearray column is an object of type DataByteArray, pig runtime tries to cast the object as a tuple and fails with a cast exception.
> I am not sure of the current behavior if the bytearray column actually contains a Tuple or DataBag object. (Needs to be tested).
> This is related to PIG-1281 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira