You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Xiaomeng Wan <sh...@gmail.com> on 2010/03/08 17:19:02 UTC

load in nested bags

Hi there,

I am trying to load in a relation with nested bags using something like

a = LOAD '*' AS (x:chararray, y:bag{t:tuple(z:chararray,
b:bag{t1:tuple(u:chararray, v:long)})});

But get the following error

java.lang.ClassCastException: java.lang.Integer cannot be cast to
org.apache.pig.data.DataBag
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:407)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:188)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POIsNull.getNext(POIsNull.java:162)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:237)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:237)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)

It seems that it load the inner bag as integer, and failed to cast it
to bag later. Explicit casting didn't work neither. Any body has this
kind of problem before?

Regards,
Xiaomeng

Re: load in nested bags

Posted by hc busy <hc...@gmail.com>.
There oughta be an "instanceof" operator or something to test casting.


On Mon, Mar 8, 2010 at 11:01 AM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> Write a isCastable UDF, filter by it.
>
> -D
>
> On Mon, Mar 8, 2010 at 10:59 AM, Xiaomeng Wan <sh...@gmail.com> wrote:
>
> > yes, you are right, some of the data entries are bad formatted and
> > cannot be cast into bag of bag. Is there any way in pig I can check to
> > see whether an entry can be cast into specified type, instead of just
> > throw an error and fail the whole job?
> >
> > Thanks!
> >
> > Xiaomeng
> >
> > On Mon, Mar 8, 2010 at 10:44 AM, hc busy <hc...@gmail.com> wrote:
> > > Can you post sample data file as well? seems that the data is corrupt
> or
> > > inconsistently formatted?
> > >
> > >
> > >
> > > On Mon, Mar 8, 2010 at 8:19 AM, Xiaomeng Wan <sh...@gmail.com>
> wrote:
> > >
> > >> Hi there,
> > >>
> > >> I am trying to load in a relation with nested bags using something
> like
> > >>
> > >> a = LOAD '*' AS (x:chararray, y:bag{t:tuple(z:chararray,
> > >> b:bag{t1:tuple(u:chararray, v:long)})});
> > >>
> > >> But get the following error
> > >>
> > >> java.lang.ClassCastException: java.lang.Integer cannot be cast to
> > >> org.apache.pig.data.DataBag
> > >>        at
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:407)
> > >>        at
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:188)
> > >>        at
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POIsNull.getNext(POIsNull.java:162)
> > >>        at
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
> > >>        at
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
> > >>        at
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
> > >>        at
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:237)
> > >>        at
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
> > >>        at
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:237)
> > >>        at
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
> > >>        at
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> > >>        at
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
> > >>        at
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
> > >>        at
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94)
> > >>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> > >>        at
> > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> > >>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> > >>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > >>
> > >> It seems that it load the inner bag as integer, and failed to cast it
> > >> to bag later. Explicit casting didn't work neither. Any body has this
> > >> kind of problem before?
> > >>
> > >> Regards,
> > >> Xiaomeng
> > >>
> > >
> >
>

Re: load in nested bags

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Write a isCastable UDF, filter by it.

-D

On Mon, Mar 8, 2010 at 10:59 AM, Xiaomeng Wan <sh...@gmail.com> wrote:

> yes, you are right, some of the data entries are bad formatted and
> cannot be cast into bag of bag. Is there any way in pig I can check to
> see whether an entry can be cast into specified type, instead of just
> throw an error and fail the whole job?
>
> Thanks!
>
> Xiaomeng
>
> On Mon, Mar 8, 2010 at 10:44 AM, hc busy <hc...@gmail.com> wrote:
> > Can you post sample data file as well? seems that the data is corrupt or
> > inconsistently formatted?
> >
> >
> >
> > On Mon, Mar 8, 2010 at 8:19 AM, Xiaomeng Wan <sh...@gmail.com> wrote:
> >
> >> Hi there,
> >>
> >> I am trying to load in a relation with nested bags using something like
> >>
> >> a = LOAD '*' AS (x:chararray, y:bag{t:tuple(z:chararray,
> >> b:bag{t1:tuple(u:chararray, v:long)})});
> >>
> >> But get the following error
> >>
> >> java.lang.ClassCastException: java.lang.Integer cannot be cast to
> >> org.apache.pig.data.DataBag
> >>        at
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:407)
> >>        at
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:188)
> >>        at
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POIsNull.getNext(POIsNull.java:162)
> >>        at
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
> >>        at
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
> >>        at
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
> >>        at
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:237)
> >>        at
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
> >>        at
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:237)
> >>        at
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
> >>        at
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
> >>        at
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
> >>        at
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
> >>        at
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94)
> >>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> >>        at
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> >>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> >>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >>
> >> It seems that it load the inner bag as integer, and failed to cast it
> >> to bag later. Explicit casting didn't work neither. Any body has this
> >> kind of problem before?
> >>
> >> Regards,
> >> Xiaomeng
> >>
> >
>

Re: load in nested bags

Posted by Xiaomeng Wan <sh...@gmail.com>.
yes, you are right, some of the data entries are bad formatted and
cannot be cast into bag of bag. Is there any way in pig I can check to
see whether an entry can be cast into specified type, instead of just
throw an error and fail the whole job?

Thanks!

Xiaomeng

On Mon, Mar 8, 2010 at 10:44 AM, hc busy <hc...@gmail.com> wrote:
> Can you post sample data file as well? seems that the data is corrupt or
> inconsistently formatted?
>
>
>
> On Mon, Mar 8, 2010 at 8:19 AM, Xiaomeng Wan <sh...@gmail.com> wrote:
>
>> Hi there,
>>
>> I am trying to load in a relation with nested bags using something like
>>
>> a = LOAD '*' AS (x:chararray, y:bag{t:tuple(z:chararray,
>> b:bag{t1:tuple(u:chararray, v:long)})});
>>
>> But get the following error
>>
>> java.lang.ClassCastException: java.lang.Integer cannot be cast to
>> org.apache.pig.data.DataBag
>>        at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:407)
>>        at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:188)
>>        at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POIsNull.getNext(POIsNull.java:162)
>>        at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
>>        at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
>>        at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
>>        at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:237)
>>        at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
>>        at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:237)
>>        at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
>>        at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
>>        at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
>>        at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
>>        at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94)
>>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> It seems that it load the inner bag as integer, and failed to cast it
>> to bag later. Explicit casting didn't work neither. Any body has this
>> kind of problem before?
>>
>> Regards,
>> Xiaomeng
>>
>

Re: load in nested bags

Posted by hc busy <hc...@gmail.com>.
Can you post sample data file as well? seems that the data is corrupt or
inconsistently formatted?



On Mon, Mar 8, 2010 at 8:19 AM, Xiaomeng Wan <sh...@gmail.com> wrote:

> Hi there,
>
> I am trying to load in a relation with nested bags using something like
>
> a = LOAD '*' AS (x:chararray, y:bag{t:tuple(z:chararray,
> b:bag{t1:tuple(u:chararray, v:long)})});
>
> But get the following error
>
> java.lang.ClassCastException: java.lang.Integer cannot be cast to
> org.apache.pig.data.DataBag
>        at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:407)
>        at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:188)
>        at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POIsNull.getNext(POIsNull.java:162)
>        at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
>        at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
>        at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
>        at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:237)
>        at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
>        at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:237)
>        at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
>        at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
>        at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
>        at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
>        at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94)
>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> It seems that it load the inner bag as integer, and failed to cast it
> to bag later. Explicit casting didn't work neither. Any body has this
> kind of problem before?
>
> Regards,
> Xiaomeng
>