You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Vikram Oberoi <vi...@meebo-inc.com> on 2010/04/19 23:57:34 UTC

Elephant Bird Protobuf --> Pig Mapping?

Hey folks,

My understanding of the elephant bird code Twitter recently released is that
'repeated' protocol buffer fields map to DataBags in Pig. I'm getting an
error saying "org.apache.pig.data.DefaultDataBag cannot be cast to
java.util.List" when I try to store repeated fields, though, so either I'm
wrong or something more sinister is at work.

I'm trying to *store* data with the following interface:

message Session {
  optional string id = 1;
  repeated Login logins = 2;
  ...
 }

message Login {
  optional string protocol = 1;
  optional string sn = 2;
}

I've generated an LzoProtobufB64LineStorage store function using
HadoopProtoCodeGenerator (which is enormously convenient, by the way) for
Session messages, and it throws the following error when it tries to store
repeated Login messages:

Pig Stack Trace
---------------
ERROR 0: org.apache.pig.data.DefaultDataBag cannot be cast to java.util.List

org.apache.pig.backend.executionengine.ExecException: ERROR 0:
org.apache.pig.data.DefaultDataBag cannot be cast to java.util.List
        at
org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:184)
        at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773)
        at org.apache.pig.PigServer.execute(PigServer.java:766)
        at org.apache.pig.PigServer.access$100(PigServer.java:89)
        at org.apache.pig.PigServer$Graph.execute(PigServer.java:937)
        at org.apache.pig.PigServer.executeBatch(PigServer.java:249)
        at
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:113)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
        at org.apache.pig.Main.main(Main.java:397)
Caused by: java.lang.ClassCastException: org.apache.pig.data.DefaultDataBag
cannot be cast to java.util.List
        at
com.google.protobuf.GeneratedMessage$FieldAccessorTable$RepeatedFieldAccessor.set(GeneratedMessage.java:1140)
        at
com.google.protobuf.GeneratedMessage$Builder.setField(GeneratedMessage.java:206)
        at
com.google.protobuf.GeneratedMessage$Builder.setField(GeneratedMessage.java:147)
        at
com.twitter.elephantbird.pig.store.LzoProtobufB64LinePigStorage.putNext(Unknown
Source)
        at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:121)
        at
org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146)
        at
org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
        at
org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)
        ... 10 more
================================================================================

Have any of you run into this issue/are you Twitter devs able to shed some
light on what the problem is here?

Thanks,
Vikram

Re: Elephant Bird Protobuf --> Pig Mapping?

Posted by Vikram Oberoi <vi...@meebo-inc.com>.
Sure thing: http://github.com/kevinweil/elephant-bird/issues/#issue/3

Vikram

On Mon, Apr 19, 2010 at 3:30 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> Vikram,
> Sounds like a bug. Can you open a ticket on github?
> Glad to hear you are using this code!
>
> -D
>
> On Mon, Apr 19, 2010 at 2:57 PM, Vikram Oberoi <vi...@meebo-inc.com>
> wrote:
>
> > Hey folks,
> >
> > My understanding of the elephant bird code Twitter recently released is
> > that
> > 'repeated' protocol buffer fields map to DataBags in Pig. I'm getting an
> > error saying "org.apache.pig.data.DefaultDataBag cannot be cast to
> > java.util.List" when I try to store repeated fields, though, so either
> I'm
> > wrong or something more sinister is at work.
> >
> > I'm trying to *store* data with the following interface:
> >
> > message Session {
> >  optional string id = 1;
> >  repeated Login logins = 2;
> >  ...
> >  }
> >
> > message Login {
> >  optional string protocol = 1;
> >  optional string sn = 2;
> > }
> >
> > I've generated an LzoProtobufB64LineStorage store function using
> > HadoopProtoCodeGenerator (which is enormously convenient, by the way) for
> > Session messages, and it throws the following error when it tries to
> store
> > repeated Login messages:
> >
> > Pig Stack Trace
> > ---------------
> > ERROR 0: org.apache.pig.data.DefaultDataBag cannot be cast to
> > java.util.List
> >
> > org.apache.pig.backend.executionengine.ExecException: ERROR 0:
> > org.apache.pig.data.DefaultDataBag cannot be cast to java.util.List
> >        at
> >
> >
> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:184)
> >        at
> > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773)
> >        at org.apache.pig.PigServer.execute(PigServer.java:766)
> >        at org.apache.pig.PigServer.access$100(PigServer.java:89)
> >        at org.apache.pig.PigServer$Graph.execute(PigServer.java:937)
> >        at org.apache.pig.PigServer.executeBatch(PigServer.java:249)
> >        at
> > org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:113)
> >        at
> >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
> >        at
> >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
> >        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> >        at org.apache.pig.Main.main(Main.java:397)
> > Caused by: java.lang.ClassCastException:
> org.apache.pig.data.DefaultDataBag
> > cannot be cast to java.util.List
> >        at
> >
> >
> com.google.protobuf.GeneratedMessage$FieldAccessorTable$RepeatedFieldAccessor.set(GeneratedMessage.java:1140)
> >        at
> >
> >
> com.google.protobuf.GeneratedMessage$Builder.setField(GeneratedMessage.java:206)
> >        at
> >
> >
> com.google.protobuf.GeneratedMessage$Builder.setField(GeneratedMessage.java:147)
> >        at
> >
> >
> com.twitter.elephantbird.pig.store.LzoProtobufB64LinePigStorage.putNext(Unknown
> > Source)
> >        at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:121)
> >        at
> >
> >
> org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146)
> >        at
> >
> >
> org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
> >        at
> >
> >
> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)
> >        ... 10 more
> >
> >
> ================================================================================
> >
> > Have any of you run into this issue/are you Twitter devs able to shed
> some
> > light on what the problem is here?
> >
> > Thanks,
> > Vikram
> >
>

Re: Elephant Bird Protobuf --> Pig Mapping?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Vikram,
Sounds like a bug. Can you open a ticket on github?
Glad to hear you are using this code!

-D

On Mon, Apr 19, 2010 at 2:57 PM, Vikram Oberoi <vi...@meebo-inc.com> wrote:

> Hey folks,
>
> My understanding of the elephant bird code Twitter recently released is
> that
> 'repeated' protocol buffer fields map to DataBags in Pig. I'm getting an
> error saying "org.apache.pig.data.DefaultDataBag cannot be cast to
> java.util.List" when I try to store repeated fields, though, so either I'm
> wrong or something more sinister is at work.
>
> I'm trying to *store* data with the following interface:
>
> message Session {
>  optional string id = 1;
>  repeated Login logins = 2;
>  ...
>  }
>
> message Login {
>  optional string protocol = 1;
>  optional string sn = 2;
> }
>
> I've generated an LzoProtobufB64LineStorage store function using
> HadoopProtoCodeGenerator (which is enormously convenient, by the way) for
> Session messages, and it throws the following error when it tries to store
> repeated Login messages:
>
> Pig Stack Trace
> ---------------
> ERROR 0: org.apache.pig.data.DefaultDataBag cannot be cast to
> java.util.List
>
> org.apache.pig.backend.executionengine.ExecException: ERROR 0:
> org.apache.pig.data.DefaultDataBag cannot be cast to java.util.List
>        at
>
> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:184)
>        at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773)
>        at org.apache.pig.PigServer.execute(PigServer.java:766)
>        at org.apache.pig.PigServer.access$100(PigServer.java:89)
>        at org.apache.pig.PigServer$Graph.execute(PigServer.java:937)
>        at org.apache.pig.PigServer.executeBatch(PigServer.java:249)
>        at
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:113)
>        at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
>        at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
>        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>        at org.apache.pig.Main.main(Main.java:397)
> Caused by: java.lang.ClassCastException: org.apache.pig.data.DefaultDataBag
> cannot be cast to java.util.List
>        at
>
> com.google.protobuf.GeneratedMessage$FieldAccessorTable$RepeatedFieldAccessor.set(GeneratedMessage.java:1140)
>        at
>
> com.google.protobuf.GeneratedMessage$Builder.setField(GeneratedMessage.java:206)
>        at
>
> com.google.protobuf.GeneratedMessage$Builder.setField(GeneratedMessage.java:147)
>        at
>
> com.twitter.elephantbird.pig.store.LzoProtobufB64LinePigStorage.putNext(Unknown
> Source)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:121)
>        at
>
> org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146)
>        at
>
> org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
>        at
>
> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)
>        ... 10 more
>
> ================================================================================
>
> Have any of you run into this issue/are you Twitter devs able to shed some
> light on what the problem is here?
>
> Thanks,
> Vikram
>