You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Dmitriy Ryaboy <dv...@gmail.com> on 2011/05/02 02:27:51 UTC

Re: Problem with a protobuf in EB

Actually pig does some class loader magic for finding classes in registered
classes on the front end.
We recently added that to Elephant bird so that it works when the proto or
thrift classes aren't already on the classpath and are only registered -- I
believe I merged that into the 8 branch, so if Gerrit updates his packages
with the most recent version it should " just work"

D

On Fri, Apr 29, 2011 at 5:11 PM, Gerrit Jansen van Vuuren <
gerritjvv@googlemail.com> wrote:

> Pig has a backend and front end.
>  I.e.
>  Front End:
>     Pig JVM instance.
>  Back End
>     Pig classes running your M/R job on hadoop.
>
>  When pig instantiates the same loader in the front and back end to get
> different information on loading the job files. e.g. Which files to Load?
> This is decided in the front end, Reading the actual file? This is done in
> the back end.
>
>  The java classes for your GPB message needs to be present in the Front and
> Back end.
>
>  How?
>   REGISTER <jar> === Back End
>   $PIG_HOME/lib/ == Front End
>
>
> Cheers,
>  Gerrit
>
> On Sat, Apr 30, 2011 at 2:02 AM, Kris Coward <kr...@melon.org> wrote:
>
> >
> > Here we go:
> >
> > META-INF/
> > META-INF/MANIFEST.MF
> > com/work/logs/LogFormat$1.class
> > com/work/logs/LogFormat$Apa$Builder.class
> > com/work/logs/LogFormat$Apa.class
> > com/work/logs/LogFormat.class
> > com/work/logs/LogFormat$Cpu$Builder.class
> > com/work/logs/LogFormat$Cpu.class
> > com/work/logs/LogFormat$Evt$Builder.class
> > com/work/logs/LogFormat$Evt.class
> > com/work/logs/LogFormat$FirstMsg$Builder.class
> > com/work/logs/LogFormat$FirstMsg.class
> > com/work/logs/LogFormat$Gci$Builder.class
> > com/work/logs/LogFormat$Gci.class
> > com/work/logs/LogFormat$Inr$Builder.class
> > com/work/logs/LogFormat$Inr.class
> > com/work/logs/LogFormat$Ins$Builder.class
> > com/work/logs/LogFormat$Ins.class
> > com/work/logs/LogFormat$Mer$Builder.class
> > com/work/logs/LogFormat$Mer.class
> > com/work/logs/LogFormat$Mes$Builder.class
> > com/work/logs/LogFormat$Mes.class
> > com/work/logs/LogFormat$Mtu$Builder.class
> > com/work/logs/LogFormat$Mtu.class
> > com/work/logs/LogFormat$Nei$Builder.class
> > com/work/logs/LogFormat$Nei.class
> > com/work/logs/LogFormat$Nes$Builder.class
> > com/work/logs/LogFormat$Nes.class
> > com/work/logs/LogFormat$Ntr$Builder.class
> > com/work/logs/LogFormat$Ntr.class
> > com/work/logs/LogFormat$Nts$Builder.class
> > com/work/logs/LogFormat$Nts.class
> > com/work/logs/LogFormat$Pgr$Builder.class
> > com/work/logs/LogFormat$Pgr.class
> > com/work/logs/LogFormat$Psr$Builder.class
> > com/work/logs/LogFormat$Psr.class
> > com/work/logs/LogFormat$Pst$Builder.class
> > com/work/logs/LogFormat$Pst.class
> > com/work/logs/LogFormat$Ucc$Builder.class
> > com/work/logs/LogFormat$Ucc.class
> >
> > On Fri, Apr 29, 2011 at 04:16:05PM -0700, Dmitriy Ryaboy wrote:
> > > and the contents of '/home/kris/swineflu/logformats-0.1.2.jar'  (jar
> -tf)
> > >
> > > D
> > >
> > > On Fri, Apr 29, 2011 at 1:15 PM, Kris Coward <kr...@melon.org> wrote:
> > >
> > > >
> > > > Well I'll send up to the point where it fails and exits, since the
> rest
> > > > seems kinda superfluous.. here it is:
> > > >
> > > > REGISTER '/usr/local/hadoopgpl/lib/slf4j-api-1.5.8.jar'
> > > > REGISTER '/usr/local/hadoopgpl/lib/slf4j-log4j12-1.5.10.jar'
> > > > REGISTER '/usr/local/pig/lib/elephant-bird.jar'
> > > > REGISTER '/usr/local/pig/lib/hadoop-lzo.jar'
> > > > REGISTER '/usr/local/pig/lib/piggybank.jar'
> > > > REGISTER '/usr/local/pig/lib/jackson-core-asl-1.0.1.jar'
> > > > REGISTER '/usr/local/pig/lib/jackson-mapper-asl-1.0.1.jar'
> > > > REGISTER '/usr/local/pig/lib/jsp-2.1-6.1.4.jar'
> > > > REGISTER '/home/kris/swineflu/com.kontagent.swineflu.jar'
> > > > REGISTER '/home/kris/swineflu/logformats-0.1.2.jar'
> > > >
> > > > %declare storage
> > > > com.twitter.elephantbird.pig.proto.LzoProtobuffB64LinePigStore
> > > > %declare loader
> > > > com.twitter.elephantbird.pig.proto.LzoProtobuffB64LinePigStore
> > > >
> > > > -- load the raw data from HDFS
> > > > apaNew = LOAD '$infile/apa' USING $loader('apa');
> > > > apaTable = LOAD '$firstfile/apa' USING $loader('firstp');
> > > >
> > > >
> > > > (where $infile and $firstfile are passed as parameters at runtime,
> and
> > > > the files were tested as existing)
> > > >
> > > > Cheers,
> > > > Kris
> > > >
> > > > On Fri, Apr 29, 2011 at 01:00:55PM -0700, Dmitriy Ryaboy wrote:
> > > > > Odd.. can you send the full pig script including the register
> > statements?
> > > > >
> > > > > On Fri, Apr 29, 2011 at 11:38 AM, Kris Coward <kr...@melon.org>
> > wrote:
> > > > >
> > > > > >
> > > > > > So I've recently added a protocol/schema to a collection I got
> from
> > > > > > someone else, recompiled it, and added it to my scripts and am
> > having
> > > > > > problems.
> > > > > >
> > > > > > More specifically, it built just fine, and when REGISTERed in the
> > > > script
> > > > > > that uses it to store a relation, it seems to work fine, but when
> I
> > try
> > > > > > to use it to read that same relation back, I get the error:
> > > > > >
> > > > > > [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999:
> > Unexpected
> > > > > > internal error. Error instantiating
> > com.work.logs.LogFormat$FirstMsg
> > > > > > referred to by firstp
> > > > > >
> > > > > > With a stack trace of:
> > > > > >
> > > > > > java.lang.RuntimeException: Error instantiating
> > > > > > com.work.logs.LogFormat$FirstMsg referred to by firstp
> > > > > >        at
> > > > > >
> > > >
> >
> com.twitter.elephantbird.pig.proto.ProtobufClassUtil.loadProtoClass(Unknown
> > > > > > Source)
> > > > > >        at
> > > > > >
> > > >
> >
> com.twitter.elephantbird.pig.proto.LzoProtobuffB64LinePigStore.getSchema(Unknown
> > > > > > Source)
> > > > > >        at
> > > > > >
> > > >
> > org.apache.pig.impl.logicalLayer.LOLoad.determineSchema(LOLoad.java:186)
> > > > > >        at
> > > > > >
> org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:151)
> > > > > >        at
> > > > > >
> > > >
> >
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:851)
> > > > > >        at
> > > > > >
> > > >
> >
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
> > > > > >        at
> > > > org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1601)
> > > > > >        at
> > > > org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1551)
> > > > > >        at
> > org.apache.pig.PigServer.registerQuery(PigServer.java:523)
> > > > > >        at
> > > > > >
> > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:868)
> > > > > >        at
> > > > > >
> > > >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:388)
> > > > > >        at
> > > > > >
> > > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > > >        at
> > > > > >
> > > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > > > >        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
> > > > > >        at org.apache.pig.Main.run(Main.java:510)
> > > > > >        at org.apache.pig.Main.main(Main.java:107)
> > > > > > Caused by: java.lang.ClassNotFoundException:
> > > > > > com.work.logs.LogFormat$FirstMsg
> > > > > >        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > > > > >        at java.security.AccessController.doPrivileged(Native
> > Method)
> > > > > >        at
> > java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > > > > >        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
> > > > > >        at
> > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > > > > >        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
> > > > > >        ... 16 more
> > > > > >
> > > > > > The protocol itself is pretty simple, just:
> > > > > >
> > > > > > message FirstMsg{
> > > > > >    optional string uid = 1;
> > > > > >    optional int64 timestamp = 2;
> > > > > >    optional string type = 3;
> > > > > > }
> > > > > >
> > > > > > The other classes in the jar file seem to be loading just fine,
> > > > > > producing notices along the lines of:
> > > > > >
> > > > > > [main] INFO  com.twitter.elephantbird.pig.proto.ProtobufClassUtil
> -
> > > > Using
> > > > > > com.work.logs.LogFormat$Apa mapped by apa
> > > > > >
> > > > > > Any help figuring out why this is failing would be appreciated. I
> > have
> > > > a
> > > > > > strong suspicion that it's something simple that I just keep
> > looking
> > > > > > past.
> > > > > >
> > > > > > Thanks,
> > > > > > Kris
> > > > > >
> > > > > > --
> > > > > > Kris Coward
> > > > http://unripe.melon.org/
> > > > > > GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB
> 12B3
> > > > > >
> > > >
> > > > --
> > > > Kris Coward
> > http://unripe.melon.org/
> > > > GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3
> > > >
> >
> > --
> > Kris Coward                                     http://unripe.melon.org/
> > GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3
> >
>

Re: Problem with a protobuf in EB

Posted by Gerrit Jansen van Vuuren <ge...@googlemail.com>.
Hi,

Thanks Dmitriy,  I wasn't aware of these changes, would make life much
easier, given that users do not always have the permissions to add jars to
the $PIG_HOME/lib dirs or classpaths.
In any case I've been planning to make a major update on the
hadoop-gpl-packaging rpms to include all of the latest changes.

Cheers,
 Gerrit

On Mon, May 2, 2011 at 2:27 AM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> Actually pig does some class loader magic for finding classes in registered
> classes on the front end.
> We recently added that to Elephant bird so that it works when the proto or
> thrift classes aren't already on the classpath and are only registered -- I
> believe I merged that into the 8 branch, so if Gerrit updates his packages
> with the most recent version it should " just work"
>
> D
>
> On Fri, Apr 29, 2011 at 5:11 PM, Gerrit Jansen van Vuuren <
> gerritjvv@googlemail.com> wrote:
>
> > Pig has a backend and front end.
> >  I.e.
> >  Front End:
> >     Pig JVM instance.
> >  Back End
> >     Pig classes running your M/R job on hadoop.
> >
> >  When pig instantiates the same loader in the front and back end to get
> > different information on loading the job files. e.g. Which files to Load?
> > This is decided in the front end, Reading the actual file? This is done
> in
> > the back end.
> >
> >  The java classes for your GPB message needs to be present in the Front
> and
> > Back end.
> >
> >  How?
> >   REGISTER <jar> === Back End
> >   $PIG_HOME/lib/ == Front End
> >
> >
> > Cheers,
> >  Gerrit
> >
> > On Sat, Apr 30, 2011 at 2:02 AM, Kris Coward <kr...@melon.org> wrote:
> >
> > >
> > > Here we go:
> > >
> > > META-INF/
> > > META-INF/MANIFEST.MF
> > > com/work/logs/LogFormat$1.class
> > > com/work/logs/LogFormat$Apa$Builder.class
> > > com/work/logs/LogFormat$Apa.class
> > > com/work/logs/LogFormat.class
> > > com/work/logs/LogFormat$Cpu$Builder.class
> > > com/work/logs/LogFormat$Cpu.class
> > > com/work/logs/LogFormat$Evt$Builder.class
> > > com/work/logs/LogFormat$Evt.class
> > > com/work/logs/LogFormat$FirstMsg$Builder.class
> > > com/work/logs/LogFormat$FirstMsg.class
> > > com/work/logs/LogFormat$Gci$Builder.class
> > > com/work/logs/LogFormat$Gci.class
> > > com/work/logs/LogFormat$Inr$Builder.class
> > > com/work/logs/LogFormat$Inr.class
> > > com/work/logs/LogFormat$Ins$Builder.class
> > > com/work/logs/LogFormat$Ins.class
> > > com/work/logs/LogFormat$Mer$Builder.class
> > > com/work/logs/LogFormat$Mer.class
> > > com/work/logs/LogFormat$Mes$Builder.class
> > > com/work/logs/LogFormat$Mes.class
> > > com/work/logs/LogFormat$Mtu$Builder.class
> > > com/work/logs/LogFormat$Mtu.class
> > > com/work/logs/LogFormat$Nei$Builder.class
> > > com/work/logs/LogFormat$Nei.class
> > > com/work/logs/LogFormat$Nes$Builder.class
> > > com/work/logs/LogFormat$Nes.class
> > > com/work/logs/LogFormat$Ntr$Builder.class
> > > com/work/logs/LogFormat$Ntr.class
> > > com/work/logs/LogFormat$Nts$Builder.class
> > > com/work/logs/LogFormat$Nts.class
> > > com/work/logs/LogFormat$Pgr$Builder.class
> > > com/work/logs/LogFormat$Pgr.class
> > > com/work/logs/LogFormat$Psr$Builder.class
> > > com/work/logs/LogFormat$Psr.class
> > > com/work/logs/LogFormat$Pst$Builder.class
> > > com/work/logs/LogFormat$Pst.class
> > > com/work/logs/LogFormat$Ucc$Builder.class
> > > com/work/logs/LogFormat$Ucc.class
> > >
> > > On Fri, Apr 29, 2011 at 04:16:05PM -0700, Dmitriy Ryaboy wrote:
> > > > and the contents of '/home/kris/swineflu/logformats-0.1.2.jar'  (jar
> > -tf)
> > > >
> > > > D
> > > >
> > > > On Fri, Apr 29, 2011 at 1:15 PM, Kris Coward <kr...@melon.org> wrote:
> > > >
> > > > >
> > > > > Well I'll send up to the point where it fails and exits, since the
> > rest
> > > > > seems kinda superfluous.. here it is:
> > > > >
> > > > > REGISTER '/usr/local/hadoopgpl/lib/slf4j-api-1.5.8.jar'
> > > > > REGISTER '/usr/local/hadoopgpl/lib/slf4j-log4j12-1.5.10.jar'
> > > > > REGISTER '/usr/local/pig/lib/elephant-bird.jar'
> > > > > REGISTER '/usr/local/pig/lib/hadoop-lzo.jar'
> > > > > REGISTER '/usr/local/pig/lib/piggybank.jar'
> > > > > REGISTER '/usr/local/pig/lib/jackson-core-asl-1.0.1.jar'
> > > > > REGISTER '/usr/local/pig/lib/jackson-mapper-asl-1.0.1.jar'
> > > > > REGISTER '/usr/local/pig/lib/jsp-2.1-6.1.4.jar'
> > > > > REGISTER '/home/kris/swineflu/com.kontagent.swineflu.jar'
> > > > > REGISTER '/home/kris/swineflu/logformats-0.1.2.jar'
> > > > >
> > > > > %declare storage
> > > > > com.twitter.elephantbird.pig.proto.LzoProtobuffB64LinePigStore
> > > > > %declare loader
> > > > > com.twitter.elephantbird.pig.proto.LzoProtobuffB64LinePigStore
> > > > >
> > > > > -- load the raw data from HDFS
> > > > > apaNew = LOAD '$infile/apa' USING $loader('apa');
> > > > > apaTable = LOAD '$firstfile/apa' USING $loader('firstp');
> > > > >
> > > > >
> > > > > (where $infile and $firstfile are passed as parameters at runtime,
> > and
> > > > > the files were tested as existing)
> > > > >
> > > > > Cheers,
> > > > > Kris
> > > > >
> > > > > On Fri, Apr 29, 2011 at 01:00:55PM -0700, Dmitriy Ryaboy wrote:
> > > > > > Odd.. can you send the full pig script including the register
> > > statements?
> > > > > >
> > > > > > On Fri, Apr 29, 2011 at 11:38 AM, Kris Coward <kr...@melon.org>
> > > wrote:
> > > > > >
> > > > > > >
> > > > > > > So I've recently added a protocol/schema to a collection I got
> > from
> > > > > > > someone else, recompiled it, and added it to my scripts and am
> > > having
> > > > > > > problems.
> > > > > > >
> > > > > > > More specifically, it built just fine, and when REGISTERed in
> the
> > > > > script
> > > > > > > that uses it to store a relation, it seems to work fine, but
> when
> > I
> > > try
> > > > > > > to use it to read that same relation back, I get the error:
> > > > > > >
> > > > > > > [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999:
> > > Unexpected
> > > > > > > internal error. Error instantiating
> > > com.work.logs.LogFormat$FirstMsg
> > > > > > > referred to by firstp
> > > > > > >
> > > > > > > With a stack trace of:
> > > > > > >
> > > > > > > java.lang.RuntimeException: Error instantiating
> > > > > > > com.work.logs.LogFormat$FirstMsg referred to by firstp
> > > > > > >        at
> > > > > > >
> > > > >
> > >
> >
> com.twitter.elephantbird.pig.proto.ProtobufClassUtil.loadProtoClass(Unknown
> > > > > > > Source)
> > > > > > >        at
> > > > > > >
> > > > >
> > >
> >
> com.twitter.elephantbird.pig.proto.LzoProtobuffB64LinePigStore.getSchema(Unknown
> > > > > > > Source)
> > > > > > >        at
> > > > > > >
> > > > >
> > >
> org.apache.pig.impl.logicalLayer.LOLoad.determineSchema(LOLoad.java:186)
> > > > > > >        at
> > > > > > >
> > org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:151)
> > > > > > >        at
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:851)
> > > > > > >        at
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
> > > > > > >        at
> > > > > org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1601)
> > > > > > >        at
> > > > > org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1551)
> > > > > > >        at
> > > org.apache.pig.PigServer.registerQuery(PigServer.java:523)
> > > > > > >        at
> > > > > > >
> > > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:868)
> > > > > > >        at
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:388)
> > > > > > >        at
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> > > > > > >        at
> > > > > > >
> > > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> > > > > > >        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
> > > > > > >        at org.apache.pig.Main.run(Main.java:510)
> > > > > > >        at org.apache.pig.Main.main(Main.java:107)
> > > > > > > Caused by: java.lang.ClassNotFoundException:
> > > > > > > com.work.logs.LogFormat$FirstMsg
> > > > > > >        at
> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > > > > > >        at java.security.AccessController.doPrivileged(Native
> > > Method)
> > > > > > >        at
> > > java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > > > > > >        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
> > > > > > >        at
> > > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > > > > > >        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
> > > > > > >        ... 16 more
> > > > > > >
> > > > > > > The protocol itself is pretty simple, just:
> > > > > > >
> > > > > > > message FirstMsg{
> > > > > > >    optional string uid = 1;
> > > > > > >    optional int64 timestamp = 2;
> > > > > > >    optional string type = 3;
> > > > > > > }
> > > > > > >
> > > > > > > The other classes in the jar file seem to be loading just fine,
> > > > > > > producing notices along the lines of:
> > > > > > >
> > > > > > > [main] INFO
>  com.twitter.elephantbird.pig.proto.ProtobufClassUtil
> > -
> > > > > Using
> > > > > > > com.work.logs.LogFormat$Apa mapped by apa
> > > > > > >
> > > > > > > Any help figuring out why this is failing would be appreciated.
> I
> > > have
> > > > > a
> > > > > > > strong suspicion that it's something simple that I just keep
> > > looking
> > > > > > > past.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Kris
> > > > > > >
> > > > > > > --
> > > > > > > Kris Coward
> > > > > http://unripe.melon.org/
> > > > > > > GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB
> > 12B3
> > > > > > >
> > > > >
> > > > > --
> > > > > Kris Coward
> > > http://unripe.melon.org/
> > > > > GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3
> > > > >
> > >
> > > --
> > > Kris Coward
> http://unripe.melon.org/
> > > GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3
> > >
> >
>