You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Christian Decker <de...@gmail.com> on 2010/08/13 18:30:25 UTC

Cassandra and Pig

Hi all,

I'm trying to get Pig to read data from a Cassandra cluster, which I thought
trivial since Cassandra already provides me with the CassandraStorage class.
Problem is that once I try executing a simple script like this:

register /path/to/pig-0.7.0-core.jar;register /path/to/libthrift-r917130.jar;
register /path/to/cassandra_loadfunc.jarrows = LOAD
'cassandra://Keyspace1/Standard1' USING
org.apache.cassandra.hadoop.pig.CassandraStorage();cols = FOREACH rows
GENERATE flatten($1);colnames = FOREACH cols GENERATE $0;namegroups =
GROUP colnames BY $0;namecounts = FOREACH namegroups GENERATE
COUNT($1), group;orderednames = ORDER namecounts BY $0;topnames =
LIMIT orderednames 50;dump topnames;

I just end up with a NoClassDefFoundError:

ERROR org.apache.pig.tools.grunt.Grunt -
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
open iterator for alias topnames
at org.apache.pig.PigServer.openIterator(PigServer.java:521)
 at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
 at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:391)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002:
Unable to store alias topnames
 at org.apache.pig.PigServer.store(PigServer.java:577)
at org.apache.pig.PigServer.openIterator(PigServer.java:504)
 ... 6 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117:
Unexpected error when launching map reduce job.
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:209)
 at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835)
 at org.apache.pig.PigServer.store(PigServer.java:569)
... 7 more
Caused by: java.lang.RuntimeException: Could not resolve error that occured
when launching map reduce job: java.lang.NoClassDefFoundError:
org/apache/thrift/TBase
 at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510)
 at java.lang.Thread.dispatchUncaughtException(Thread.java:1845)

I cannot think of a reason as to why. As far as I understood it Pig takes
the jar files in the script, unpackages them, creates the execution plan for
the script itself and then bundles it into a single jar again, then submits
it to the HDFS from where it will be executed in Hadoop, right?
I also checked that the class in question actually is in the libthrift jar,
so what's going wrong?

Regards,
Chris

Re: Cassandra and Pig

Posted by Christian Decker <de...@gmail.com>.

Hm,
that was my conclusion too, but somehow I don't get what I'm doing wrong. I
checked that the thrift library is in CLASSPATH and the PIG_CLASSPATH and as
shown in the script above I'm using register to add the library to the
dependencies. Am I missing something else?

Regards,
Chris
--
Christian Decker
Software Architect
http://blog.snyke.net


On Wed, Aug 18, 2010 at 8:09 PM, Stu Hood <st...@rackspace.com> wrote:

> Needing to manually copy the jars to all of the nodes would mean that you
> aren't applying the Pig 'register <jar>;' command properly.
>
> -----Original Message-----
> From: "Christian Decker" <de...@gmail.com>
> Sent: Wednesday, August 18, 2010 7:08am
> To: user@cassandra.apache.org
> Subject: Re: Cassandra and Pig
>
> I got one step further by cheating a bit, I just took all the Cassandra
> Jars
> and dropped them into the Hadoop lib folder, so at least now I can run some
> pig scripts over the data in Cassandra, but this is far from optimal since
> it means I'd have to distribute my UDFs also to the Hadoop cluster, or did
> I
> miss something?
>
> Regards,
> Chris
> --
> Christian Decker
> Software Architect
> http://blog.snyke.net
>
>
> On Tue, Aug 17, 2010 at 4:04 PM, Christian Decker <
> decker.christian@gmail.com> wrote:
>
> > Ok, by now it's getting very strange. I deleted the entire installation
> and
> > restarted from scratch and now I'm getting a similar error even though
> I'm
> > going through the pig_cassandra script.
> >
> > 2010-08-17 15:54:10,049 [main] INFO
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 0% complete
> > 2010-08-17 15:55:10,032 [Thread-10] INFO
> >  org.apache.cassandra.config.DatabaseDescriptor - Auto DiskAccessMode
> > determined to be standard
> > 2010-08-17 15:55:24,652 [main] INFO
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - HadoopJobId: job_201008111350_0020
> > 2010-08-17 15:55:24,652 [main] INFO
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - More information at:
> > http://hmaster:50030/jobdetails.jsp?jobid=job_201008111350_0020
> > 2010-08-17 15:56:05,690 [main] INFO
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 33% complete
> > 2010-08-17 15:56:09,874 [main] INFO
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 100% complete
> > 2010-08-17 15:56:09,874 [main] ERROR
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 1 map reduce job(s) failed!
> > 2010-08-17 15:56:10,261 [main] INFO
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - Failed!
> > 2010-08-17 15:56:10,351 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > ERROR 2997: Unable to recreate exception from backed error: Error:
> > java.lang.ClassNotFoundException: org.apache.thrift.TBase
> >
> >
> > which is a bit different from my original error, but on the backend I get
> a
> > classic ClassNotFoundException.
> >
> > Any ideas?
> > --
> > Christian Decker
> > Software Architect
> > http://blog.snyke.net
> >
>
>
>

Re: Cassandra and Pig

Posted by Stu Hood <st...@rackspace.com>.

Needing to manually copy the jars to all of the nodes would mean that you aren't applying the Pig 'register <jar>;' command properly.

-----Original Message-----
From: "Christian Decker" <de...@gmail.com>
Sent: Wednesday, August 18, 2010 7:08am
To: user@cassandra.apache.org
Subject: Re: Cassandra and Pig

I got one step further by cheating a bit, I just took all the Cassandra Jars
and dropped them into the Hadoop lib folder, so at least now I can run some
pig scripts over the data in Cassandra, but this is far from optimal since
it means I'd have to distribute my UDFs also to the Hadoop cluster, or did I
miss something?

Regards,
Chris
--
Christian Decker
Software Architect
http://blog.snyke.net


On Tue, Aug 17, 2010 at 4:04 PM, Christian Decker <
decker.christian@gmail.com> wrote:

> Ok, by now it's getting very strange. I deleted the entire installation and
> restarted from scratch and now I'm getting a similar error even though I'm
> going through the pig_cassandra script.
>
> 2010-08-17 15:54:10,049 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete
> 2010-08-17 15:55:10,032 [Thread-10] INFO
>  org.apache.cassandra.config.DatabaseDescriptor - Auto DiskAccessMode
> determined to be standard
> 2010-08-17 15:55:24,652 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - HadoopJobId: job_201008111350_0020
> 2010-08-17 15:55:24,652 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - More information at:
> http://hmaster:50030/jobdetails.jsp?jobid=job_201008111350_0020
> 2010-08-17 15:56:05,690 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 33% complete
> 2010-08-17 15:56:09,874 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
> 2010-08-17 15:56:09,874 [main] ERROR
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map reduce job(s) failed!
> 2010-08-17 15:56:10,261 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed!
> 2010-08-17 15:56:10,351 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 2997: Unable to recreate exception from backed error: Error:
> java.lang.ClassNotFoundException: org.apache.thrift.TBase
>
>
> which is a bit different from my original error, but on the backend I get a
> classic ClassNotFoundException.
>
> Any ideas?
> --
> Christian Decker
> Software Architect
> http://blog.snyke.net
>

Re: Cassandra and Pig

Posted by Christian Decker <de...@gmail.com>.

I got one step further by cheating a bit, I just took all the Cassandra Jars
and dropped them into the Hadoop lib folder, so at least now I can run some
pig scripts over the data in Cassandra, but this is far from optimal since
it means I'd have to distribute my UDFs also to the Hadoop cluster, or did I
miss something?

Regards,
Chris
--
Christian Decker
Software Architect
http://blog.snyke.net


On Tue, Aug 17, 2010 at 4:04 PM, Christian Decker <
decker.christian@gmail.com> wrote:

> Ok, by now it's getting very strange. I deleted the entire installation and
> restarted from scratch and now I'm getting a similar error even though I'm
> going through the pig_cassandra script.
>
> 2010-08-17 15:54:10,049 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete
> 2010-08-17 15:55:10,032 [Thread-10] INFO
>  org.apache.cassandra.config.DatabaseDescriptor - Auto DiskAccessMode
> determined to be standard
> 2010-08-17 15:55:24,652 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - HadoopJobId: job_201008111350_0020
> 2010-08-17 15:55:24,652 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - More information at:
> http://hmaster:50030/jobdetails.jsp?jobid=job_201008111350_0020
> 2010-08-17 15:56:05,690 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 33% complete
> 2010-08-17 15:56:09,874 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
> 2010-08-17 15:56:09,874 [main] ERROR
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map reduce job(s) failed!
> 2010-08-17 15:56:10,261 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed!
> 2010-08-17 15:56:10,351 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 2997: Unable to recreate exception from backed error: Error:
> java.lang.ClassNotFoundException: org.apache.thrift.TBase
>
>
> which is a bit different from my original error, but on the backend I get a
> classic ClassNotFoundException.
>
> Any ideas?
> --
> Christian Decker
> Software Architect
> http://blog.snyke.net
>

Re: Cassandra and Pig

Posted by Christian Decker <de...@gmail.com>.

Ok, by now it's getting very strange. I deleted the entire installation and
restarted from scratch and now I'm getting a similar error even though I'm
going through the pig_cassandra script.

2010-08-17 15:54:10,049 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2010-08-17 15:55:10,032 [Thread-10] INFO
 org.apache.cassandra.config.DatabaseDescriptor - Auto DiskAccessMode
determined to be standard
2010-08-17 15:55:24,652 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201008111350_0020
2010-08-17 15:55:24,652 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at:
http://hmaster:50030/jobdetails.jsp?jobid=job_201008111350_0020
2010-08-17 15:56:05,690 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 33% complete
2010-08-17 15:56:09,874 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2010-08-17 15:56:09,874 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map reduce job(s) failed!
2010-08-17 15:56:10,261 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!
2010-08-17 15:56:10,351 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 2997: Unable to recreate exception from backed error: Error:
java.lang.ClassNotFoundException: org.apache.thrift.TBase


which is a bit different from my original error, but on the backend I get a
classic ClassNotFoundException.

Any ideas?
--
Christian Decker
Software Architect
http://blog.snyke.net

Re: Cassandra and Pig

Posted by Christian Decker <de...@gmail.com>.

I'm using Cassandra 0.6.3 but plan on switching to 0.7.0 later. While
compiling I have a copy of the storage-conf.xml from the running cluster :-)

On Fri, Aug 13, 2010 at 9:51 PM, Stu Hood <st...@rackspace.com> wrote:

> > Still I get an exception which I cannot explain where it comes
> > from (http://pastebin.com/JYfSSfny)
> Which version of Cassandra are you using? The 0.6 series requires that a
> valid storage-conf.xml is distributed with the job to specify
> connection/partitioner/etc information, but trunk/0.7-beta2 requires
> properties to be set by your startup script.
>
> -----Original Message-----
> From: "Stu Hood" <st...@rackspace.com>
> Sent: Friday, August 13, 2010 2:31pm
> To: user@cassandra.apache.org
> Subject: Re: Cassandra and Pig
>
> Hmm, the example code there may not have been run in distributed mode
> recently, or perhaps Pig performs some magic to automatically register Jars
> containing classes directly referenced as UDFs.
>
> -----Original Message-----
> From: "Christian Decker" <de...@gmail.com>
> Sent: Friday, August 13, 2010 12:16pm
> To: user@cassandra.apache.org
> Subject: Re: Cassandra and Pig
>
> Wow, that was extremely quick, thanks Stu :-)
> I'm still a bit unclear on what the pig_cassandra script does. It sets some
> variables (PIG_CLASSPATH for one) and then starts the original pig binary
> but injects some libraries in it (libthrift and pig-core) but strangely not
> the cassandra loadfunc, why not?
>
> Anyway now I understand why I was getting different errors when executing
> directly via Pig compared to through pig_cassandra. Still I get an
> exception
> which I cannot explain where it comes from (http://pastebin.com/JYfSSfny):
>
> Caused by: java.lang.RuntimeException: Could not resolve error that occured
> when launching map reduce job: java.lang.ExceptionInInitializerError
>  at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510)
>  at java.lang.Thread.dispatchUncaughtException(Thread.java:1845)
>
>
> Any idea? Thanks again for your fast answer :)
>
> On Fri, Aug 13, 2010 at 6:55 PM, Stu Hood <st...@rackspace.com> wrote:
>
> > That error is coming from the frontend: the jars must also be on the
> local
> > classpath. Take a look at how contrib/pig/bin/pig_cassandra sets up
> > $PIG_CLASSPATH.
> >
> > -----Original Message-----
> > From: "Christian Decker" <de...@gmail.com>
> > Sent: Friday, August 13, 2010 11:30am
> > To: user@cassandra.apache.org
> > Subject: Cassandra and Pig
> >
> > Hi all,
> >
> > I'm trying to get Pig to read data from a Cassandra cluster, which I
> > thought
> > trivial since Cassandra already provides me with the CassandraStorage
> > class.
> > Problem is that once I try executing a simple script like this:
> >
> > register /path/to/pig-0.7.0-core.jar;register
> > /path/to/libthrift-r917130.jar;
> > register /path/to/cassandra_loadfunc.jarrows = LOAD
> > 'cassandra://Keyspace1/Standard1' USING
> > org.apache.cassandra.hadoop.pig.CassandraStorage();cols = FOREACH rows
> > GENERATE flatten($1);colnames = FOREACH cols GENERATE $0;namegroups =
> > GROUP colnames BY $0;namecounts = FOREACH namegroups GENERATE
> > COUNT($1), group;orderednames = ORDER namecounts BY $0;topnames =
> > LIMIT orderednames 50;dump topnames;
> >
> > I just end up with a NoClassDefFoundError:
> >
> > ERROR org.apache.pig.tools.grunt.Grunt -
> > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> > open iterator for alias topnames
> > at org.apache.pig.PigServer.openIterator(PigServer.java:521)
> >  at
> > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544)
> > at
> >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
> >  at
> >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
> > at
> >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
> >  at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> > at org.apache.pig.Main.main(Main.java:391)
> > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR
> 1002:
> > Unable to store alias topnames
> >  at org.apache.pig.PigServer.store(PigServer.java:577)
> > at org.apache.pig.PigServer.openIterator(PigServer.java:504)
> >  ... 6 more
> > Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
> > 2117:
> > Unexpected error when launching map reduce job.
> > at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:209)
> >  at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
> > at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835)
> >  at org.apache.pig.PigServer.store(PigServer.java:569)
> > ... 7 more
> > Caused by: java.lang.RuntimeException: Could not resolve error that
> occured
> > when launching map reduce job: java.lang.NoClassDefFoundError:
> > org/apache/thrift/TBase
> >  at
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510)
> >  at java.lang.Thread.dispatchUncaughtException(Thread.java:1845)
> >
> > I cannot think of a reason as to why. As far as I understood it Pig takes
> > the jar files in the script, unpackages them, creates the execution plan
> > for
> > the script itself and then bundles it into a single jar again, then
> submits
> > it to the HDFS from where it will be executed in Hadoop, right?
> > I also checked that the class in question actually is in the libthrift
> jar,
> > so what's going wrong?
> >
> > Regards,
> > Chris
> >
> >
> >
>
>
>
>
>

Re: Cassandra and Pig

Posted by Stu Hood <st...@rackspace.com>.

> Still I get an exception which I cannot explain where it comes
> from (http://pastebin.com/JYfSSfny)
Which version of Cassandra are you using? The 0.6 series requires that a valid storage-conf.xml is distributed with the job to specify connection/partitioner/etc information, but trunk/0.7-beta2 requires properties to be set by your startup script.

-----Original Message-----
From: "Stu Hood" <st...@rackspace.com>
Sent: Friday, August 13, 2010 2:31pm
To: user@cassandra.apache.org
Subject: Re: Cassandra and Pig

Hmm, the example code there may not have been run in distributed mode recently, or perhaps Pig performs some magic to automatically register Jars containing classes directly referenced as UDFs.

-----Original Message-----
From: "Christian Decker" <de...@gmail.com>
Sent: Friday, August 13, 2010 12:16pm
To: user@cassandra.apache.org
Subject: Re: Cassandra and Pig

Wow, that was extremely quick, thanks Stu :-)
I'm still a bit unclear on what the pig_cassandra script does. It sets some
variables (PIG_CLASSPATH for one) and then starts the original pig binary
but injects some libraries in it (libthrift and pig-core) but strangely not
the cassandra loadfunc, why not?

Anyway now I understand why I was getting different errors when executing
directly via Pig compared to through pig_cassandra. Still I get an exception
which I cannot explain where it comes from (http://pastebin.com/JYfSSfny):

Caused by: java.lang.RuntimeException: Could not resolve error that occured
when launching map reduce job: java.lang.ExceptionInInitializerError
 at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510)
 at java.lang.Thread.dispatchUncaughtException(Thread.java:1845)


Any idea? Thanks again for your fast answer :)

On Fri, Aug 13, 2010 at 6:55 PM, Stu Hood <st...@rackspace.com> wrote:

> That error is coming from the frontend: the jars must also be on the local
> classpath. Take a look at how contrib/pig/bin/pig_cassandra sets up
> $PIG_CLASSPATH.
>
> -----Original Message-----
> From: "Christian Decker" <de...@gmail.com>
> Sent: Friday, August 13, 2010 11:30am
> To: user@cassandra.apache.org
> Subject: Cassandra and Pig
>
> Hi all,
>
> I'm trying to get Pig to read data from a Cassandra cluster, which I
> thought
> trivial since Cassandra already provides me with the CassandraStorage
> class.
> Problem is that once I try executing a simple script like this:
>
> register /path/to/pig-0.7.0-core.jar;register
> /path/to/libthrift-r917130.jar;
> register /path/to/cassandra_loadfunc.jarrows = LOAD
> 'cassandra://Keyspace1/Standard1' USING
> org.apache.cassandra.hadoop.pig.CassandraStorage();cols = FOREACH rows
> GENERATE flatten($1);colnames = FOREACH cols GENERATE $0;namegroups =
> GROUP colnames BY $0;namecounts = FOREACH namegroups GENERATE
> COUNT($1), group;orderednames = ORDER namecounts BY $0;topnames =
> LIMIT orderednames 50;dump topnames;
>
> I just end up with a NoClassDefFoundError:
>
> ERROR org.apache.pig.tools.grunt.Grunt -
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> open iterator for alias topnames
> at org.apache.pig.PigServer.openIterator(PigServer.java:521)
>  at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544)
> at
>
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
>  at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
> at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>  at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:391)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002:
> Unable to store alias topnames
>  at org.apache.pig.PigServer.store(PigServer.java:577)
> at org.apache.pig.PigServer.openIterator(PigServer.java:504)
>  ... 6 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
> 2117:
> Unexpected error when launching map reduce job.
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:209)
>  at
>
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835)
>  at org.apache.pig.PigServer.store(PigServer.java:569)
> ... 7 more
> Caused by: java.lang.RuntimeException: Could not resolve error that occured
> when launching map reduce job: java.lang.NoClassDefFoundError:
> org/apache/thrift/TBase
>  at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510)
>  at java.lang.Thread.dispatchUncaughtException(Thread.java:1845)
>
> I cannot think of a reason as to why. As far as I understood it Pig takes
> the jar files in the script, unpackages them, creates the execution plan
> for
> the script itself and then bundles it into a single jar again, then submits
> it to the HDFS from where it will be executed in Hadoop, right?
> I also checked that the class in question actually is in the libthrift jar,
> so what's going wrong?
>
> Regards,
> Chris
>
>
>

Re: Cassandra and Pig

Posted by Stu Hood <st...@rackspace.com>.

Hmm, the example code there may not have been run in distributed mode recently, or perhaps Pig performs some magic to automatically register Jars containing classes directly referenced as UDFs.

-----Original Message-----
From: "Christian Decker" <de...@gmail.com>
Sent: Friday, August 13, 2010 12:16pm
To: user@cassandra.apache.org
Subject: Re: Cassandra and Pig

Wow, that was extremely quick, thanks Stu :-)
I'm still a bit unclear on what the pig_cassandra script does. It sets some
variables (PIG_CLASSPATH for one) and then starts the original pig binary
but injects some libraries in it (libthrift and pig-core) but strangely not
the cassandra loadfunc, why not?

Anyway now I understand why I was getting different errors when executing
directly via Pig compared to through pig_cassandra. Still I get an exception
which I cannot explain where it comes from (http://pastebin.com/JYfSSfny):

Caused by: java.lang.RuntimeException: Could not resolve error that occured
when launching map reduce job: java.lang.ExceptionInInitializerError
 at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510)
 at java.lang.Thread.dispatchUncaughtException(Thread.java:1845)


Any idea? Thanks again for your fast answer :)

On Fri, Aug 13, 2010 at 6:55 PM, Stu Hood <st...@rackspace.com> wrote:

> That error is coming from the frontend: the jars must also be on the local
> classpath. Take a look at how contrib/pig/bin/pig_cassandra sets up
> $PIG_CLASSPATH.
>
> -----Original Message-----
> From: "Christian Decker" <de...@gmail.com>
> Sent: Friday, August 13, 2010 11:30am
> To: user@cassandra.apache.org
> Subject: Cassandra and Pig
>
> Hi all,
>
> I'm trying to get Pig to read data from a Cassandra cluster, which I
> thought
> trivial since Cassandra already provides me with the CassandraStorage
> class.
> Problem is that once I try executing a simple script like this:
>
> register /path/to/pig-0.7.0-core.jar;register
> /path/to/libthrift-r917130.jar;
> register /path/to/cassandra_loadfunc.jarrows = LOAD
> 'cassandra://Keyspace1/Standard1' USING
> org.apache.cassandra.hadoop.pig.CassandraStorage();cols = FOREACH rows
> GENERATE flatten($1);colnames = FOREACH cols GENERATE $0;namegroups =
> GROUP colnames BY $0;namecounts = FOREACH namegroups GENERATE
> COUNT($1), group;orderednames = ORDER namecounts BY $0;topnames =
> LIMIT orderednames 50;dump topnames;
>
> I just end up with a NoClassDefFoundError:
>
> ERROR org.apache.pig.tools.grunt.Grunt -
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> open iterator for alias topnames
> at org.apache.pig.PigServer.openIterator(PigServer.java:521)
>  at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544)
> at
>
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
>  at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
> at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>  at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:391)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002:
> Unable to store alias topnames
>  at org.apache.pig.PigServer.store(PigServer.java:577)
> at org.apache.pig.PigServer.openIterator(PigServer.java:504)
>  ... 6 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
> 2117:
> Unexpected error when launching map reduce job.
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:209)
>  at
>
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835)
>  at org.apache.pig.PigServer.store(PigServer.java:569)
> ... 7 more
> Caused by: java.lang.RuntimeException: Could not resolve error that occured
> when launching map reduce job: java.lang.NoClassDefFoundError:
> org/apache/thrift/TBase
>  at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510)
>  at java.lang.Thread.dispatchUncaughtException(Thread.java:1845)
>
> I cannot think of a reason as to why. As far as I understood it Pig takes
> the jar files in the script, unpackages them, creates the execution plan
> for
> the script itself and then bundles it into a single jar again, then submits
> it to the HDFS from where it will be executed in Hadoop, right?
> I also checked that the class in question actually is in the libthrift jar,
> so what's going wrong?
>
> Regards,
> Chris
>
>
>

Re: Cassandra and Pig

Posted by Christian Decker <de...@gmail.com>.

Wow, that was extremely quick, thanks Stu :-)
I'm still a bit unclear on what the pig_cassandra script does. It sets some
variables (PIG_CLASSPATH for one) and then starts the original pig binary
but injects some libraries in it (libthrift and pig-core) but strangely not
the cassandra loadfunc, why not?

Anyway now I understand why I was getting different errors when executing
directly via Pig compared to through pig_cassandra. Still I get an exception
which I cannot explain where it comes from (http://pastebin.com/JYfSSfny):

Caused by: java.lang.RuntimeException: Could not resolve error that occured
when launching map reduce job: java.lang.ExceptionInInitializerError
 at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510)
 at java.lang.Thread.dispatchUncaughtException(Thread.java:1845)


Any idea? Thanks again for your fast answer :)

On Fri, Aug 13, 2010 at 6:55 PM, Stu Hood <st...@rackspace.com> wrote:

> That error is coming from the frontend: the jars must also be on the local
> classpath. Take a look at how contrib/pig/bin/pig_cassandra sets up
> $PIG_CLASSPATH.
>
> -----Original Message-----
> From: "Christian Decker" <de...@gmail.com>
> Sent: Friday, August 13, 2010 11:30am
> To: user@cassandra.apache.org
> Subject: Cassandra and Pig
>
> Hi all,
>
> I'm trying to get Pig to read data from a Cassandra cluster, which I
> thought
> trivial since Cassandra already provides me with the CassandraStorage
> class.
> Problem is that once I try executing a simple script like this:
>
> register /path/to/pig-0.7.0-core.jar;register
> /path/to/libthrift-r917130.jar;
> register /path/to/cassandra_loadfunc.jarrows = LOAD
> 'cassandra://Keyspace1/Standard1' USING
> org.apache.cassandra.hadoop.pig.CassandraStorage();cols = FOREACH rows
> GENERATE flatten($1);colnames = FOREACH cols GENERATE $0;namegroups =
> GROUP colnames BY $0;namecounts = FOREACH namegroups GENERATE
> COUNT($1), group;orderednames = ORDER namecounts BY $0;topnames =
> LIMIT orderednames 50;dump topnames;
>
> I just end up with a NoClassDefFoundError:
>
> ERROR org.apache.pig.tools.grunt.Grunt -
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> open iterator for alias topnames
> at org.apache.pig.PigServer.openIterator(PigServer.java:521)
>  at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544)
> at
>
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
>  at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
> at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>  at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:391)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002:
> Unable to store alias topnames
>  at org.apache.pig.PigServer.store(PigServer.java:577)
> at org.apache.pig.PigServer.openIterator(PigServer.java:504)
>  ... 6 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
> 2117:
> Unexpected error when launching map reduce job.
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:209)
>  at
>
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835)
>  at org.apache.pig.PigServer.store(PigServer.java:569)
> ... 7 more
> Caused by: java.lang.RuntimeException: Could not resolve error that occured
> when launching map reduce job: java.lang.NoClassDefFoundError:
> org/apache/thrift/TBase
>  at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510)
>  at java.lang.Thread.dispatchUncaughtException(Thread.java:1845)
>
> I cannot think of a reason as to why. As far as I understood it Pig takes
> the jar files in the script, unpackages them, creates the execution plan
> for
> the script itself and then bundles it into a single jar again, then submits
> it to the HDFS from where it will be executed in Hadoop, right?
> I also checked that the class in question actually is in the libthrift jar,
> so what's going wrong?
>
> Regards,
> Chris
>
>
>

RE: Cassandra and Pig

Posted by Stu Hood <st...@rackspace.com>.

That error is coming from the frontend: the jars must also be on the local classpath. Take a look at how contrib/pig/bin/pig_cassandra sets up $PIG_CLASSPATH.

-----Original Message-----
From: "Christian Decker" <de...@gmail.com>
Sent: Friday, August 13, 2010 11:30am
To: user@cassandra.apache.org
Subject: Cassandra and Pig

Hi all,

I'm trying to get Pig to read data from a Cassandra cluster, which I thought
trivial since Cassandra already provides me with the CassandraStorage class.
Problem is that once I try executing a simple script like this:

register /path/to/pig-0.7.0-core.jar;register /path/to/libthrift-r917130.jar;
register /path/to/cassandra_loadfunc.jarrows = LOAD
'cassandra://Keyspace1/Standard1' USING
org.apache.cassandra.hadoop.pig.CassandraStorage();cols = FOREACH rows
GENERATE flatten($1);colnames = FOREACH cols GENERATE $0;namegroups =
GROUP colnames BY $0;namecounts = FOREACH namegroups GENERATE
COUNT($1), group;orderednames = ORDER namecounts BY $0;topnames =
LIMIT orderednames 50;dump topnames;

I just end up with a NoClassDefFoundError:

ERROR org.apache.pig.tools.grunt.Grunt -
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
open iterator for alias topnames
at org.apache.pig.PigServer.openIterator(PigServer.java:521)
 at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
 at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:391)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002:
Unable to store alias topnames
 at org.apache.pig.PigServer.store(PigServer.java:577)
at org.apache.pig.PigServer.openIterator(PigServer.java:504)
 ... 6 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117:
Unexpected error when launching map reduce job.
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:209)
 at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835)
 at org.apache.pig.PigServer.store(PigServer.java:569)
... 7 more
Caused by: java.lang.RuntimeException: Could not resolve error that occured
when launching map reduce job: java.lang.NoClassDefFoundError:
org/apache/thrift/TBase
 at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510)
 at java.lang.Thread.dispatchUncaughtException(Thread.java:1845)

I cannot think of a reason as to why. As far as I understood it Pig takes
the jar files in the script, unpackages them, creates the execution plan for
the script itself and then bundles it into a single jar again, then submits
it to the HDFS from where it will be executed in Hadoop, right?
I also checked that the class in question actually is in the libthrift jar,
so what's going wrong?

Regards,
Chris