You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Christian Decker <de...@gmail.com> on 2010/08/13 13:21:23 UTC

Pig and Cassandra

Hi all,

I'm pretty new to Pig and Hadoop so excuse me if this is trivial, but I
couldn't find anyone able to help me.
I'm trying to get Pig to read data from a Cassandra cluster, which I thought
trivial since Cassandra already provides me with the CassandraStorage class
[1]. Problem is that once I try executing a simple script like this:

register /path/to/pig-0.7.0-core.jar;register /path/to/libthrift-r917130.jar;
register /path/to/cassandra_loadfunc.jarrows = LOAD
'cassandra://Keyspace1/Standard1' USING
org.apache.cassandra.hadoop.pig.CassandraStorage();cols = FOREACH rows
GENERATE flatten($1);colnames = FOREACH cols GENERATE $0;namegroups =
GROUP colnames BY $0;namecounts = FOREACH namegroups GENERATE
COUNT($1), group;orderednames = ORDER namecounts BY $0;topnames =
LIMIT orderednames 50;dump topnames;

I just end up with a NoClassDefFoundError:

ERROR org.apache.pig.tools.grunt.Grunt -
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
open iterator for alias topnames
at org.apache.pig.PigServer.openIterator(PigServer.java:521)
 at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
 at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:391)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002:
Unable to store alias topnames
at org.apache.pig.PigServer.store(PigServer.java:577)
 at org.apache.pig.PigServer.openIterator(PigServer.java:504)
... 6 more
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117:
Unexpected error when launching map reduce job.
 at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:209)
at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
 at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835)
at org.apache.pig.PigServer.store(PigServer.java:569)
 ... 7 more
Caused by: java.lang.RuntimeException: Could not resolve error that occured
when launching map reduce job: java.lang.NoClassDefFoundError:
org/apache/thrift/TBase
 at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510)
 at java.lang.Thread.dispatchUncaughtException(Thread.java:1845)

(sorry for posting all the error message).
I cannot think of a reason as to why. As far as I understood it Pig takes
the jar files in the script, unpackages them, creates the execution plan for
the script itself and then bundles it into a single jar again, then submits
it to the HDFS from where it will be executed in Hadoop, right?
I also checked that the class in question actually is in the libthrift jar,
so what's going wrong?

Regards,
Chris

[1]
http://svn.apache.org/viewvc/cassandra/trunk/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java?revision=984904&view=markup

Re: Pig and Cassandra

Posted by Christian Decker <de...@gmail.com>.
I've got a partial response on the Cassandra mailing list:
http://www.mail-archive.com/user@cassandra.apache.org/msg05216.html but
still now it crashes on the Hadoop side, so not quite there but getting
somewhere :-)

On Sat, Aug 14, 2010 at 2:43 AM, Bill Graham <bi...@gmail.com> wrote:

> I've seen that exception in other cases where there is an unmeet
> dependency on a superclass that is included in a separate (and not
> provided) jar. Check the thrift source to see if that's the case.
>
> On Friday, August 13, 2010, Christian Decker <de...@gmail.com>
> wrote:
> > Hi all,
> >
> > I'm pretty new to Pig and Hadoop so excuse me if this is trivial, but I
> > couldn't find anyone able to help me.
> > I'm trying to get Pig to read data from a Cassandra cluster, which I
> thought
> > trivial since Cassandra already provides me with the CassandraStorage
> class
> > [1]. Problem is that once I try executing a simple script like this:
> >
> > register /path/to/pig-0.7.0-core.jar;register
> /path/to/libthrift-r917130.jar;
> > register /path/to/cassandra_loadfunc.jarrows = LOAD
> > 'cassandra://Keyspace1/Standard1' USING
> > org.apache.cassandra.hadoop.pig.CassandraStorage();cols = FOREACH rows
> > GENERATE flatten($1);colnames = FOREACH cols GENERATE $0;namegroups =
> > GROUP colnames BY $0;namecounts = FOREACH namegroups GENERATE
> > COUNT($1), group;orderednames = ORDER namecounts BY $0;topnames =
> > LIMIT orderednames 50;dump topnames;
> >
> > I just end up with a NoClassDefFoundError:
> >
> > ERROR org.apache.pig.tools.grunt.Grunt -
> > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> > open iterator for alias topnames
> > at org.apache.pig.PigServer.openIterator(PigServer.java:521)
> >  at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544)
> > at
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
> >  at
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
> > at
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
> >  at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> > at org.apache.pig.Main.main(Main.java:391)
> > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR
> 1002:
> > Unable to store alias topnames
> > at org.apache.pig.PigServer.store(PigServer.java:577)
> >  at org.apache.pig.PigServer.openIterator(PigServer.java:504)
> > ... 6 more
> > Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
> 2117:
> > Unexpected error when launching map reduce job.
> >  at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:209)
> > at
> >
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
> >  at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835)
> > at org.apache.pig.PigServer.store(PigServer.java:569)
> >  ... 7 more
> > Caused by: java.lang.RuntimeException: Could not resolve error that
> occured
> > when launching map reduce job: java.lang.NoClassDefFoundError:
> > org/apache/thrift/TBase
> >  at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510)
> >  at java.lang.Thread.dispatchUncaughtException(Thread.java:1845)
> >
> > (sorry for posting all the error message).
> > I cannot think of a reason as to why. As far as I understood it Pig takes
> > the jar files in the script, unpackages them, creates the execution plan
> for
> > the script itself and then bundles it into a single jar again, then
> submits
> > it to the HDFS from where it will be executed in Hadoop, right?
> > I also checked that the class in question actually is in the libthrift
> jar,
> > so what's going wrong?
> >
> > Regards,
> > Chris
> >
> > [1]
> >
> http://svn.apache.org/viewvc/cassandra/trunk/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java?revision=984904&view=markup
> >
>

Re: Pig and Cassandra

Posted by Bill Graham <bi...@gmail.com>.
I've seen that exception in other cases where there is an unmeet
dependency on a superclass that is included in a separate (and not
provided) jar. Check the thrift source to see if that's the case.

On Friday, August 13, 2010, Christian Decker <de...@gmail.com> wrote:
> Hi all,
>
> I'm pretty new to Pig and Hadoop so excuse me if this is trivial, but I
> couldn't find anyone able to help me.
> I'm trying to get Pig to read data from a Cassandra cluster, which I thought
> trivial since Cassandra already provides me with the CassandraStorage class
> [1]. Problem is that once I try executing a simple script like this:
>
> register /path/to/pig-0.7.0-core.jar;register /path/to/libthrift-r917130.jar;
> register /path/to/cassandra_loadfunc.jarrows = LOAD
> 'cassandra://Keyspace1/Standard1' USING
> org.apache.cassandra.hadoop.pig.CassandraStorage();cols = FOREACH rows
> GENERATE flatten($1);colnames = FOREACH cols GENERATE $0;namegroups =
> GROUP colnames BY $0;namecounts = FOREACH namegroups GENERATE
> COUNT($1), group;orderednames = ORDER namecounts BY $0;topnames =
> LIMIT orderednames 50;dump topnames;
>
> I just end up with a NoClassDefFoundError:
>
> ERROR org.apache.pig.tools.grunt.Grunt -
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> open iterator for alias topnames
> at org.apache.pig.PigServer.openIterator(PigServer.java:521)
>  at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
>  at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>  at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:391)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002:
> Unable to store alias topnames
> at org.apache.pig.PigServer.store(PigServer.java:577)
>  at org.apache.pig.PigServer.openIterator(PigServer.java:504)
> ... 6 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117:
> Unexpected error when launching map reduce job.
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:209)
> at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
>  at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835)
> at org.apache.pig.PigServer.store(PigServer.java:569)
>  ... 7 more
> Caused by: java.lang.RuntimeException: Could not resolve error that occured
> when launching map reduce job: java.lang.NoClassDefFoundError:
> org/apache/thrift/TBase
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510)
>  at java.lang.Thread.dispatchUncaughtException(Thread.java:1845)
>
> (sorry for posting all the error message).
> I cannot think of a reason as to why. As far as I understood it Pig takes
> the jar files in the script, unpackages them, creates the execution plan for
> the script itself and then bundles it into a single jar again, then submits
> it to the HDFS from where it will be executed in Hadoop, right?
> I also checked that the class in question actually is in the libthrift jar,
> so what's going wrong?
>
> Regards,
> Chris
>
> [1]
> http://svn.apache.org/viewvc/cassandra/trunk/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java?revision=984904&view=markup
>