You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Shai Harel <sh...@mythings.com> on 2011/07/31 16:48:41 UTC

Pig & Cassandra integration

hey all, i'v been trying to query cassandra using my pig script,
so i used the contrib jar from cassandra. and i'm getting the following
error...
some thrift failure err.... :|

ERROR 2998: Unhandled internal error.
org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V

java.lang.NoSuchMethodError:
org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
    at org.apache.cassandra.thrift.SliceRange.<clinit>(SliceRange.java:149)
    at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
Source)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
    at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    at
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)


does anyone managed to get this up and running?
i'm considering to rewrite the CassandraStorage.jar using Hector,
Any thoughts about that?

Re: Pig & Cassandra integration

Posted by Jeremy Hanna <je...@gmail.com>.
afaik, amazon still uses Pig 0.6 on emr, though they've said they were in the process of upgrading in discussion threads.
http://aws.amazon.com/elasticmapreduce/faqs/#pig-7
https://forums.aws.amazon.com/thread.jspa?messageID=233903&#249998

Pig 0.6 doesn't have the concept of loadfunc/storefunc, which was added in 0.7.  That's the extension point that Cassandra uses.

I've heard that you can just deploy a newer version of pig yourself in your emr cluster, but I haven't messed with doing that.  We just went with our own cluster in ec2 so that we would control versions after we got some odd errors with emr that we couldn't track down or reproduce.

Sorry I can't be of more help there.

On Aug 2, 2011, at 7:40 AM, Shai Harel wrote:

> Jeremy, where you able to make it run on AMAZON elastic map reduce
> machines?
> 
> i'v tried to copy the jars (both pig's and cassandra) to the new machine
> set the PIG_HOME environment variable
> even added the hadoop config files to the class path
> and I'm getting this error
> 
> Error before Pig is launched
> ----------------------------
> ERROR 2999: Unexpected internal error. Failed to create DataStorage
> 
> java.lang.RuntimeException: Failed to create DataStorage
>        at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
>        at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
>        at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:213)
>        at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:133)
>        at org.apache.pig.impl.PigContext.connect(PigContext.java:183)
>        at org.apache.pig.PigServer.<init>(PigServer.java:225)
>        at org.apache.pig.PigServer.<init>(PigServer.java:214)
>        at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:55)
>        at org.apache.pig.Main.run(Main.java:462)
>        at org.apache.pig.Main.main(Main.java:107)
> Caused by: java.io.IOException: Call to
> ip-10-56-51-167.eu-west-1.compute.internal/10.56.51.167:9000 failed on local
> exception: java.io.EOFExc
> eption
>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1139)
>        at org.apache.hadoop.ipc.Client.call(Client.java:1107)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
>        at $Proxy0.getProtocolVersion(Unknown Source)
>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398)
>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
>        at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111)
>        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:213)
>        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:180)
>        at
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
>        at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514)
>        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
>        at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548)
>        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530)
>        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
>        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:111)
>        at
> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
>        ... 9 more
> Caused by: java.io.EOFException
>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>        at
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:812)
>        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:720)
> ================================================================================
> 
> Amazon claims to run hadoop v 0.20, what am i doing wrong?
> 
> 
> 
> On Mon, Aug 1, 2011 at 5:55 PM, Jeremy Hanna <je...@gmail.com>wrote:
> 
>> Ah - just saw this, glad you got it working - cheers.
>> 
>> On Aug 1, 2011, at 5:43 AM, Shai Harel wrote:
>> 
>>> hey all, i'v successfully fixed this problem,
>>> i was missing the cassandra jars,
>>> so you actually need to build cassandra (ant) and then you need to jar it
>>> (ant jar)
>>> and only then it'll work
>>> 
>>> BTW if you have hue installed, remove it first!
>>> 
>>> 
>>> 
>>> On Mon, Aug 1, 2011 at 12:41 PM, Shai Harel <sh...@mythings.com>
>> wrote:
>>> 
>>>> thanks for the help, i'v tried to be conservative and i'm using pig 0.8
>> &
>>>> cassandra 0.8
>>>> and still getting this error
>>>> 
>>>> Pig Stack Trace
>>>> ---------------
>>>> ERROR 2998: Unhandled internal error. Could not initialize class
>>>> org.apache.cassandra.thrift.SliceRange
>>>> 
>>>> java.lang.NoClassDefFoundError: Could not initialize class
>>>> org.apache.cassandra.thrift.SliceRange
>>>> 
>>>>   at
>> org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
>>>> Source)
>>>>   at
>>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
>>>>   at
>>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
>>>>   at
>>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
>>>>   at
>>>> 
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
>>>>   at
>>>> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
>>>>   at org.apache.pig.PigServer.storeEx(PigServer.java:874)
>>>>   at org.apache.pig.PigServer.store(PigServer.java:816)
>>>>   at org.apache.pig.PigServer.openIterator(PigServer.java:728)
>>>>   at
>>>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>>>>   at
>>>> 
>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>>>>   at
>>>> 
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>>>   at
>>>> 
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>>>>   at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>>>>   at org.apache.pig.Main.run(Main.java:465)
>>>>   at org.apache.pig.Main.main(Main.java:107)
>>>> 
>>>> does anyone else have this problem?
>>>> 
>>>> 
>>>> 
>>>> On Sun, Jul 31, 2011 at 2:04 PM, Jeremy Hanna <
>> jeremy.hanna1234@gmail.com>wrote:
>>>> 
>>>>> Try following this and see if it helps getting started:
>>>>> https://github.com/jeromatron/pygmalion/wiki/Getting-Started
>>>>> 
>>>>> I haven't tried it with 0.9 yet but I plan to this week.  We use the
>>>>> CassandraStorage jar in production.  If you can, validate your data
>> with
>>>>> Cassandra's schema validators.  CassandraStorage gets the schema from
>>>>> Cassandra and tries to unmarshal the data into Pig data types with the
>>>>> schema information.
>>>>> 
>>>>> See if that helps.
>>>>> 
>>>>> On Jul 31, 2011, at 9:48 AM, Shai Harel wrote:
>>>>> 
>>>>>> hey all, i'v been trying to query cassandra using my pig script,
>>>>>> so i used the contrib jar from cassandra. and i'm getting the
>> following
>>>>>> error...
>>>>>> some thrift failure err.... :|
>>>>>> 
>>>>>> ERROR 2998: Unhandled internal error.
>>>>>> org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
>>>>>> 
>>>>>> java.lang.NoSuchMethodError:
>>>>>> org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
>>>>>>  at
>>>>> org.apache.cassandra.thrift.SliceRange.<clinit>(SliceRange.java:149)
>>>>>>  at
>>>>> org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
>>>>>> Source)
>>>>>>  at
>>>>>> 
>>>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
>>>>>>  at
>>>>>> 
>>>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
>>>>>>  at
>>>>>> 
>>>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
>>>>>>  at
>>>>>> 
>>>>> 
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
>>>>>>  at
>>>>>> 
>> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
>>>>>>  at org.apache.pig.PigServer.storeEx(PigServer.java:874)
>>>>>>  at org.apache.pig.PigServer.store(PigServer.java:816)
>>>>>>  at org.apache.pig.PigServer.openIterator(PigServer.java:728)
>>>>>>  at
>>>>>> 
>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>>>>>>  at
>>>>>> 
>>>>> 
>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>>>>>>  at
>>>>>> 
>>>>> 
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>>>>>  at
>>>>>> 
>>>>> 
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>>>>>>  at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>>>>>>  at org.apache.pig.Main.run(Main.java:465)
>>>>>>  at org.apache.pig.Main.main(Main.java:107)
>>>>>> 
>>>>>> 
>>>>>> does anyone managed to get this up and running?
>>>>>> i'm considering to rewrite the CassandraStorage.jar using Hector,
>>>>>> Any thoughts about that?
>>>>> 
>>>>> 
>>>> 
>> 
>> 


Re: Pig & Cassandra integration

Posted by Shai Harel <sh...@mythings.com>.
Jeremy, where you able to make it run on AMAZON elastic map reduce
machines?

i'v tried to copy the jars (both pig's and cassandra) to the new machine
set the PIG_HOME environment variable
even added the hadoop config files to the class path
and I'm getting this error

Error before Pig is launched
----------------------------
ERROR 2999: Unexpected internal error. Failed to create DataStorage

java.lang.RuntimeException: Failed to create DataStorage
        at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:75)
        at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:58)
        at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:213)
        at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:133)
        at org.apache.pig.impl.PigContext.connect(PigContext.java:183)
        at org.apache.pig.PigServer.<init>(PigServer.java:225)
        at org.apache.pig.PigServer.<init>(PigServer.java:214)
        at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:55)
        at org.apache.pig.Main.run(Main.java:462)
        at org.apache.pig.Main.main(Main.java:107)
Caused by: java.io.IOException: Call to
ip-10-56-51-167.eu-west-1.compute.internal/10.56.51.167:9000 failed on local
exception: java.io.EOFExc
eption
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1139)
        at org.apache.hadoop.ipc.Client.call(Client.java:1107)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
        at $Proxy0.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
        at
org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:213)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:180)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
        at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
        at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:111)
        at
org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:72)
        ... 9 more
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:375)
        at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:812)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:720)
================================================================================

Amazon claims to run hadoop v 0.20, what am i doing wrong?



On Mon, Aug 1, 2011 at 5:55 PM, Jeremy Hanna <je...@gmail.com>wrote:

> Ah - just saw this, glad you got it working - cheers.
>
> On Aug 1, 2011, at 5:43 AM, Shai Harel wrote:
>
> > hey all, i'v successfully fixed this problem,
> > i was missing the cassandra jars,
> > so you actually need to build cassandra (ant) and then you need to jar it
> > (ant jar)
> > and only then it'll work
> >
> > BTW if you have hue installed, remove it first!
> >
> >
> >
> > On Mon, Aug 1, 2011 at 12:41 PM, Shai Harel <sh...@mythings.com>
> wrote:
> >
> >> thanks for the help, i'v tried to be conservative and i'm using pig 0.8
> &
> >> cassandra 0.8
> >> and still getting this error
> >>
> >> Pig Stack Trace
> >> ---------------
> >> ERROR 2998: Unhandled internal error. Could not initialize class
> >> org.apache.cassandra.thrift.SliceRange
> >>
> >> java.lang.NoClassDefFoundError: Could not initialize class
> >> org.apache.cassandra.thrift.SliceRange
> >>
> >>    at
> org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
> >> Source)
> >>    at
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
> >>    at
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
> >>    at
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
> >>    at
> >>
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
> >>    at
> >> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
> >>    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
> >>    at org.apache.pig.PigServer.store(PigServer.java:816)
> >>    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
> >>    at
> >> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> >>    at
> >>
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> >>    at
> >>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> >>    at
> >>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> >>    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> >>    at org.apache.pig.Main.run(Main.java:465)
> >>    at org.apache.pig.Main.main(Main.java:107)
> >>
> >> does anyone else have this problem?
> >>
> >>
> >>
> >> On Sun, Jul 31, 2011 at 2:04 PM, Jeremy Hanna <
> jeremy.hanna1234@gmail.com>wrote:
> >>
> >>> Try following this and see if it helps getting started:
> >>> https://github.com/jeromatron/pygmalion/wiki/Getting-Started
> >>>
> >>> I haven't tried it with 0.9 yet but I plan to this week.  We use the
> >>> CassandraStorage jar in production.  If you can, validate your data
> with
> >>> Cassandra's schema validators.  CassandraStorage gets the schema from
> >>> Cassandra and tries to unmarshal the data into Pig data types with the
> >>> schema information.
> >>>
> >>> See if that helps.
> >>>
> >>> On Jul 31, 2011, at 9:48 AM, Shai Harel wrote:
> >>>
> >>>> hey all, i'v been trying to query cassandra using my pig script,
> >>>> so i used the contrib jar from cassandra. and i'm getting the
> following
> >>>> error...
> >>>> some thrift failure err.... :|
> >>>>
> >>>> ERROR 2998: Unhandled internal error.
> >>>> org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
> >>>>
> >>>> java.lang.NoSuchMethodError:
> >>>> org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
> >>>>   at
> >>> org.apache.cassandra.thrift.SliceRange.<clinit>(SliceRange.java:149)
> >>>>   at
> >>> org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
> >>>> Source)
> >>>>   at
> >>>>
> >>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
> >>>>   at
> >>>>
> >>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
> >>>>   at
> >>>>
> >>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
> >>>>   at
> >>>>
> >>>
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
> >>>>   at
> >>>>
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
> >>>>   at org.apache.pig.PigServer.storeEx(PigServer.java:874)
> >>>>   at org.apache.pig.PigServer.store(PigServer.java:816)
> >>>>   at org.apache.pig.PigServer.openIterator(PigServer.java:728)
> >>>>   at
> >>>>
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> >>>>   at
> >>>>
> >>>
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> >>>>   at
> >>>>
> >>>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> >>>>   at
> >>>>
> >>>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> >>>>   at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> >>>>   at org.apache.pig.Main.run(Main.java:465)
> >>>>   at org.apache.pig.Main.main(Main.java:107)
> >>>>
> >>>>
> >>>> does anyone managed to get this up and running?
> >>>> i'm considering to rewrite the CassandraStorage.jar using Hector,
> >>>> Any thoughts about that?
> >>>
> >>>
> >>
>
>

Re: Pig & Cassandra integration

Posted by Jeremy Hanna <je...@gmail.com>.
Ah - just saw this, glad you got it working - cheers.

On Aug 1, 2011, at 5:43 AM, Shai Harel wrote:

> hey all, i'v successfully fixed this problem,
> i was missing the cassandra jars,
> so you actually need to build cassandra (ant) and then you need to jar it
> (ant jar)
> and only then it'll work
> 
> BTW if you have hue installed, remove it first!
> 
> 
> 
> On Mon, Aug 1, 2011 at 12:41 PM, Shai Harel <sh...@mythings.com> wrote:
> 
>> thanks for the help, i'v tried to be conservative and i'm using pig 0.8 &
>> cassandra 0.8
>> and still getting this error
>> 
>> Pig Stack Trace
>> ---------------
>> ERROR 2998: Unhandled internal error. Could not initialize class
>> org.apache.cassandra.thrift.SliceRange
>> 
>> java.lang.NoClassDefFoundError: Could not initialize class
>> org.apache.cassandra.thrift.SliceRange
>> 
>>    at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
>> Source)
>>    at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
>>    at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
>>    at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
>>    at
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
>>    at
>> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
>>    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
>>    at org.apache.pig.PigServer.store(PigServer.java:816)
>>    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
>>    at
>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>>    at
>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>>    at
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>    at
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>>    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>>    at org.apache.pig.Main.run(Main.java:465)
>>    at org.apache.pig.Main.main(Main.java:107)
>> 
>> does anyone else have this problem?
>> 
>> 
>> 
>> On Sun, Jul 31, 2011 at 2:04 PM, Jeremy Hanna <je...@gmail.com>wrote:
>> 
>>> Try following this and see if it helps getting started:
>>> https://github.com/jeromatron/pygmalion/wiki/Getting-Started
>>> 
>>> I haven't tried it with 0.9 yet but I plan to this week.  We use the
>>> CassandraStorage jar in production.  If you can, validate your data with
>>> Cassandra's schema validators.  CassandraStorage gets the schema from
>>> Cassandra and tries to unmarshal the data into Pig data types with the
>>> schema information.
>>> 
>>> See if that helps.
>>> 
>>> On Jul 31, 2011, at 9:48 AM, Shai Harel wrote:
>>> 
>>>> hey all, i'v been trying to query cassandra using my pig script,
>>>> so i used the contrib jar from cassandra. and i'm getting the following
>>>> error...
>>>> some thrift failure err.... :|
>>>> 
>>>> ERROR 2998: Unhandled internal error.
>>>> org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
>>>> 
>>>> java.lang.NoSuchMethodError:
>>>> org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
>>>>   at
>>> org.apache.cassandra.thrift.SliceRange.<clinit>(SliceRange.java:149)
>>>>   at
>>> org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
>>>> Source)
>>>>   at
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
>>>>   at
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
>>>>   at
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
>>>>   at
>>>> 
>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
>>>>   at
>>>> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
>>>>   at org.apache.pig.PigServer.storeEx(PigServer.java:874)
>>>>   at org.apache.pig.PigServer.store(PigServer.java:816)
>>>>   at org.apache.pig.PigServer.openIterator(PigServer.java:728)
>>>>   at
>>>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>>>>   at
>>>> 
>>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>>>>   at
>>>> 
>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>>>   at
>>>> 
>>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>>>>   at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>>>>   at org.apache.pig.Main.run(Main.java:465)
>>>>   at org.apache.pig.Main.main(Main.java:107)
>>>> 
>>>> 
>>>> does anyone managed to get this up and running?
>>>> i'm considering to rewrite the CassandraStorage.jar using Hector,
>>>> Any thoughts about that?
>>> 
>>> 
>> 


Re: Pig & Cassandra integration

Posted by Shai Harel <sh...@mythings.com>.
hey all, i'v successfully fixed this problem,
i was missing the cassandra jars,
so you actually need to build cassandra (ant) and then you need to jar it
(ant jar)
and only then it'll work

BTW if you have hue installed, remove it first!



On Mon, Aug 1, 2011 at 12:41 PM, Shai Harel <sh...@mythings.com> wrote:

> thanks for the help, i'v tried to be conservative and i'm using pig 0.8 &
> cassandra 0.8
> and still getting this error
>
> Pig Stack Trace
> ---------------
> ERROR 2998: Unhandled internal error. Could not initialize class
> org.apache.cassandra.thrift.SliceRange
>
> java.lang.NoClassDefFoundError: Could not initialize class
> org.apache.cassandra.thrift.SliceRange
>
>     at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
> Source)
>     at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
>     at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
>     at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
>     at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
>     at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
>     at org.apache.pig.PigServer.storeEx(PigServer.java:874)
>     at org.apache.pig.PigServer.store(PigServer.java:816)
>     at org.apache.pig.PigServer.openIterator(PigServer.java:728)
>     at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>     at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>     at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>     at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>     at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>     at org.apache.pig.Main.run(Main.java:465)
>     at org.apache.pig.Main.main(Main.java:107)
>
> does anyone else have this problem?
>
>
>
> On Sun, Jul 31, 2011 at 2:04 PM, Jeremy Hanna <je...@gmail.com>wrote:
>
>> Try following this and see if it helps getting started:
>> https://github.com/jeromatron/pygmalion/wiki/Getting-Started
>>
>> I haven't tried it with 0.9 yet but I plan to this week.  We use the
>> CassandraStorage jar in production.  If you can, validate your data with
>> Cassandra's schema validators.  CassandraStorage gets the schema from
>> Cassandra and tries to unmarshal the data into Pig data types with the
>> schema information.
>>
>> See if that helps.
>>
>> On Jul 31, 2011, at 9:48 AM, Shai Harel wrote:
>>
>> > hey all, i'v been trying to query cassandra using my pig script,
>> > so i used the contrib jar from cassandra. and i'm getting the following
>> > error...
>> > some thrift failure err.... :|
>> >
>> > ERROR 2998: Unhandled internal error.
>> > org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
>> >
>> > java.lang.NoSuchMethodError:
>> > org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
>> >    at
>> org.apache.cassandra.thrift.SliceRange.<clinit>(SliceRange.java:149)
>> >    at
>> org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
>> > Source)
>> >    at
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
>> >    at
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
>> >    at
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
>> >    at
>> >
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
>> >    at
>> > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
>> >    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
>> >    at org.apache.pig.PigServer.store(PigServer.java:816)
>> >    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
>> >    at
>> > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>> >    at
>> >
>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>> >    at
>> >
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>> >    at
>> >
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>> >    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>> >    at org.apache.pig.Main.run(Main.java:465)
>> >    at org.apache.pig.Main.main(Main.java:107)
>> >
>> >
>> > does anyone managed to get this up and running?
>> > i'm considering to rewrite the CassandraStorage.jar using Hector,
>> > Any thoughts about that?
>>
>>
>

Re: Pig & Cassandra integration

Posted by Jeremy Hanna <je...@gmail.com>.
It looks like you don't have the cassandra libraries in your classpath.  Are you in the contrib/pig directory of the cassandra source and are you running bin/pig_cassandra?  That is a script that puts everything you need from cassandra into the classpath.  That would be the first thing to try, if you aren't using that script already.  At first try it with the -x local flag too to make sure that it's not an issue with trying to distribute it out to your hadoop cluster.

The other reason you might be getting that is that if you're running the script in mapreduce mode, it's going to try to distribute it out to your hadoop custer and your task trackers are going to need a couple of jars in their classpath as well.  Take a look at http://wiki.apache.org/cassandra/HadoopSupport#ClusterConfig if you want to run it in mapreduce mode.

On Aug 1, 2011, at 2:41 AM, Shai Harel wrote:

> thanks for the help, i'v tried to be conservative and i'm using pig 0.8 &
> cassandra 0.8
> and still getting this error
> 
> Pig Stack Trace
> ---------------
> ERROR 2998: Unhandled internal error. Could not initialize class
> org.apache.cassandra.thrift.SliceRange
> 
> java.lang.NoClassDefFoundError: Could not initialize class
> org.apache.cassandra.thrift.SliceRange
>    at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
> Source)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
>    at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
>    at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
>    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
>    at org.apache.pig.PigServer.store(PigServer.java:816)
>    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
>    at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>    at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>    at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>    at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>    at org.apache.pig.Main.run(Main.java:465)
>    at org.apache.pig.Main.main(Main.java:107)
> 
> does anyone else have this problem?
> 
> 
> On Sun, Jul 31, 2011 at 2:04 PM, Jeremy Hanna <je...@gmail.com>wrote:
> 
>> Try following this and see if it helps getting started:
>> https://github.com/jeromatron/pygmalion/wiki/Getting-Started
>> 
>> I haven't tried it with 0.9 yet but I plan to this week.  We use the
>> CassandraStorage jar in production.  If you can, validate your data with
>> Cassandra's schema validators.  CassandraStorage gets the schema from
>> Cassandra and tries to unmarshal the data into Pig data types with the
>> schema information.
>> 
>> See if that helps.
>> 
>> On Jul 31, 2011, at 9:48 AM, Shai Harel wrote:
>> 
>>> hey all, i'v been trying to query cassandra using my pig script,
>>> so i used the contrib jar from cassandra. and i'm getting the following
>>> error...
>>> some thrift failure err.... :|
>>> 
>>> ERROR 2998: Unhandled internal error.
>>> org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
>>> 
>>> java.lang.NoSuchMethodError:
>>> org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
>>>   at
>> org.apache.cassandra.thrift.SliceRange.<clinit>(SliceRange.java:149)
>>>   at
>> org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
>>> Source)
>>>   at
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
>>>   at
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
>>>   at
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
>>>   at
>>> 
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
>>>   at
>>> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
>>>   at org.apache.pig.PigServer.storeEx(PigServer.java:874)
>>>   at org.apache.pig.PigServer.store(PigServer.java:816)
>>>   at org.apache.pig.PigServer.openIterator(PigServer.java:728)
>>>   at
>>> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>>>   at
>>> 
>> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>>>   at
>>> 
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>>   at
>>> 
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>>>   at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>>>   at org.apache.pig.Main.run(Main.java:465)
>>>   at org.apache.pig.Main.main(Main.java:107)
>>> 
>>> 
>>> does anyone managed to get this up and running?
>>> i'm considering to rewrite the CassandraStorage.jar using Hector,
>>> Any thoughts about that?
>> 
>> 


Re: Pig & Cassandra integration

Posted by Shai Harel <sh...@mythings.com>.
thanks for the help, i'v tried to be conservative and i'm using pig 0.8 &
cassandra 0.8
and still getting this error

Pig Stack Trace
---------------
ERROR 2998: Unhandled internal error. Could not initialize class
org.apache.cassandra.thrift.SliceRange

java.lang.NoClassDefFoundError: Could not initialize class
org.apache.cassandra.thrift.SliceRange
    at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
Source)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
    at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
    at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
    at org.apache.pig.PigServer.store(PigServer.java:816)
    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
    at
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
    at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
    at org.apache.pig.Main.run(Main.java:465)
    at org.apache.pig.Main.main(Main.java:107)

does anyone else have this problem?


On Sun, Jul 31, 2011 at 2:04 PM, Jeremy Hanna <je...@gmail.com>wrote:

> Try following this and see if it helps getting started:
> https://github.com/jeromatron/pygmalion/wiki/Getting-Started
>
> I haven't tried it with 0.9 yet but I plan to this week.  We use the
> CassandraStorage jar in production.  If you can, validate your data with
> Cassandra's schema validators.  CassandraStorage gets the schema from
> Cassandra and tries to unmarshal the data into Pig data types with the
> schema information.
>
> See if that helps.
>
> On Jul 31, 2011, at 9:48 AM, Shai Harel wrote:
>
> > hey all, i'v been trying to query cassandra using my pig script,
> > so i used the contrib jar from cassandra. and i'm getting the following
> > error...
> > some thrift failure err.... :|
> >
> > ERROR 2998: Unhandled internal error.
> > org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
> >
> > java.lang.NoSuchMethodError:
> > org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
> >    at
> org.apache.cassandra.thrift.SliceRange.<clinit>(SliceRange.java:149)
> >    at
> org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
> > Source)
> >    at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
> >    at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
> >    at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
> >    at
> >
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
> >    at
> > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
> >    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
> >    at org.apache.pig.PigServer.store(PigServer.java:816)
> >    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
> >    at
> > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
> >    at
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
> >    at
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> >    at
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> >    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
> >    at org.apache.pig.Main.run(Main.java:465)
> >    at org.apache.pig.Main.main(Main.java:107)
> >
> >
> > does anyone managed to get this up and running?
> > i'm considering to rewrite the CassandraStorage.jar using Hector,
> > Any thoughts about that?
>
>

Re: Pig & Cassandra integration

Posted by Jeremy Hanna <je...@gmail.com>.
It's been mentioned in this thread, but if you're using tabular (static column names) data, you might consider using Pygmalion.  It will extract the values from Cassandra to simplify grouping by values and other operations.
https://github.com/jeromatron/pygmalion
What you'll want to look at is the FromCassandraBag udf, which has an example here:
https://github.com/jeromatron/pygmalion/blob/master/scripts/from_to_cassandra_bag_example.pig

Hope that helps - we use pygmalion 1.0.0 for all our scripts in production.

On Sep 28, 2011, at 11:18 AM, Tamil Selvan wrote:

> Hi,
> I'm trying to integrate pig with cassandra. 
> My columnfamily in cassandra is
> name -> xxx
> Age -> yyy
> class -> zzz
> This is how I load data
> rows =LOAD 'cassandra://TestKeySpace/TestPig' USING CassandraStorage()
> as (key,columns:bag{column:tuple(name,value)});
> 
> Now I wish to perform group by based on value of class. I tried
> 
> col_values = FOREACH rows GENERATE (columns.value) as list:bag{};
> 
> This gave me the result in following Schema :bag(:tuple(chararray))
> Ex: on dump col_values i got {(xxx),(yyy),(zzz)} 
> 
> Now if I try to access
> 
> list = FOREACH col_values GENERATE (list.$0, list.$1);
> 
> I'm getting undefined index access error. Like
> list.$1 doesn't exist :bag[:tuple(chararray)] has only one column [But
> there are 3]
> 
> How can i access tuple wise data in such cases?
> I couldn't perform group by based on 1 column because of this.
> 
> I tried TOTUPLE but the problem is, it converts the entire bag a tuple
> and applies group by on that.
> 
> Help me out
> 
> Regards,
> Tamil
> 


Re: Pig & Cassandra integration

Posted by Tamil Selvan <ta...@gmail.com>.
Hi,
 I'm trying to integrate pig with cassandra. 
 My columnfamily in cassandra is
 name -> xxx
 Age -> yyy
 class -> zzz
This is how I load data
 rows =LOAD 'cassandra://TestKeySpace/TestPig' USING CassandraStorage()
as (key,columns:bag{column:tuple(name,value)});

Now I wish to perform group by based on value of class. I tried

 col_values = FOREACH rows GENERATE (columns.value) as list:bag{};

This gave me the result in following Schema :bag(:tuple(chararray))
Ex: on dump col_values i got {(xxx),(yyy),(zzz)} 

Now if I try to access

 list = FOREACH col_values GENERATE (list.$0, list.$1);

I'm getting undefined index access error. Like
list.$1 doesn't exist :bag[:tuple(chararray)] has only one column [But
there are 3]

How can i access tuple wise data in such cases?
I couldn't perform group by based on 1 column because of this.

I tried TOTUPLE but the problem is, it converts the entire bag a tuple
and applies group by on that.

Help me out

Regards,
Tamil


Re: Pig & Cassandra integration

Posted by Jeremy Hanna <je...@gmail.com>.
Try following this and see if it helps getting started:
https://github.com/jeromatron/pygmalion/wiki/Getting-Started

I haven't tried it with 0.9 yet but I plan to this week.  We use the CassandraStorage jar in production.  If you can, validate your data with Cassandra's schema validators.  CassandraStorage gets the schema from Cassandra and tries to unmarshal the data into Pig data types with the schema information.

See if that helps.

On Jul 31, 2011, at 9:48 AM, Shai Harel wrote:

> hey all, i'v been trying to query cassandra using my pig script,
> so i used the contrib jar from cassandra. and i'm getting the following
> error...
> some thrift failure err.... :|
> 
> ERROR 2998: Unhandled internal error.
> org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
> 
> java.lang.NoSuchMethodError:
> org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
>    at org.apache.cassandra.thrift.SliceRange.<clinit>(SliceRange.java:149)
>    at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
> Source)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
>    at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
>    at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
>    at org.apache.pig.PigServer.storeEx(PigServer.java:874)
>    at org.apache.pig.PigServer.store(PigServer.java:816)
>    at org.apache.pig.PigServer.openIterator(PigServer.java:728)
>    at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>    at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>    at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>    at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>    at org.apache.pig.Main.run(Main.java:465)
>    at org.apache.pig.Main.main(Main.java:107)
> 
> 
> does anyone managed to get this up and running?
> i'm considering to rewrite the CassandraStorage.jar using Hector,
> Any thoughts about that?


Re: Pig & Cassandra integration

Posted by Shai Harel <sh...@mythings.com>.
i'v migrated to pig 0.9 and now i get

Pig Stack Trace
---------------
ERROR 2017: Internal error creating job configuration.

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
open iterator for alias rows
    at org.apache.pig.PigServer.openIterator(PigServer.java:900)
    at
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:655)
    at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
    at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:188)
    at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67)
    at org.apache.pig.Main.run(Main.java:487)
    at org.apache.pig.Main.main(Main.java:108)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias
rows
    at org.apache.pig.PigServer.storeEx(PigServer.java:999)
    at org.apache.pig.PigServer.store(PigServer.java:962)
    at org.apache.pig.PigServer.openIterator(PigServer.java:875)
    ... 7 more
Caused by:
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
ERROR 2017: Internal error creating job configuration.
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:712)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:258)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
    at org.apache.pig.PigServer.launchPlan(PigServer.java:1313)
    at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1298)
    at org.apache.pig.PigServer.storeEx(PigServer.java:995)
    ... 9 more
Caused by: java.lang.NullPointerException
    at org.apache.thrift.TSerializer.serialize(TSerializer.java:79)
    at
org.apache.cassandra.hadoop.pig.CassandraStorage.cfdefToString(Unknown
Source)
    at org.apache.cassandra.hadoop.pig.CassandraStorage.initSchema(Unknown
Source)
    at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
Source)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:382)
    ... 14 more


anyone has a clue?


On Sun, Jul 31, 2011 at 7:48 AM, Shai Harel <sh...@mythings.com> wrote:

> hey all, i'v been trying to query cassandra using my pig script,
> so i used the contrib jar from cassandra. and i'm getting the following
> error...
> some thrift failure err.... :|
>
> ERROR 2998: Unhandled internal error.
> org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
>
> java.lang.NoSuchMethodError:
> org.apache.thrift.meta_data.FieldValueMetaData.<init>(BZ)V
>     at org.apache.cassandra.thrift.SliceRange.<clinit>(SliceRange.java:149)
>     at org.apache.cassandra.hadoop.pig.CassandraStorage.setLocation(Unknown
> Source)
>     at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:369)
>     at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:256)
>     at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:147)
>     at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:378)
>     at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1198)
>     at org.apache.pig.PigServer.storeEx(PigServer.java:874)
>     at org.apache.pig.PigServer.store(PigServer.java:816)
>     at org.apache.pig.PigServer.openIterator(PigServer.java:728)
>     at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>     at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>     at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>     at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>     at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>     at org.apache.pig.Main.run(Main.java:465)
>     at org.apache.pig.Main.main(Main.java:107)
>
>
> does anyone managed to get this up and running?
> i'm considering to rewrite the CassandraStorage.jar using Hector,
> Any thoughts about that?
>