You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Sasha Dolgy <sd...@gmail.com> on 2011/06/20 20:09:16 UTC

pig integration & NoClassDefFoundError TypeParser

Been trying for the past little bit to try and get the PIG integration
working with Cassandra 0.8.0

1.  Downloaded the src for 0.8.0 and ran ant build
2.  went into contrib/pig and ran ant ... gives me:
/usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/build/cassandra_storage.jar
and is copied into the lib/ directory
3.  Downloaded pig-0.8.1, modified the ivy/libraries.properties so
that it uses Jackson 1.8.2 .. and ran ant.  it compiles and gives me
two jars:  pig-0.8.1-SNAPSHOT-core.jar and pig-0.8.1-SNAPSHOT.jar
----- I did try to run it with Jackson 1.4 as the
contrib/pig/README.txt suggested, but that failed...  The referenced
JIRA ticket (PIG-1863) suggests 1.6.0 (still produces the same
results)

Environment variables are set:
java version "1.6.0_24"

PIG_INITIAL_ADDRESS=localhost
PIG_HOME=/usr/local/src/pig-0.8.1
PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
PIG_RPC_PORT=9160
CASSANDRA_HOME=/usr/local/src/apache-cassandra-0.8.0-src

I then start up cassandra ... no issues.  I connect and create a new
keyspace called foo with a column family called bar and a CF called
foo...Inside the CF bar, I create a few rows, with random columns ....
4 Rows.

>From contrib/pig I run:  bin/pig_cassandra -x local ... immediately
get the error:

[: 45: /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar: unexpected operator

-- this is a reference to this line:  if [ ! -e $PIG_JAR ]; then

*** Problem here is that $PIG_JAR is a reference to two files ...
pig-0.8.1-core.jar & pig.jar ...

Changing line 44 to PIG_JAR=$PIG_HOME/pig*core*.jar fixes this ... (or
even referencing $PIG_HOME/build/pig*core*.jar or just pig.jar

Try again to run:  bin/pig_cassandra -x local and everything loads up nicely:

2011-06-21 02:07:23,671 [main] INFO  org.apache.pig.Main - Logging
error messages to:
/usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/pig_1308593243668.log
2011-06-21 02:07:23,778 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: file:///
grunt> register /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar; register
/usr/local/src/pig-0.8.1/pig.jar; register
/usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-fixes.jar;
register /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-sources-fixes.jar;
register /usr/local/src/apache-cassandra-0.8.0-src/lib/libthrift-0.6.jar;
grunt>
grunt> rows = LOAD 'cassandra://foo/bar' USING CassandraStorage();
grunt> STORE rows into 'cassandra://foo/foo' USING CassandraStorage();
2011-06-21 02:04:53,271 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: UNKNOWN
2011-06-21 02:04:53,271 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
pig.usenewlogicalplan is set to true. New logical plan will be used.
2011-06-21 02:04:53,324 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics
with processName=JobTracker, sessionId=
2011-06-21 02:04:53,447 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
(Name: rows: Store(cassandra://foo/foo:CassandraStorage) - scope-1
Operator Key: scope-1)
2011-06-21 02:04:53,458 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
- File concatenation threshold: 100 optimistic? false
2011-06-21 02:04:53,477 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2011-06-21 02:04:53,477 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2011-06-21 02:04:53,480 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:04:53,494 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:04:53,494 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
added to the job
2011-06-21 02:04:53,556 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to
default 0.3
2011-06-21 02:04:59,700 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2011-06-21 02:04:59,718 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:04:59,719 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2011-06-21 02:04:59,948 [Thread-5] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:04:59,960 [Thread-5] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:04:59,980 [Thread-5] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths (combined) to process : 1
2011-06-21 02:05:00,220 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2011-06-21 02:05:00,322 [Thread-14] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:05:00,340 [Thread-14] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths (combined) to process : 1
2011-06-21 02:05:00,372 [Thread-14] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:05:00,374 [Thread-14] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:05:00,378 [Thread-14] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:05:00,381 [Thread-14] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:05:00,491 [Thread-14] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
java.lang.NoClassDefFoundError: org/apache/cassandra/db/marshal/TypeParser
        at org.apache.cassandra.hadoop.pig.CassandraStorage.getDefaultMarshallers(Unknown
Source)
        at org.apache.cassandra.hadoop.pig.CassandraStorage.columnToTuple(Unknown
Source)
        at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown
Source)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
        at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: java.lang.ClassNotFoundException:
org.apache.cassandra.db.marshal.TypeParser
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
        ... 10 more
2011-06-21 02:05:00,818 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_local_0001
2011-06-21 02:05:05,408 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_local_0001 has failed! Stop running all dependent jobs
2011-06-21 02:05:05,411 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2011-06-21 02:05:05,412 [main] ERROR
org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
failed!
2011-06-21 02:05:05,412 [main] INFO
org.apache.pig.tools.pigstats.PigStats - Detected Local mode. Stats
reported below may be incomplete
2011-06-21 02:05:05,413 [main] INFO
org.apache.pig.tools.pigstats.PigStats - Script Statistics:

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
0.20.2  0.8.1   root    2011-06-21 02:04:53     2011-06-21 02:05:05     UNKNOWN

Failed!

Failed Jobs:
JobId   Alias   Feature Message Outputs
job_local_0001  rows    MAP_ONLY        Message: Job failed!
cassandra://foo/foo,

Input(s):
Failed to read data from "cassandra://foo/bar"

Output(s):
Failed to produce result in "cassandra://foo/foo"

Job DAG:
job_local_0001


2011-06-21 02:05:05,413 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!
2011-06-21 02:05:05,416 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
grunt>


Any help or insight is appreciated ....

Re: pig integration & NoClassDefFoundError TypeParser

Posted by Sasha Dolgy <sd...@gmail.com>.
bang on ... no idea why ... a new day a fresh login ... environment
variables gone.  working now with cassandra 0.8.0 and pig 0.8.1

went through all my steps and all is working ... except line 45 in the
bin/pig_cassandra is not proper when there are multiple pig*.jar
files.

On Mon, Jun 20, 2011 at 10:03 PM, Jeremy Hanna
<je...@gmail.com> wrote:
> I think you might be having environment/classpath issues with an RC of cassandra 0.8 or something.

Re: pig integration & NoClassDefFoundError TypeParser

Posted by Jeremy Hanna <je...@gmail.com>.
I think you might be having environment/classpath issues with an RC of cassandra 0.8 or something.

I just downloaded 0.8 and did the following:
- Ran the examples/hadoop_word_count/bin/word_count_setup to create some data
- Ran contrib/pig/bin/pig_cassandra -x local example_script.pig (with the keyspace/columnfamily as wordcount/input_words)
- that worked

then I added the pygmalion data with a slight change for 0.8 (key_validation_class) (listed below) and ran the from_to_cassandra_bag_example.pig with bin/pig_cassandra -x local from_to_cassandra_bag_example.pig.  That inputs from one column family and writes out to another column family from filtered data.  The script is here (you just need to build pygmalion and point the register statement to your built pygmalion jar) - https://github.com/jeromatron/pygmalion/blob/master/scripts/from_to_cassandra_bag_example.pig

That worked as well and output to cassandra.

So I suspect that for some reason your environment is messed up somehow - the CassandraStorage class (for pig integration) doesn't point to TypeParser in 0.8.0.

create keyspace pygmalion;
use pygmalion;
create column family account with comparator = UTF8Type and default_validation_class = UTF8Type and key_validation_class = UTF8Type and
    column_metadata=
    [
        {column_name: num_heads, validation_class: LongType},
    ];
create column family betelgeuse with comparator = UTF8Type and default_validation_class = UTF8Type;

set account['hipcat']['first_name'] = 'Zaphod';
set account['hipcat']['last_name'] = 'Beeblebrox';
set account['hipcat']['birth_place'] = 'Betelgeuse Five';
set account['hipcat']['num_heads'] = '2';

set account['hoopyfrood']['first_name'] = 'Ford';
set account['hoopyfrood']['last_name'] = 'Prefect';
set account['hoopyfrood']['birth_place'] = 'Betelgeuse Five';
set account['hoopyfrood']['num_heads'] = '1';

set account['earthman']['first_name'] = 'Arthur';
set account['earthman']['last_name'] = 'Dent';
set account['earthman']['birth_place'] = 'Earth';
set account['earthman']['num_heads'] = '1';


On Jun 20, 2011, at 2:23 PM, Sasha Dolgy wrote:

> cassandra-0.8.0/src/java/org/apache/cassandra/db/marshal/TypeParser.java
> : doesn't exist
> cassandra-0.8.1/src/java/org/apache/cassandra/db/marshal/TypeParser.java
> : exists...
> 
> PIG integration with 0.8.0 is no longer working / doesn't work with
> 0.8.0 release, but will with 0.8.1 .. fair assumption?
> 
> On Mon, Jun 20, 2011 at 9:18 PM, Sasha Dolgy <sd...@gmail.com> wrote:
>> Yes ... I ran an "ant" in the root directory on a fresh download of 0.8.0 src:
>> 
>> /usr/local/src/apache-cassandra-0.8.0-src# ls
>> /usr/local/src/apache-cassandra-0.8.0-src/build/classes/main/org/apache/cassandra/db/marshal/
>> AbstractCommutativeType.class       AbstractType.class
>>  LexicalUUIDType.class               UTF8Type.class
>> AbstractType$1.class                AbstractUUIDType.class
>>  LocalByPartionerType.class          UTF8Type$UTF8Validator.class
>> AbstractType$2.class                AsciiType.class
>>  LongType.class
>> UTF8Type$UTF8Validator$State.class
>> AbstractType$3.class                BytesType.class
>>  MarshalException.class              UUIDType.class
>> AbstractType$4.class                CounterColumnType.class
>>  TimeUUIDType.class
>> AbstractType$5.class                IntegerType.class
>>  UTF8Type$1.class
>> 
>> /usr/local/src/apache-cassandra-0.8.0-src# find . | grep TypeParser
>> /usr/local/src/apache-cassandra-0.8.0-src# echo $?
>> 1
>> /usr/local/src/apache-cassandra-0.8.0-src#
>> 
>> /usr/local/src/apache-cassandra-0.8.0-src# grep -Ri TypeError .
>> /usr/local/src/apache-cassandra-0.8.0-src# echo $?
>> 1
>> /usr/local/src/apache-cassandra-0.8.0-src#
>> 
>> TypeParser does not exist...?
>> 
>> 
>> On Mon, Jun 20, 2011 at 9:11 PM, Jeremy Hanna
>> <je...@gmail.com> wrote:
>>> hmmm, did you build the cassandra src in the root of your cassandra directory with ant?  sounds like it can't find that cassandra class.  That's required.
>> 
> 
> 
> 
> -- 
> Sasha Dolgy
> sasha.dolgy@gmail.com


Re: pig integration & NoClassDefFoundError TypeParser

Posted by Jeremy Hanna <je...@gmail.com>.
I seem to recall a last minute issue with 0.8.0 before release that the TypeParser wasn't in there (for the pig support).  However, I'm pretty sure that got fixed before release.  I'll test it out in a few minutes - stay tuned :).

Jeremy

On Jun 20, 2011, at 2:23 PM, Sasha Dolgy wrote:

> cassandra-0.8.0/src/java/org/apache/cassandra/db/marshal/TypeParser.java
> : doesn't exist
> cassandra-0.8.1/src/java/org/apache/cassandra/db/marshal/TypeParser.java
> : exists...
> 
> PIG integration with 0.8.0 is no longer working / doesn't work with
> 0.8.0 release, but will with 0.8.1 .. fair assumption?
> 
> On Mon, Jun 20, 2011 at 9:18 PM, Sasha Dolgy <sd...@gmail.com> wrote:
>> Yes ... I ran an "ant" in the root directory on a fresh download of 0.8.0 src:
>> 
>> /usr/local/src/apache-cassandra-0.8.0-src# ls
>> /usr/local/src/apache-cassandra-0.8.0-src/build/classes/main/org/apache/cassandra/db/marshal/
>> AbstractCommutativeType.class       AbstractType.class
>>  LexicalUUIDType.class               UTF8Type.class
>> AbstractType$1.class                AbstractUUIDType.class
>>  LocalByPartionerType.class          UTF8Type$UTF8Validator.class
>> AbstractType$2.class                AsciiType.class
>>  LongType.class
>> UTF8Type$UTF8Validator$State.class
>> AbstractType$3.class                BytesType.class
>>  MarshalException.class              UUIDType.class
>> AbstractType$4.class                CounterColumnType.class
>>  TimeUUIDType.class
>> AbstractType$5.class                IntegerType.class
>>  UTF8Type$1.class
>> 
>> /usr/local/src/apache-cassandra-0.8.0-src# find . | grep TypeParser
>> /usr/local/src/apache-cassandra-0.8.0-src# echo $?
>> 1
>> /usr/local/src/apache-cassandra-0.8.0-src#
>> 
>> /usr/local/src/apache-cassandra-0.8.0-src# grep -Ri TypeError .
>> /usr/local/src/apache-cassandra-0.8.0-src# echo $?
>> 1
>> /usr/local/src/apache-cassandra-0.8.0-src#
>> 
>> TypeParser does not exist...?
>> 
>> 
>> On Mon, Jun 20, 2011 at 9:11 PM, Jeremy Hanna
>> <je...@gmail.com> wrote:
>>> hmmm, did you build the cassandra src in the root of your cassandra directory with ant?  sounds like it can't find that cassandra class.  That's required.
>> 
> 
> 
> 
> -- 
> Sasha Dolgy
> sasha.dolgy@gmail.com


Re: pig integration & NoClassDefFoundError TypeParser

Posted by Sasha Dolgy <sd...@gmail.com>.
cassandra-0.8.0/src/java/org/apache/cassandra/db/marshal/TypeParser.java
: doesn't exist
cassandra-0.8.1/src/java/org/apache/cassandra/db/marshal/TypeParser.java
: exists...

PIG integration with 0.8.0 is no longer working / doesn't work with
0.8.0 release, but will with 0.8.1 .. fair assumption?

On Mon, Jun 20, 2011 at 9:18 PM, Sasha Dolgy <sd...@gmail.com> wrote:
> Yes ... I ran an "ant" in the root directory on a fresh download of 0.8.0 src:
>
> /usr/local/src/apache-cassandra-0.8.0-src# ls
> /usr/local/src/apache-cassandra-0.8.0-src/build/classes/main/org/apache/cassandra/db/marshal/
> AbstractCommutativeType.class       AbstractType.class
>  LexicalUUIDType.class               UTF8Type.class
> AbstractType$1.class                AbstractUUIDType.class
>  LocalByPartionerType.class          UTF8Type$UTF8Validator.class
> AbstractType$2.class                AsciiType.class
>  LongType.class
> UTF8Type$UTF8Validator$State.class
> AbstractType$3.class                BytesType.class
>  MarshalException.class              UUIDType.class
> AbstractType$4.class                CounterColumnType.class
>  TimeUUIDType.class
> AbstractType$5.class                IntegerType.class
>  UTF8Type$1.class
>
> /usr/local/src/apache-cassandra-0.8.0-src# find . | grep TypeParser
> /usr/local/src/apache-cassandra-0.8.0-src# echo $?
> 1
> /usr/local/src/apache-cassandra-0.8.0-src#
>
> /usr/local/src/apache-cassandra-0.8.0-src# grep -Ri TypeError .
> /usr/local/src/apache-cassandra-0.8.0-src# echo $?
> 1
> /usr/local/src/apache-cassandra-0.8.0-src#
>
> TypeParser does not exist...?
>
>
> On Mon, Jun 20, 2011 at 9:11 PM, Jeremy Hanna
> <je...@gmail.com> wrote:
>> hmmm, did you build the cassandra src in the root of your cassandra directory with ant?  sounds like it can't find that cassandra class.  That's required.
>



-- 
Sasha Dolgy
sasha.dolgy@gmail.com

Re: pig integration & NoClassDefFoundError TypeParser

Posted by Sasha Dolgy <sd...@gmail.com>.
Yes ... I ran an "ant" in the root directory on a fresh download of 0.8.0 src:

/usr/local/src/apache-cassandra-0.8.0-src# ls
/usr/local/src/apache-cassandra-0.8.0-src/build/classes/main/org/apache/cassandra/db/marshal/
AbstractCommutativeType.class       AbstractType.class
 LexicalUUIDType.class               UTF8Type.class
AbstractType$1.class                AbstractUUIDType.class
 LocalByPartionerType.class          UTF8Type$UTF8Validator.class
AbstractType$2.class                AsciiType.class
 LongType.class
UTF8Type$UTF8Validator$State.class
AbstractType$3.class                BytesType.class
 MarshalException.class              UUIDType.class
AbstractType$4.class                CounterColumnType.class
 TimeUUIDType.class
AbstractType$5.class                IntegerType.class
 UTF8Type$1.class

/usr/local/src/apache-cassandra-0.8.0-src# find . | grep TypeParser
/usr/local/src/apache-cassandra-0.8.0-src# echo $?
1
/usr/local/src/apache-cassandra-0.8.0-src#

/usr/local/src/apache-cassandra-0.8.0-src# grep -Ri TypeError .
/usr/local/src/apache-cassandra-0.8.0-src# echo $?
1
/usr/local/src/apache-cassandra-0.8.0-src#

TypeParser does not exist...?


On Mon, Jun 20, 2011 at 9:11 PM, Jeremy Hanna
<je...@gmail.com> wrote:
> hmmm, did you build the cassandra src in the root of your cassandra directory with ant?  sounds like it can't find that cassandra class.  That's required.

Re: pig integration & NoClassDefFoundError TypeParser

Posted by Jeremy Hanna <je...@gmail.com>.
hmmm, did you build the cassandra src in the root of your cassandra directory with ant?  sounds like it can't find that cassandra class.  That's required.

On Jun 20, 2011, at 2:05 PM, Sasha Dolgy wrote:

> Hi ... I still have the same problem with pig-0.8.0-cdh3u0...
> 
> Maybe I'm doing something wrong.  Where does
> org/apache/cassandra/db/marshal/TypeParser exist, or should exist?
> 
> It's not in the $CASSANDRA_HOME/libs or
> /usr/local/src/pig-0.8.0-cdh3u0/lib or
> /usr/local/src/apache-cassandra-0.8.0-src/build/lib/jars
> 
> 
> for jar in `ls *.jar`
>  do
>  jar -tf $jar | grep TypeParser
>  if [ $? -eq 0 ]; then
>     echo $jar
>  fi
>  done
> 
> Shows me nothing in all the lib dirs....
> 
> 
> 
> On Mon, Jun 20, 2011 at 8:44 PM, Jeremy Hanna
> <je...@gmail.com> wrote:
>> Try running with cdh3u0 version of pig and see if it has the same problem.  They backported the patch (to pig 0.9 which should be out in time for the hadoop summit next week) that adds the updated jackson dependency for avro.  The download URL for that is - http://archive.cloudera.com/cdh/3/pig-0.8.0-cdh3u0.tar.gz
>> 
>> Alternatively, I believe today brisk beta 2 will be out which has pig integrated.  Not sure if that would work for your current environment though.
>> 
>> See if that works.
>> On Jun 20, 2011, at 1:09 PM, Sasha Dolgy wrote:
>> 
>>> Been trying for the past little bit to try and get the PIG integration
>>> working with Cassandra 0.8.0
>>> 
>>> 1.  Downloaded the src for 0.8.0 and ran ant build
>>> 2.  went into contrib/pig and ran ant ... gives me:
>>> /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/build/cassandra_storage.jar
>>> and is copied into the lib/ directory
>>> 3.  Downloaded pig-0.8.1, modified the ivy/libraries.properties so
>>> that it uses Jackson 1.8.2 .. and ran ant.  it compiles and gives me
>>> two jars:  pig-0.8.1-SNAPSHOT-core.jar and pig-0.8.1-SNAPSHOT.jar
>>> ----- I did try to run it with Jackson 1.4 as the
>>> contrib/pig/README.txt suggested, but that failed...  The referenced
>>> JIRA ticket (PIG-1863) suggests 1.6.0 (still produces the same
>>> results)
>>> 
>>> Environment variables are set:
>>> java version "1.6.0_24"
>>> 
>>> PIG_INITIAL_ADDRESS=localhost
>>> PIG_HOME=/usr/local/src/pig-0.8.1
>>> PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
>>> PIG_RPC_PORT=9160
>>> CASSANDRA_HOME=/usr/local/src/apache-cassandra-0.8.0-src
>>> 
>>> I then start up cassandra ... no issues.  I connect and create a new
>>> keyspace called foo with a column family called bar and a CF called
>>> foo...Inside the CF bar, I create a few rows, with random columns ....
>>> 4 Rows.
>>> 
>>> From contrib/pig I run:  bin/pig_cassandra -x local ... immediately
>>> get the error:
>>> 
>>> [: 45: /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar: unexpected operator
>>> 
>>> -- this is a reference to this line:  if [ ! -e $PIG_JAR ]; then
>>> 
>>> *** Problem here is that $PIG_JAR is a reference to two files ...
>>> pig-0.8.1-core.jar & pig.jar ...
>>> 
>>> Changing line 44 to PIG_JAR=$PIG_HOME/pig*core*.jar fixes this ... (or
>>> even referencing $PIG_HOME/build/pig*core*.jar or just pig.jar
>>> 
>>> Try again to run:  bin/pig_cassandra -x local and everything loads up nicely:
>>> 
>>> 2011-06-21 02:07:23,671 [main] INFO  org.apache.pig.Main - Logging
>>> error messages to:
>>> /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/pig_1308593243668.log
>>> 2011-06-21 02:07:23,778 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>>> Connecting to hadoop file system at: file:///
>>> grunt> register /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar; register
>>> /usr/local/src/pig-0.8.1/pig.jar; register
>>> /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-fixes.jar;
>>> register /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-sources-fixes.jar;
>>> register /usr/local/src/apache-cassandra-0.8.0-src/lib/libthrift-0.6.jar;
>>> grunt>
>>> grunt> rows = LOAD 'cassandra://foo/bar' USING CassandraStorage();
>>> grunt> STORE rows into 'cassandra://foo/foo' USING CassandraStorage();
>>> 2011-06-21 02:04:53,271 [main] INFO
>>> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
>>> script: UNKNOWN
>>> 2011-06-21 02:04:53,271 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>>> pig.usenewlogicalplan is set to true. New logical plan will be used.
>>> 2011-06-21 02:04:53,324 [main] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics
>>> with processName=JobTracker, sessionId=
>>> 2011-06-21 02:04:53,447 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>>> (Name: rows: Store(cassandra://foo/foo:CassandraStorage) - scope-1
>>> Operator Key: scope-1)
>>> 2011-06-21 02:04:53,458 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
>>> - File concatenation threshold: 100 optimistic? false
>>> 2011-06-21 02:04:53,477 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>> - MR plan size before optimization: 1
>>> 2011-06-21 02:04:53,477 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>> - MR plan size after optimization: 1
>>> 2011-06-21 02:04:53,480 [main] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>>> Metrics with processName=JobTracker, sessionId= - already initialized
>>> 2011-06-21 02:04:53,494 [main] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>>> Metrics with processName=JobTracker, sessionId= - already initialized
>>> 2011-06-21 02:04:53,494 [main] INFO
>>> org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
>>> added to the job
>>> 2011-06-21 02:04:53,556 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>> - mapred.job.reduce.markreset.buffer.percent is not set, set to
>>> default 0.3
>>> 2011-06-21 02:04:59,700 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>> - Setting up single store job
>>> 2011-06-21 02:04:59,718 [main] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>>> Metrics with processName=JobTracker, sessionId= - already initialized
>>> 2011-06-21 02:04:59,719 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>> - 1 map-reduce job(s) waiting for submission.
>>> 2011-06-21 02:04:59,948 [Thread-5] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>>> Metrics with processName=JobTracker, sessionId= - already initialized
>>> 2011-06-21 02:04:59,960 [Thread-5] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>>> Metrics with processName=JobTracker, sessionId= - already initialized
>>> 2011-06-21 02:04:59,980 [Thread-5] INFO
>>> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
>>> input paths (combined) to process : 1
>>> 2011-06-21 02:05:00,220 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>> - 0% complete
>>> 2011-06-21 02:05:00,322 [Thread-14] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>>> Metrics with processName=JobTracker, sessionId= - already initialized
>>> 2011-06-21 02:05:00,340 [Thread-14] INFO
>>> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
>>> input paths (combined) to process : 1
>>> 2011-06-21 02:05:00,372 [Thread-14] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>>> Metrics with processName=JobTracker, sessionId= - already initialized
>>> 2011-06-21 02:05:00,374 [Thread-14] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>>> Metrics with processName=JobTracker, sessionId= - already initialized
>>> 2011-06-21 02:05:00,378 [Thread-14] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>>> Metrics with processName=JobTracker, sessionId= - already initialized
>>> 2011-06-21 02:05:00,381 [Thread-14] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>>> Metrics with processName=JobTracker, sessionId= - already initialized
>>> 2011-06-21 02:05:00,491 [Thread-14] WARN
>>> org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
>>> java.lang.NoClassDefFoundError: org/apache/cassandra/db/marshal/TypeParser
>>>        at org.apache.cassandra.hadoop.pig.CassandraStorage.getDefaultMarshallers(Unknown
>>> Source)
>>>        at org.apache.cassandra.hadoop.pig.CassandraStorage.columnToTuple(Unknown
>>> Source)
>>>        at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown
>>> Source)
>>>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
>>>        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
>>>        at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.apache.cassandra.db.marshal.TypeParser
>>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>>>        ... 10 more
>>> 2011-06-21 02:05:00,818 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>> - HadoopJobId: job_local_0001
>>> 2011-06-21 02:05:05,408 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>> - job job_local_0001 has failed! Stop running all dependent jobs
>>> 2011-06-21 02:05:05,411 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>> - 100% complete
>>> 2011-06-21 02:05:05,412 [main] ERROR
>>> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
>>> failed!
>>> 2011-06-21 02:05:05,412 [main] INFO
>>> org.apache.pig.tools.pigstats.PigStats - Detected Local mode. Stats
>>> reported below may be incomplete
>>> 2011-06-21 02:05:05,413 [main] INFO
>>> org.apache.pig.tools.pigstats.PigStats - Script Statistics:
>>> 
>>> HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
>>> 0.20.2  0.8.1   root    2011-06-21 02:04:53     2011-06-21 02:05:05     UNKNOWN
>>> 
>>> Failed!
>>> 
>>> Failed Jobs:
>>> JobId   Alias   Feature Message Outputs
>>> job_local_0001  rows    MAP_ONLY        Message: Job failed!
>>> cassandra://foo/foo,
>>> 
>>> Input(s):
>>> Failed to read data from "cassandra://foo/bar"
>>> 
>>> Output(s):
>>> Failed to produce result in "cassandra://foo/foo"
>>> 
>>> Job DAG:
>>> job_local_0001
>>> 
>>> 
>>> 2011-06-21 02:05:05,413 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>> - Failed!
>>> 2011-06-21 02:05:05,416 [main] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>>> Metrics with processName=JobTracker, sessionId= - already initialized
>>> grunt>
>>> 
>>> 
>>> Any help or insight is appreciated ....
>> 
>> 
> 
> 
> 
> -- 
> Sasha Dolgy
> sasha.dolgy@gmail.com


Re: pig integration & NoClassDefFoundError TypeParser

Posted by Sasha Dolgy <sd...@gmail.com>.
Hi ... I still have the same problem with pig-0.8.0-cdh3u0...

Maybe I'm doing something wrong.  Where does
org/apache/cassandra/db/marshal/TypeParser exist, or should exist?

It's not in the $CASSANDRA_HOME/libs or
/usr/local/src/pig-0.8.0-cdh3u0/lib or
/usr/local/src/apache-cassandra-0.8.0-src/build/lib/jars


for jar in `ls *.jar`
  do
  jar -tf $jar | grep TypeParser
  if [ $? -eq 0 ]; then
     echo $jar
  fi
  done

Shows me nothing in all the lib dirs....



On Mon, Jun 20, 2011 at 8:44 PM, Jeremy Hanna
<je...@gmail.com> wrote:
> Try running with cdh3u0 version of pig and see if it has the same problem.  They backported the patch (to pig 0.9 which should be out in time for the hadoop summit next week) that adds the updated jackson dependency for avro.  The download URL for that is - http://archive.cloudera.com/cdh/3/pig-0.8.0-cdh3u0.tar.gz
>
> Alternatively, I believe today brisk beta 2 will be out which has pig integrated.  Not sure if that would work for your current environment though.
>
> See if that works.
> On Jun 20, 2011, at 1:09 PM, Sasha Dolgy wrote:
>
>> Been trying for the past little bit to try and get the PIG integration
>> working with Cassandra 0.8.0
>>
>> 1.  Downloaded the src for 0.8.0 and ran ant build
>> 2.  went into contrib/pig and ran ant ... gives me:
>> /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/build/cassandra_storage.jar
>> and is copied into the lib/ directory
>> 3.  Downloaded pig-0.8.1, modified the ivy/libraries.properties so
>> that it uses Jackson 1.8.2 .. and ran ant.  it compiles and gives me
>> two jars:  pig-0.8.1-SNAPSHOT-core.jar and pig-0.8.1-SNAPSHOT.jar
>> ----- I did try to run it with Jackson 1.4 as the
>> contrib/pig/README.txt suggested, but that failed...  The referenced
>> JIRA ticket (PIG-1863) suggests 1.6.0 (still produces the same
>> results)
>>
>> Environment variables are set:
>> java version "1.6.0_24"
>>
>> PIG_INITIAL_ADDRESS=localhost
>> PIG_HOME=/usr/local/src/pig-0.8.1
>> PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
>> PIG_RPC_PORT=9160
>> CASSANDRA_HOME=/usr/local/src/apache-cassandra-0.8.0-src
>>
>> I then start up cassandra ... no issues.  I connect and create a new
>> keyspace called foo with a column family called bar and a CF called
>> foo...Inside the CF bar, I create a few rows, with random columns ....
>> 4 Rows.
>>
>> From contrib/pig I run:  bin/pig_cassandra -x local ... immediately
>> get the error:
>>
>> [: 45: /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar: unexpected operator
>>
>> -- this is a reference to this line:  if [ ! -e $PIG_JAR ]; then
>>
>> *** Problem here is that $PIG_JAR is a reference to two files ...
>> pig-0.8.1-core.jar & pig.jar ...
>>
>> Changing line 44 to PIG_JAR=$PIG_HOME/pig*core*.jar fixes this ... (or
>> even referencing $PIG_HOME/build/pig*core*.jar or just pig.jar
>>
>> Try again to run:  bin/pig_cassandra -x local and everything loads up nicely:
>>
>> 2011-06-21 02:07:23,671 [main] INFO  org.apache.pig.Main - Logging
>> error messages to:
>> /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/pig_1308593243668.log
>> 2011-06-21 02:07:23,778 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> Connecting to hadoop file system at: file:///
>> grunt> register /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar; register
>> /usr/local/src/pig-0.8.1/pig.jar; register
>> /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-fixes.jar;
>> register /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-sources-fixes.jar;
>> register /usr/local/src/apache-cassandra-0.8.0-src/lib/libthrift-0.6.jar;
>> grunt>
>> grunt> rows = LOAD 'cassandra://foo/bar' USING CassandraStorage();
>> grunt> STORE rows into 'cassandra://foo/foo' USING CassandraStorage();
>> 2011-06-21 02:04:53,271 [main] INFO
>> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
>> script: UNKNOWN
>> 2011-06-21 02:04:53,271 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> pig.usenewlogicalplan is set to true. New logical plan will be used.
>> 2011-06-21 02:04:53,324 [main] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics
>> with processName=JobTracker, sessionId=
>> 2011-06-21 02:04:53,447 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> (Name: rows: Store(cassandra://foo/foo:CassandraStorage) - scope-1
>> Operator Key: scope-1)
>> 2011-06-21 02:04:53,458 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
>> - File concatenation threshold: 100 optimistic? false
>> 2011-06-21 02:04:53,477 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>> - MR plan size before optimization: 1
>> 2011-06-21 02:04:53,477 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>> - MR plan size after optimization: 1
>> 2011-06-21 02:04:53,480 [main] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> 2011-06-21 02:04:53,494 [main] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> 2011-06-21 02:04:53,494 [main] INFO
>> org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
>> added to the job
>> 2011-06-21 02:04:53,556 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>> - mapred.job.reduce.markreset.buffer.percent is not set, set to
>> default 0.3
>> 2011-06-21 02:04:59,700 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>> - Setting up single store job
>> 2011-06-21 02:04:59,718 [main] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> 2011-06-21 02:04:59,719 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> - 1 map-reduce job(s) waiting for submission.
>> 2011-06-21 02:04:59,948 [Thread-5] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> 2011-06-21 02:04:59,960 [Thread-5] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> 2011-06-21 02:04:59,980 [Thread-5] INFO
>> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
>> input paths (combined) to process : 1
>> 2011-06-21 02:05:00,220 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> - 0% complete
>> 2011-06-21 02:05:00,322 [Thread-14] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> 2011-06-21 02:05:00,340 [Thread-14] INFO
>> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
>> input paths (combined) to process : 1
>> 2011-06-21 02:05:00,372 [Thread-14] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> 2011-06-21 02:05:00,374 [Thread-14] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> 2011-06-21 02:05:00,378 [Thread-14] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> 2011-06-21 02:05:00,381 [Thread-14] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> 2011-06-21 02:05:00,491 [Thread-14] WARN
>> org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
>> java.lang.NoClassDefFoundError: org/apache/cassandra/db/marshal/TypeParser
>>        at org.apache.cassandra.hadoop.pig.CassandraStorage.getDefaultMarshallers(Unknown
>> Source)
>>        at org.apache.cassandra.hadoop.pig.CassandraStorage.columnToTuple(Unknown
>> Source)
>>        at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown
>> Source)
>>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
>>        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
>>        at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.cassandra.db.marshal.TypeParser
>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>>        ... 10 more
>> 2011-06-21 02:05:00,818 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> - HadoopJobId: job_local_0001
>> 2011-06-21 02:05:05,408 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> - job job_local_0001 has failed! Stop running all dependent jobs
>> 2011-06-21 02:05:05,411 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> - 100% complete
>> 2011-06-21 02:05:05,412 [main] ERROR
>> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
>> failed!
>> 2011-06-21 02:05:05,412 [main] INFO
>> org.apache.pig.tools.pigstats.PigStats - Detected Local mode. Stats
>> reported below may be incomplete
>> 2011-06-21 02:05:05,413 [main] INFO
>> org.apache.pig.tools.pigstats.PigStats - Script Statistics:
>>
>> HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
>> 0.20.2  0.8.1   root    2011-06-21 02:04:53     2011-06-21 02:05:05     UNKNOWN
>>
>> Failed!
>>
>> Failed Jobs:
>> JobId   Alias   Feature Message Outputs
>> job_local_0001  rows    MAP_ONLY        Message: Job failed!
>> cassandra://foo/foo,
>>
>> Input(s):
>> Failed to read data from "cassandra://foo/bar"
>>
>> Output(s):
>> Failed to produce result in "cassandra://foo/foo"
>>
>> Job DAG:
>> job_local_0001
>>
>>
>> 2011-06-21 02:05:05,413 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> - Failed!
>> 2011-06-21 02:05:05,416 [main] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> grunt>
>>
>>
>> Any help or insight is appreciated ....
>
>



-- 
Sasha Dolgy
sasha.dolgy@gmail.com

Re: pig integration & NoClassDefFoundError TypeParser

Posted by Jeremy Hanna <je...@gmail.com>.
Try running with cdh3u0 version of pig and see if it has the same problem.  They backported the patch (to pig 0.9 which should be out in time for the hadoop summit next week) that adds the updated jackson dependency for avro.  The download URL for that is - http://archive.cloudera.com/cdh/3/pig-0.8.0-cdh3u0.tar.gz

Alternatively, I believe today brisk beta 2 will be out which has pig integrated.  Not sure if that would work for your current environment though.

See if that works.
On Jun 20, 2011, at 1:09 PM, Sasha Dolgy wrote:

> Been trying for the past little bit to try and get the PIG integration
> working with Cassandra 0.8.0
> 
> 1.  Downloaded the src for 0.8.0 and ran ant build
> 2.  went into contrib/pig and ran ant ... gives me:
> /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/build/cassandra_storage.jar
> and is copied into the lib/ directory
> 3.  Downloaded pig-0.8.1, modified the ivy/libraries.properties so
> that it uses Jackson 1.8.2 .. and ran ant.  it compiles and gives me
> two jars:  pig-0.8.1-SNAPSHOT-core.jar and pig-0.8.1-SNAPSHOT.jar
> ----- I did try to run it with Jackson 1.4 as the
> contrib/pig/README.txt suggested, but that failed...  The referenced
> JIRA ticket (PIG-1863) suggests 1.6.0 (still produces the same
> results)
> 
> Environment variables are set:
> java version "1.6.0_24"
> 
> PIG_INITIAL_ADDRESS=localhost
> PIG_HOME=/usr/local/src/pig-0.8.1
> PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
> PIG_RPC_PORT=9160
> CASSANDRA_HOME=/usr/local/src/apache-cassandra-0.8.0-src
> 
> I then start up cassandra ... no issues.  I connect and create a new
> keyspace called foo with a column family called bar and a CF called
> foo...Inside the CF bar, I create a few rows, with random columns ....
> 4 Rows.
> 
> From contrib/pig I run:  bin/pig_cassandra -x local ... immediately
> get the error:
> 
> [: 45: /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar: unexpected operator
> 
> -- this is a reference to this line:  if [ ! -e $PIG_JAR ]; then
> 
> *** Problem here is that $PIG_JAR is a reference to two files ...
> pig-0.8.1-core.jar & pig.jar ...
> 
> Changing line 44 to PIG_JAR=$PIG_HOME/pig*core*.jar fixes this ... (or
> even referencing $PIG_HOME/build/pig*core*.jar or just pig.jar
> 
> Try again to run:  bin/pig_cassandra -x local and everything loads up nicely:
> 
> 2011-06-21 02:07:23,671 [main] INFO  org.apache.pig.Main - Logging
> error messages to:
> /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/pig_1308593243668.log
> 2011-06-21 02:07:23,778 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to hadoop file system at: file:///
> grunt> register /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar; register
> /usr/local/src/pig-0.8.1/pig.jar; register
> /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-fixes.jar;
> register /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-sources-fixes.jar;
> register /usr/local/src/apache-cassandra-0.8.0-src/lib/libthrift-0.6.jar;
> grunt>
> grunt> rows = LOAD 'cassandra://foo/bar' USING CassandraStorage();
> grunt> STORE rows into 'cassandra://foo/foo' USING CassandraStorage();
> 2011-06-21 02:04:53,271 [main] INFO
> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
> script: UNKNOWN
> 2011-06-21 02:04:53,271 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> pig.usenewlogicalplan is set to true. New logical plan will be used.
> 2011-06-21 02:04:53,324 [main] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics
> with processName=JobTracker, sessionId=
> 2011-06-21 02:04:53,447 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> (Name: rows: Store(cassandra://foo/foo:CassandraStorage) - scope-1
> Operator Key: scope-1)
> 2011-06-21 02:04:53,458 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
> - File concatenation threshold: 100 optimistic? false
> 2011-06-21 02:04:53,477 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size before optimization: 1
> 2011-06-21 02:04:53,477 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size after optimization: 1
> 2011-06-21 02:04:53,480 [main] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-06-21 02:04:53,494 [main] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-06-21 02:04:53,494 [main] INFO
> org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
> added to the job
> 2011-06-21 02:04:53,556 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - mapred.job.reduce.markreset.buffer.percent is not set, set to
> default 0.3
> 2011-06-21 02:04:59,700 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - Setting up single store job
> 2011-06-21 02:04:59,718 [main] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-06-21 02:04:59,719 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map-reduce job(s) waiting for submission.
> 2011-06-21 02:04:59,948 [Thread-5] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-06-21 02:04:59,960 [Thread-5] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-06-21 02:04:59,980 [Thread-5] INFO
> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> input paths (combined) to process : 1
> 2011-06-21 02:05:00,220 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete
> 2011-06-21 02:05:00,322 [Thread-14] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-06-21 02:05:00,340 [Thread-14] INFO
> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> input paths (combined) to process : 1
> 2011-06-21 02:05:00,372 [Thread-14] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-06-21 02:05:00,374 [Thread-14] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-06-21 02:05:00,378 [Thread-14] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-06-21 02:05:00,381 [Thread-14] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-06-21 02:05:00,491 [Thread-14] WARN
> org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
> java.lang.NoClassDefFoundError: org/apache/cassandra/db/marshal/TypeParser
>        at org.apache.cassandra.hadoop.pig.CassandraStorage.getDefaultMarshallers(Unknown
> Source)
>        at org.apache.cassandra.hadoop.pig.CassandraStorage.columnToTuple(Unknown
> Source)
>        at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown
> Source)
>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
>        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
>        at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.cassandra.db.marshal.TypeParser
>        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>        ... 10 more
> 2011-06-21 02:05:00,818 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - HadoopJobId: job_local_0001
> 2011-06-21 02:05:05,408 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - job job_local_0001 has failed! Stop running all dependent jobs
> 2011-06-21 02:05:05,411 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
> 2011-06-21 02:05:05,412 [main] ERROR
> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
> failed!
> 2011-06-21 02:05:05,412 [main] INFO
> org.apache.pig.tools.pigstats.PigStats - Detected Local mode. Stats
> reported below may be incomplete
> 2011-06-21 02:05:05,413 [main] INFO
> org.apache.pig.tools.pigstats.PigStats - Script Statistics:
> 
> HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
> 0.20.2  0.8.1   root    2011-06-21 02:04:53     2011-06-21 02:05:05     UNKNOWN
> 
> Failed!
> 
> Failed Jobs:
> JobId   Alias   Feature Message Outputs
> job_local_0001  rows    MAP_ONLY        Message: Job failed!
> cassandra://foo/foo,
> 
> Input(s):
> Failed to read data from "cassandra://foo/bar"
> 
> Output(s):
> Failed to produce result in "cassandra://foo/foo"
> 
> Job DAG:
> job_local_0001
> 
> 
> 2011-06-21 02:05:05,413 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed!
> 2011-06-21 02:05:05,416 [main] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> grunt>
> 
> 
> Any help or insight is appreciated ....