You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Sasha Dolgy <sd...@gmail.com> on 2011/06/20 20:09:16 UTC
pig integration & NoClassDefFoundError TypeParser
Been trying for the past little bit to try and get the PIG integration
working with Cassandra 0.8.0
1. Downloaded the src for 0.8.0 and ran ant build
2. went into contrib/pig and ran ant ... gives me:
/usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/build/cassandra_storage.jar
and is copied into the lib/ directory
3. Downloaded pig-0.8.1, modified the ivy/libraries.properties so
that it uses Jackson 1.8.2 .. and ran ant. it compiles and gives me
two jars: pig-0.8.1-SNAPSHOT-core.jar and pig-0.8.1-SNAPSHOT.jar
----- I did try to run it with Jackson 1.4 as the
contrib/pig/README.txt suggested, but that failed... The referenced
JIRA ticket (PIG-1863) suggests 1.6.0 (still produces the same
results)
Environment variables are set:
java version "1.6.0_24"
PIG_INITIAL_ADDRESS=localhost
PIG_HOME=/usr/local/src/pig-0.8.1
PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
PIG_RPC_PORT=9160
CASSANDRA_HOME=/usr/local/src/apache-cassandra-0.8.0-src
I then start up cassandra ... no issues. I connect and create a new
keyspace called foo with a column family called bar and a CF called
foo...Inside the CF bar, I create a few rows, with random columns ....
4 Rows.
>From contrib/pig I run: bin/pig_cassandra -x local ... immediately
get the error:
[: 45: /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar: unexpected operator
-- this is a reference to this line: if [ ! -e $PIG_JAR ]; then
*** Problem here is that $PIG_JAR is a reference to two files ...
pig-0.8.1-core.jar & pig.jar ...
Changing line 44 to PIG_JAR=$PIG_HOME/pig*core*.jar fixes this ... (or
even referencing $PIG_HOME/build/pig*core*.jar or just pig.jar
Try again to run: bin/pig_cassandra -x local and everything loads up nicely:
2011-06-21 02:07:23,671 [main] INFO org.apache.pig.Main - Logging
error messages to:
/usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/pig_1308593243668.log
2011-06-21 02:07:23,778 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: file:///
grunt> register /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar; register
/usr/local/src/pig-0.8.1/pig.jar; register
/usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-fixes.jar;
register /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-sources-fixes.jar;
register /usr/local/src/apache-cassandra-0.8.0-src/lib/libthrift-0.6.jar;
grunt>
grunt> rows = LOAD 'cassandra://foo/bar' USING CassandraStorage();
grunt> STORE rows into 'cassandra://foo/foo' USING CassandraStorage();
2011-06-21 02:04:53,271 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: UNKNOWN
2011-06-21 02:04:53,271 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
pig.usenewlogicalplan is set to true. New logical plan will be used.
2011-06-21 02:04:53,324 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics
with processName=JobTracker, sessionId=
2011-06-21 02:04:53,447 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
(Name: rows: Store(cassandra://foo/foo:CassandraStorage) - scope-1
Operator Key: scope-1)
2011-06-21 02:04:53,458 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
- File concatenation threshold: 100 optimistic? false
2011-06-21 02:04:53,477 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2011-06-21 02:04:53,477 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2011-06-21 02:04:53,480 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:04:53,494 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:04:53,494 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
added to the job
2011-06-21 02:04:53,556 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to
default 0.3
2011-06-21 02:04:59,700 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2011-06-21 02:04:59,718 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:04:59,719 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2011-06-21 02:04:59,948 [Thread-5] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:04:59,960 [Thread-5] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:04:59,980 [Thread-5] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths (combined) to process : 1
2011-06-21 02:05:00,220 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2011-06-21 02:05:00,322 [Thread-14] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:05:00,340 [Thread-14] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths (combined) to process : 1
2011-06-21 02:05:00,372 [Thread-14] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:05:00,374 [Thread-14] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:05:00,378 [Thread-14] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:05:00,381 [Thread-14] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:05:00,491 [Thread-14] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
java.lang.NoClassDefFoundError: org/apache/cassandra/db/marshal/TypeParser
at org.apache.cassandra.hadoop.pig.CassandraStorage.getDefaultMarshallers(Unknown
Source)
at org.apache.cassandra.hadoop.pig.CassandraStorage.columnToTuple(Unknown
Source)
at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown
Source)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: java.lang.ClassNotFoundException:
org.apache.cassandra.db.marshal.TypeParser
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 10 more
2011-06-21 02:05:00,818 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_local_0001
2011-06-21 02:05:05,408 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_local_0001 has failed! Stop running all dependent jobs
2011-06-21 02:05:05,411 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2011-06-21 02:05:05,412 [main] ERROR
org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
failed!
2011-06-21 02:05:05,412 [main] INFO
org.apache.pig.tools.pigstats.PigStats - Detected Local mode. Stats
reported below may be incomplete
2011-06-21 02:05:05,413 [main] INFO
org.apache.pig.tools.pigstats.PigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
0.20.2 0.8.1 root 2011-06-21 02:04:53 2011-06-21 02:05:05 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_local_0001 rows MAP_ONLY Message: Job failed!
cassandra://foo/foo,
Input(s):
Failed to read data from "cassandra://foo/bar"
Output(s):
Failed to produce result in "cassandra://foo/foo"
Job DAG:
job_local_0001
2011-06-21 02:05:05,413 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!
2011-06-21 02:05:05,416 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
grunt>
Any help or insight is appreciated ....
Re: pig integration & NoClassDefFoundError TypeParser
Posted by Sasha Dolgy <sd...@gmail.com>.
bang on ... no idea why ... a new day a fresh login ... environment
variables gone. working now with cassandra 0.8.0 and pig 0.8.1
went through all my steps and all is working ... except line 45 in the
bin/pig_cassandra is not proper when there are multiple pig*.jar
files.
On Mon, Jun 20, 2011 at 10:03 PM, Jeremy Hanna
<je...@gmail.com> wrote:
> I think you might be having environment/classpath issues with an RC of cassandra 0.8 or something.
Re: pig integration & NoClassDefFoundError TypeParser
Posted by Jeremy Hanna <je...@gmail.com>.
I think you might be having environment/classpath issues with an RC of cassandra 0.8 or something.
I just downloaded 0.8 and did the following:
- Ran the examples/hadoop_word_count/bin/word_count_setup to create some data
- Ran contrib/pig/bin/pig_cassandra -x local example_script.pig (with the keyspace/columnfamily as wordcount/input_words)
- that worked
then I added the pygmalion data with a slight change for 0.8 (key_validation_class) (listed below) and ran the from_to_cassandra_bag_example.pig with bin/pig_cassandra -x local from_to_cassandra_bag_example.pig. That inputs from one column family and writes out to another column family from filtered data. The script is here (you just need to build pygmalion and point the register statement to your built pygmalion jar) - https://github.com/jeromatron/pygmalion/blob/master/scripts/from_to_cassandra_bag_example.pig
That worked as well and output to cassandra.
So I suspect that for some reason your environment is messed up somehow - the CassandraStorage class (for pig integration) doesn't point to TypeParser in 0.8.0.
create keyspace pygmalion;
use pygmalion;
create column family account with comparator = UTF8Type and default_validation_class = UTF8Type and key_validation_class = UTF8Type and
column_metadata=
[
{column_name: num_heads, validation_class: LongType},
];
create column family betelgeuse with comparator = UTF8Type and default_validation_class = UTF8Type;
set account['hipcat']['first_name'] = 'Zaphod';
set account['hipcat']['last_name'] = 'Beeblebrox';
set account['hipcat']['birth_place'] = 'Betelgeuse Five';
set account['hipcat']['num_heads'] = '2';
set account['hoopyfrood']['first_name'] = 'Ford';
set account['hoopyfrood']['last_name'] = 'Prefect';
set account['hoopyfrood']['birth_place'] = 'Betelgeuse Five';
set account['hoopyfrood']['num_heads'] = '1';
set account['earthman']['first_name'] = 'Arthur';
set account['earthman']['last_name'] = 'Dent';
set account['earthman']['birth_place'] = 'Earth';
set account['earthman']['num_heads'] = '1';
On Jun 20, 2011, at 2:23 PM, Sasha Dolgy wrote:
> cassandra-0.8.0/src/java/org/apache/cassandra/db/marshal/TypeParser.java
> : doesn't exist
> cassandra-0.8.1/src/java/org/apache/cassandra/db/marshal/TypeParser.java
> : exists...
>
> PIG integration with 0.8.0 is no longer working / doesn't work with
> 0.8.0 release, but will with 0.8.1 .. fair assumption?
>
> On Mon, Jun 20, 2011 at 9:18 PM, Sasha Dolgy <sd...@gmail.com> wrote:
>> Yes ... I ran an "ant" in the root directory on a fresh download of 0.8.0 src:
>>
>> /usr/local/src/apache-cassandra-0.8.0-src# ls
>> /usr/local/src/apache-cassandra-0.8.0-src/build/classes/main/org/apache/cassandra/db/marshal/
>> AbstractCommutativeType.class AbstractType.class
>> LexicalUUIDType.class UTF8Type.class
>> AbstractType$1.class AbstractUUIDType.class
>> LocalByPartionerType.class UTF8Type$UTF8Validator.class
>> AbstractType$2.class AsciiType.class
>> LongType.class
>> UTF8Type$UTF8Validator$State.class
>> AbstractType$3.class BytesType.class
>> MarshalException.class UUIDType.class
>> AbstractType$4.class CounterColumnType.class
>> TimeUUIDType.class
>> AbstractType$5.class IntegerType.class
>> UTF8Type$1.class
>>
>> /usr/local/src/apache-cassandra-0.8.0-src# find . | grep TypeParser
>> /usr/local/src/apache-cassandra-0.8.0-src# echo $?
>> 1
>> /usr/local/src/apache-cassandra-0.8.0-src#
>>
>> /usr/local/src/apache-cassandra-0.8.0-src# grep -Ri TypeError .
>> /usr/local/src/apache-cassandra-0.8.0-src# echo $?
>> 1
>> /usr/local/src/apache-cassandra-0.8.0-src#
>>
>> TypeParser does not exist...?
>>
>>
>> On Mon, Jun 20, 2011 at 9:11 PM, Jeremy Hanna
>> <je...@gmail.com> wrote:
>>> hmmm, did you build the cassandra src in the root of your cassandra directory with ant? sounds like it can't find that cassandra class. That's required.
>>
>
>
>
> --
> Sasha Dolgy
> sasha.dolgy@gmail.com
Re: pig integration & NoClassDefFoundError TypeParser
Posted by Jeremy Hanna <je...@gmail.com>.
I seem to recall a last minute issue with 0.8.0 before release that the TypeParser wasn't in there (for the pig support). However, I'm pretty sure that got fixed before release. I'll test it out in a few minutes - stay tuned :).
Jeremy
On Jun 20, 2011, at 2:23 PM, Sasha Dolgy wrote:
> cassandra-0.8.0/src/java/org/apache/cassandra/db/marshal/TypeParser.java
> : doesn't exist
> cassandra-0.8.1/src/java/org/apache/cassandra/db/marshal/TypeParser.java
> : exists...
>
> PIG integration with 0.8.0 is no longer working / doesn't work with
> 0.8.0 release, but will with 0.8.1 .. fair assumption?
>
> On Mon, Jun 20, 2011 at 9:18 PM, Sasha Dolgy <sd...@gmail.com> wrote:
>> Yes ... I ran an "ant" in the root directory on a fresh download of 0.8.0 src:
>>
>> /usr/local/src/apache-cassandra-0.8.0-src# ls
>> /usr/local/src/apache-cassandra-0.8.0-src/build/classes/main/org/apache/cassandra/db/marshal/
>> AbstractCommutativeType.class AbstractType.class
>> LexicalUUIDType.class UTF8Type.class
>> AbstractType$1.class AbstractUUIDType.class
>> LocalByPartionerType.class UTF8Type$UTF8Validator.class
>> AbstractType$2.class AsciiType.class
>> LongType.class
>> UTF8Type$UTF8Validator$State.class
>> AbstractType$3.class BytesType.class
>> MarshalException.class UUIDType.class
>> AbstractType$4.class CounterColumnType.class
>> TimeUUIDType.class
>> AbstractType$5.class IntegerType.class
>> UTF8Type$1.class
>>
>> /usr/local/src/apache-cassandra-0.8.0-src# find . | grep TypeParser
>> /usr/local/src/apache-cassandra-0.8.0-src# echo $?
>> 1
>> /usr/local/src/apache-cassandra-0.8.0-src#
>>
>> /usr/local/src/apache-cassandra-0.8.0-src# grep -Ri TypeError .
>> /usr/local/src/apache-cassandra-0.8.0-src# echo $?
>> 1
>> /usr/local/src/apache-cassandra-0.8.0-src#
>>
>> TypeParser does not exist...?
>>
>>
>> On Mon, Jun 20, 2011 at 9:11 PM, Jeremy Hanna
>> <je...@gmail.com> wrote:
>>> hmmm, did you build the cassandra src in the root of your cassandra directory with ant? sounds like it can't find that cassandra class. That's required.
>>
>
>
>
> --
> Sasha Dolgy
> sasha.dolgy@gmail.com
Re: pig integration & NoClassDefFoundError TypeParser
Posted by Sasha Dolgy <sd...@gmail.com>.
cassandra-0.8.0/src/java/org/apache/cassandra/db/marshal/TypeParser.java
: doesn't exist
cassandra-0.8.1/src/java/org/apache/cassandra/db/marshal/TypeParser.java
: exists...
PIG integration with 0.8.0 is no longer working / doesn't work with
0.8.0 release, but will with 0.8.1 .. fair assumption?
On Mon, Jun 20, 2011 at 9:18 PM, Sasha Dolgy <sd...@gmail.com> wrote:
> Yes ... I ran an "ant" in the root directory on a fresh download of 0.8.0 src:
>
> /usr/local/src/apache-cassandra-0.8.0-src# ls
> /usr/local/src/apache-cassandra-0.8.0-src/build/classes/main/org/apache/cassandra/db/marshal/
> AbstractCommutativeType.class AbstractType.class
> LexicalUUIDType.class UTF8Type.class
> AbstractType$1.class AbstractUUIDType.class
> LocalByPartionerType.class UTF8Type$UTF8Validator.class
> AbstractType$2.class AsciiType.class
> LongType.class
> UTF8Type$UTF8Validator$State.class
> AbstractType$3.class BytesType.class
> MarshalException.class UUIDType.class
> AbstractType$4.class CounterColumnType.class
> TimeUUIDType.class
> AbstractType$5.class IntegerType.class
> UTF8Type$1.class
>
> /usr/local/src/apache-cassandra-0.8.0-src# find . | grep TypeParser
> /usr/local/src/apache-cassandra-0.8.0-src# echo $?
> 1
> /usr/local/src/apache-cassandra-0.8.0-src#
>
> /usr/local/src/apache-cassandra-0.8.0-src# grep -Ri TypeError .
> /usr/local/src/apache-cassandra-0.8.0-src# echo $?
> 1
> /usr/local/src/apache-cassandra-0.8.0-src#
>
> TypeParser does not exist...?
>
>
> On Mon, Jun 20, 2011 at 9:11 PM, Jeremy Hanna
> <je...@gmail.com> wrote:
>> hmmm, did you build the cassandra src in the root of your cassandra directory with ant? sounds like it can't find that cassandra class. That's required.
>
--
Sasha Dolgy
sasha.dolgy@gmail.com
Re: pig integration & NoClassDefFoundError TypeParser
Posted by Sasha Dolgy <sd...@gmail.com>.
Yes ... I ran an "ant" in the root directory on a fresh download of 0.8.0 src:
/usr/local/src/apache-cassandra-0.8.0-src# ls
/usr/local/src/apache-cassandra-0.8.0-src/build/classes/main/org/apache/cassandra/db/marshal/
AbstractCommutativeType.class AbstractType.class
LexicalUUIDType.class UTF8Type.class
AbstractType$1.class AbstractUUIDType.class
LocalByPartionerType.class UTF8Type$UTF8Validator.class
AbstractType$2.class AsciiType.class
LongType.class
UTF8Type$UTF8Validator$State.class
AbstractType$3.class BytesType.class
MarshalException.class UUIDType.class
AbstractType$4.class CounterColumnType.class
TimeUUIDType.class
AbstractType$5.class IntegerType.class
UTF8Type$1.class
/usr/local/src/apache-cassandra-0.8.0-src# find . | grep TypeParser
/usr/local/src/apache-cassandra-0.8.0-src# echo $?
1
/usr/local/src/apache-cassandra-0.8.0-src#
/usr/local/src/apache-cassandra-0.8.0-src# grep -Ri TypeError .
/usr/local/src/apache-cassandra-0.8.0-src# echo $?
1
/usr/local/src/apache-cassandra-0.8.0-src#
TypeParser does not exist...?
On Mon, Jun 20, 2011 at 9:11 PM, Jeremy Hanna
<je...@gmail.com> wrote:
> hmmm, did you build the cassandra src in the root of your cassandra directory with ant? sounds like it can't find that cassandra class. That's required.
Re: pig integration & NoClassDefFoundError TypeParser
Posted by Jeremy Hanna <je...@gmail.com>.
hmmm, did you build the cassandra src in the root of your cassandra directory with ant? sounds like it can't find that cassandra class. That's required.
On Jun 20, 2011, at 2:05 PM, Sasha Dolgy wrote:
> Hi ... I still have the same problem with pig-0.8.0-cdh3u0...
>
> Maybe I'm doing something wrong. Where does
> org/apache/cassandra/db/marshal/TypeParser exist, or should exist?
>
> It's not in the $CASSANDRA_HOME/libs or
> /usr/local/src/pig-0.8.0-cdh3u0/lib or
> /usr/local/src/apache-cassandra-0.8.0-src/build/lib/jars
>
>
> for jar in `ls *.jar`
> do
> jar -tf $jar | grep TypeParser
> if [ $? -eq 0 ]; then
> echo $jar
> fi
> done
>
> Shows me nothing in all the lib dirs....
>
>
>
> On Mon, Jun 20, 2011 at 8:44 PM, Jeremy Hanna
> <je...@gmail.com> wrote:
>> Try running with cdh3u0 version of pig and see if it has the same problem. They backported the patch (to pig 0.9 which should be out in time for the hadoop summit next week) that adds the updated jackson dependency for avro. The download URL for that is - http://archive.cloudera.com/cdh/3/pig-0.8.0-cdh3u0.tar.gz
>>
>> Alternatively, I believe today brisk beta 2 will be out which has pig integrated. Not sure if that would work for your current environment though.
>>
>> See if that works.
>> On Jun 20, 2011, at 1:09 PM, Sasha Dolgy wrote:
>>
>>> Been trying for the past little bit to try and get the PIG integration
>>> working with Cassandra 0.8.0
>>>
>>> 1. Downloaded the src for 0.8.0 and ran ant build
>>> 2. went into contrib/pig and ran ant ... gives me:
>>> /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/build/cassandra_storage.jar
>>> and is copied into the lib/ directory
>>> 3. Downloaded pig-0.8.1, modified the ivy/libraries.properties so
>>> that it uses Jackson 1.8.2 .. and ran ant. it compiles and gives me
>>> two jars: pig-0.8.1-SNAPSHOT-core.jar and pig-0.8.1-SNAPSHOT.jar
>>> ----- I did try to run it with Jackson 1.4 as the
>>> contrib/pig/README.txt suggested, but that failed... The referenced
>>> JIRA ticket (PIG-1863) suggests 1.6.0 (still produces the same
>>> results)
>>>
>>> Environment variables are set:
>>> java version "1.6.0_24"
>>>
>>> PIG_INITIAL_ADDRESS=localhost
>>> PIG_HOME=/usr/local/src/pig-0.8.1
>>> PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
>>> PIG_RPC_PORT=9160
>>> CASSANDRA_HOME=/usr/local/src/apache-cassandra-0.8.0-src
>>>
>>> I then start up cassandra ... no issues. I connect and create a new
>>> keyspace called foo with a column family called bar and a CF called
>>> foo...Inside the CF bar, I create a few rows, with random columns ....
>>> 4 Rows.
>>>
>>> From contrib/pig I run: bin/pig_cassandra -x local ... immediately
>>> get the error:
>>>
>>> [: 45: /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar: unexpected operator
>>>
>>> -- this is a reference to this line: if [ ! -e $PIG_JAR ]; then
>>>
>>> *** Problem here is that $PIG_JAR is a reference to two files ...
>>> pig-0.8.1-core.jar & pig.jar ...
>>>
>>> Changing line 44 to PIG_JAR=$PIG_HOME/pig*core*.jar fixes this ... (or
>>> even referencing $PIG_HOME/build/pig*core*.jar or just pig.jar
>>>
>>> Try again to run: bin/pig_cassandra -x local and everything loads up nicely:
>>>
>>> 2011-06-21 02:07:23,671 [main] INFO org.apache.pig.Main - Logging
>>> error messages to:
>>> /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/pig_1308593243668.log
>>> 2011-06-21 02:07:23,778 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>>> Connecting to hadoop file system at: file:///
>>> grunt> register /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar; register
>>> /usr/local/src/pig-0.8.1/pig.jar; register
>>> /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-fixes.jar;
>>> register /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-sources-fixes.jar;
>>> register /usr/local/src/apache-cassandra-0.8.0-src/lib/libthrift-0.6.jar;
>>> grunt>
>>> grunt> rows = LOAD 'cassandra://foo/bar' USING CassandraStorage();
>>> grunt> STORE rows into 'cassandra://foo/foo' USING CassandraStorage();
>>> 2011-06-21 02:04:53,271 [main] INFO
>>> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
>>> script: UNKNOWN
>>> 2011-06-21 02:04:53,271 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>>> pig.usenewlogicalplan is set to true. New logical plan will be used.
>>> 2011-06-21 02:04:53,324 [main] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics
>>> with processName=JobTracker, sessionId=
>>> 2011-06-21 02:04:53,447 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>>> (Name: rows: Store(cassandra://foo/foo:CassandraStorage) - scope-1
>>> Operator Key: scope-1)
>>> 2011-06-21 02:04:53,458 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
>>> - File concatenation threshold: 100 optimistic? false
>>> 2011-06-21 02:04:53,477 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>> - MR plan size before optimization: 1
>>> 2011-06-21 02:04:53,477 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>>> - MR plan size after optimization: 1
>>> 2011-06-21 02:04:53,480 [main] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>>> Metrics with processName=JobTracker, sessionId= - already initialized
>>> 2011-06-21 02:04:53,494 [main] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>>> Metrics with processName=JobTracker, sessionId= - already initialized
>>> 2011-06-21 02:04:53,494 [main] INFO
>>> org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
>>> added to the job
>>> 2011-06-21 02:04:53,556 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>> - mapred.job.reduce.markreset.buffer.percent is not set, set to
>>> default 0.3
>>> 2011-06-21 02:04:59,700 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>> - Setting up single store job
>>> 2011-06-21 02:04:59,718 [main] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>>> Metrics with processName=JobTracker, sessionId= - already initialized
>>> 2011-06-21 02:04:59,719 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>> - 1 map-reduce job(s) waiting for submission.
>>> 2011-06-21 02:04:59,948 [Thread-5] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>>> Metrics with processName=JobTracker, sessionId= - already initialized
>>> 2011-06-21 02:04:59,960 [Thread-5] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>>> Metrics with processName=JobTracker, sessionId= - already initialized
>>> 2011-06-21 02:04:59,980 [Thread-5] INFO
>>> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
>>> input paths (combined) to process : 1
>>> 2011-06-21 02:05:00,220 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>> - 0% complete
>>> 2011-06-21 02:05:00,322 [Thread-14] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>>> Metrics with processName=JobTracker, sessionId= - already initialized
>>> 2011-06-21 02:05:00,340 [Thread-14] INFO
>>> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
>>> input paths (combined) to process : 1
>>> 2011-06-21 02:05:00,372 [Thread-14] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>>> Metrics with processName=JobTracker, sessionId= - already initialized
>>> 2011-06-21 02:05:00,374 [Thread-14] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>>> Metrics with processName=JobTracker, sessionId= - already initialized
>>> 2011-06-21 02:05:00,378 [Thread-14] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>>> Metrics with processName=JobTracker, sessionId= - already initialized
>>> 2011-06-21 02:05:00,381 [Thread-14] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>>> Metrics with processName=JobTracker, sessionId= - already initialized
>>> 2011-06-21 02:05:00,491 [Thread-14] WARN
>>> org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
>>> java.lang.NoClassDefFoundError: org/apache/cassandra/db/marshal/TypeParser
>>> at org.apache.cassandra.hadoop.pig.CassandraStorage.getDefaultMarshallers(Unknown
>>> Source)
>>> at org.apache.cassandra.hadoop.pig.CassandraStorage.columnToTuple(Unknown
>>> Source)
>>> at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown
>>> Source)
>>> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
>>> at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
>>> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.apache.cassandra.db.marshal.TypeParser
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>>> ... 10 more
>>> 2011-06-21 02:05:00,818 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>> - HadoopJobId: job_local_0001
>>> 2011-06-21 02:05:05,408 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>> - job job_local_0001 has failed! Stop running all dependent jobs
>>> 2011-06-21 02:05:05,411 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>> - 100% complete
>>> 2011-06-21 02:05:05,412 [main] ERROR
>>> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
>>> failed!
>>> 2011-06-21 02:05:05,412 [main] INFO
>>> org.apache.pig.tools.pigstats.PigStats - Detected Local mode. Stats
>>> reported below may be incomplete
>>> 2011-06-21 02:05:05,413 [main] INFO
>>> org.apache.pig.tools.pigstats.PigStats - Script Statistics:
>>>
>>> HadoopVersion PigVersion UserId StartedAt FinishedAt Features
>>> 0.20.2 0.8.1 root 2011-06-21 02:04:53 2011-06-21 02:05:05 UNKNOWN
>>>
>>> Failed!
>>>
>>> Failed Jobs:
>>> JobId Alias Feature Message Outputs
>>> job_local_0001 rows MAP_ONLY Message: Job failed!
>>> cassandra://foo/foo,
>>>
>>> Input(s):
>>> Failed to read data from "cassandra://foo/bar"
>>>
>>> Output(s):
>>> Failed to produce result in "cassandra://foo/foo"
>>>
>>> Job DAG:
>>> job_local_0001
>>>
>>>
>>> 2011-06-21 02:05:05,413 [main] INFO
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>> - Failed!
>>> 2011-06-21 02:05:05,416 [main] INFO
>>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>>> Metrics with processName=JobTracker, sessionId= - already initialized
>>> grunt>
>>>
>>>
>>> Any help or insight is appreciated ....
>>
>>
>
>
>
> --
> Sasha Dolgy
> sasha.dolgy@gmail.com
Re: pig integration & NoClassDefFoundError TypeParser
Posted by Sasha Dolgy <sd...@gmail.com>.
Hi ... I still have the same problem with pig-0.8.0-cdh3u0...
Maybe I'm doing something wrong. Where does
org/apache/cassandra/db/marshal/TypeParser exist, or should exist?
It's not in the $CASSANDRA_HOME/libs or
/usr/local/src/pig-0.8.0-cdh3u0/lib or
/usr/local/src/apache-cassandra-0.8.0-src/build/lib/jars
for jar in `ls *.jar`
do
jar -tf $jar | grep TypeParser
if [ $? -eq 0 ]; then
echo $jar
fi
done
Shows me nothing in all the lib dirs....
On Mon, Jun 20, 2011 at 8:44 PM, Jeremy Hanna
<je...@gmail.com> wrote:
> Try running with cdh3u0 version of pig and see if it has the same problem. They backported the patch (to pig 0.9 which should be out in time for the hadoop summit next week) that adds the updated jackson dependency for avro. The download URL for that is - http://archive.cloudera.com/cdh/3/pig-0.8.0-cdh3u0.tar.gz
>
> Alternatively, I believe today brisk beta 2 will be out which has pig integrated. Not sure if that would work for your current environment though.
>
> See if that works.
> On Jun 20, 2011, at 1:09 PM, Sasha Dolgy wrote:
>
>> Been trying for the past little bit to try and get the PIG integration
>> working with Cassandra 0.8.0
>>
>> 1. Downloaded the src for 0.8.0 and ran ant build
>> 2. went into contrib/pig and ran ant ... gives me:
>> /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/build/cassandra_storage.jar
>> and is copied into the lib/ directory
>> 3. Downloaded pig-0.8.1, modified the ivy/libraries.properties so
>> that it uses Jackson 1.8.2 .. and ran ant. it compiles and gives me
>> two jars: pig-0.8.1-SNAPSHOT-core.jar and pig-0.8.1-SNAPSHOT.jar
>> ----- I did try to run it with Jackson 1.4 as the
>> contrib/pig/README.txt suggested, but that failed... The referenced
>> JIRA ticket (PIG-1863) suggests 1.6.0 (still produces the same
>> results)
>>
>> Environment variables are set:
>> java version "1.6.0_24"
>>
>> PIG_INITIAL_ADDRESS=localhost
>> PIG_HOME=/usr/local/src/pig-0.8.1
>> PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
>> PIG_RPC_PORT=9160
>> CASSANDRA_HOME=/usr/local/src/apache-cassandra-0.8.0-src
>>
>> I then start up cassandra ... no issues. I connect and create a new
>> keyspace called foo with a column family called bar and a CF called
>> foo...Inside the CF bar, I create a few rows, with random columns ....
>> 4 Rows.
>>
>> From contrib/pig I run: bin/pig_cassandra -x local ... immediately
>> get the error:
>>
>> [: 45: /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar: unexpected operator
>>
>> -- this is a reference to this line: if [ ! -e $PIG_JAR ]; then
>>
>> *** Problem here is that $PIG_JAR is a reference to two files ...
>> pig-0.8.1-core.jar & pig.jar ...
>>
>> Changing line 44 to PIG_JAR=$PIG_HOME/pig*core*.jar fixes this ... (or
>> even referencing $PIG_HOME/build/pig*core*.jar or just pig.jar
>>
>> Try again to run: bin/pig_cassandra -x local and everything loads up nicely:
>>
>> 2011-06-21 02:07:23,671 [main] INFO org.apache.pig.Main - Logging
>> error messages to:
>> /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/pig_1308593243668.log
>> 2011-06-21 02:07:23,778 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> Connecting to hadoop file system at: file:///
>> grunt> register /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar; register
>> /usr/local/src/pig-0.8.1/pig.jar; register
>> /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-fixes.jar;
>> register /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-sources-fixes.jar;
>> register /usr/local/src/apache-cassandra-0.8.0-src/lib/libthrift-0.6.jar;
>> grunt>
>> grunt> rows = LOAD 'cassandra://foo/bar' USING CassandraStorage();
>> grunt> STORE rows into 'cassandra://foo/foo' USING CassandraStorage();
>> 2011-06-21 02:04:53,271 [main] INFO
>> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
>> script: UNKNOWN
>> 2011-06-21 02:04:53,271 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> pig.usenewlogicalplan is set to true. New logical plan will be used.
>> 2011-06-21 02:04:53,324 [main] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics
>> with processName=JobTracker, sessionId=
>> 2011-06-21 02:04:53,447 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> (Name: rows: Store(cassandra://foo/foo:CassandraStorage) - scope-1
>> Operator Key: scope-1)
>> 2011-06-21 02:04:53,458 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
>> - File concatenation threshold: 100 optimistic? false
>> 2011-06-21 02:04:53,477 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>> - MR plan size before optimization: 1
>> 2011-06-21 02:04:53,477 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
>> - MR plan size after optimization: 1
>> 2011-06-21 02:04:53,480 [main] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> 2011-06-21 02:04:53,494 [main] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> 2011-06-21 02:04:53,494 [main] INFO
>> org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
>> added to the job
>> 2011-06-21 02:04:53,556 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>> - mapred.job.reduce.markreset.buffer.percent is not set, set to
>> default 0.3
>> 2011-06-21 02:04:59,700 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>> - Setting up single store job
>> 2011-06-21 02:04:59,718 [main] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> 2011-06-21 02:04:59,719 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> - 1 map-reduce job(s) waiting for submission.
>> 2011-06-21 02:04:59,948 [Thread-5] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> 2011-06-21 02:04:59,960 [Thread-5] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> 2011-06-21 02:04:59,980 [Thread-5] INFO
>> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
>> input paths (combined) to process : 1
>> 2011-06-21 02:05:00,220 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> - 0% complete
>> 2011-06-21 02:05:00,322 [Thread-14] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> 2011-06-21 02:05:00,340 [Thread-14] INFO
>> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
>> input paths (combined) to process : 1
>> 2011-06-21 02:05:00,372 [Thread-14] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> 2011-06-21 02:05:00,374 [Thread-14] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> 2011-06-21 02:05:00,378 [Thread-14] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> 2011-06-21 02:05:00,381 [Thread-14] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> 2011-06-21 02:05:00,491 [Thread-14] WARN
>> org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
>> java.lang.NoClassDefFoundError: org/apache/cassandra/db/marshal/TypeParser
>> at org.apache.cassandra.hadoop.pig.CassandraStorage.getDefaultMarshallers(Unknown
>> Source)
>> at org.apache.cassandra.hadoop.pig.CassandraStorage.columnToTuple(Unknown
>> Source)
>> at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown
>> Source)
>> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
>> at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
>> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.cassandra.db.marshal.TypeParser
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>> ... 10 more
>> 2011-06-21 02:05:00,818 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> - HadoopJobId: job_local_0001
>> 2011-06-21 02:05:05,408 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> - job job_local_0001 has failed! Stop running all dependent jobs
>> 2011-06-21 02:05:05,411 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> - 100% complete
>> 2011-06-21 02:05:05,412 [main] ERROR
>> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
>> failed!
>> 2011-06-21 02:05:05,412 [main] INFO
>> org.apache.pig.tools.pigstats.PigStats - Detected Local mode. Stats
>> reported below may be incomplete
>> 2011-06-21 02:05:05,413 [main] INFO
>> org.apache.pig.tools.pigstats.PigStats - Script Statistics:
>>
>> HadoopVersion PigVersion UserId StartedAt FinishedAt Features
>> 0.20.2 0.8.1 root 2011-06-21 02:04:53 2011-06-21 02:05:05 UNKNOWN
>>
>> Failed!
>>
>> Failed Jobs:
>> JobId Alias Feature Message Outputs
>> job_local_0001 rows MAP_ONLY Message: Job failed!
>> cassandra://foo/foo,
>>
>> Input(s):
>> Failed to read data from "cassandra://foo/bar"
>>
>> Output(s):
>> Failed to produce result in "cassandra://foo/foo"
>>
>> Job DAG:
>> job_local_0001
>>
>>
>> 2011-06-21 02:05:05,413 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> - Failed!
>> 2011-06-21 02:05:05,416 [main] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
>> Metrics with processName=JobTracker, sessionId= - already initialized
>> grunt>
>>
>>
>> Any help or insight is appreciated ....
>
>
--
Sasha Dolgy
sasha.dolgy@gmail.com
Re: pig integration & NoClassDefFoundError TypeParser
Posted by Jeremy Hanna <je...@gmail.com>.
Try running with cdh3u0 version of pig and see if it has the same problem. They backported the patch (to pig 0.9 which should be out in time for the hadoop summit next week) that adds the updated jackson dependency for avro. The download URL for that is - http://archive.cloudera.com/cdh/3/pig-0.8.0-cdh3u0.tar.gz
Alternatively, I believe today brisk beta 2 will be out which has pig integrated. Not sure if that would work for your current environment though.
See if that works.
On Jun 20, 2011, at 1:09 PM, Sasha Dolgy wrote:
> Been trying for the past little bit to try and get the PIG integration
> working with Cassandra 0.8.0
>
> 1. Downloaded the src for 0.8.0 and ran ant build
> 2. went into contrib/pig and ran ant ... gives me:
> /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/build/cassandra_storage.jar
> and is copied into the lib/ directory
> 3. Downloaded pig-0.8.1, modified the ivy/libraries.properties so
> that it uses Jackson 1.8.2 .. and ran ant. it compiles and gives me
> two jars: pig-0.8.1-SNAPSHOT-core.jar and pig-0.8.1-SNAPSHOT.jar
> ----- I did try to run it with Jackson 1.4 as the
> contrib/pig/README.txt suggested, but that failed... The referenced
> JIRA ticket (PIG-1863) suggests 1.6.0 (still produces the same
> results)
>
> Environment variables are set:
> java version "1.6.0_24"
>
> PIG_INITIAL_ADDRESS=localhost
> PIG_HOME=/usr/local/src/pig-0.8.1
> PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
> PIG_RPC_PORT=9160
> CASSANDRA_HOME=/usr/local/src/apache-cassandra-0.8.0-src
>
> I then start up cassandra ... no issues. I connect and create a new
> keyspace called foo with a column family called bar and a CF called
> foo...Inside the CF bar, I create a few rows, with random columns ....
> 4 Rows.
>
> From contrib/pig I run: bin/pig_cassandra -x local ... immediately
> get the error:
>
> [: 45: /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar: unexpected operator
>
> -- this is a reference to this line: if [ ! -e $PIG_JAR ]; then
>
> *** Problem here is that $PIG_JAR is a reference to two files ...
> pig-0.8.1-core.jar & pig.jar ...
>
> Changing line 44 to PIG_JAR=$PIG_HOME/pig*core*.jar fixes this ... (or
> even referencing $PIG_HOME/build/pig*core*.jar or just pig.jar
>
> Try again to run: bin/pig_cassandra -x local and everything loads up nicely:
>
> 2011-06-21 02:07:23,671 [main] INFO org.apache.pig.Main - Logging
> error messages to:
> /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/pig_1308593243668.log
> 2011-06-21 02:07:23,778 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> Connecting to hadoop file system at: file:///
> grunt> register /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar; register
> /usr/local/src/pig-0.8.1/pig.jar; register
> /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-fixes.jar;
> register /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-sources-fixes.jar;
> register /usr/local/src/apache-cassandra-0.8.0-src/lib/libthrift-0.6.jar;
> grunt>
> grunt> rows = LOAD 'cassandra://foo/bar' USING CassandraStorage();
> grunt> STORE rows into 'cassandra://foo/foo' USING CassandraStorage();
> 2011-06-21 02:04:53,271 [main] INFO
> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
> script: UNKNOWN
> 2011-06-21 02:04:53,271 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> pig.usenewlogicalplan is set to true. New logical plan will be used.
> 2011-06-21 02:04:53,324 [main] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics
> with processName=JobTracker, sessionId=
> 2011-06-21 02:04:53,447 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> (Name: rows: Store(cassandra://foo/foo:CassandraStorage) - scope-1
> Operator Key: scope-1)
> 2011-06-21 02:04:53,458 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
> - File concatenation threshold: 100 optimistic? false
> 2011-06-21 02:04:53,477 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size before optimization: 1
> 2011-06-21 02:04:53,477 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size after optimization: 1
> 2011-06-21 02:04:53,480 [main] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-06-21 02:04:53,494 [main] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-06-21 02:04:53,494 [main] INFO
> org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
> added to the job
> 2011-06-21 02:04:53,556 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - mapred.job.reduce.markreset.buffer.percent is not set, set to
> default 0.3
> 2011-06-21 02:04:59,700 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - Setting up single store job
> 2011-06-21 02:04:59,718 [main] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-06-21 02:04:59,719 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map-reduce job(s) waiting for submission.
> 2011-06-21 02:04:59,948 [Thread-5] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-06-21 02:04:59,960 [Thread-5] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-06-21 02:04:59,980 [Thread-5] INFO
> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> input paths (combined) to process : 1
> 2011-06-21 02:05:00,220 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete
> 2011-06-21 02:05:00,322 [Thread-14] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-06-21 02:05:00,340 [Thread-14] INFO
> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> input paths (combined) to process : 1
> 2011-06-21 02:05:00,372 [Thread-14] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-06-21 02:05:00,374 [Thread-14] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-06-21 02:05:00,378 [Thread-14] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-06-21 02:05:00,381 [Thread-14] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> 2011-06-21 02:05:00,491 [Thread-14] WARN
> org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
> java.lang.NoClassDefFoundError: org/apache/cassandra/db/marshal/TypeParser
> at org.apache.cassandra.hadoop.pig.CassandraStorage.getDefaultMarshallers(Unknown
> Source)
> at org.apache.cassandra.hadoop.pig.CassandraStorage.columnToTuple(Unknown
> Source)
> at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown
> Source)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
> at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
> at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.cassandra.db.marshal.TypeParser
> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
> ... 10 more
> 2011-06-21 02:05:00,818 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - HadoopJobId: job_local_0001
> 2011-06-21 02:05:05,408 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - job job_local_0001 has failed! Stop running all dependent jobs
> 2011-06-21 02:05:05,411 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
> 2011-06-21 02:05:05,412 [main] ERROR
> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
> failed!
> 2011-06-21 02:05:05,412 [main] INFO
> org.apache.pig.tools.pigstats.PigStats - Detected Local mode. Stats
> reported below may be incomplete
> 2011-06-21 02:05:05,413 [main] INFO
> org.apache.pig.tools.pigstats.PigStats - Script Statistics:
>
> HadoopVersion PigVersion UserId StartedAt FinishedAt Features
> 0.20.2 0.8.1 root 2011-06-21 02:04:53 2011-06-21 02:05:05 UNKNOWN
>
> Failed!
>
> Failed Jobs:
> JobId Alias Feature Message Outputs
> job_local_0001 rows MAP_ONLY Message: Job failed!
> cassandra://foo/foo,
>
> Input(s):
> Failed to read data from "cassandra://foo/bar"
>
> Output(s):
> Failed to produce result in "cassandra://foo/foo"
>
> Job DAG:
> job_local_0001
>
>
> 2011-06-21 02:05:05,413 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed!
> 2011-06-21 02:05:05,416 [main] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> grunt>
>
>
> Any help or insight is appreciated ....