You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Michael Moores <mm...@real.com> on 2010/10/14 02:39:37 UTC

0.7.0-beta2 and Hadoop

What version of hadoop should i be using with cassandra 0.7.0-beta2?
I am using the latest version 21.0.

Just running a modified version of the WordCount example:
https://svn.apache.org/repos/asf/cassandra/trunk/contrib/word_count/src/

I get a linkage error thrown from the getSplits method.

Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
        at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:88)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:401)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:418)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:338)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:960)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:976)

Re: 0.7.0-beta2 and Hadoop

Posted by Michael Moores <mm...@real.com>.

I SOLVED the problem.
It was my misunderstanding of how the cassandra connection is being used for calling getSlices().

On Oct 14, 2010, at 10:06 AM, Michael Moores wrote:

Ok I moved back to hadoop 20.2 and the WordCount example is doing better.
But I am still seeing a problem, that may be due to my lack of experience w/ hadoop.
I am running "hadoop jar..." on my JobTracker/NameNode machine, which is not running Cassandra.
I have DataNode/TaskTracker running on all cassandra nodes, with my ConfigHelper set up to talk to cassandra on localhost.
When I run the job, I see it can't connect:  (I renamed the main class to "ProfileStats")

[hadoop@kv-app01 test]$ hadoop jar hadoop-cassandra-0.0.1-SNAPSHOT.jar com.real.uds.hadoop.ProfileStats xyz -libjars ./cassandra-0.7.0-beta2.jar ./libthrift-r959516.jar
10/10/14 09:57:57 INFO hadoop.ProfileStats: main: adding jars...
10/10/14 09:57:58 INFO hadoop.ProfileStats: output reducer type: filesystem
10/10/14 09:57:58 INFO hadoop.ProfileStats: main: adding jars AGAIN...
Exception in thread "main" java.io.IOException: unable to connect to server
        at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.createConnection(ColumnFamilyInputFormat.java:205)
..
Caused by: java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)

Should I expect my job to be executed on the TaskTracker nodes?



On Oct 13, 2010, at 5:39 PM, Michael Moores wrote:

What version of hadoop should i be using with cassandra 0.7.0-beta2?
I am using the latest version 21.0.

Just running a modified version of the WordCount example:
https://svn.apache.org/repos/asf/cassandra/trunk/contrib/word_count/src/

I get a linkage error thrown from the getSplits method.

Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
        at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:88)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:401)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:418)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:338)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:960)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:976)

Re: 0.7.0-beta2 and Hadoop

Posted by Michael Moores <mm...@real.com>.

Ok I moved back to hadoop 20.2 and the WordCount example is doing better.
But I am still seeing a problem, that may be due to my lack of experience w/ hadoop.
I am running "hadoop jar..." on my JobTracker/NameNode machine, which is not running Cassandra.
I have DataNode/TaskTracker running on all cassandra nodes, with my ConfigHelper set up to talk to cassandra on localhost.
When I run the job, I see it can't connect:  (I renamed the main class to "ProfileStats")

[hadoop@kv-app01 test]$ hadoop jar hadoop-cassandra-0.0.1-SNAPSHOT.jar com.real.uds.hadoop.ProfileStats xyz -libjars ./cassandra-0.7.0-beta2.jar ./libthrift-r959516.jar
10/10/14 09:57:57 INFO hadoop.ProfileStats: main: adding jars...
10/10/14 09:57:58 INFO hadoop.ProfileStats: output reducer type: filesystem
10/10/14 09:57:58 INFO hadoop.ProfileStats: main: adding jars AGAIN...
Exception in thread "main" java.io.IOException: unable to connect to server
        at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.createConnection(ColumnFamilyInputFormat.java:205)
..
Caused by: java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)

Should I expect my job to be executed on the TaskTracker nodes?



On Oct 13, 2010, at 5:39 PM, Michael Moores wrote:

What version of hadoop should i be using with cassandra 0.7.0-beta2?
I am using the latest version 21.0.

Just running a modified version of the WordCount example:
https://svn.apache.org/repos/asf/cassandra/trunk/contrib/word_count/src/

I get a linkage error thrown from the getSplits method.

Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
        at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:88)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:401)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:418)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:338)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:960)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:976)

Re: 0.7.0-beta2 and Hadoop

Posted by Jeremy Hanna <je...@gmail.com>.

I would first see if the unmodified version of the word count example works for you.  Also, I don't believe hadoop version 0.21 is meant for production use - it's more of a "let's get 0.21 release out the door so we can move on" type of release.  I would use either 0.20.2 from the hadoop website, or perhaps the cloudera cdh2 (http://www.cloudera.com/downloads/) or yahoo distribution (http://developer.yahoo.com/hadoop/distribution/).

On Oct 13, 2010, at 7:39 PM, Michael Moores wrote:

> What version of hadoop should i be using with cassandra 0.7.0-beta2?
> I am using the latest version 21.0.
> 
> Just running a modified version of the WordCount example:
> https://svn.apache.org/repos/asf/cassandra/trunk/contrib/word_count/src/
> 
> I get a linkage error thrown from the getSplits method.
> 
> Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
>         at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:88)
>         at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:401)
>         at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:418)
>         at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:338)
>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:960)
>         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:976)