You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Sonny Heer <so...@gmail.com> on 2010/04/07 23:47:40 UTC

Iterate through entire data set

I need a way to process all of my data set.

A way to process every keyspace, CF, row, column, and perform some
operation based on that mapped combination.

The map bucket would collect down to column name.

Is there a map/reduce program which shows how to go about doing this?

Re: Iterate through entire data set

Posted by Benjamin Black <b...@b3k.us>.

You are telling Windows to only listen on localhost, which is the
loopback, which is only accessible on the system itself, not from
external machines.

On Thu, Apr 8, 2010 at 11:56 AM, Sonny Heer <so...@gmail.com> wrote:
> Yeah I agree it's strange, Windows is just my local box, and I'm
> testing before setting up actual boxes :)
>
> the thrift address is 'localhost' on the windows box.
>
> On Thu, Apr 8, 2010 at 11:53 AM, Benjamin Black <b...@b3k.us> wrote:
>> Strange setup, but, ok.  What is your <ThriftAddress> setting on the
>> Windows machine?
>>
>> On Thu, Apr 8, 2010 at 11:44 AM, Sonny Heer <so...@gmail.com> wrote:
>>> I have two boxes.  One is a windows box running Cassandra .6, and the
>>> other is an ubuntu box from which I'm trying to run the word count
>>> program as in the readme.
>>>
>>> The windows box seed is set to 127.0.0.1, and listen address to localhost.
>>>
>>> The ubuntu box seed & listen is point to IP of the windows box.
>>>
>>> from contrib/word_count:
>>>
>>> ~/dev/cassandra-0.6.0-rc1/contrib/word_count$ ant
>>> Unable to locate tools.jar. Expected to find it in
>>> /usr/lib/jvm/java-6-openjdk/lib/tools.jar
>>> Buildfile: build.xml
>>>
>>> init:
>>>
>>> build:
>>>
>>> jar:
>>>      [jar] Building jar:
>>> /home/psheer/dev/cassandra-0.6.0-rc1/contrib/word_count/build/word_count.jar
>>>
>>> BUILD SUCCESSFUL
>>> Total time: 2 seconds
>>>
>>> =====================
>>>
>>> ~/dev/cassandra-0.6.0-rc1/contrib/word_count$ bin/word_count_setup
>>> 10/04/08 10:33:18 INFO config.DatabaseDescriptor: Auto DiskAccessMode
>>> determined to be standard
>>> 10/04/08 10:33:18 WARN config.DatabaseDescriptor: KeysCachedFraction
>>> is deprecated: use KeysCached instead.
>>> 10/04/08 10:33:18 WARN config.DatabaseDescriptor: KeysCachedFraction
>>> is deprecated: use KeysCached instead.
>>> 10/04/08 10:33:18 INFO service.StorageService: Starting up client gossip
>>> Exception in thread "main" java.net.BindException: Cannot assign
>>> requested address
>>>        at sun.nio.ch.Net.bind(Native Method)
>>>        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:137)
>>>        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)
>>>        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:70)
>>>        at org.apache.cassandra.net.MessagingService.listen(MessagingService.java:138)
>>>        at org.apache.cassandra.service.StorageService.initClient(StorageService.java:289)
>>>        at WordCountSetup.main(Unknown Source)
>>>
>>>
>>> Sorry, I'm a bit new to this... help?
>>>
>>>
>>> On Wed, Apr 7, 2010 at 6:07 PM, Stu Hood <st...@rackspace.com> wrote:
>>>> Please read the README in the contrib/word_count directory.
>>>>
>>>> -----Original Message-----
>>>> From: "Sonny Heer" <so...@gmail.com>
>>>> Sent: Wednesday, April 7, 2010 6:33pm
>>>> To: user@cassandra.apache.org
>>>> Subject: Re: Iterate through entire data set
>>>>
>>>> Jon,
>>>> I've got the word_count.jar and a Hadoop cluster.  How do you usually
>>>> run this sample?
>>>>
>>>>
>>>> On Wed, Apr 7, 2010 at 3:04 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>> Yes
>>>>>
>>>>> On Wed, Apr 7, 2010 at 5:01 PM, Sonny Heer <so...@gmail.com> wrote:
>>>>>> These examples work on Cassandra .06 and Hadoop .20.2?
>>>>>>
>>>>>> On Wed, Apr 7, 2010 at 2:49 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>>>> Look at the READMEs for contrib/word_count and contrib/pig.
>>>>>>>
>>>>>>> On Wed, Apr 7, 2010 at 4:47 PM, Sonny Heer <so...@gmail.com> wrote:
>>>>>>>> I need a way to process all of my data set.
>>>>>>>>
>>>>>>>> A way to process every keyspace, CF, row, column, and perform some
>>>>>>>> operation based on that mapped combination.
>>>>>>>>
>>>>>>>> The map bucket would collect down to column name.
>>>>>>>>
>>>>>>>> Is there a map/reduce program which shows how to go about doing this?
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: Iterate through entire data set

Posted by Sonny Heer <so...@gmail.com>.

Yeah I agree it's strange, Windows is just my local box, and I'm
testing before setting up actual boxes :)

the thrift address is 'localhost' on the windows box.

On Thu, Apr 8, 2010 at 11:53 AM, Benjamin Black <b...@b3k.us> wrote:
> Strange setup, but, ok.  What is your <ThriftAddress> setting on the
> Windows machine?
>
> On Thu, Apr 8, 2010 at 11:44 AM, Sonny Heer <so...@gmail.com> wrote:
>> I have two boxes.  One is a windows box running Cassandra .6, and the
>> other is an ubuntu box from which I'm trying to run the word count
>> program as in the readme.
>>
>> The windows box seed is set to 127.0.0.1, and listen address to localhost.
>>
>> The ubuntu box seed & listen is point to IP of the windows box.
>>
>> from contrib/word_count:
>>
>> ~/dev/cassandra-0.6.0-rc1/contrib/word_count$ ant
>> Unable to locate tools.jar. Expected to find it in
>> /usr/lib/jvm/java-6-openjdk/lib/tools.jar
>> Buildfile: build.xml
>>
>> init:
>>
>> build:
>>
>> jar:
>>      [jar] Building jar:
>> /home/psheer/dev/cassandra-0.6.0-rc1/contrib/word_count/build/word_count.jar
>>
>> BUILD SUCCESSFUL
>> Total time: 2 seconds
>>
>> =====================
>>
>> ~/dev/cassandra-0.6.0-rc1/contrib/word_count$ bin/word_count_setup
>> 10/04/08 10:33:18 INFO config.DatabaseDescriptor: Auto DiskAccessMode
>> determined to be standard
>> 10/04/08 10:33:18 WARN config.DatabaseDescriptor: KeysCachedFraction
>> is deprecated: use KeysCached instead.
>> 10/04/08 10:33:18 WARN config.DatabaseDescriptor: KeysCachedFraction
>> is deprecated: use KeysCached instead.
>> 10/04/08 10:33:18 INFO service.StorageService: Starting up client gossip
>> Exception in thread "main" java.net.BindException: Cannot assign
>> requested address
>>        at sun.nio.ch.Net.bind(Native Method)
>>        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:137)
>>        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)
>>        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:70)
>>        at org.apache.cassandra.net.MessagingService.listen(MessagingService.java:138)
>>        at org.apache.cassandra.service.StorageService.initClient(StorageService.java:289)
>>        at WordCountSetup.main(Unknown Source)
>>
>>
>> Sorry, I'm a bit new to this... help?
>>
>>
>> On Wed, Apr 7, 2010 at 6:07 PM, Stu Hood <st...@rackspace.com> wrote:
>>> Please read the README in the contrib/word_count directory.
>>>
>>> -----Original Message-----
>>> From: "Sonny Heer" <so...@gmail.com>
>>> Sent: Wednesday, April 7, 2010 6:33pm
>>> To: user@cassandra.apache.org
>>> Subject: Re: Iterate through entire data set
>>>
>>> Jon,
>>> I've got the word_count.jar and a Hadoop cluster.  How do you usually
>>> run this sample?
>>>
>>>
>>> On Wed, Apr 7, 2010 at 3:04 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>> Yes
>>>>
>>>> On Wed, Apr 7, 2010 at 5:01 PM, Sonny Heer <so...@gmail.com> wrote:
>>>>> These examples work on Cassandra .06 and Hadoop .20.2?
>>>>>
>>>>> On Wed, Apr 7, 2010 at 2:49 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>>> Look at the READMEs for contrib/word_count and contrib/pig.
>>>>>>
>>>>>> On Wed, Apr 7, 2010 at 4:47 PM, Sonny Heer <so...@gmail.com> wrote:
>>>>>>> I need a way to process all of my data set.
>>>>>>>
>>>>>>> A way to process every keyspace, CF, row, column, and perform some
>>>>>>> operation based on that mapped combination.
>>>>>>>
>>>>>>> The map bucket would collect down to column name.
>>>>>>>
>>>>>>> Is there a map/reduce program which shows how to go about doing this?
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>

Re: Iterate through entire data set

Posted by Benjamin Black <b...@b3k.us>.

Strange setup, but, ok.  What is your <ThriftAddress> setting on the
Windows machine?

On Thu, Apr 8, 2010 at 11:44 AM, Sonny Heer <so...@gmail.com> wrote:
> I have two boxes.  One is a windows box running Cassandra .6, and the
> other is an ubuntu box from which I'm trying to run the word count
> program as in the readme.
>
> The windows box seed is set to 127.0.0.1, and listen address to localhost.
>
> The ubuntu box seed & listen is point to IP of the windows box.
>
> from contrib/word_count:
>
> ~/dev/cassandra-0.6.0-rc1/contrib/word_count$ ant
> Unable to locate tools.jar. Expected to find it in
> /usr/lib/jvm/java-6-openjdk/lib/tools.jar
> Buildfile: build.xml
>
> init:
>
> build:
>
> jar:
>      [jar] Building jar:
> /home/psheer/dev/cassandra-0.6.0-rc1/contrib/word_count/build/word_count.jar
>
> BUILD SUCCESSFUL
> Total time: 2 seconds
>
> =====================
>
> ~/dev/cassandra-0.6.0-rc1/contrib/word_count$ bin/word_count_setup
> 10/04/08 10:33:18 INFO config.DatabaseDescriptor: Auto DiskAccessMode
> determined to be standard
> 10/04/08 10:33:18 WARN config.DatabaseDescriptor: KeysCachedFraction
> is deprecated: use KeysCached instead.
> 10/04/08 10:33:18 WARN config.DatabaseDescriptor: KeysCachedFraction
> is deprecated: use KeysCached instead.
> 10/04/08 10:33:18 INFO service.StorageService: Starting up client gossip
> Exception in thread "main" java.net.BindException: Cannot assign
> requested address
>        at sun.nio.ch.Net.bind(Native Method)
>        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:137)
>        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)
>        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:70)
>        at org.apache.cassandra.net.MessagingService.listen(MessagingService.java:138)
>        at org.apache.cassandra.service.StorageService.initClient(StorageService.java:289)
>        at WordCountSetup.main(Unknown Source)
>
>
> Sorry, I'm a bit new to this... help?
>
>
> On Wed, Apr 7, 2010 at 6:07 PM, Stu Hood <st...@rackspace.com> wrote:
>> Please read the README in the contrib/word_count directory.
>>
>> -----Original Message-----
>> From: "Sonny Heer" <so...@gmail.com>
>> Sent: Wednesday, April 7, 2010 6:33pm
>> To: user@cassandra.apache.org
>> Subject: Re: Iterate through entire data set
>>
>> Jon,
>> I've got the word_count.jar and a Hadoop cluster.  How do you usually
>> run this sample?
>>
>>
>> On Wed, Apr 7, 2010 at 3:04 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>> Yes
>>>
>>> On Wed, Apr 7, 2010 at 5:01 PM, Sonny Heer <so...@gmail.com> wrote:
>>>> These examples work on Cassandra .06 and Hadoop .20.2?
>>>>
>>>> On Wed, Apr 7, 2010 at 2:49 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>> Look at the READMEs for contrib/word_count and contrib/pig.
>>>>>
>>>>> On Wed, Apr 7, 2010 at 4:47 PM, Sonny Heer <so...@gmail.com> wrote:
>>>>>> I need a way to process all of my data set.
>>>>>>
>>>>>> A way to process every keyspace, CF, row, column, and perform some
>>>>>> operation based on that mapped combination.
>>>>>>
>>>>>> The map bucket would collect down to column name.
>>>>>>
>>>>>> Is there a map/reduce program which shows how to go about doing this?
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>

Re: Iterate through entire data set

Posted by Sonny Heer <so...@gmail.com>.

Right.  The word_count program has a storage-conf.xml file which I'm
assuming it reads in order to discover the cluster.

I've changed the thrift listen address on the windows box to be the IP
instead.  but still the same result.  what is proper setup for easily
testing this?

On Thu, Apr 8, 2010 at 12:00 PM, Benjamin Black <b...@b3k.us> wrote:
> "The ubuntu box seed & listen is point to IP of the windows box."
>
> If you are setting storage-conf.xml parameters, you are trying to run
> Cassandra on the Ubuntu system.  Regardless, your earlier mail saying
> you are setting ThriftAddress to localhost on the Windows machine
> precludes anything connecting.
>
> On Thu, Apr 8, 2010 at 11:58 AM, Sonny Heer <so...@gmail.com> wrote:
>> Single node cluster (the windows box).  the Ubuntu box is only used to
>> run the word count
>>
>> On Thu, Apr 8, 2010 at 11:54 AM, Benjamin Black <b...@b3k.us> wrote:
>>> Are you actually trying to make the Ubuntu system another node in the
>>> ring?  While the first node is only listening on localhost?  There's
>>> your problem.
>>>
>>> On Thu, Apr 8, 2010 at 11:44 AM, Sonny Heer <so...@gmail.com> wrote:
>>>> I have two boxes.  One is a windows box running Cassandra .6, and the
>>>> other is an ubuntu box from which I'm trying to run the word count
>>>> program as in the readme.
>>>>
>>>> The windows box seed is set to 127.0.0.1, and listen address to localhost.
>>>>
>>>> The ubuntu box seed & listen is point to IP of the windows box.
>>>>
>>>> from contrib/word_count:
>>>>
>>>> ~/dev/cassandra-0.6.0-rc1/contrib/word_count$ ant
>>>> Unable to locate tools.jar. Expected to find it in
>>>> /usr/lib/jvm/java-6-openjdk/lib/tools.jar
>>>> Buildfile: build.xml
>>>>
>>>> init:
>>>>
>>>> build:
>>>>
>>>> jar:
>>>>      [jar] Building jar:
>>>> /home/psheer/dev/cassandra-0.6.0-rc1/contrib/word_count/build/word_count.jar
>>>>
>>>> BUILD SUCCESSFUL
>>>> Total time: 2 seconds
>>>>
>>>> =====================
>>>>
>>>> ~/dev/cassandra-0.6.0-rc1/contrib/word_count$ bin/word_count_setup
>>>> 10/04/08 10:33:18 INFO config.DatabaseDescriptor: Auto DiskAccessMode
>>>> determined to be standard
>>>> 10/04/08 10:33:18 WARN config.DatabaseDescriptor: KeysCachedFraction
>>>> is deprecated: use KeysCached instead.
>>>> 10/04/08 10:33:18 WARN config.DatabaseDescriptor: KeysCachedFraction
>>>> is deprecated: use KeysCached instead.
>>>> 10/04/08 10:33:18 INFO service.StorageService: Starting up client gossip
>>>> Exception in thread "main" java.net.BindException: Cannot assign
>>>> requested address
>>>>        at sun.nio.ch.Net.bind(Native Method)
>>>>        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:137)
>>>>        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)
>>>>        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:70)
>>>>        at org.apache.cassandra.net.MessagingService.listen(MessagingService.java:138)
>>>>        at org.apache.cassandra.service.StorageService.initClient(StorageService.java:289)
>>>>        at WordCountSetup.main(Unknown Source)
>>>>
>>>>
>>>> Sorry, I'm a bit new to this... help?
>>>>
>>>>
>>>> On Wed, Apr 7, 2010 at 6:07 PM, Stu Hood <st...@rackspace.com> wrote:
>>>>> Please read the README in the contrib/word_count directory.
>>>>>
>>>>> -----Original Message-----
>>>>> From: "Sonny Heer" <so...@gmail.com>
>>>>> Sent: Wednesday, April 7, 2010 6:33pm
>>>>> To: user@cassandra.apache.org
>>>>> Subject: Re: Iterate through entire data set
>>>>>
>>>>> Jon,
>>>>> I've got the word_count.jar and a Hadoop cluster.  How do you usually
>>>>> run this sample?
>>>>>
>>>>>
>>>>> On Wed, Apr 7, 2010 at 3:04 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>>> Yes
>>>>>>
>>>>>> On Wed, Apr 7, 2010 at 5:01 PM, Sonny Heer <so...@gmail.com> wrote:
>>>>>>> These examples work on Cassandra .06 and Hadoop .20.2?
>>>>>>>
>>>>>>> On Wed, Apr 7, 2010 at 2:49 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>>>>> Look at the READMEs for contrib/word_count and contrib/pig.
>>>>>>>>
>>>>>>>> On Wed, Apr 7, 2010 at 4:47 PM, Sonny Heer <so...@gmail.com> wrote:
>>>>>>>>> I need a way to process all of my data set.
>>>>>>>>>
>>>>>>>>> A way to process every keyspace, CF, row, column, and perform some
>>>>>>>>> operation based on that mapped combination.
>>>>>>>>>
>>>>>>>>> The map bucket would collect down to column name.
>>>>>>>>>
>>>>>>>>> Is there a map/reduce program which shows how to go about doing this?
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Iterate through entire data set

Posted by Sonny Heer <so...@gmail.com>.

Yeah I realized that shortly... :)

I'm still not able to point the word_count to a live cluster.  If i
have a single node cluster and the thrift address is the IP of that
box, and it has seed value as the IP of itself as well.

How do i run the word_count remotely then?  Sorry I must be missing
something obvious here...

On Thu, Apr 8, 2010 at 1:07 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> those aren't shipped with Cassandra.
>
> On Thu, Apr 8, 2010 at 3:00 PM, Sonny Heer <so...@gmail.com> wrote:
>> Missing the commons logging and commons httpclient jars.  Must be
>> using the the wrong jdk?
>>
>> On Thu, Apr 8, 2010 at 12:38 PM, Sonny Heer <so...@gmail.com> wrote:
>>> Is there other documentation on how to setup all the pieces?
>>>
>>> Currently I'm simply trying to test the example word_count, but will
>>> likely need to write other map/reduce programs over the cassandra data
>>> set.
>>>
>>> For this test I have one box (ubuntu)  where i have moved cass .6 rc1
>>> binary , and started the default configuration.
>>>
>>> I checked out the .6rc1 project code from svn.  'cd' to
>>> contrib/word_count and trying to run the sample.  Is this the correct
>>> way to run contrib stuff?  where does the hadoop cluster come in?
>>>
>>> On Thu, Apr 8, 2010 at 12:18 PM, Sonny Heer <so...@gmail.com> wrote:
>>>> Okay I moved everything to the ubuntu box:
>>>>
>>>> ~/dev/cassandra-0.6.0-rc1/contrib/word_count$ bin/word_count_setup
>>>> 10/04/08 11:15:10 INFO config.DatabaseDescriptor: Auto DiskAccessMode
>>>> determined to be standard
>>>> 10/04/08 11:15:10 WARN config.DatabaseDescriptor: KeysCachedFraction
>>>> is deprecated: use KeysCached instead.
>>>> 10/04/08 11:15:10 WARN config.DatabaseDescriptor: KeysCachedFraction
>>>> is deprecated: use KeysCached instead.
>>>> 10/04/08 11:15:10 INFO service.StorageService: Starting up client gossip
>>>> 10/04/08 11:15:10 INFO WordCountSetup: Sleeping 3000
>>>> 10/04/08 11:15:11 INFO gms.Gossiper: Node /127.0.0.1 is now part of the cluster
>>>> 10/04/08 11:15:12 INFO gms.Gossiper: InetAddress /127.0.0.1 is now UP
>>>> 10/04/08 11:15:13 INFO WordCountSetup: added text1
>>>> 10/04/08 11:15:13 INFO WordCountSetup: added text2
>>>> 10/04/08 11:15:14 INFO WordCountSetup: added text3
>>>>
>>>>
>>>> =====================
>>>>
>>>> ~/dev/cassandra-0.6.0-rc1/contrib/word_count$ bin/word_count
>>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>>> org/apache/commons/logging/LogFactory
>>>>        at org.apache.hadoop.conf.Configuration.<clinit>(Configuration.java:138)
>>>>        at WordCount.main(Unknown Source)
>>>> Caused by: java.lang.ClassNotFoundException:
>>>> org.apache.commons.logging.LogFactory
>>>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
>>>>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:264)
>>>>        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:332)
>>>>        ... 2 more
>>>>
>>>
>>
>

Re: Iterate through entire data set

Posted by Jonathan Ellis <jb...@gmail.com>.

those aren't shipped with Cassandra.

On Thu, Apr 8, 2010 at 3:00 PM, Sonny Heer <so...@gmail.com> wrote:
> Missing the commons logging and commons httpclient jars.  Must be
> using the the wrong jdk?
>
> On Thu, Apr 8, 2010 at 12:38 PM, Sonny Heer <so...@gmail.com> wrote:
>> Is there other documentation on how to setup all the pieces?
>>
>> Currently I'm simply trying to test the example word_count, but will
>> likely need to write other map/reduce programs over the cassandra data
>> set.
>>
>> For this test I have one box (ubuntu)  where i have moved cass .6 rc1
>> binary , and started the default configuration.
>>
>> I checked out the .6rc1 project code from svn.  'cd' to
>> contrib/word_count and trying to run the sample.  Is this the correct
>> way to run contrib stuff?  where does the hadoop cluster come in?
>>
>> On Thu, Apr 8, 2010 at 12:18 PM, Sonny Heer <so...@gmail.com> wrote:
>>> Okay I moved everything to the ubuntu box:
>>>
>>> ~/dev/cassandra-0.6.0-rc1/contrib/word_count$ bin/word_count_setup
>>> 10/04/08 11:15:10 INFO config.DatabaseDescriptor: Auto DiskAccessMode
>>> determined to be standard
>>> 10/04/08 11:15:10 WARN config.DatabaseDescriptor: KeysCachedFraction
>>> is deprecated: use KeysCached instead.
>>> 10/04/08 11:15:10 WARN config.DatabaseDescriptor: KeysCachedFraction
>>> is deprecated: use KeysCached instead.
>>> 10/04/08 11:15:10 INFO service.StorageService: Starting up client gossip
>>> 10/04/08 11:15:10 INFO WordCountSetup: Sleeping 3000
>>> 10/04/08 11:15:11 INFO gms.Gossiper: Node /127.0.0.1 is now part of the cluster
>>> 10/04/08 11:15:12 INFO gms.Gossiper: InetAddress /127.0.0.1 is now UP
>>> 10/04/08 11:15:13 INFO WordCountSetup: added text1
>>> 10/04/08 11:15:13 INFO WordCountSetup: added text2
>>> 10/04/08 11:15:14 INFO WordCountSetup: added text3
>>>
>>>
>>> =====================
>>>
>>> ~/dev/cassandra-0.6.0-rc1/contrib/word_count$ bin/word_count
>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> org/apache/commons/logging/LogFactory
>>>        at org.apache.hadoop.conf.Configuration.<clinit>(Configuration.java:138)
>>>        at WordCount.main(Unknown Source)
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.apache.commons.logging.LogFactory
>>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
>>>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:264)
>>>        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:332)
>>>        ... 2 more
>>>
>>
>

Re: Iterate through entire data set

Posted by Sonny Heer <so...@gmail.com>.

Missing the commons logging and commons httpclient jars.  Must be
using the the wrong jdk?

On Thu, Apr 8, 2010 at 12:38 PM, Sonny Heer <so...@gmail.com> wrote:
> Is there other documentation on how to setup all the pieces?
>
> Currently I'm simply trying to test the example word_count, but will
> likely need to write other map/reduce programs over the cassandra data
> set.
>
> For this test I have one box (ubuntu)  where i have moved cass .6 rc1
> binary , and started the default configuration.
>
> I checked out the .6rc1 project code from svn.  'cd' to
> contrib/word_count and trying to run the sample.  Is this the correct
> way to run contrib stuff?  where does the hadoop cluster come in?
>
> On Thu, Apr 8, 2010 at 12:18 PM, Sonny Heer <so...@gmail.com> wrote:
>> Okay I moved everything to the ubuntu box:
>>
>> ~/dev/cassandra-0.6.0-rc1/contrib/word_count$ bin/word_count_setup
>> 10/04/08 11:15:10 INFO config.DatabaseDescriptor: Auto DiskAccessMode
>> determined to be standard
>> 10/04/08 11:15:10 WARN config.DatabaseDescriptor: KeysCachedFraction
>> is deprecated: use KeysCached instead.
>> 10/04/08 11:15:10 WARN config.DatabaseDescriptor: KeysCachedFraction
>> is deprecated: use KeysCached instead.
>> 10/04/08 11:15:10 INFO service.StorageService: Starting up client gossip
>> 10/04/08 11:15:10 INFO WordCountSetup: Sleeping 3000
>> 10/04/08 11:15:11 INFO gms.Gossiper: Node /127.0.0.1 is now part of the cluster
>> 10/04/08 11:15:12 INFO gms.Gossiper: InetAddress /127.0.0.1 is now UP
>> 10/04/08 11:15:13 INFO WordCountSetup: added text1
>> 10/04/08 11:15:13 INFO WordCountSetup: added text2
>> 10/04/08 11:15:14 INFO WordCountSetup: added text3
>>
>>
>> =====================
>>
>> ~/dev/cassandra-0.6.0-rc1/contrib/word_count$ bin/word_count
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/commons/logging/LogFactory
>>        at org.apache.hadoop.conf.Configuration.<clinit>(Configuration.java:138)
>>        at WordCount.main(Unknown Source)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.commons.logging.LogFactory
>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
>>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:264)
>>        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:332)
>>        ... 2 more
>>
>

Re: Iterate through entire data set

Posted by Sonny Heer <so...@gmail.com>.

Is there other documentation on how to setup all the pieces?

Currently I'm simply trying to test the example word_count, but will
likely need to write other map/reduce programs over the cassandra data
set.

For this test I have one box (ubuntu)  where i have moved cass .6 rc1
binary , and started the default configuration.

I checked out the .6rc1 project code from svn.  'cd' to
contrib/word_count and trying to run the sample.  Is this the correct
way to run contrib stuff?  where does the hadoop cluster come in?

On Thu, Apr 8, 2010 at 12:18 PM, Sonny Heer <so...@gmail.com> wrote:
> Okay I moved everything to the ubuntu box:
>
> ~/dev/cassandra-0.6.0-rc1/contrib/word_count$ bin/word_count_setup
> 10/04/08 11:15:10 INFO config.DatabaseDescriptor: Auto DiskAccessMode
> determined to be standard
> 10/04/08 11:15:10 WARN config.DatabaseDescriptor: KeysCachedFraction
> is deprecated: use KeysCached instead.
> 10/04/08 11:15:10 WARN config.DatabaseDescriptor: KeysCachedFraction
> is deprecated: use KeysCached instead.
> 10/04/08 11:15:10 INFO service.StorageService: Starting up client gossip
> 10/04/08 11:15:10 INFO WordCountSetup: Sleeping 3000
> 10/04/08 11:15:11 INFO gms.Gossiper: Node /127.0.0.1 is now part of the cluster
> 10/04/08 11:15:12 INFO gms.Gossiper: InetAddress /127.0.0.1 is now UP
> 10/04/08 11:15:13 INFO WordCountSetup: added text1
> 10/04/08 11:15:13 INFO WordCountSetup: added text2
> 10/04/08 11:15:14 INFO WordCountSetup: added text3
>
>
> =====================
>
> ~/dev/cassandra-0.6.0-rc1/contrib/word_count$ bin/word_count
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/commons/logging/LogFactory
>        at org.apache.hadoop.conf.Configuration.<clinit>(Configuration.java:138)
>        at WordCount.main(Unknown Source)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.commons.logging.LogFactory
>        at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:264)
>        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:332)
>        ... 2 more
>

Re: Iterate through entire data set

Posted by Sonny Heer <so...@gmail.com>.

Okay I moved everything to the ubuntu box:

~/dev/cassandra-0.6.0-rc1/contrib/word_count$ bin/word_count_setup
10/04/08 11:15:10 INFO config.DatabaseDescriptor: Auto DiskAccessMode
determined to be standard
10/04/08 11:15:10 WARN config.DatabaseDescriptor: KeysCachedFraction
is deprecated: use KeysCached instead.
10/04/08 11:15:10 WARN config.DatabaseDescriptor: KeysCachedFraction
is deprecated: use KeysCached instead.
10/04/08 11:15:10 INFO service.StorageService: Starting up client gossip
10/04/08 11:15:10 INFO WordCountSetup: Sleeping 3000
10/04/08 11:15:11 INFO gms.Gossiper: Node /127.0.0.1 is now part of the cluster
10/04/08 11:15:12 INFO gms.Gossiper: InetAddress /127.0.0.1 is now UP
10/04/08 11:15:13 INFO WordCountSetup: added text1
10/04/08 11:15:13 INFO WordCountSetup: added text2
10/04/08 11:15:14 INFO WordCountSetup: added text3


=====================

~/dev/cassandra-0.6.0-rc1/contrib/word_count$ bin/word_count
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/commons/logging/LogFactory
	at org.apache.hadoop.conf.Configuration.<clinit>(Configuration.java:138)
	at WordCount.main(Unknown Source)
Caused by: java.lang.ClassNotFoundException:
org.apache.commons.logging.LogFactory
	at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:264)
	at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:332)
	... 2 more

Re: Iterate through entire data set

Posted by Benjamin Black <b...@b3k.us>.

"The ubuntu box seed & listen is point to IP of the windows box."

If you are setting storage-conf.xml parameters, you are trying to run
Cassandra on the Ubuntu system.  Regardless, your earlier mail saying
you are setting ThriftAddress to localhost on the Windows machine
precludes anything connecting.

On Thu, Apr 8, 2010 at 11:58 AM, Sonny Heer <so...@gmail.com> wrote:
> Single node cluster (the windows box).  the Ubuntu box is only used to
> run the word count
>
> On Thu, Apr 8, 2010 at 11:54 AM, Benjamin Black <b...@b3k.us> wrote:
>> Are you actually trying to make the Ubuntu system another node in the
>> ring?  While the first node is only listening on localhost?  There's
>> your problem.
>>
>> On Thu, Apr 8, 2010 at 11:44 AM, Sonny Heer <so...@gmail.com> wrote:
>>> I have two boxes.  One is a windows box running Cassandra .6, and the
>>> other is an ubuntu box from which I'm trying to run the word count
>>> program as in the readme.
>>>
>>> The windows box seed is set to 127.0.0.1, and listen address to localhost.
>>>
>>> The ubuntu box seed & listen is point to IP of the windows box.
>>>
>>> from contrib/word_count:
>>>
>>> ~/dev/cassandra-0.6.0-rc1/contrib/word_count$ ant
>>> Unable to locate tools.jar. Expected to find it in
>>> /usr/lib/jvm/java-6-openjdk/lib/tools.jar
>>> Buildfile: build.xml
>>>
>>> init:
>>>
>>> build:
>>>
>>> jar:
>>>      [jar] Building jar:
>>> /home/psheer/dev/cassandra-0.6.0-rc1/contrib/word_count/build/word_count.jar
>>>
>>> BUILD SUCCESSFUL
>>> Total time: 2 seconds
>>>
>>> =====================
>>>
>>> ~/dev/cassandra-0.6.0-rc1/contrib/word_count$ bin/word_count_setup
>>> 10/04/08 10:33:18 INFO config.DatabaseDescriptor: Auto DiskAccessMode
>>> determined to be standard
>>> 10/04/08 10:33:18 WARN config.DatabaseDescriptor: KeysCachedFraction
>>> is deprecated: use KeysCached instead.
>>> 10/04/08 10:33:18 WARN config.DatabaseDescriptor: KeysCachedFraction
>>> is deprecated: use KeysCached instead.
>>> 10/04/08 10:33:18 INFO service.StorageService: Starting up client gossip
>>> Exception in thread "main" java.net.BindException: Cannot assign
>>> requested address
>>>        at sun.nio.ch.Net.bind(Native Method)
>>>        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:137)
>>>        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)
>>>        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:70)
>>>        at org.apache.cassandra.net.MessagingService.listen(MessagingService.java:138)
>>>        at org.apache.cassandra.service.StorageService.initClient(StorageService.java:289)
>>>        at WordCountSetup.main(Unknown Source)
>>>
>>>
>>> Sorry, I'm a bit new to this... help?
>>>
>>>
>>> On Wed, Apr 7, 2010 at 6:07 PM, Stu Hood <st...@rackspace.com> wrote:
>>>> Please read the README in the contrib/word_count directory.
>>>>
>>>> -----Original Message-----
>>>> From: "Sonny Heer" <so...@gmail.com>
>>>> Sent: Wednesday, April 7, 2010 6:33pm
>>>> To: user@cassandra.apache.org
>>>> Subject: Re: Iterate through entire data set
>>>>
>>>> Jon,
>>>> I've got the word_count.jar and a Hadoop cluster.  How do you usually
>>>> run this sample?
>>>>
>>>>
>>>> On Wed, Apr 7, 2010 at 3:04 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>> Yes
>>>>>
>>>>> On Wed, Apr 7, 2010 at 5:01 PM, Sonny Heer <so...@gmail.com> wrote:
>>>>>> These examples work on Cassandra .06 and Hadoop .20.2?
>>>>>>
>>>>>> On Wed, Apr 7, 2010 at 2:49 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>>>> Look at the READMEs for contrib/word_count and contrib/pig.
>>>>>>>
>>>>>>> On Wed, Apr 7, 2010 at 4:47 PM, Sonny Heer <so...@gmail.com> wrote:
>>>>>>>> I need a way to process all of my data set.
>>>>>>>>
>>>>>>>> A way to process every keyspace, CF, row, column, and perform some
>>>>>>>> operation based on that mapped combination.
>>>>>>>>
>>>>>>>> The map bucket would collect down to column name.
>>>>>>>>
>>>>>>>> Is there a map/reduce program which shows how to go about doing this?
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: Iterate through entire data set

Posted by Sonny Heer <so...@gmail.com>.

Single node cluster (the windows box).  the Ubuntu box is only used to
run the word count

On Thu, Apr 8, 2010 at 11:54 AM, Benjamin Black <b...@b3k.us> wrote:
> Are you actually trying to make the Ubuntu system another node in the
> ring?  While the first node is only listening on localhost?  There's
> your problem.
>
> On Thu, Apr 8, 2010 at 11:44 AM, Sonny Heer <so...@gmail.com> wrote:
>> I have two boxes.  One is a windows box running Cassandra .6, and the
>> other is an ubuntu box from which I'm trying to run the word count
>> program as in the readme.
>>
>> The windows box seed is set to 127.0.0.1, and listen address to localhost.
>>
>> The ubuntu box seed & listen is point to IP of the windows box.
>>
>> from contrib/word_count:
>>
>> ~/dev/cassandra-0.6.0-rc1/contrib/word_count$ ant
>> Unable to locate tools.jar. Expected to find it in
>> /usr/lib/jvm/java-6-openjdk/lib/tools.jar
>> Buildfile: build.xml
>>
>> init:
>>
>> build:
>>
>> jar:
>>      [jar] Building jar:
>> /home/psheer/dev/cassandra-0.6.0-rc1/contrib/word_count/build/word_count.jar
>>
>> BUILD SUCCESSFUL
>> Total time: 2 seconds
>>
>> =====================
>>
>> ~/dev/cassandra-0.6.0-rc1/contrib/word_count$ bin/word_count_setup
>> 10/04/08 10:33:18 INFO config.DatabaseDescriptor: Auto DiskAccessMode
>> determined to be standard
>> 10/04/08 10:33:18 WARN config.DatabaseDescriptor: KeysCachedFraction
>> is deprecated: use KeysCached instead.
>> 10/04/08 10:33:18 WARN config.DatabaseDescriptor: KeysCachedFraction
>> is deprecated: use KeysCached instead.
>> 10/04/08 10:33:18 INFO service.StorageService: Starting up client gossip
>> Exception in thread "main" java.net.BindException: Cannot assign
>> requested address
>>        at sun.nio.ch.Net.bind(Native Method)
>>        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:137)
>>        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)
>>        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:70)
>>        at org.apache.cassandra.net.MessagingService.listen(MessagingService.java:138)
>>        at org.apache.cassandra.service.StorageService.initClient(StorageService.java:289)
>>        at WordCountSetup.main(Unknown Source)
>>
>>
>> Sorry, I'm a bit new to this... help?
>>
>>
>> On Wed, Apr 7, 2010 at 6:07 PM, Stu Hood <st...@rackspace.com> wrote:
>>> Please read the README in the contrib/word_count directory.
>>>
>>> -----Original Message-----
>>> From: "Sonny Heer" <so...@gmail.com>
>>> Sent: Wednesday, April 7, 2010 6:33pm
>>> To: user@cassandra.apache.org
>>> Subject: Re: Iterate through entire data set
>>>
>>> Jon,
>>> I've got the word_count.jar and a Hadoop cluster.  How do you usually
>>> run this sample?
>>>
>>>
>>> On Wed, Apr 7, 2010 at 3:04 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>> Yes
>>>>
>>>> On Wed, Apr 7, 2010 at 5:01 PM, Sonny Heer <so...@gmail.com> wrote:
>>>>> These examples work on Cassandra .06 and Hadoop .20.2?
>>>>>
>>>>> On Wed, Apr 7, 2010 at 2:49 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>>> Look at the READMEs for contrib/word_count and contrib/pig.
>>>>>>
>>>>>> On Wed, Apr 7, 2010 at 4:47 PM, Sonny Heer <so...@gmail.com> wrote:
>>>>>>> I need a way to process all of my data set.
>>>>>>>
>>>>>>> A way to process every keyspace, CF, row, column, and perform some
>>>>>>> operation based on that mapped combination.
>>>>>>>
>>>>>>> The map bucket would collect down to column name.
>>>>>>>
>>>>>>> Is there a map/reduce program which shows how to go about doing this?
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>

Re: Iterate through entire data set

Posted by Benjamin Black <b...@b3k.us>.

Are you actually trying to make the Ubuntu system another node in the
ring?  While the first node is only listening on localhost?  There's
your problem.

On Thu, Apr 8, 2010 at 11:44 AM, Sonny Heer <so...@gmail.com> wrote:
> I have two boxes.  One is a windows box running Cassandra .6, and the
> other is an ubuntu box from which I'm trying to run the word count
> program as in the readme.
>
> The windows box seed is set to 127.0.0.1, and listen address to localhost.
>
> The ubuntu box seed & listen is point to IP of the windows box.
>
> from contrib/word_count:
>
> ~/dev/cassandra-0.6.0-rc1/contrib/word_count$ ant
> Unable to locate tools.jar. Expected to find it in
> /usr/lib/jvm/java-6-openjdk/lib/tools.jar
> Buildfile: build.xml
>
> init:
>
> build:
>
> jar:
>      [jar] Building jar:
> /home/psheer/dev/cassandra-0.6.0-rc1/contrib/word_count/build/word_count.jar
>
> BUILD SUCCESSFUL
> Total time: 2 seconds
>
> =====================
>
> ~/dev/cassandra-0.6.0-rc1/contrib/word_count$ bin/word_count_setup
> 10/04/08 10:33:18 INFO config.DatabaseDescriptor: Auto DiskAccessMode
> determined to be standard
> 10/04/08 10:33:18 WARN config.DatabaseDescriptor: KeysCachedFraction
> is deprecated: use KeysCached instead.
> 10/04/08 10:33:18 WARN config.DatabaseDescriptor: KeysCachedFraction
> is deprecated: use KeysCached instead.
> 10/04/08 10:33:18 INFO service.StorageService: Starting up client gossip
> Exception in thread "main" java.net.BindException: Cannot assign
> requested address
>        at sun.nio.ch.Net.bind(Native Method)
>        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:137)
>        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)
>        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:70)
>        at org.apache.cassandra.net.MessagingService.listen(MessagingService.java:138)
>        at org.apache.cassandra.service.StorageService.initClient(StorageService.java:289)
>        at WordCountSetup.main(Unknown Source)
>
>
> Sorry, I'm a bit new to this... help?
>
>
> On Wed, Apr 7, 2010 at 6:07 PM, Stu Hood <st...@rackspace.com> wrote:
>> Please read the README in the contrib/word_count directory.
>>
>> -----Original Message-----
>> From: "Sonny Heer" <so...@gmail.com>
>> Sent: Wednesday, April 7, 2010 6:33pm
>> To: user@cassandra.apache.org
>> Subject: Re: Iterate through entire data set
>>
>> Jon,
>> I've got the word_count.jar and a Hadoop cluster.  How do you usually
>> run this sample?
>>
>>
>> On Wed, Apr 7, 2010 at 3:04 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>> Yes
>>>
>>> On Wed, Apr 7, 2010 at 5:01 PM, Sonny Heer <so...@gmail.com> wrote:
>>>> These examples work on Cassandra .06 and Hadoop .20.2?
>>>>
>>>> On Wed, Apr 7, 2010 at 2:49 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>>> Look at the READMEs for contrib/word_count and contrib/pig.
>>>>>
>>>>> On Wed, Apr 7, 2010 at 4:47 PM, Sonny Heer <so...@gmail.com> wrote:
>>>>>> I need a way to process all of my data set.
>>>>>>
>>>>>> A way to process every keyspace, CF, row, column, and perform some
>>>>>> operation based on that mapped combination.
>>>>>>
>>>>>> The map bucket would collect down to column name.
>>>>>>
>>>>>> Is there a map/reduce program which shows how to go about doing this?
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>

Re: Iterate through entire data set

Posted by Sonny Heer <so...@gmail.com>.

I have two boxes.  One is a windows box running Cassandra .6, and the
other is an ubuntu box from which I'm trying to run the word count
program as in the readme.

The windows box seed is set to 127.0.0.1, and listen address to localhost.

The ubuntu box seed & listen is point to IP of the windows box.

from contrib/word_count:

~/dev/cassandra-0.6.0-rc1/contrib/word_count$ ant
Unable to locate tools.jar. Expected to find it in
/usr/lib/jvm/java-6-openjdk/lib/tools.jar
Buildfile: build.xml

init:

build:

jar:
      [jar] Building jar:
/home/psheer/dev/cassandra-0.6.0-rc1/contrib/word_count/build/word_count.jar

BUILD SUCCESSFUL
Total time: 2 seconds

=====================

~/dev/cassandra-0.6.0-rc1/contrib/word_count$ bin/word_count_setup
10/04/08 10:33:18 INFO config.DatabaseDescriptor: Auto DiskAccessMode
determined to be standard
10/04/08 10:33:18 WARN config.DatabaseDescriptor: KeysCachedFraction
is deprecated: use KeysCached instead.
10/04/08 10:33:18 WARN config.DatabaseDescriptor: KeysCachedFraction
is deprecated: use KeysCached instead.
10/04/08 10:33:18 INFO service.StorageService: Starting up client gossip
Exception in thread "main" java.net.BindException: Cannot assign
requested address
	at sun.nio.ch.Net.bind(Native Method)
	at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:137)
	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:77)
	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:70)
	at org.apache.cassandra.net.MessagingService.listen(MessagingService.java:138)
	at org.apache.cassandra.service.StorageService.initClient(StorageService.java:289)
	at WordCountSetup.main(Unknown Source)


Sorry, I'm a bit new to this... help?


On Wed, Apr 7, 2010 at 6:07 PM, Stu Hood <st...@rackspace.com> wrote:
> Please read the README in the contrib/word_count directory.
>
> -----Original Message-----
> From: "Sonny Heer" <so...@gmail.com>
> Sent: Wednesday, April 7, 2010 6:33pm
> To: user@cassandra.apache.org
> Subject: Re: Iterate through entire data set
>
> Jon,
> I've got the word_count.jar and a Hadoop cluster.  How do you usually
> run this sample?
>
>
> On Wed, Apr 7, 2010 at 3:04 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> Yes
>>
>> On Wed, Apr 7, 2010 at 5:01 PM, Sonny Heer <so...@gmail.com> wrote:
>>> These examples work on Cassandra .06 and Hadoop .20.2?
>>>
>>> On Wed, Apr 7, 2010 at 2:49 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>> Look at the READMEs for contrib/word_count and contrib/pig.
>>>>
>>>> On Wed, Apr 7, 2010 at 4:47 PM, Sonny Heer <so...@gmail.com> wrote:
>>>>> I need a way to process all of my data set.
>>>>>
>>>>> A way to process every keyspace, CF, row, column, and perform some
>>>>> operation based on that mapped combination.
>>>>>
>>>>> The map bucket would collect down to column name.
>>>>>
>>>>> Is there a map/reduce program which shows how to go about doing this?
>>>>>
>>>>
>>>
>>
>
>
>

Re: Iterate through entire data set

Posted by Stu Hood <st...@rackspace.com>.

Please read the README in the contrib/word_count directory.

-----Original Message-----
From: "Sonny Heer" <so...@gmail.com>
Sent: Wednesday, April 7, 2010 6:33pm
To: user@cassandra.apache.org
Subject: Re: Iterate through entire data set

Jon,
I've got the word_count.jar and a Hadoop cluster.  How do you usually
run this sample?


On Wed, Apr 7, 2010 at 3:04 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Yes
>
> On Wed, Apr 7, 2010 at 5:01 PM, Sonny Heer <so...@gmail.com> wrote:
>> These examples work on Cassandra .06 and Hadoop .20.2?
>>
>> On Wed, Apr 7, 2010 at 2:49 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>> Look at the READMEs for contrib/word_count and contrib/pig.
>>>
>>> On Wed, Apr 7, 2010 at 4:47 PM, Sonny Heer <so...@gmail.com> wrote:
>>>> I need a way to process all of my data set.
>>>>
>>>> A way to process every keyspace, CF, row, column, and perform some
>>>> operation based on that mapped combination.
>>>>
>>>> The map bucket would collect down to column name.
>>>>
>>>> Is there a map/reduce program which shows how to go about doing this?
>>>>
>>>
>>
>

Re: Iterate through entire data set

Posted by Sonny Heer <so...@gmail.com>.

Jon,
I've got the word_count.jar and a Hadoop cluster.  How do you usually
run this sample?


On Wed, Apr 7, 2010 at 3:04 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Yes
>
> On Wed, Apr 7, 2010 at 5:01 PM, Sonny Heer <so...@gmail.com> wrote:
>> These examples work on Cassandra .06 and Hadoop .20.2?
>>
>> On Wed, Apr 7, 2010 at 2:49 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>> Look at the READMEs for contrib/word_count and contrib/pig.
>>>
>>> On Wed, Apr 7, 2010 at 4:47 PM, Sonny Heer <so...@gmail.com> wrote:
>>>> I need a way to process all of my data set.
>>>>
>>>> A way to process every keyspace, CF, row, column, and perform some
>>>> operation based on that mapped combination.
>>>>
>>>> The map bucket would collect down to column name.
>>>>
>>>> Is there a map/reduce program which shows how to go about doing this?
>>>>
>>>
>>
>

Re: Iterate through entire data set

Posted by Jonathan Ellis <jb...@gmail.com>.

Yes

On Wed, Apr 7, 2010 at 5:01 PM, Sonny Heer <so...@gmail.com> wrote:
> These examples work on Cassandra .06 and Hadoop .20.2?
>
> On Wed, Apr 7, 2010 at 2:49 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> Look at the READMEs for contrib/word_count and contrib/pig.
>>
>> On Wed, Apr 7, 2010 at 4:47 PM, Sonny Heer <so...@gmail.com> wrote:
>>> I need a way to process all of my data set.
>>>
>>> A way to process every keyspace, CF, row, column, and perform some
>>> operation based on that mapped combination.
>>>
>>> The map bucket would collect down to column name.
>>>
>>> Is there a map/reduce program which shows how to go about doing this?
>>>
>>
>

Re: Iterate through entire data set

Posted by Sonny Heer <so...@gmail.com>.

These examples work on Cassandra .06 and Hadoop .20.2?

On Wed, Apr 7, 2010 at 2:49 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Look at the READMEs for contrib/word_count and contrib/pig.
>
> On Wed, Apr 7, 2010 at 4:47 PM, Sonny Heer <so...@gmail.com> wrote:
>> I need a way to process all of my data set.
>>
>> A way to process every keyspace, CF, row, column, and perform some
>> operation based on that mapped combination.
>>
>> The map bucket would collect down to column name.
>>
>> Is there a map/reduce program which shows how to go about doing this?
>>
>

Re: Iterate through entire data set

Posted by Jonathan Ellis <jb...@gmail.com>.

Look at the READMEs for contrib/word_count and contrib/pig.

On Wed, Apr 7, 2010 at 4:47 PM, Sonny Heer <so...@gmail.com> wrote:
> I need a way to process all of my data set.
>
> A way to process every keyspace, CF, row, column, and perform some
> operation based on that mapped combination.
>
> The map bucket would collect down to column name.
>
> Is there a map/reduce program which shows how to go about doing this?
>