You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Mark <st...@gmail.com> on 2010/08/19 19:07:23 UTC
Cassandra w/ Hadoop
Are there any examples/tutorials on the web for reading/writing from
Cassandra into/from Hadoop?
I found the example in contrib/word_count but I really can't make sense
of it... a tutorial/explanation would help.
Re: Cassandra w/ Hadoop
Posted by Mark <st...@gmail.com>.
On 8/19/10 11:14 AM, Mark wrote:
> On 8/19/10 10:23 AM, Jeremy Hanna wrote:
>> I would check out http://wiki.apache.org/cassandra/HadoopSupport for
>> more info. I'll try to explain a bit more here, but I don't think
>> there's a tutorial out there yet.
>>
>> For input:
>> - configure your main class where you're starting the mapreduce job
>> the way the word_count is configured (with either storage-conf or in
>> your code via the ConfigHelper). It will complain specifically about
>> stuff you hadn't configured - esp. important is your cassandra server
>> and port.
>> - the inputs to your mapper are going to be what's coming from
>> cassandra - so your key with a map of row values
>> - you need to set your column name in your overridden setup method in
>> your mapper
>> - for the reducer, nothing really changes from a normal map/reduce,
>> unless you want to output to cassandra
>> - generally cassandra just provides an inputformat and split classes
>> to read from cassandra - you can find the guts in the
>> org.apache.cassandra.hadoop package
>>
>> For output:
>> - in your reducer, you could just write to cassandra directly via
>> thrift. there is a built-in outputformat coming in 0.7 but it still
>> might change before 0.7 final - that will queue up changes so it will
>> write large blocks all at once.
>>
>>
>> On Aug 19, 2010, at 12:07 PM, Mark wrote:
>>
>>> Are there any examples/tutorials on the web for reading/writing from
>>> Cassandra into/from Hadoop?
>>>
>>> I found the example in contrib/word_count but I really can't make
>>> sense of it... a tutorial/explanation would help.
> Thanks!
How does batching across all rows work? Does it just take an arbitrary
start w/ a limit of x and then use the last key from that result as the
next start? Does this work with RandomPartitioner?
Re: Cassandra w/ Hadoop
Posted by Mark <st...@gmail.com>.
On 8/19/10 10:23 AM, Jeremy Hanna wrote:
> I would check out http://wiki.apache.org/cassandra/HadoopSupport for more info. I'll try to explain a bit more here, but I don't think there's a tutorial out there yet.
>
> For input:
> - configure your main class where you're starting the mapreduce job the way the word_count is configured (with either storage-conf or in your code via the ConfigHelper). It will complain specifically about stuff you hadn't configured - esp. important is your cassandra server and port.
> - the inputs to your mapper are going to be what's coming from cassandra - so your key with a map of row values
> - you need to set your column name in your overridden setup method in your mapper
> - for the reducer, nothing really changes from a normal map/reduce, unless you want to output to cassandra
> - generally cassandra just provides an inputformat and split classes to read from cassandra - you can find the guts in the org.apache.cassandra.hadoop package
>
> For output:
> - in your reducer, you could just write to cassandra directly via thrift. there is a built-in outputformat coming in 0.7 but it still might change before 0.7 final - that will queue up changes so it will write large blocks all at once.
>
>
> On Aug 19, 2010, at 12:07 PM, Mark wrote:
>
>> Are there any examples/tutorials on the web for reading/writing from Cassandra into/from Hadoop?
>>
>> I found the example in contrib/word_count but I really can't make sense of it... a tutorial/explanation would help.
Thanks!
Re: Cassandra w/ Hadoop
Posted by Mark <st...@gmail.com>.
On 8/19/10 10:34 AM, Christian Decker wrote:
> If, like me, you prefer to write your jobs on the fly try taking a
> look at Pig. Cassandra provides a loadfunc under contrib/pig/ in the
> source package which allows you to load data directly from Cassandra.
> --
> Christian Decker
> Software Architect
> http://blog.snyke.net
>
>
> On Thu, Aug 19, 2010 at 7:23 PM, Jeremy Hanna
> <jeremy.hanna1234@gmail.com <ma...@gmail.com>> wrote:
>
> I would check out http://wiki.apache.org/cassandra/HadoopSupport
> for more info. I'll try to explain a bit more here, but I don't
> think there's a tutorial out there yet.
>
> For input:
> - configure your main class where you're starting the mapreduce
> job the way the word_count is configured (with either storage-conf
> or in your code via the ConfigHelper). It will complain
> specifically about stuff you hadn't configured - esp. important is
> your cassandra server and port.
> - the inputs to your mapper are going to be what's coming from
> cassandra - so your key with a map of row values
> - you need to set your column name in your overridden setup method
> in your mapper
> - for the reducer, nothing really changes from a normal
> map/reduce, unless you want to output to cassandra
> - generally cassandra just provides an inputformat and split
> classes to read from cassandra - you can find the guts in the
> org.apache.cassandra.hadoop package
>
> For output:
> - in your reducer, you could just write to cassandra directly via
> thrift. there is a built-in outputformat coming in 0.7 but it
> still might change before 0.7 final - that will queue up changes
> so it will write large blocks all at once.
>
>
> On Aug 19, 2010, at 12:07 PM, Mark wrote:
>
> > Are there any examples/tutorials on the web for reading/writing
> from Cassandra into/from Hadoop?
> >
> > I found the example in contrib/word_count but I really can't
> make sense of it... a tutorial/explanation would help.
>
>
That's definitely an option and I'll probably lean towards that in the
near future. I am just trying to get a complete understanding of the
whole infrastructure before working with higher level features.
Also same problem exists... I need a nice tutorial :)
Re: Cassandra w/ Hadoop
Posted by Christian Decker <de...@gmail.com>.
If, like me, you prefer to write your jobs on the fly try taking a look at
Pig. Cassandra provides a loadfunc under contrib/pig/ in the source package
which allows you to load data directly from Cassandra.
--
Christian Decker
Software Architect
http://blog.snyke.net
On Thu, Aug 19, 2010 at 7:23 PM, Jeremy Hanna <je...@gmail.com>wrote:
> I would check out http://wiki.apache.org/cassandra/HadoopSupport for more
> info. I'll try to explain a bit more here, but I don't think there's a
> tutorial out there yet.
>
> For input:
> - configure your main class where you're starting the mapreduce job the way
> the word_count is configured (with either storage-conf or in your code via
> the ConfigHelper). It will complain specifically about stuff you hadn't
> configured - esp. important is your cassandra server and port.
> - the inputs to your mapper are going to be what's coming from cassandra -
> so your key with a map of row values
> - you need to set your column name in your overridden setup method in your
> mapper
> - for the reducer, nothing really changes from a normal map/reduce, unless
> you want to output to cassandra
> - generally cassandra just provides an inputformat and split classes to
> read from cassandra - you can find the guts in the
> org.apache.cassandra.hadoop package
>
> For output:
> - in your reducer, you could just write to cassandra directly via thrift.
> there is a built-in outputformat coming in 0.7 but it still might change
> before 0.7 final - that will queue up changes so it will write large blocks
> all at once.
>
>
> On Aug 19, 2010, at 12:07 PM, Mark wrote:
>
> > Are there any examples/tutorials on the web for reading/writing from
> Cassandra into/from Hadoop?
> >
> > I found the example in contrib/word_count but I really can't make sense
> of it... a tutorial/explanation would help.
>
>
Re: Cassandra w/ Hadoop
Posted by Jeremy Hanna <je...@gmail.com>.
I would check out http://wiki.apache.org/cassandra/HadoopSupport for more info. I'll try to explain a bit more here, but I don't think there's a tutorial out there yet.
For input:
- configure your main class where you're starting the mapreduce job the way the word_count is configured (with either storage-conf or in your code via the ConfigHelper). It will complain specifically about stuff you hadn't configured - esp. important is your cassandra server and port.
- the inputs to your mapper are going to be what's coming from cassandra - so your key with a map of row values
- you need to set your column name in your overridden setup method in your mapper
- for the reducer, nothing really changes from a normal map/reduce, unless you want to output to cassandra
- generally cassandra just provides an inputformat and split classes to read from cassandra - you can find the guts in the org.apache.cassandra.hadoop package
For output:
- in your reducer, you could just write to cassandra directly via thrift. there is a built-in outputformat coming in 0.7 but it still might change before 0.7 final - that will queue up changes so it will write large blocks all at once.
On Aug 19, 2010, at 12:07 PM, Mark wrote:
> Are there any examples/tutorials on the web for reading/writing from Cassandra into/from Hadoop?
>
> I found the example in contrib/word_count but I really can't make sense of it... a tutorial/explanation would help.