You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Jeremy Hanna <je...@gmail.com> on 2010/11/04 20:53:08 UTC

Mahout/Cassandra integration

For people interested in using Cassandra with Mahout, there are a few possible integration points that could be fleshed out.  I was talking with Grant Ingersoll about this at apachecon and thought I would send out a note about it.  The motivation could be enhancing Cassandra's analytics capabilities with using Mahout with data stored in Cassandra.

drivers - in the bin directory there is a script that loads drivers.  Those drivers are used to input to the algorithms from sequence files through the hdfs inputformat by default.  It could possibly use Cassandra's inputformat or have a pluggable option.  I'm not sure where the output comes into play, but I would think that it would likewise just be able to use the outputformat.

datamodel - https://hudson.apache.org/hudson/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/model/DataModel.html

DataStore - https://hudson.apache.org/hudson/job/Mahout-Quality/javadoc/org/apache/mahout/classifier/bayes/interfaces/Datastore.html
Currently there is an HBase and an in memory data store, but that would be a relatively simple integration point.

Other integration points in the future might be using Flume for output and could also go through flume to Cassandra through the Cassandra sink that Tyler Hobbs did - https://github.com/thobbs/flume-cassandra-plugin

Anyway, just wanted to relay that info.