You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by hustnn <nz...@gmail.com> on 2011/05/05 04:14:47 UTC

integrate with mysql,hadoop,hbase and cassandra

Is there some examples shows how mahout integrate with mysql,hbase ,
cassandra and hadoop, it means how to gain input and output data.

Do I need to implement some inputformat and outputformat for the specific
db?

Thanks.

--
View this message in context: http://lucene.472066.n3.nabble.com/integrate-with-mysql-hadoop-hbase-and-cassandra-tp2901764p2901764.html
Sent from the Mahout Developer List mailing list archive at Nabble.com.

Re: integrate with mysql,hadoop,hbase and cassandra

Posted by Sean Owen <sr...@gmail.com>.
Yes, you need to be much more specific about what you are trying to
do. There's not one algorithm here, or even one family of algorithms,
which could possibly have one input format.

On Thu, May 5, 2011 at 3:14 AM, hustnn <nz...@gmail.com> wrote:
> Is there some examples shows how mahout integrate with mysql,hbase ,
> cassandra and hadoop, it means how to gain input and output data.
>
> Do I need to implement some inputformat and outputformat for the specific
> db?
>
> Thanks.
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/integrate-with-mysql-hadoop-hbase-and-cassandra-tp2901764p2901764.html
> Sent from the Mahout Developer List mailing list archive at Nabble.com.
>

Re: integrate with mysql,hadoop,hbase and cassandra

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
I think at the current state of Mahout one can't make a sweeping
statement as to any single algorithm supports a universal Mahout
format.

Many algorithms work with Distributed row matrix format, which is a
sequence file of <Writable, VectorWritable> pairs. This is probably
the most widely supported in Mahout for batch-style training and
transformations.

Many algorithms also expose  java level api.

SGD-based regressions also support web services for online style learning.

I am not sure what the status of integration with hbase is, but i
think any support for this is currently quite scarce.

I don't think there was any effort to integrate with Cassandra.

Since most algorithms are working either with matrices or streams of
feature vectors, any effort to integrate with either hbase or
Cassandra would probably require some standardization of
interpretation of content similar to how it was done about Distributed
Row matrix sequence files. AFAIK there's none such effort.

*I think* the thinking is that one can integrate with any kind of
outside sample media but the effort to vectorize that is outside of
Mahout's scope (perhaps it can be more like 'contributed' scope).
Usually it is very easy to vectorize data and there are helpers
available to do that, which are described in much detail in the book
"Mahout in Action".


On Wed, May 4, 2011 at 7:14 PM, hustnn <nz...@gmail.com> wrote:
> Is there some examples shows how mahout integrate with mysql,hbase ,
> cassandra and hadoop, it means how to gain input and output data.
>
> Do I need to implement some inputformat and outputformat for the specific
> db?
>
> Thanks.
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/integrate-with-mysql-hadoop-hbase-and-cassandra-tp2901764p2901764.html
> Sent from the Mahout Developer List mailing list archive at Nabble.com.
>