You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Pat Ferrel <pa...@occamsmachete.com> on 2012/03/17 23:56:33 UTC

Export to MongoDB

I need to digest some mahout files and merge them into a MongoDB 
database. Since digesting would be a lot easier if the mahout keys were 
indexed I wonder if a "seqdumper --format json or mongodb" might be 
useful. It would make my life easier but maybe there is already a better 
way to do this?

Re: Export to MongoDB

Posted by Pat Ferrel <pa...@farfetchers.com>.
I have a couple mongo db structures that contain docs, terms associated 
with each vector dimension, term weights, docids for similar docs, 
clusters, docs included in the clusters, etc. They come from several 
sequence files in HDFS so I'm just looking for a way to conveniently do 
the post mahout processing. If each sequence file were in mongo with 
keys indexed I can imagine how to connect the dots.  Also I'm creating a 
prototype so trying to find the easiest way to do it. Since the data has 
to get into mongo I thought sooner in the pipeline would be simplest. I 
realize that I don't need to export into human readable json and could 
write to mongo directly and that is certainly an option.

I looked for a way to use mongo as a generic backing store for 
hadoop/mahout but struck out (not even sure that would be a good idea 
anyway). I did see the pig integration and saw your code for the 
MongoDBDataModel in the recommender but they didn't  seem to apply to my 
case.

Any advise is appreciated.

On 3/17/12 4:01 PM, Sean Owen wrote:
> What do you mean by indexed here?
>
> On Sat, Mar 17, 2012 at 10:56 PM, Pat Ferrel<pa...@occamsmachete.com>  wrote:
>
>> I need to digest some mahout files and merge them into a MongoDB database.
>> Since digesting would be a lot easier if the mahout keys were indexed I
>> wonder if a "seqdumper --format json or mongodb" might be useful. It would
>> make my life easier but maybe there is already a better way to do this?
>>

Re: Export to MongoDB

Posted by Sean Owen <sr...@gmail.com>.
What do you mean by indexed here?

On Sat, Mar 17, 2012 at 10:56 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> I need to digest some mahout files and merge them into a MongoDB database.
> Since digesting would be a lot easier if the mahout keys were indexed I
> wonder if a "seqdumper --format json or mongodb" might be useful. It would
> make my life easier but maybe there is already a better way to do this?
>