You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Jay Vyas <ja...@gmail.com> on 2014/02/16 23:31:55 UTC

Alternative input formats for Distributed REcommenders.

Does mahout have any kind of record transformer or reader API so that I can
use existing files, that arent perfectly formatted, as input to the
recommenders?

The Recommender's desired input data set has format:

jay, skis, .2
jay, iphone, .3


Instead I have:

jay,  xbffX, skis, .2
jay,   x123x, iphone, .3

So I'd like to tell the recommender engine at runtime to read in fields 0,
2, and 3, skipping the garbage text in column 1.

Any ideas on how to handle this without having to write a mapreduce job
just to scrape 3 out of the 4  columns out of the file?

-- 
Jay Vyas
http://jayunit100.blogspot.com