You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Sameer Tilak <ss...@live.com> on 2013/11/19 22:55:37 UTC

Parallel Frequent Pattern Mining input format

Hi everyone,
I am interested in using Mahout for analyzing data -- in particular frequent pattern mining using Mahout's FPG algorithm. My data can be expressed as a MXN matrix. Each row represents a given user where as columns represent the items (1 if a given user has viewed a particular item 0 otherwise). We will have millions of rows and columns. I have following two questions:
1. Can anyone please tell me the input file format for the FPG algorithm? The documentation says that: "Input files have to be in the following format.<optional document id>TAB<TOKEN1>SPACE<TOKEN2>SPACE…." I looked at retail.dat and accident.dat, but not sure how the  format in documentation is mapped onto them. Any thoughts on representing data would be great. 
2. Any thoughts on scalability of FPG's implementation to our problem size.