You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by ricky lee <ri...@gmail.com> on 2013/03/26 17:14:06 UTC

setting the number of reduce jobs for FPGrowth

Hi,

I saw some similar questions in this mailing list but could not find a
clear answer yet.
With fairly large dataset (330G), the FPGrowth takes most of time in the
parallel-fpgrowth Reduce tasks, can I set the number of Reduce jobs
automatically? In my default Hadoop installation, the number of reduce job
is one, and it takes very long. I could make them complete earlier by
setting the Hadoop default number of Reduce jobs to over 10. Do you have a
recommendation to set the number of reduce jobs automatically by
considering Group size and number of frequent attributes?

Thanks.