You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Xiaobo Gu <gu...@gmail.com> on 2011/07/16 07:52:33 UTC

What does the -Dmapred.max.split.size option of org.apache.mahout.df.mapreduce.BuildForest mean for each split ?

Hi,
  The row number for each split?

And does the partial implementation example show all the command
options of BuildForest and TestForest?

Regards,

Xiaobo Gu

Re: What does the -Dmapred.max.split.size option of org.apache.mahout.df.mapreduce.BuildForest mean for each split ?

Posted by deneche abdelhakim <ad...@gmail.com>.

Mahout is based on Hadoop, and this parameter is Hadoop's one. The partial
implementation splits the dataset into multiple partitions using Hadoop, and
this parameter tells Hadoop the size of each partition in bytes (not in row
number but Hadoop won't split a row in half).

calling BuildForest (TestForest) without any parameter should give you a
list of all parameters

On Sat, Jul 16, 2011 at 6:52 AM, Xiaobo Gu <gu...@gmail.com> wrote:

> Hi,
>  The row number for each split?
>
> And does the partial implementation example show all the command
> options of BuildForest and TestForest?
>
> Regards,
>
> Xiaobo Gu
>