You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Mahmood Naderan <nt...@yahoo.com> on 2014/03/31 16:20:01 UTC
Using "split" without partitioning the data to train/test
Hi,
In an old Mahout, I used wikipediaDataSetCreator on an input to create the training data
mahout wikipediaDataSetCreator -i
wiki-tr/chunks -o tr-input -c labels.txt
and then fed the tr-input to the trainclassifier using
mahout trainclassifier -i tr-input -o wikimodel
Now, in Mahout 0.9, I see some examples that create 80% of the input file as training model using "split"
mahout split -i input-vectors --trainingOutput tr-vectors --testOutput ts-vectors --randomSelectionPct 20
My question is how can I use "split" to split the input without partitioning it to train and test parts? I want to use one file as training input and the other file as the test input.
Regards,
Mahmood
Re: Using "split" without partitioning the data to train/test
Posted by Mahmood Naderan <nt...@yahoo.com>.
Yeah you are right. I have to ignore that command
Regards,
Mahmood
On Monday, March 31, 2014 6:56 PM, Suneel Marthi <su...@yahoo.com> wrote:
Sent from my iPhone
> On Mar 31, 2014, at 4:20 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
>
> Hi,
> In an old Mahout, I used wikipediaDataSetCreator on an input to create the training data
>
> mahout wikipediaDataSetCreator -i
> wiki-tr/chunks -o tr-input -c labels.txt
>
> and then fed the tr-input to the trainclassifier using
>
> mahout trainclassifier -i tr-input -o wikimodel
>
>
> Now, in Mahout 0.9, I see some examples that create 80% of the input file as training model using "split"
>
> mahout split -i input-vectors --trainingOutput tr-vectors --testOutput ts-vectors --randomSelectionPct 20
>
> My question is how can I use "split" to split the input without partitioning it to train and test parts? I want to use one file as training input and the other file as the test input.
So why use 'split'? Separate out the test and training files.
>
>
>
> Regards,
> Mahmood
Re: Using "split" without partitioning the data to train/test
Posted by Suneel Marthi <su...@yahoo.com>.
Sent from my iPhone
> On Mar 31, 2014, at 4:20 PM, Mahmood Naderan <nt...@yahoo.com> wrote:
>
> Hi,
> In an old Mahout, I used wikipediaDataSetCreator on an input to create the training data
>
> mahout wikipediaDataSetCreator -i
> wiki-tr/chunks -o tr-input -c labels.txt
>
> and then fed the tr-input to the trainclassifier using
>
> mahout trainclassifier -i tr-input -o wikimodel
>
>
> Now, in Mahout 0.9, I see some examples that create 80% of the input file as training model using "split"
>
> mahout split -i input-vectors --trainingOutput tr-vectors --testOutput ts-vectors --randomSelectionPct 20
>
> My question is how can I use "split" to split the input without partitioning it to train and test parts? I want to use one file as training input and the other file as the test input.
So why use 'split'? Separate out the test and training files.
>
>
>
> Regards,
> Mahmood