You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "TheGeorge1918 ." <zh...@gmail.com> on 2015/07/08 17:54:08 UTC

mahout random forest output with data id?

Hi all

I'm new to Mahout. I use random forest to do classification by using the
jar file. After I do prediction, I get a list of predicted labels for my
testing data. The question is

Is it possible to have data id along with the predicted labels?

When I prepared the data, I included the data id in the training/testing
data. And I specified this field to be ignored when describing the data. I
expected that in the output file, I would see this data id since it's not
used in the training or prediction.

If it's not possible to include data id in the prediction output, is there
any common routine to handle this. Since the testing data I have is quite
big, around 20GB. For me, there isn't any obvious way to pair the data id
and the predicted label.

I've already searched online. But unfortunately, I couldn't find anything
useful.

Thanks a lot

Best

Xuan

Re: mahout random forest output with data id?

Posted by Alessandro Negro <al...@yahoo.it>.
Hi Xuan,
I had a similar issue in the past, to fix it I changed the code in the example java recompile and run it.
I think that this could be the best solution also for you. If so I can try to find the piece of code I changed.

Regards,
Alessandro


Il giorno 08/lug/2015, alle ore 17:54, TheGeorge1918 . <zh...@gmail.com> ha scritto:

> Hi all
> 
> I'm new to Mahout. I use random forest to do classification by using the
> jar file. After I do prediction, I get a list of predicted labels for my
> testing data. The question is
> 
> Is it possible to have data id along with the predicted labels?
> 
> When I prepared the data, I included the data id in the training/testing
> data. And I specified this field to be ignored when describing the data. I
> expected that in the output file, I would see this data id since it's not
> used in the training or prediction.
> 
> If it's not possible to include data id in the prediction output, is there
> any common routine to handle this. Since the testing data I have is quite
> big, around 20GB. For me, there isn't any obvious way to pair the data id
> and the predicted label.
> 
> I've already searched online. But unfortunately, I couldn't find anything
> useful.
> 
> Thanks a lot
> 
> Best
> 
> Xuan