You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hivemall.apache.org by Mars Xu <xu...@gmail.com> on 2017/04/11 08:10:03 UTC

Random Forest to train Kaggle Titanic but the accuracy is just 0.65 which small than the user guide says 0.765

Hi users,

    I build hivemall using the version 0.4.2 with spark 2.1.0, than running the random forest algorithm to test Kaggle titanic Tutorial(https://hivemall.incubator.apache.org/userguide/binaryclass/titanic_rf.html <https://hivemall.incubator.apache.org/userguide/binaryclass/titanic_rf.html>)
  
    There is one point I didn’t follow the guide, in data preparation part, when I run this command ,

awk '{ FPAT="([^,]*)|(\"[^\"]+\")";OFS="|"; } NR >1 {$1=$1;$4=substr($4,2,length($4)-2);print $0}’ train.csv
    the data is not right as below ,



so, I just use ‘,’  as the fields delimiter.  it get the accuracy 0.655 on Kaggle platform. 

Is there anything I can do to correct this result ?



Thanks so much!
Mars.


Re: Random Forest to train Kaggle Titanic but the accuracy is just 0.65 which small than the user guide says 0.765

Posted by Makoto Yui <yu...@gmail.com>.
Hi Mars,

Could you share the training data and test data for me?

Also, code snippets used for training/test are required for reproducing
your result.

Thanks,
Makoto

2017-04-11 17:10 GMT+09:00 Mars Xu <xu...@gmail.com>:

> Hi users,
>
>     I build hivemall using the version 0.4.2 with spark 2.1.0, than
> running the random forest algorithm to test Kaggle titanic Tutorial(
> https://hivemall.incubator.apache.org/userguide/binaryclass/titanic_
> rf.html)
>
>     There is one point I didn’t follow the guide, in data preparation
> part, when I run this command ,
>
> awk '{ FPAT="([^,]*)|(\"[^\"]+\")";OFS="|"; } NR >1 {$1=$1;$4=substr($4,2,length($4)-2);print $0}’ train.csv
>
>     the data is not right as below ,
>
>
> so, I just use ‘,’  as the fields delimiter.  it get the accuracy 0.655 on
> Kaggle platform.
>
> Is there anything I can do to correct this result ?
>
>
>
> Thanks so much!
> Mars.
>
>