You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Nirav Patel <np...@xactlycorp.com> on 2016/11/01 12:10:15 UTC

Spark ML - CrossValidation - How to get Evaluation metrics of best model

I am running classification model. with normal training-test split I can
check model accuracy and F1 score using MulticlassClassificationEvaluator.
How can I do this with CrossValidation approach?
Afaik, you Fit entire sample data in CrossValidator as you don't want to
leave out any observation from either testing or training. But by doing so
I don't have anymore unseen data on which I can run finalized model on. So
is there a way I can get Accuracy and F1 score of a best model resulted
from cross validation?
Or should I still split sample data in to training and test before running
cross validation against only training data? so later I can test it against
test data.

-- 


[image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>

<https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn] 
<https://www.linkedin.com/company/xactly-corporation>  [image: Twitter] 
<https://twitter.com/Xactly>  [image: Facebook] 
<https://www.facebook.com/XactlyCorp>  [image: YouTube] 
<http://www.youtube.com/xactlycorporation>

Re: Spark ML - CrossValidation - How to get Evaluation metrics of best model

Posted by Nirav Patel <np...@xactlycorp.com>.
Thanks!

On Tue, Nov 1, 2016 at 6:30 AM, Sean Owen <so...@cloudera.com> wrote:

> CrossValidator splits the data into k sets, and then trains k times,
> holding out one subset for cross-validation each time. You are correct that
> you should actually withhold an additional test set, before you use
> CrossValidator, in order to get an unbiased estimate of the best model's
> performance.
>
> On Tue, Nov 1, 2016 at 12:10 PM Nirav Patel <np...@xactlycorp.com> wrote:
>
>> I am running classification model. with normal training-test split I can
>> check model accuracy and F1 score using MulticlassClassificationEvaluator.
>> How can I do this with CrossValidation approach?
>> Afaik, you Fit entire sample data in CrossValidator as you don't want to
>> leave out any observation from either testing or training. But by doing so
>> I don't have anymore unseen data on which I can run finalized model on. So
>> is there a way I can get Accuracy and F1 score of a best model resulted
>> from cross validation?
>> Or should I still split sample data in to training and test before
>> running cross validation against only training data? so later I can test it
>> against test data.
>>
>>
>>
>> [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>
>>
>> <https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn]
>> <https://www.linkedin.com/company/xactly-corporation>  [image: Twitter]
>> <https://twitter.com/Xactly>  [image: Facebook]
>> <https://www.facebook.com/XactlyCorp>  [image: YouTube]
>> <http://www.youtube.com/xactlycorporation>
>
>

-- 


[image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>

<https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn] 
<https://www.linkedin.com/company/xactly-corporation>  [image: Twitter] 
<https://twitter.com/Xactly>  [image: Facebook] 
<https://www.facebook.com/XactlyCorp>  [image: YouTube] 
<http://www.youtube.com/xactlycorporation>

Re: Spark ML - CrossValidation - How to get Evaluation metrics of best model

Posted by Sean Owen <so...@cloudera.com>.
CrossValidator splits the data into k sets, and then trains k times,
holding out one subset for cross-validation each time. You are correct that
you should actually withhold an additional test set, before you use
CrossValidator, in order to get an unbiased estimate of the best model's
performance.

On Tue, Nov 1, 2016 at 12:10 PM Nirav Patel <np...@xactlycorp.com> wrote:

> I am running classification model. with normal training-test split I can
> check model accuracy and F1 score using MulticlassClassificationEvaluator.
> How can I do this with CrossValidation approach?
> Afaik, you Fit entire sample data in CrossValidator as you don't want to
> leave out any observation from either testing or training. But by doing so
> I don't have anymore unseen data on which I can run finalized model on. So
> is there a way I can get Accuracy and F1 score of a best model resulted
> from cross validation?
> Or should I still split sample data in to training and test before running
> cross validation against only training data? so later I can test it against
> test data.
>
>
>
> [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>
>
> <https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn]
> <https://www.linkedin.com/company/xactly-corporation>  [image: Twitter]
> <https://twitter.com/Xactly>  [image: Facebook]
> <https://www.facebook.com/XactlyCorp>  [image: YouTube]
> <http://www.youtube.com/xactlycorporation>