You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Nirav Patel <np...@xactlycorp.com> on 2016/11/02 18:05:00 UTC

Spark ML - Is it rule of thumb that all Estimators should only be Fit on Training data

It is very clear that for ML algorithms (classification, regression) that
Estimator only fits on training data but it's not very clear of other
estimators like IDF for example.
IDF is a feature transformation model but having IDF estimator and
transformer makes it little confusing that what exactly it does in Fitting
on one dataset vs Transforming on another dataset.

-- 


[image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>

<https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn] 
<https://www.linkedin.com/company/xactly-corporation>  [image: Twitter] 
<https://twitter.com/Xactly>  [image: Facebook] 
<https://www.facebook.com/XactlyCorp>  [image: YouTube] 
<http://www.youtube.com/xactlycorporation>

Re: Spark ML - Is it rule of thumb that all Estimators should only be Fit on Training data

Posted by Sean Owen <so...@cloudera.com>.
I would also only fit these on training data. There are probably some
corner cases where letting these ancillary transforms see test data results
in a target leak. Though I can't really think of a good example.

More to the point, you're probably fitting these as part of a pipeline and
that pipeline as a whole is only fed with training data during model
building.

On Wed, Nov 2, 2016 at 6:05 PM Nirav Patel <np...@xactlycorp.com> wrote:

> It is very clear that for ML algorithms (classification, regression) that
> Estimator only fits on training data but it's not very clear of other
> estimators like IDF for example.
> IDF is a feature transformation model but having IDF estimator and
> transformer makes it little confusing that what exactly it does in Fitting
> on one dataset vs Transforming on another dataset.
>
>
>
> [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>
>
> <https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn]
> <https://www.linkedin.com/company/xactly-corporation>  [image: Twitter]
> <https://twitter.com/Xactly>  [image: Facebook]
> <https://www.facebook.com/XactlyCorp>  [image: YouTube]
> <http://www.youtube.com/xactlycorporation>