You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ablaye FAYE <fa...@gmail.com> on 2020/06/09 08:59:38 UTC
[PySpark CrossValidator] Dropping column randCol before fitting model
Hello,
I have noticed that the _fit method of CrossValidator class adds a new
column (randCol) to the input dataset in Pyspark. This column allows to
split the dataset in k folds.
Is this variable removed from the training data and test data of the fold
before fitting model?
I ask this question because I've gone through all the code but I haven't
seen a place where this variable is removed before executing the fitting.
Thanks for your help