You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/21 03:26:33 UTC

[GitHub] imatiach-msft commented on issue #22087: [SPARK-25097][ML] Support prediction on single instance in KMeans/BiKMeans/GMM

imatiach-msft commented on issue #22087: [SPARK-25097][ML] Support prediction on single instance in KMeans/BiKMeans/GMM
URL: https://github.com/apache/spark/pull/22087#issuecomment-465847633
 
 
   @erikerlandson 
   I think this PR is good as is, we should decouple what it is fixing and the requirement to have a better base class structure.  I agree that longer-term we need to figure out a better base class structure that brings some of the parts of predictor without the supervised learning requirements.  I suggested to break the class hierarchy into unsupervisedmodel/supervisedmodel that inherit from predictionmodel above.  Removing training params from the model would be a much more radical change across all learners.  In Spark ML we share a lot of parameters on both the estimator and model, even when those parameters are only used during training, which is a little weird.  I actually do agree that it would be a better and more clean change logically to only have training-related parameters on the estimator (and base classes of estimator), but it would put us in backwards compatibility hell, as there is probably tons of user code that relies on this fact.  Moreover, it may be that the label is in some cases used in some of the models for some reason, most likely not but we should make sure.  I think we would need to do more research into that approach.  However, if it were done I don't think it would make a lot of users happy, because suddenly parameters would be missing from the model which they could previously access.  While those parameters logically don't _need_ to be there, they still do offer information about how the model was created so in some sense you could argue that it is good to have them on the model.  I guess it's just a matter of opinion, the important thing is to be consistent across the entire codebase.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org