You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Franco Victorio <vi...@gmail.com> on 2018/01/27 17:29:03 UTC

Semi-supervised learning in MLlib

Hi, I'm working on the implementation of a semi-supervised algorithm in Spark
and I want it to implement the interfaces provided by MLlib, so that it can
use things like model selection.

My problem is that, as far as I can tell, the provided interfaces are meant
for supervised algorithms (for example, they assume all the training data is
labeled).

The other problem is that this method is transductive, so it would receive a
dataframe with features and label columns, and the label column would be
mostly null, and the algorithm would just fill the non-null entries. What I
mean with this is that a `fit` stage doesn't really make sense. But if I
want to do model selection, I need to have an Estimator with configurable
parameters.

Is anyone aware of some work already done in Spark with this
characteristics? Are there plans to support this kind of algorithms in the
future?

Thanks.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org