You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by apu <ap...@gmail.com> on 2016/06/24 17:42:15 UTC

How can I use pyspark.ml.evaluation.BinaryClassificationEvaluator with point predictions instead of confidence intervals?

pyspark.ml.evaluation.BinaryClassificationEvaluator expects
predictions in the form of vectors (apparently designating confidence
intervals), as described in
https://spark.apache.org/docs/latest/api/python/pyspark.ml.html#pyspark.ml.evaluation.BinaryClassificationEvaluator

However, I am trying to evaluate ALS predictions, which are given as
single point predictions without confidence intervals. Therefore,
predictions are given as floats rather than vectors.

How can I evaluate these using ml's BinaryClassificationEvaluator?

(Note that this is a different function from mllib's
BinaryClassificationMetrics.)

Thanks!

Apu

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: How can I use pyspark.ml.evaluation.BinaryClassificationEvaluator with point predictions instead of confidence intervals?

Posted by apu <ap...@gmail.com>.

SOLVED.

The rawPredictionCol input to BinaryClassificationEvaluator is a
vector specifying the prediction confidence for each class. Since we
are talking about binary classification the prediction for class 0 is
simply (1 - y_pred), where y_pred is the prediction for class 1.

So this can be applied to ALS for boolean ratings as follows:

# First, train model and create predictions
from pyspark.ml.recommendation import ALS
model = ALS().fit(trainingdata)
predictions = model.transform(validationdata)

# Vectorize predictions to prep for evaluation
from pyspark.mllib.linalg import Vectors, VectorUDT
predictionvectorizer = udf(lambda x: Vectors.dense(1.0 - x, x),
returnType=VectorUDT())
vectorizedpredictions =
predictions.withColumn("rawPrediction",predictionvectorizer("prediction"))

# Now evaluate predictions
from pyspark.ml.evaluation import BinaryClassificationEvaluator
evaluator = BinaryClassificationEvaluator()
evaluator.evaluate(vectorizedpredictions)

On Fri, Jun 24, 2016 at 10:42 AM, apu <ap...@gmail.com> wrote:
> pyspark.ml.evaluation.BinaryClassificationEvaluator expects
> predictions in the form of vectors (apparently designating confidence
> intervals), as described in
> https://spark.apache.org/docs/latest/api/python/pyspark.ml.html#pyspark.ml.evaluation.BinaryClassificationEvaluator
>
> However, I am trying to evaluate ALS predictions, which are given as
> single point predictions without confidence intervals. Therefore,
> predictions are given as floats rather than vectors.
>
> How can I evaluate these using ml's BinaryClassificationEvaluator?
>
> (Note that this is a different function from mllib's
> BinaryClassificationMetrics.)
>
> Thanks!
>
> Apu

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org