You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Marius FETEANU <ma...@sien.com> on 2014/09/30 11:55:48 UTC
Fwd: Actual Probabilities when Using Naive Bayes classifier
I want to use the mllib NaiveBayes classifier to predict user responses to
an offer.
I am interested in different types of responses (not just accept/reject)
and also I need the actual probabilities for each predictions (as each
label might come with a different benefit/cost not known at training time).
Anybody else has experience doing this with spark? Below is a detailed
explanation of what I have done/tried to do.
The question is how to get those probabilities out of the classifier on
each prediction? A built-in way would be best, but looking at the code it
does not seem possible. Instead I created this method:
def predictProbs(testData: Vector): (BDV[Double], BDV[Double]) = {
val logLikelihoodRatios = brzPi + brzTheta * new
BDV[Double](testData.toArray)
val relativeLikelihoods = logLikelihoodRatios.map(x => math.exp(x))
val probMass = relativeLikelihoods.reduceLeft[Double](_+_)
(logLikelihoodRatios, relativeLikelihoods.map(x => x/probMass))
}
def predictProbs(testData: RDD[Vector]): RDD[(BDV[Double], BDV[Double])]
= {
val bcModel = testData.context.broadcast(this)
testData.map{ item =>
val model = bcModel.value
model.predictProbs(item)
}
}
There are two big issues here:
- I have not tested this code (especially for performance), and it requires
me to either re-compile spark or duplicate the class
- I am not sure about the math (I used to use exp(llr)/1(1+exp(llr)) to do
this conversion but it does not seem to work here)