You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Thomas Kwan <th...@manage.com> on 2014/12/23 18:00:24 UTC

retry in combineByKey at BinaryClassificationMetrics.scala

Hi there,

We are using mllib 1.1.1, and doing Logistics Regression with a dataset of
about 150M rows.
The training part usually goes pretty smoothly without any retries. But
during the prediction stage and BinaryClassificationMetrics stage, I am
seeing retries with error of "fetch failure".

The prediction part is just as follows:

        val predictionAndLabel = testRDD.map { point =>
            val prediction = model.predict(point.features)
            (prediction, point.label)
        }
...
        val metrics = new BinaryClassificationMetrics(predictionAndLabel)

The fetch failure happened with the following stack trace:

org.apache.spark.rdd.PairRDDFunctions.combineByKey(PairRDDFunctions.scala:515)

org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.x$3$lzycompute(BinaryClassificationMetrics.scala:101)

org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.x$3(BinaryClassificationMetrics.scala:96)

org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.confusions$lzycompute(BinaryClassificationMetrics.scala:98)

org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.confusions(BinaryClassificationMetrics.scala:98)

org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.createCurve(BinaryClassificationMetrics.scala:142)

org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.roc(BinaryClassificationMetrics.scala:50)

org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.areaUnderROC(BinaryClassificationMetrics.scala:60)

com.manage.ml.evaluation.BinaryClassificationMetrics.areaUnderROC(BinaryClassificationMetrics.scala:14)

...

We are doing this in the yarn-client mode. 32 executors, 16G executor
memory, and 12 cores as the spark-submit settings.

I wonder if anyone has suggestion on how to debug this.

thanks in advance
thomas

Re: retry in combineByKey at BinaryClassificationMetrics.scala

Posted by Sean Owen <so...@cloudera.com>.
Yes, my change is slightly downstream of this point in the processing
though. The code is still creating a counter for each distinct score
value, and then binning. I don't think that would cause a failure -
just might be slow. At the extremes, you might see 'fetch failure' as
a symptom of things running too slowly.

Yes you can sacrifice some fidelity by more aggressively binning
upstream, on your scores. That would drastically reduce the input
size, at the cost of accuracy of course.

On Tue, Dec 23, 2014 at 7:35 PM, Xiangrui Meng <me...@gmail.com> wrote:
> Sean's PR may be relevant to this issue
> (https://github.com/apache/spark/pull/3702). As a workaround, you can
> try to truncate the raw scores to 4 digits (e.g., 0.5643215 -> 0.5643)
> before sending it to BinaryClassificationMetrics. This may not work
> well if he score distribution is very skewed. See discussion on
> https://issues.apache.org/jira/browse/SPARK-4547 -Xiangrui
>
> On Tue, Dec 23, 2014 at 9:00 AM, Thomas Kwan <th...@manage.com> wrote:
>> Hi there,
>>
>> We are using mllib 1.1.1, and doing Logistics Regression with a dataset of
>> about 150M rows.
>> The training part usually goes pretty smoothly without any retries. But
>> during the prediction stage and BinaryClassificationMetrics stage, I am
>> seeing retries with error of "fetch failure".
>>
>> The prediction part is just as follows:
>>
>>         val predictionAndLabel = testRDD.map { point =>
>>             val prediction = model.predict(point.features)
>>             (prediction, point.label)
>>         }
>> ...
>>         val metrics = new BinaryClassificationMetrics(predictionAndLabel)
>>
>> The fetch failure happened with the following stack trace:
>>
>> org.apache.spark.rdd.PairRDDFunctions.combineByKey(PairRDDFunctions.scala:515)
>>
>> org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.x$3$lzycompute(BinaryClassificationMetrics.scala:101)
>>
>> org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.x$3(BinaryClassificationMetrics.scala:96)
>>
>> org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.confusions$lzycompute(BinaryClassificationMetrics.scala:98)
>>
>> org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.confusions(BinaryClassificationMetrics.scala:98)
>>
>> org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.createCurve(BinaryClassificationMetrics.scala:142)
>>
>> org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.roc(BinaryClassificationMetrics.scala:50)
>>
>> org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.areaUnderROC(BinaryClassificationMetrics.scala:60)
>>
>> com.manage.ml.evaluation.BinaryClassificationMetrics.areaUnderROC(BinaryClassificationMetrics.scala:14)
>>
>> ...
>>
>>
>> We are doing this in the yarn-client mode. 32 executors, 16G executor
>> memory, and 12 cores as the spark-submit settings.
>>
>> I wonder if anyone has suggestion on how to debug this.
>>
>> thanks in advance
>> thomas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: retry in combineByKey at BinaryClassificationMetrics.scala

Posted by Xiangrui Meng <me...@gmail.com>.
Sean's PR may be relevant to this issue
(https://github.com/apache/spark/pull/3702). As a workaround, you can
try to truncate the raw scores to 4 digits (e.g., 0.5643215 -> 0.5643)
before sending it to BinaryClassificationMetrics. This may not work
well if he score distribution is very skewed. See discussion on
https://issues.apache.org/jira/browse/SPARK-4547 -Xiangrui

On Tue, Dec 23, 2014 at 9:00 AM, Thomas Kwan <th...@manage.com> wrote:
> Hi there,
>
> We are using mllib 1.1.1, and doing Logistics Regression with a dataset of
> about 150M rows.
> The training part usually goes pretty smoothly without any retries. But
> during the prediction stage and BinaryClassificationMetrics stage, I am
> seeing retries with error of "fetch failure".
>
> The prediction part is just as follows:
>
>         val predictionAndLabel = testRDD.map { point =>
>             val prediction = model.predict(point.features)
>             (prediction, point.label)
>         }
> ...
>         val metrics = new BinaryClassificationMetrics(predictionAndLabel)
>
> The fetch failure happened with the following stack trace:
>
> org.apache.spark.rdd.PairRDDFunctions.combineByKey(PairRDDFunctions.scala:515)
>
> org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.x$3$lzycompute(BinaryClassificationMetrics.scala:101)
>
> org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.x$3(BinaryClassificationMetrics.scala:96)
>
> org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.confusions$lzycompute(BinaryClassificationMetrics.scala:98)
>
> org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.confusions(BinaryClassificationMetrics.scala:98)
>
> org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.createCurve(BinaryClassificationMetrics.scala:142)
>
> org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.roc(BinaryClassificationMetrics.scala:50)
>
> org.apache.spark.mllib.evaluation.BinaryClassificationMetrics.areaUnderROC(BinaryClassificationMetrics.scala:60)
>
> com.manage.ml.evaluation.BinaryClassificationMetrics.areaUnderROC(BinaryClassificationMetrics.scala:14)
>
> ...
>
>
> We are doing this in the yarn-client mode. 32 executors, 16G executor
> memory, and 12 cores as the spark-submit settings.
>
> I wonder if anyone has suggestion on how to debug this.
>
> thanks in advance
> thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org