You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Sergio Ramírez <sr...@ugr.es> on 2015/06/15 11:56:47 UTC

Spark hangs without notification (broadcasting)

Hi everyone:

I am having several problems with an algorithm for MLLIB that I am 
developing. It uses large broadcasted variables with many iteration and 
breeze vectors as RDDs. The problem is that in some stages the spark 
program freezes without notification. I have tried to reduce the use of 
broadcasting and the size of the variables (from hash tables to simple 
arrays of bytes), but the problem appears again in others lines.

The code is here: 
https://github.com/sramirez/SparkFeatureSelection/blob/efficient-fs/src/main/scala/org/apache/spark/mllib/feature/InfoTheory.scala

There is a problem related with mine in JIRA: 
https://issues.apache.org/jira/browse/SPARK-5363
It seems fixed, but it is not so clear. Despite being related with 
PySpark, it also seems to reproduce in Scala.

I have tried several Spark versions: 1.2.0, 1.3.1, 1.4.0.

I would appreciate any clue or advise.

Thanks,

Sergio R.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org