You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Korb, Michael [USA]" <Ko...@bah.com> on 2014/02/18 18:55:08 UTC

Java Spark job significantly slower than Python

Hi,

I'm experimenting with a Spark analytic on a 9-node cluster, and the Python version runs in about 5 minutes, whereas the Java version with all the same SparkContext configurations (and everything else being equal) takes 40+ minutes.

Does anyone know what may be causing this performance issue? What is pyspark doing differently?

Thanks,
Mike