You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by KhajaAsmath Mohammed <md...@gmail.com> on 2019/03/22 15:19:24 UTC

Java Heap Space error - Spark ML

Hi,

I am getting the below exception when using Spark Kmeans. Any solutions
from the experts. Would be really helpful.

val kMeans = new KMeans().setK(reductionCount).setMaxIter(30)

    val kMeansModel = kMeans.fit(df)

Error is occured when calling kmeans.fit


Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at
org.apache.spark.mllib.linalg.SparseVector.toArray(Vectors.scala:760)
        at
org.apache.spark.mllib.clustering.VectorWithNorm.toDense(KMeans.scala:614)
        at
org.apache.spark.mllib.clustering.KMeans$$anonfun$initKMeansParallel$3.apply(KMeans.scala:382)
        at
org.apache.spark.mllib.clustering.KMeans$$anonfun$initKMeansParallel$3.apply(KMeans.scala:382)
        at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at
scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
        at
org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:382)
        at
org.apache.spark.mllib.clustering.KMeans.runAlgorithm(KMeans.scala:256)
        at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:227)
        at org.apache.spark.ml.clustering.KMeans.fit(KMeans.scala:319)
        at
com.datamantra.spark.DataBalancing$.createBalancedDataframe(DataBalancing.scala:25)
        at
com.datamantra.spark.jobs.IftaMLTraining$.trainML$1(IftaMLTraining.scala:182)
        at
com.datamantra.spark.jobs.IftaMLTraining$.main(IftaMLTraining.scala:94)
        at
com.datamantra.spark.jobs.IftaMLTraining.main(IftaMLTraining.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
        at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
        at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Thanks,
Asmath

>

Re: Java Heap Space error - Spark ML

Posted by "Apostolos N. Papadopoulos" <pa...@csd.auth.gr>.
What is the size of your data, size of the cluster, are you using 
spark-submit or an IDE, what spark version are you using?

Try spark-submit and increase the memory of the driver or the executors.

a.


On 22/3/19 17:19, KhajaAsmath Mohammed wrote:
> Hi,
>
> I am getting the below exception when using Spark Kmeans. Any 
> solutions from the experts. Would be really helpful.
>
> val kMeans = new KMeans().setK(reductionCount).setMaxIter(30)
>
>     val kMeansModel = kMeans.fit(df)
>
> Error is occured when calling kmeans.fit
>
>
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>         at 
> org.apache.spark.mllib.linalg.SparseVector.toArray(Vectors.scala:760)
>         at 
> org.apache.spark.mllib.clustering.VectorWithNorm.toDense(KMeans.scala:614)
>         at 
> org.apache.spark.mllib.clustering.KMeans$$anonfun$initKMeansParallel$3.apply(KMeans.scala:382)
>         at 
> org.apache.spark.mllib.clustering.KMeans$$anonfun$initKMeansParallel$3.apply(KMeans.scala:382)
>         at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>         at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>         at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>         at 
> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>         at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>         at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
>         at 
> org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:382)
>         at 
> org.apache.spark.mllib.clustering.KMeans.runAlgorithm(KMeans.scala:256)
>         at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:227)
>         at org.apache.spark.ml.clustering.KMeans.fit(KMeans.scala:319)
>         at 
> com.datamantra.spark.DataBalancing$.createBalancedDataframe(DataBalancing.scala:25)
>         at 
> com.datamantra.spark.jobs.IftaMLTraining$.trainML$1(IftaMLTraining.scala:182)
>         at 
> com.datamantra.spark.jobs.IftaMLTraining$.main(IftaMLTraining.scala:94)
>         at 
> com.datamantra.spark.jobs.IftaMLTraining.main(IftaMLTraining.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
>         at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
>         at 
> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
>         at 
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> Thanks,
> Asmath
>
-- 
Apostolos N. Papadopoulos, Associate Professor
Department of Informatics
Aristotle University of Thessaloniki
Thessaloniki, GREECE
tel: ++0030312310991918
email: papadopo@csd.auth.gr
twitter: @papadopoulos_ap
web: http://datalab.csd.auth.gr/~apostol