You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Anuradha Uduwage (JIRA)" <ji...@apache.org> on 2015/08/11 20:26:46 UTC

[jira] [Commented] (SPARK-1006) MLlib ALS gets stack overflow with too many iterations

    [ https://issues.apache.org/jira/browse/SPARK-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14682235#comment-14682235 ] 

Anuradha Uduwage commented on SPARK-1006:
-----------------------------------------

Is there away to find out which build fix this problem: I am on 1.3.1 and error 

5/08/11 14:08:20 ERROR TaskSetManager: Task 1 in stage 4204.0 failed 4 times; aborting job
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 4204.0 failed 4 times, most recent failure: Lost task 1.3 in stage 4204.0 (COMPANYDOMAIN-TAKENOUT): java.lang.StackOverflowError
        at java.io.SerialCallbackContext.<init>(SerialCallbackContext.java:48)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1890)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

I am doing 50 iterations 
val numIterations = 50
ALS.trainImplicit(trainData, rank, numIterations, 0.01, 0.01)

> MLlib ALS gets stack overflow with too many iterations
> ------------------------------------------------------
>
>                 Key: SPARK-1006
>                 URL: https://issues.apache.org/jira/browse/SPARK-1006
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>            Reporter: Matei Zaharia
>
> The tipping point seems to be around 50. We should fix this by checkpointing the RDDs every 10-20 iterations to break the lineage chain, but checkpointing currently requires HDFS installed, which not all users will have.
> We might also be able to fix DAGScheduler to not be recursive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org