You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Xiaoli Li <li...@gmail.com> on 2014/04/15 23:29:38 UTC

StackOverflow Error when run ALS with 100 iterations

Hi,

I am testing ALS using 7 nodes. Each node has 4 cores and 8G memeory. ALS
program cannot run  even with a very small size of training data (about 91
lines) due to StackVverFlow error when I set the number of iterations to
100. I think the problem may be caused by updateFeatures method which
updates products RDD iteratively by join previous products RDD.


I am writing a program which has a similar update process with ALS.  This
problem also appeared when I iterate too many times (more than 80).

The iterative part of my code is as following:

solution = outlinks.join(solution). map {
     .......
 }


Has anyone had similar problem?  Thanks.


Xiaoli

Re: StackOverflow Error when run ALS with 100 iterations

Posted by Nick Pentreath <ni...@gmail.com>.

I'd also say that running for 100 iterations is a waste of resources, as
ALS will typically converge pretty quickly, as in within 10-20 iterations.


On Wed, Apr 16, 2014 at 3:54 AM, Xiaoli Li <li...@gmail.com> wrote:

> Thanks a lot for your information. It really helps me.
>
>
> On Tue, Apr 15, 2014 at 7:57 PM, Cheng Lian <li...@gmail.com> wrote:
>
>> Probably this JIRA issue<https://spark-project.atlassian.net/browse/SPARK-1006>solves your problem. When running with large iteration number, the lineage
>> DAG of ALS becomes very deep, both DAGScheduler and Java serializer may
>> overflow because they are implemented in a recursive way. You may resort to
>> checkpointing as a workaround.
>>
>>
>> On Wed, Apr 16, 2014 at 5:29 AM, Xiaoli Li <li...@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> I am testing ALS using 7 nodes. Each node has 4 cores and 8G memeory.
>>> ALS program cannot run  even with a very small size of training data (about
>>> 91 lines) due to StackVverFlow error when I set the number of iterations to
>>> 100. I think the problem may be caused by updateFeatures method which
>>> updates products RDD iteratively by join previous products RDD.
>>>
>>>
>>> I am writing a program which has a similar update process with ALS.
>>> This problem also appeared when I iterate too many times (more than 80).
>>>
>>> The iterative part of my code is as following:
>>>
>>> solution = outlinks.join(solution). map {
>>>      .......
>>>  }
>>>
>>>
>>> Has anyone had similar problem?  Thanks.
>>>
>>>
>>> Xiaoli
>>>
>>
>>
>

Re: StackOverflow Error when run ALS with 100 iterations

Posted by Xiaoli Li <li...@gmail.com>.

Thanks a lot for your information. It really helps me.


On Tue, Apr 15, 2014 at 7:57 PM, Cheng Lian <li...@gmail.com> wrote:

> Probably this JIRA issue<https://spark-project.atlassian.net/browse/SPARK-1006>solves your problem. When running with large iteration number, the lineage
> DAG of ALS becomes very deep, both DAGScheduler and Java serializer may
> overflow because they are implemented in a recursive way. You may resort to
> checkpointing as a workaround.
>
>
> On Wed, Apr 16, 2014 at 5:29 AM, Xiaoli Li <li...@gmail.com>wrote:
>
>> Hi,
>>
>> I am testing ALS using 7 nodes. Each node has 4 cores and 8G memeory. ALS
>> program cannot run  even with a very small size of training data (about 91
>> lines) due to StackVverFlow error when I set the number of iterations to
>> 100. I think the problem may be caused by updateFeatures method which
>> updates products RDD iteratively by join previous products RDD.
>>
>>
>> I am writing a program which has a similar update process with ALS.  This
>> problem also appeared when I iterate too many times (more than 80).
>>
>> The iterative part of my code is as following:
>>
>> solution = outlinks.join(solution). map {
>>      .......
>>  }
>>
>>
>> Has anyone had similar problem?  Thanks.
>>
>>
>> Xiaoli
>>
>
>

Re: StackOverflow Error when run ALS with 100 iterations

Posted by Xiangrui Meng <me...@gmail.com>.

ALS.setCheckpointInterval was added in Spark 1.3.1. You need to
upgrade Spark to use this feature. -Xiangrui

On Wed, Apr 22, 2015 at 9:03 PM, amghost <zh...@outlook.com> wrote:
> Hi, would you please how to checkpoint the training set rdd since all things
> are done in ALS.train method.
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/StackOverflow-Error-when-run-ALS-with-100-iterations-tp4296p22619.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: StackOverflow Error when run ALS with 100 iterations

Posted by LeoB <le...@gmail.com>.

Just wanted to add a comment to the Jira ticket but I don't think I have
permission to do so, so answering here instead. I am encountering the same
issue with a stackOverflow Exception. 
I would like to point out that there is a  
localCheckpoint
<https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-checkpointing.html>  
method which does not require HDFS to be installed. We could use this
instead of Checkpoint to cut down the lineage. 



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: StackOverflow Error when run ALS with 100 iterations

Posted by amghost <zh...@outlook.com>.

Hi, would you please how to checkpoint the training set rdd since all things
are done in ALS.train method.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/StackOverflow-Error-when-run-ALS-with-100-iterations-tp4296p22619.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: StackOverflow Error when run ALS with 100 iterations

Posted by Cheng Lian <li...@gmail.com>.

Probably this JIRA
issue<https://spark-project.atlassian.net/browse/SPARK-1006>solves
your problem. When running with large iteration number, the lineage
DAG of ALS becomes very deep, both DAGScheduler and Java serializer may
overflow because they are implemented in a recursive way. You may resort to
checkpointing as a workaround.

On Wed, Apr 16, 2014 at 5:29 AM, Xiaoli Li <li...@gmail.com> wrote:

> Hi,
>
> I am testing ALS using 7 nodes. Each node has 4 cores and 8G memeory. ALS
> program cannot run  even with a very small size of training data (about 91
> lines) due to StackVverFlow error when I set the number of iterations to
> 100. I think the problem may be caused by updateFeatures method which
> updates products RDD iteratively by join previous products RDD.
>
>
> I am writing a program which has a similar update process with ALS.  This
> problem also appeared when I iterate too many times (more than 80).
>
> The iterative part of my code is as following:
>
> solution = outlinks.join(solution). map {
>      .......
>  }
>
>
> Has anyone had similar problem?  Thanks.
>
>
> Xiaoli
>