You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jeet (Jira)" <ji...@apache.org> on 2021/03/18 16:53:00 UTC

[jira] [Updated] (SPARK-34791) SparkR throws node stack overflow

     [ https://issues.apache.org/jira/browse/SPARK-34791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeet updated SPARK-34791:
-------------------------
    Description: 
SparkR throws "node stack overflow" error upon running code (sample below) on R-4.0.2 with Spark 3.0.1.

Same piece of code works on R-3.3.3 with Spark 2.2.1 (& SparkR 2.4.5)

{{}}
{code:java}
source('sample.R')
myclsr = myclosure_func()
myclsr$get_some_date('2021-01-01')

## spark.lapply throws node stack overflow
result = spark.lapply(c('2021-01-01', '2021-01-02'), function (rdate) {
    source('sample.R')
    another_closure = myclosure_func()
    return(another_closure$get_some_date(rdate))
})

{code}
{{}}

Sample.R

{{}}
{code:java}
## util function, which calls itself
getPreviousBusinessDate <- function(asofdate) {
    asdt <- asofdate;
    asdt <- as.Date(asofdate)-1;

    wd <- format(as.Date(asdt),"%A")
    if(wd == "Saturday" | wd == "Sunday") {
        return (getPreviousBusinessDate(asdt));
    }

    return (asdt);
}

## closure which calls util function
myclosure_func = function() {

    myclosure = list()

    get_some_date = function (random_date) {
        return (getPreviousBusinessDate(random_date))
    }
    myclosure$get_some_date = get_some_date

    return(myclosure)
}

{code}
This seems to have caused by sourcing sample.R twice. Once before invoking Spark session and another within Spark session.

 

  was:
SparkR throws "node stack overflow" error upon running code (sample below) on R-4.0.2 with Spark 3.0.1.

Same piece of code works on R-3.3.3 with Spark 2.2.1 (& SparkR 2.4.5)

{{}}
{code:java}
source('sample.R')
myclsr = myclosure_func()
myclsr$get_some_date('2021-01-01')

## spark.lapply throws node stack overflow
result = spark.lapply(c('2021-01-01', '2021-01-02'), function (rdate) {
    source('sample.R')
    another_closure = myclosure_func()
    return(another_closure$get_some_date(rdate))
})

{code}
{{}}

Sample.R

{{}}
{code:java}
## util function, which calls itself
getPreviousBusinessDate <- function(asofdate) {
    asdt <- asofdate;
    asdt <- as.Date(asofdate)-1;

    wd <- format(as.Date(asdt),"%A")
    if(wd == "Saturday" | wd == "Sunday") {
        return (getPreviousBusinessDate(asdt));
    }

    return (asdt);
}

## closure which calls util function
myclosure_func = function() {

    myclosure = list()

    get_some_date = function (random_date) {
        return (getPreviousBusinessDate(random_date))
    }
    myclosure$get_some_date = get_some_date

    return(myclosure)
}

{code}
{{}}

{{}}

This seems to have caused by sourcing sample.R twice. Once before invoking Spark session and another within Spark session.

{{}}

{{}}

{{}}


> SparkR throws node stack overflow
> ---------------------------------
>
>                 Key: SPARK-34791
>                 URL: https://issues.apache.org/jira/browse/SPARK-34791
>             Project: Spark
>          Issue Type: Question
>          Components: SparkR
>    Affects Versions: 3.0.1
>            Reporter: Jeet
>            Priority: Major
>
> SparkR throws "node stack overflow" error upon running code (sample below) on R-4.0.2 with Spark 3.0.1.
> Same piece of code works on R-3.3.3 with Spark 2.2.1 (& SparkR 2.4.5)
> {{}}
> {code:java}
> source('sample.R')
> myclsr = myclosure_func()
> myclsr$get_some_date('2021-01-01')
> ## spark.lapply throws node stack overflow
> result = spark.lapply(c('2021-01-01', '2021-01-02'), function (rdate) {
>     source('sample.R')
>     another_closure = myclosure_func()
>     return(another_closure$get_some_date(rdate))
> })
> {code}
> {{}}
> Sample.R
> {{}}
> {code:java}
> ## util function, which calls itself
> getPreviousBusinessDate <- function(asofdate) {
>     asdt <- asofdate;
>     asdt <- as.Date(asofdate)-1;
>     wd <- format(as.Date(asdt),"%A")
>     if(wd == "Saturday" | wd == "Sunday") {
>         return (getPreviousBusinessDate(asdt));
>     }
>     return (asdt);
> }
> ## closure which calls util function
> myclosure_func = function() {
>     myclosure = list()
>     get_some_date = function (random_date) {
>         return (getPreviousBusinessDate(random_date))
>     }
>     myclosure$get_some_date = get_some_date
>     return(myclosure)
> }
> {code}
> This seems to have caused by sourcing sample.R twice. Once before invoking Spark session and another within Spark session.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org