You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Edmond La Chance (Jira)" <ji...@apache.org> on 2019/10/01 12:41:00 UTC

[jira] [Updated] (SPARK-29315) RDD.cache() called early creates problems

     [ https://issues.apache.org/jira/browse/SPARK-29315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edmond La Chance updated SPARK-29315:
-------------------------------------
    Description: 
First issue I post here.  I noticed that when I call RDD.cache() early in my code, the results are all wrong!
 If I remove the call to cache(), or I add cache later in the code, after the first map transformation, it works fine.
 The graph is created from a data structure that already contains the random.

 

I have posted versions that work, and versions that don't work here in this gist.

[https://gist.github.com/mitchi/edd9637687cf47fac2616bb72932f8e7]

here is an output that works : 

_Colors of the graph_

_3 2 1 3 2 1 1 4 2 3_

and an output that doesn't work :

_Colors of the graph_

_25 16 36 49 3 1 6 15 10 3_

 

 

  was:
First issue I post here.  I noticed that when I call RDD.cache() early in my code, the results are all wrong!
 If I remove the call to cache(), or I add cache later in the code, after the first map transformation, it works fine.
 The graph is created from a data structure that already contains the random.

 

I have posted versions that work, and versions that don't work here in this gist.

[https://gist.github.com/mitchi/edd9637687cf47fac2616bb72932f8e7

]

here is an output that works : 

_Colors of the graph_

_3 2 1 3 2 1 1 4 2 3_

and an output that doesn't work :

_Colors of the graph_

_25 16 36 49 3 1 6 15 10 3_

 

 


> RDD.cache() called early creates problems
> -----------------------------------------
>
>                 Key: SPARK-29315
>                 URL: https://issues.apache.org/jira/browse/SPARK-29315
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.4
>         Environment: Apache Spark 2.4.4
> Windows 10
>            Reporter: Edmond La Chance
>            Priority: Minor
>
> First issue I post here.  I noticed that when I call RDD.cache() early in my code, the results are all wrong!
>  If I remove the call to cache(), or I add cache later in the code, after the first map transformation, it works fine.
>  The graph is created from a data structure that already contains the random.
>  
> I have posted versions that work, and versions that don't work here in this gist.
> [https://gist.github.com/mitchi/edd9637687cf47fac2616bb72932f8e7]
> here is an output that works : 
> _Colors of the graph_
> _3 2 1 3 2 1 1 4 2 3_
> and an output that doesn't work :
> _Colors of the graph_
> _25 16 36 49 3 1 6 15 10 3_
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org