You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Edmond La Chance (Jira)" <ji...@apache.org> on 2019/10/01 12:32:00 UTC
[jira] [Created] (SPARK-29315) RDD.cache() called early creates
problems
Edmond La Chance created SPARK-29315:
----------------------------------------
Summary: RDD.cache() called early creates problems
Key: SPARK-29315
URL: https://issues.apache.org/jira/browse/SPARK-29315
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.4.4
Environment: Apache Spark 2.4.4
Windows 10
Reporter: Edmond La Chance
First issue I post here. I noticed that when I call RDD.cache() early in my code, the results are all wrong!
If I remove the call to cache(), or I add cache later in the code, after the first map transformation, it works fine.
The graph is created from a data structure that already contains the random.
I have posted versions that work, and versions that don't work here in this gist.
[https://gist.github.com/mitchi/edd9637687cf47fac2616bb72932f8e7]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org