You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Will Uto (JIRA)" <ji...@apache.org> on 2019/03/02 15:34:00 UTC
[jira] [Commented] (SPARK-26943) Weird behaviour with `.cache()`
[ https://issues.apache.org/jira/browse/SPARK-26943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782418#comment-16782418 ]
Will Uto commented on SPARK-26943:
----------------------------------
Thanks for explanation [~srowen], makes sense - I think this is why I couldn't reproduce it locally (on a smaller dataset).
Out of curiosity, is there a way to run a newer version of Spark on a cluster e.g. within Python Virtual Environments, or do I have to upgrade an entire cluster?
> Weird behaviour with `.cache()`
> -------------------------------
>
> Key: SPARK-26943
> URL: https://issues.apache.org/jira/browse/SPARK-26943
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 2.1.0
> Reporter: Will Uto
> Priority: Major
>
>
> {code:java}
> sdf.count(){code}
>
> works fine. However:
>
> {code:java}
> sdf = sdf.cache()
> sdf.count()
> {code}
> does not, and produces error
> {code:java}
> Py4JJavaError: An error occurred while calling o314.count.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 75 in stage 8.0 failed 4 times, most recent failure: Lost task 75.3 in stage 8.0 (TID 438, uat-datanode-02, executor 1): java.text.ParseException: Unparseable number: "(N/A)"
> at java.text.NumberFormat.parse(NumberFormat.java:350)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org