You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jungtaek Lim (Jira)" <ji...@apache.org> on 2019/12/16 02:47:00 UTC

[jira] [Commented] (SPARK-30264) Unexpected behaviour when using persist MEMORY_ONLY in RDD

    [ https://issues.apache.org/jira/browse/SPARK-30264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16996930#comment-16996930 ] 

Jungtaek Lim commented on SPARK-30264:
--------------------------------------

Could you provide minimized reproducer, and also try to reproduce with latest Spark version (2.4.4, and maybe also Spark 3.0.0 preview 1 if you don't mind)?

> Unexpected behaviour when using persist MEMORY_ONLY in RDD
> ----------------------------------------------------------
>
>                 Key: SPARK-30264
>                 URL: https://issues.apache.org/jira/browse/SPARK-30264
>             Project: Spark
>          Issue Type: Question
>          Components: Java API
>    Affects Versions: 2.4.0
>            Reporter: moshe ohaion
>            Priority: Major
>
> Persist method with MEMORY_ONLY behave different than using with MEMORY_ONLY_SER.
> persist(StorageLevel.MEMORY_ONLY()).distinct().count() return 1
> while persist(StorageLevel.MEMORY_ONLY_SER()).distinct().count() return 100
> I expect both to return the same results. The right result is 100, for some reason MEMORY_ONLY causing all the objects in the RDD to be the same one. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org