You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Matei Zaharia (JIRA)" <ji...@apache.org> on 2014/11/06 18:40:34 UTC

[jira] [Commented] (SPARK-993) Don't reuse Writable objects in HadoopRDDs by default

    [ https://issues.apache.org/jira/browse/SPARK-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14200531#comment-14200531 ] 

Matei Zaharia commented on SPARK-993:
-------------------------------------

Arun, you'd see this issue if you do collect() or take() and then println. The problem is that the same Text object (for example) is referenced for all records in the dataset. The counts will be okay.

> Don't reuse Writable objects in HadoopRDDs by default
> -----------------------------------------------------
>
>                 Key: SPARK-993
>                 URL: https://issues.apache.org/jira/browse/SPARK-993
>             Project: Spark
>          Issue Type: Improvement
>            Reporter: Matei Zaharia
>
> Right now we reuse them as an optimization, which leads to weird results when you call collect() on a file with distinct items. We should instead make that behavior optional through a flag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org