You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2021/09/26 03:52:00 UTC

[jira] [Resolved] (SPARK-36843) Add an iterator method to Dataset

     [ https://issues.apache.org/jira/browse/SPARK-36843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-36843.
----------------------------------
    Resolution: Won't Fix

In that case you would just call first(). I don't think this is worthwhile

> Add an iterator method to Dataset
> ---------------------------------
>
>                 Key: SPARK-36843
>                 URL: https://issues.apache.org/jira/browse/SPARK-36843
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Li Xian
>            Priority: Minor
>
> The current org.apache.spark.sql.Dataset#toLocalIterator will submit multiple jobs for multiple partitions. 
> In my case, I would like to collect all partition at once to save the job scheduling cost and also has an iterator to save the memory on deserialization (instead of deserialize all rows at once, I want only one row is deserialized during the iteration)
> . 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org