You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:34:59 UTC

[jira] [Resolved] (SPARK-14534) Should SparkContext.parallelize(List) take an Iterable instead?

     [ https://issues.apache.org/jira/browse/SPARK-14534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-14534.
----------------------------------
    Resolution: Incomplete

> Should SparkContext.parallelize(List) take an Iterable instead?
> ---------------------------------------------------------------
>
>                 Key: SPARK-14534
>                 URL: https://issues.apache.org/jira/browse/SPARK-14534
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.6.1
>            Reporter: David Wood
>            Priority: Minor
>              Labels: bulk-closed
>
> I am using MongoDB to read the DB and it provides an Iterable (and not a List) to access the results.  This is similar to the ResultSet in SQL and is done this way so that you can process things row by row and not have to pull in a potentially large DB all at once.  It might be nice if parallelize(List) could instead operate on an Iterable to allow a similar efficience.   SInce a List is an Iterable, this would would be backwards compatible.  However, I'm new to Spark so not sure if that might violate some other design point.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org