You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by WangJianfei <wa...@otcaix.iscas.ac.cn> on 2016/11/12 13:57:46 UTC

does The Design of spark consider the scala parallelize collections?

Hi devs:
   According to scala doc, we can see the scala has parallelize collections,
according to my experient, surely, parallelize collections can accelerate
the operation,such as(map). so i want to know does spark has used the scala
parallelize collections and even will spark consider thant? thank you!



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/does-The-Design-of-spark-consider-the-scala-parallelize-collections-tp19833.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: does The Design of spark consider the scala parallelize collections?

Posted by Reynold Xin <rx...@databricks.com>.
Some places in Spark do use it:

> git grep "\\.par\\."
mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala:
 val models = Range(0, numClasses).par.map { index =>
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala:
           (0 until 10).par.foreach { _ =>
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLExecutionSuite.scala:
     (1 to 100).par.foreach { _ =>
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala:
   (1 to 100).par.map { i =>
streaming/src/main/scala/org/apache/spark/streaming/DStreamGraph.scala:
 inputStreams.par.foreach(_.start())
streaming/src/main/scala/org/apache/spark/streaming/DStreamGraph.scala:
 inputStreams.par.foreach(_.stop())


Most of the usage are in tests, not the actual execution path. Parallel
collection is fairly complicated and difficult to manage (implicit thread
pools). It is good for more the basic thread management, but Spark itself
has much more sophisticated parallelization built-in.


On Sat, Nov 12, 2016 at 5:57 AM, WangJianfei <
wangjianfei15@otcaix.iscas.ac.cn> wrote:

> Hi devs:
>    According to scala doc, we can see the scala has parallelize
> collections,
> according to my experient, surely, parallelize collections can accelerate
> the operation,such as(map). so i want to know does spark has used the scala
> parallelize collections and even will spark consider thant? thank you!
>
>
>
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/does-The-Design-of-
> spark-consider-the-scala-parallelize-collections-tp19833.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>