You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by gsvic <vi...@gmail.com> on 2015/11/11 13:35:07 UTC

Map Tasks - Disk I/O

According to  this paper
<http://www.cs.berkeley.edu/~kubitron/courses/cs262a-F13/projects/reports/project16_report.pdf>  
Spak's map tasks writes the results to disk. 

My actual question is, in  BroadcastHashJoin
<https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala#L100>  
doExecute() method at line  109 the mapPartitions
<https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala#L109>  
method is called. At this step, Spark will schedule a number of tasks for
execution in order to perform the hash join operation. The results of these
tasks will be written to each worker's disk?



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Map-Tasks-Disk-I-O-tp15154.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org