You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Perinkulam I Ganesh (JIRA)" <ji...@apache.org> on 2015/06/26 23:05:05 UTC

[jira] [Commented] (SPARK-3528) Reading data from file:/// should be called NODE_LOCAL not PROCESS_LOCAL

    [ https://issues.apache.org/jira/browse/SPARK-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603576#comment-14603576 ] 

Perinkulam I Ganesh commented on SPARK-3528:
--------------------------------------------

Have a question:

If the driver is on one node and the slave on another node. Then the file may be local to the driver node but it won't be local on the slave. So is it proper to tag the file as NODE_LOCAL?

thanks

- P. I. 

> Reading data from file:/// should be called NODE_LOCAL not PROCESS_LOCAL
> ------------------------------------------------------------------------
>
>                 Key: SPARK-3528
>                 URL: https://issues.apache.org/jira/browse/SPARK-3528
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Andrew Ash
>            Priority: Critical
>
> Note that reading from {{file:///.../pom.xml}} is called a PROCESS_LOCAL task
> {noformat}
> spark> sc.textFile("pom.xml").count
> ...
> 14/09/15 00:59:13 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1191 bytes)
> 14/09/15 00:59:13 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1191 bytes)
> 14/09/15 00:59:13 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
> 14/09/15 00:59:13 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
> 14/09/15 00:59:13 INFO HadoopRDD: Input split: file:/Users/aash/git/spark/pom.xml:20862+20863
> 14/09/15 00:59:13 INFO HadoopRDD: Input split: file:/Users/aash/git/spark/pom.xml:0+20862
> {noformat}
> There is an outstanding TODO in {{HadoopRDD.scala}} that may be related:
> {noformat}
>   override def getPreferredLocations(split: Partition): Seq[String] = {
>     // TODO: Filtering out "localhost" in case of file:// URLs
>     val hadoopSplit = split.asInstanceOf[HadoopPartition]
>     hadoopSplit.inputSplit.value.getLocations.filter(_ != "localhost")
>   }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org