You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiaoju Wu (JIRA)" <ji...@apache.org> on 2018/04/25 15:46:00 UTC

[jira] [Created] (SPARK-24088) only HadoopRDD leverage HDFS Cache as preferred location

Xiaoju Wu created SPARK-24088:
---------------------------------

             Summary: only HadoopRDD leverage HDFS Cache as preferred location
                 Key: SPARK-24088
                 URL: https://issues.apache.org/jira/browse/SPARK-24088
             Project: Spark
          Issue Type: Improvement
          Components: Input/Output
    Affects Versions: 2.3.0
            Reporter: Xiaoju Wu


Only HadoopRDD implements convertSplitLocationInfo which will convert location to HDFSCacheTaskLocation based on if the block is cached in Datanode memory.  While FileScanRDD not. In FileScanRDD, all split location information is dropped. 

private[spark] def convertSplitLocationInfo(
 infos: Array[SplitLocationInfo]): Option[Seq[String]] = {
 Option(infos).map(_.flatMap { loc =>
 val locationStr = loc.getLocation
 if (locationStr != "localhost") {
 if (loc.isInMemory) {
 logDebug(s"Partition $locationStr is cached by Hadoop.")
 Some(HDFSCacheTaskLocation(locationStr).toString)
 } else {
 Some(HostTaskLocation(locationStr).toString)
 }
 } else {
 None
 }
 })
}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org