You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2019/10/08 05:44:12 UTC

[jira] [Resolved] (SPARK-24088) only HadoopRDD leverage HDFS Cache as preferred location

     [ https://issues.apache.org/jira/browse/SPARK-24088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-24088.
----------------------------------
    Resolution: Incomplete

> only HadoopRDD leverage HDFS Cache as preferred location
> --------------------------------------------------------
>
>                 Key: SPARK-24088
>                 URL: https://issues.apache.org/jira/browse/SPARK-24088
>             Project: Spark
>          Issue Type: Improvement
>          Components: Input/Output
>    Affects Versions: 2.3.0
>            Reporter: Xiaoju Wu
>            Priority: Minor
>              Labels: bulk-closed
>
> Only HadoopRDD implements convertSplitLocationInfo which will convert location to HDFSCacheTaskLocation based on if the block is cached in Datanode memory.  While FileScanRDD not. In FileScanRDD, all split location information is dropped. 
> private[spark] def convertSplitLocationInfo(
>  infos: Array[SplitLocationInfo]): Option[Seq[String]] = {
>  Option(infos).map(_.flatMap { loc =>
>  val locationStr = loc.getLocation
>  if (locationStr != "localhost") {
>  if (loc.isInMemory) {
>  logDebug(s"Partition $locationStr is cached by Hadoop.")
>  Some(HDFSCacheTaskLocation(locationStr).toString)
>  } else {
>  Some(HostTaskLocation(locationStr).toString)
>  }
>  } else {
>  None
>  }
>  })
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org