You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiaoju Wu (JIRA)" <ji...@apache.org> on 2018/04/25 15:46:00 UTC
[jira] [Created] (SPARK-24088) only HadoopRDD leverage HDFS Cache
as preferred location
Xiaoju Wu created SPARK-24088:
---------------------------------
Summary: only HadoopRDD leverage HDFS Cache as preferred location
Key: SPARK-24088
URL: https://issues.apache.org/jira/browse/SPARK-24088
Project: Spark
Issue Type: Improvement
Components: Input/Output
Affects Versions: 2.3.0
Reporter: Xiaoju Wu
Only HadoopRDD implements convertSplitLocationInfo which will convert location to HDFSCacheTaskLocation based on if the block is cached in Datanode memory. While FileScanRDD not. In FileScanRDD, all split location information is dropped.
private[spark] def convertSplitLocationInfo(
infos: Array[SplitLocationInfo]): Option[Seq[String]] = {
Option(infos).map(_.flatMap { loc =>
val locationStr = loc.getLocation
if (locationStr != "localhost") {
if (loc.isInMemory) {
logDebug(s"Partition $locationStr is cached by Hadoop.")
Some(HDFSCacheTaskLocation(locationStr).toString)
} else {
Some(HostTaskLocation(locationStr).toString)
}
} else {
None
}
})
}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org