You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2019/06/10 00:00:53 UTC
[jira] [Updated] (SPARK-23721) Enhance BlockManagerId to include container's underlying host machine hostname

     [ https://issues.apache.org/jira/browse/SPARK-23721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated SPARK-23721:
----------------------------------
    Affects Version/s:     (was: 2.4.0)
                       3.0.0

> Enhance BlockManagerId to include container's underlying host machine hostname
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-23721
>                 URL: https://issues.apache.org/jira/browse/SPARK-23721
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Mridul Muralidharan
>            Priority: Major
>
> In spark, host and rack locality computation is based on BlockManagerId's hostname - which is the container's hostname.
> When running in containerized environment's like kubernetes, docker support in hadoop 3, mesos docker support, etc; the hostname reported by container is not the actual 'host' the container is running on.
> This results in spark getting affected in multiple ways.
> h3. Suboptimal schedules
> Due to host name mismatch between different containers on same physical host, spark will treat all containers as running on own host.
> Effectively, there is no host-locality schedule at all due to this.
> In addition, depending on how sophisticated locality script is, it can also lead to either suboptimal rack locality computation all the way to no rack-locality schedule entirely.
> Hence the performance degradation in scheduler can be significant - only PROCESS_LOCAL schedules dont get affected.
> h3. HDFS reads
> This is closely related to "suboptimal schedules" above.
> Block locations for hdfs files refer to the datanode hostnames - and not the container's hostname.
> This effectively results in spark ignoring hdfs data placement entirely for scheduling tasks - resulting in very heavy cross-node/cross-rack data movement.
> h3. Speculative execution
> Spark schedules speculative tasks on a different host - in order to minimize the cost of node failures for expensive tasks.
> This gets effectively disabled, resulting in speculative tasks potentially running on the same actual host.
> h3. Block replication
> Similar to "speculative execution" above, block replication minimizes potential cost of node loss by typically leveraging another host; which gets effectively disabled in this case.
> Solution for the above is to enhance BlockManagerId to also include the node's actual hostname via 'nodeHostname' - which should be used for usecases above instead of the container hostname ('host').
> When not relevant, nodeHostname == hostname : which should ensure all existing functionality continues to work as expected with regressions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org