You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2021/01/29 14:56:02 UTC

[spark] branch branch-3.0 updated: [SPARK-34154][YARN][FOLLOWUP] Fix flaky LocalityPlacementStrategySuite test

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new cc78282  [SPARK-34154][YARN][FOLLOWUP] Fix flaky LocalityPlacementStrategySuite test
cc78282 is described below

commit cc782829b7e054a8750912d3a96cf034a7ba081a
Author: “attilapiros” <pi...@gmail.com>
AuthorDate: Fri Jan 29 23:54:40 2021 +0900

    [SPARK-34154][YARN][FOLLOWUP] Fix flaky LocalityPlacementStrategySuite test
    
    ### What changes were proposed in this pull request?
    
    Fixing the flaky `handle large number of containers and tasks (SPARK-18750)` by avoiding to use `DNSToSwitchMapping` as in some situation DNS lookup could be extremely slow.
    
    ### Why are the changes needed?
    
    After https://github.com/apache/spark/pull/31363 was merged the flaky `handle large number of containers and tasks (SPARK-18750)` test failed again in some other PRs but now we have the exact place where the test is stuck.
    
    It is in the DNS lookup:
    
    ```
    [info] - handle large number of containers and tasks (SPARK-18750) *** FAILED *** (30 seconds, 4 milliseconds)
    [info]   Failed with an exception or a timeout at thread join:
    [info]
    [info]   java.lang.RuntimeException: Timeout at waiting for thread to stop (its stack trace is added to the exception)
    [info]   	at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
    [info]   	at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
    [info]   	at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
    [info]   	at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
    [info]   	at java.net.InetAddress.getAllByName(InetAddress.java:1193)
    [info]   	at java.net.InetAddress.getAllByName(InetAddress.java:1127)
    [info]   	at java.net.InetAddress.getByName(InetAddress.java:1077)
    [info]   	at org.apache.hadoop.net.NetUtils.normalizeHostName(NetUtils.java:568)
    [info]   	at org.apache.hadoop.net.NetUtils.normalizeHostNames(NetUtils.java:585)
    [info]   	at org.apache.hadoop.net.CachedDNSToSwitchMapping.resolve(CachedDNSToSwitchMapping.java:109)
    [info]   	at org.apache.spark.deploy.yarn.SparkRackResolver.coreResolve(SparkRackResolver.scala:75)
    [info]   	at org.apache.spark.deploy.yarn.SparkRackResolver.resolve(SparkRackResolver.scala:66)
    [info]   	at org.apache.spark.deploy.yarn.LocalityPreferredContainerPlacementStrategy.$anonfun$localityOfRequestedContainers$3(LocalityPreferredContainerPlacementStrategy.scala:142)
    [info]   	at org.apache.spark.deploy.yarn.LocalityPreferredContainerPlacementStrategy$$Lambda$658/1080992036.apply$mcVI$sp(Unknown Source)
    [info]   	at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
    [info]   	at org.apache.spark.deploy.yarn.LocalityPreferredContainerPlacementStrategy.localityOfRequestedContainers(LocalityPreferredContainerPlacementStrategy.scala:138)
    [info]   	at org.apache.spark.deploy.yarn.LocalityPlacementStrategySuite.org$apache$spark$deploy$yarn$LocalityPlacementStrategySuite$$runTest(LocalityPlacementStrategySuite.scala:94)
    [info]   	at org.apache.spark.deploy.yarn.LocalityPlacementStrategySuite$$anon$1.run(LocalityPlacementStrategySuite.scala:40)
    [info]   	at java.lang.Thread.run(Thread.java:748) (LocalityPlacementStrategySuite.scala:61)
    ...
    ```
    
    This could be because of the DNS servers used by those build machines are not configured to handle IPv6 queries and the client has to wait for the IPv6 query to timeout before falling back to IPv4.
    
    This even make the tests more consistent. As when a single host was given to lookup via `resolve(hostName: String)` it gave a different answer from calling `resolve(hostNames: Seq[String])` with a `Seq` containing that single host.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Unit tests.
    
    Closes #31397 from attilapiros/SPARK-34154-2nd.
    
    Authored-by: “attilapiros” <pi...@gmail.com>
    Signed-off-by: HyukjinKwon <gu...@apache.org>
    (cherry picked from commit d3f049cbc274ee64bb9b56d6addba4f2cb8f1f0a)
    Signed-off-by: HyukjinKwon <gu...@apache.org>
---
 .../test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala  | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala b/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala
index 6216d47..0c40c98 100644
--- a/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala
+++ b/resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala
@@ -21,6 +21,7 @@ import java.util.Collections
 
 import scala.collection.JavaConverters._
 
+import org.apache.hadoop.net.{Node, NodeBase}
 import org.apache.hadoop.yarn.api.records._
 import org.apache.hadoop.yarn.client.api.AMRMClient
 import org.apache.hadoop.yarn.client.api.AMRMClient.ContainerRequest
@@ -47,6 +48,9 @@ class MockResolver extends SparkRackResolver(SparkHadoopUtil.get.conf) {
     if (hostName == "host3") "/rack2" else "/rack1"
   }
 
+  override def resolve(hostNames: Seq[String]): Seq[Node] =
+    hostNames.map(n => new NodeBase(n, resolve(n)))
+
 }
 
 class YarnAllocatorSuite extends SparkFunSuite with Matchers with BeforeAndAfterEach {


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org