You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Asher Krim <ak...@hubspot.com> on 2015/07/14 01:38:34 UTC

spark task hangs at BinaryClassificationMetrics (InetAddress related)

Hey everyone,

We are running into an issue where spark jobs will sometimes hang
indefinitely. We are on Spark 1.3.1 (working on upgrading soon), Java 8,
and using mesos with spark.mesos.coarse=false. I'm fairly certain that the
issue comes up when we do shuffle operations.
My pipeline reads data from hbase, and then runs LogisticRegression on it
using grid search to find the optimal parameters. At each iteration, I use
BinaryClassificationMetrics to compute the areaUnderROC and areaUnderPR.

We suspect that this is some kind of bug which is causing
java.net.Inet6AddressImpl.lookupAllHostAddr to hang.

Any ideas?

Thread dump:
Thread 1086: Executor task launch worker-62 (RUNNABLE)

java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:907)
java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1302)
java.net.InetAddress.getAllByName0(InetAddress.java:1255)
java.net.InetAddress.getAllByName(InetAddress.java:1171)
java.net.InetAddress.getAllByName(InetAddress.java:1105)
java.net.InetAddress.getByName(InetAddress.java:1055)
java.net.InetSocketAddress.<init>(InetSocketAddress.java:220)
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:126)
org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120)
org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:87)
org.apache.spark.storage.ShuffleBlockFetcherIterator.sendRequest(ShuffleBlockFetcherIterator.scala:149)
org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:262)
org.apache.spark.storage.ShuffleBlockFetcherIterator.<init>(ShuffleBlockFetcherIterator.scala:115)
org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.fetch(BlockStoreShuffleFetcher.scala:76)
org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:40)
org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
org.apache.spark.scheduler.Task.run(Task.scala:64)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)

Thanks,
Asher Krim