You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/11/08 15:55:58 UTC
[jira] [Commented] (SPARK-18343)
FileSystem$Statistics$StatisticsDataReferenceCleaner hangs on s3 write
[ https://issues.apache.org/jira/browse/SPARK-18343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15647931#comment-15647931 ]
Sean Owen commented on SPARK-18343:
-----------------------------------
I'd more readily suspect the S3 library threads like java-sdk-progress-listener-callback-thread becuase it involves S3. Can you look into the stack traces to see what actually seems to be stuck on a lock or something? It doesn't seem like we have evidence of this occurring otherwise.
> FileSystem$Statistics$StatisticsDataReferenceCleaner hangs on s3 write
> ----------------------------------------------------------------------
>
> Key: SPARK-18343
> URL: https://issues.apache.org/jira/browse/SPARK-18343
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.0.1
> Environment: Spark 2.0.1
> Hadoop 2.7.1
> Mesos 1.0.1
> Ubuntu 14.04
> Reporter: Luke Miner
>
> I have a driver program where I write read data in from Cassandra using spark, perform some operations, and then write out to JSON on S3. The program runs fine when I use Spark 1.6.1 and the spark-cassandra-connector 1.6.0-M1.
> However, if I try to upgrade to Spark 2.0.1 (hadoop 2.7.1) and spark-cassandra-connector 2.0.0-M3, the program completes in the sense that all the expected files are written to S3, but the program never terminates.
> I do run `sc.stop()` at the end of the program. I am also using Mesos 1.0.1. In both cases I use the default output committer.
> From the thread dump (included below) it seems like it could be waiting on: `org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner`
> Code snippet:
> {code}
> // get MongoDB oplog operations
> val operations = sc.cassandraTable[JsonOperation](keyspace, namespace)
> .where("ts >= ? AND ts < ?", minTimestamp, maxTimestamp)
>
> // replay oplog operations into documents
> val documents = operations
> .spanBy(op => op.id)
> .map { case (id: String, ops: Iterable[T]) => (id, apply(ops)) }
> .filter { case (id, result) => result.isInstanceOf[Document] }
> .map { case (id, document) => MergedDocument(id = id, document = document
> .asInstanceOf[Document])
> }
>
> // write documents to json on s3
> documents
> .map(document => document.toJson)
> .coalesce(partitions)
> .saveAsTextFile(path, classOf[GzipCodec])
> sc.stop()
> {code}
> Thread dump on the driver:
> {code}
> 60 context-cleaner-periodic-gc TIMED_WAITING
> 46 dag-scheduler-event-loop WAITING
> 4389 DestroyJavaVM RUNNABLE
> 12 dispatcher-event-loop-0 WAITING
> 13 dispatcher-event-loop-1 WAITING
> 14 dispatcher-event-loop-2 WAITING
> 15 dispatcher-event-loop-3 WAITING
> 47 driver-revive-thread TIMED_WAITING
> 3 Finalizer WAITING
> 82 ForkJoinPool-1-worker-17 WAITING
> 43 heartbeat-receiver-event-loop-thread TIMED_WAITING
> 93 java-sdk-http-connection-reaper TIMED_WAITING
> 4387 java-sdk-progress-listener-callback-thread WAITING
> 25 map-output-dispatcher-0 WAITING
> 26 map-output-dispatcher-1 WAITING
> 27 map-output-dispatcher-2 WAITING
> 28 map-output-dispatcher-3 WAITING
> 29 map-output-dispatcher-4 WAITING
> 30 map-output-dispatcher-5 WAITING
> 31 map-output-dispatcher-6 WAITING
> 32 map-output-dispatcher-7 WAITING
> 48 MesosCoarseGrainedSchedulerBackend-mesos-driver RUNNABLE
> 44 netty-rpc-env-timeout TIMED_WAITING
> 92 org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner WAITING
> 62 pool-19-thread-1 TIMED_WAITING
> 2 Reference Handler WAITING
> 61 Scheduler-1112394071 TIMED_WAITING
> 20 shuffle-server-0 RUNNABLE
> 55 shuffle-server-0 RUNNABLE
> 21 shuffle-server-1 RUNNABLE
> 56 shuffle-server-1 RUNNABLE
> 22 shuffle-server-2 RUNNABLE
> 57 shuffle-server-2 RUNNABLE
> 23 shuffle-server-3 RUNNABLE
> 58 shuffle-server-3 RUNNABLE
> 4 Signal Dispatcher RUNNABLE
> 59 Spark Context Cleaner TIMED_WAITING
> 9 SparkListenerBus WAITING
> 35 SparkUI-35-selector-ServerConnectorManager@651d3734/0 RUNNABLE
> 36 SparkUI-36-acceptor-0@467924cb-ServerConnector@3b5eaf92{HTTP/1.1}{0.0.0.0:4040} RUNNABLE
> 37 SparkUI-37-selector-ServerConnectorManager@651d3734/1 RUNNABLE
> 38 SparkUI-38 TIMED_WAITING
> 39 SparkUI-39 TIMED_WAITING
> 40 SparkUI-40 TIMED_WAITING
> 41 SparkUI-41 RUNNABLE
> 42 SparkUI-42 TIMED_WAITING
> 438 task-result-getter-0 WAITING
> 450 task-result-getter-1 WAITING
> 489 task-result-getter-2 WAITING
> 492 task-result-getter-3 WAITING
> 75 threadDeathWatcher-2-1 TIMED_WAITING
> 45 Timer-0 WAITING
> {code}
> Thread dump on the executors. It's the same on all of them:
> {code}
> 24 dispatcher-event-loop-0 WAITING
> 25 dispatcher-event-loop-1 WAITING
> 26 dispatcher-event-loop-2 RUNNABLE
> 27 dispatcher-event-loop-3 WAITING
> 39 driver-heartbeater TIMED_WAITING
> 3 Finalizer WAITING
> 58 java-sdk-http-connection-reaper TIMED_WAITING
> 75 java-sdk-progress-listener-callback-thread WAITING
> 1 main TIMED_WAITING
> 33 netty-rpc-env-timeout TIMED_WAITING
> 55 org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner WAITING
> 59 pool-17-thread-1 TIMED_WAITING
> 2 Reference Handler WAITING
> 28 shuffle-client-0 RUNNABLE
> 35 shuffle-client-0 RUNNABLE
> 41 shuffle-client-0 RUNNABLE
> 37 shuffle-server-0 RUNNABLE
> 5 Signal Dispatcher RUNNABLE
> 23 threadDeathWatcher-2-1 TIMED_WAITING
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org