You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/11/08 15:55:58 UTC
[jira] [Commented] (SPARK-18343) FileSystem$Statistics$StatisticsDataReferenceCleaner hangs on s3 write

    [ https://issues.apache.org/jira/browse/SPARK-18343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15647931#comment-15647931 ] 

Sean Owen commented on SPARK-18343:
-----------------------------------

I'd more readily suspect the S3 library threads like java-sdk-progress-listener-callback-thread becuase it involves S3. Can you look into the stack traces to see what actually seems to be stuck on a lock or something?  It doesn't seem like we have evidence of this occurring otherwise.

> FileSystem$Statistics$StatisticsDataReferenceCleaner hangs on s3 write
> ----------------------------------------------------------------------
>
>                 Key: SPARK-18343
>                 URL: https://issues.apache.org/jira/browse/SPARK-18343
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.0.1
>         Environment: Spark 2.0.1
> Hadoop 2.7.1
> Mesos 1.0.1
> Ubuntu 14.04
>            Reporter: Luke Miner
>
> I have a driver program where I write read data in from Cassandra using spark, perform some operations, and then write out to JSON on S3. The program runs fine when I use Spark 1.6.1 and the spark-cassandra-connector 1.6.0-M1.
> However, if I try to upgrade to Spark 2.0.1 (hadoop 2.7.1) and spark-cassandra-connector 2.0.0-M3, the program completes in the sense that all the expected files are written to S3, but the program never terminates.
> I do run `sc.stop()` at the end of the program. I am also using Mesos 1.0.1. In both cases I use the default output committer.
> From the thread dump (included below) it seems like it could be waiting on: `org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner`
> Code snippet:
> {code}
>     // get MongoDB oplog operations
>     val operations = sc.cassandraTable[JsonOperation](keyspace, namespace)
>       .where("ts >= ? AND ts < ?", minTimestamp, maxTimestamp)
>     
>     // replay oplog operations into documents
>     val documents = operations
>       .spanBy(op => op.id)
>       .map { case (id: String, ops: Iterable[T]) => (id, apply(ops)) }
>       .filter { case (id, result) => result.isInstanceOf[Document] }
>       .map { case (id, document) => MergedDocument(id = id, document = document
>         .asInstanceOf[Document])
>       }
>     
>     // write documents to json on s3
>     documents
>       .map(document => document.toJson)
>       .coalesce(partitions)
>       .saveAsTextFile(path, classOf[GzipCodec])
>     sc.stop()
> {code}
> Thread dump on the driver:
> {code}
>     60  context-cleaner-periodic-gc TIMED_WAITING
>     46  dag-scheduler-event-loop    WAITING
>     4389    DestroyJavaVM   RUNNABLE
>     12  dispatcher-event-loop-0 WAITING
>     13  dispatcher-event-loop-1 WAITING
>     14  dispatcher-event-loop-2 WAITING
>     15  dispatcher-event-loop-3 WAITING
>     47  driver-revive-thread    TIMED_WAITING
>     3   Finalizer   WAITING
>     82  ForkJoinPool-1-worker-17    WAITING
>     43  heartbeat-receiver-event-loop-thread    TIMED_WAITING
>     93  java-sdk-http-connection-reaper TIMED_WAITING
>     4387    java-sdk-progress-listener-callback-thread  WAITING
>     25  map-output-dispatcher-0 WAITING
>     26  map-output-dispatcher-1 WAITING
>     27  map-output-dispatcher-2 WAITING
>     28  map-output-dispatcher-3 WAITING
>     29  map-output-dispatcher-4 WAITING
>     30  map-output-dispatcher-5 WAITING
>     31  map-output-dispatcher-6 WAITING
>     32  map-output-dispatcher-7 WAITING
>     48  MesosCoarseGrainedSchedulerBackend-mesos-driver RUNNABLE
>     44  netty-rpc-env-timeout   TIMED_WAITING
>     92  org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner   WAITING
>     62  pool-19-thread-1    TIMED_WAITING
>     2   Reference Handler   WAITING
>     61  Scheduler-1112394071    TIMED_WAITING
>     20  shuffle-server-0    RUNNABLE
>     55  shuffle-server-0    RUNNABLE
>     21  shuffle-server-1    RUNNABLE
>     56  shuffle-server-1    RUNNABLE
>     22  shuffle-server-2    RUNNABLE
>     57  shuffle-server-2    RUNNABLE
>     23  shuffle-server-3    RUNNABLE
>     58  shuffle-server-3    RUNNABLE
>     4   Signal Dispatcher   RUNNABLE
>     59  Spark Context Cleaner   TIMED_WAITING
>     9   SparkListenerBus    WAITING
>     35  SparkUI-35-selector-ServerConnectorManager@651d3734/0   RUNNABLE
>     36  SparkUI-36-acceptor-0@467924cb-ServerConnector@3b5eaf92{HTTP/1.1}{0.0.0.0:4040} RUNNABLE
>     37  SparkUI-37-selector-ServerConnectorManager@651d3734/1   RUNNABLE
>     38  SparkUI-38  TIMED_WAITING
>     39  SparkUI-39  TIMED_WAITING
>     40  SparkUI-40  TIMED_WAITING
>     41  SparkUI-41  RUNNABLE
>     42  SparkUI-42  TIMED_WAITING
>     438 task-result-getter-0    WAITING
>     450 task-result-getter-1    WAITING
>     489 task-result-getter-2    WAITING
>     492 task-result-getter-3    WAITING
>     75  threadDeathWatcher-2-1  TIMED_WAITING
>     45  Timer-0 WAITING
> {code}
> Thread dump on the executors. It's the same on all of them:
> {code}
>     24  dispatcher-event-loop-0 WAITING
>     25  dispatcher-event-loop-1 WAITING
>     26  dispatcher-event-loop-2 RUNNABLE
>     27  dispatcher-event-loop-3 WAITING
>     39  driver-heartbeater  TIMED_WAITING
>     3   Finalizer   WAITING
>     58  java-sdk-http-connection-reaper TIMED_WAITING
>     75  java-sdk-progress-listener-callback-thread  WAITING
>     1   main    TIMED_WAITING
>     33  netty-rpc-env-timeout   TIMED_WAITING
>     55  org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner   WAITING
>     59  pool-17-thread-1    TIMED_WAITING
>     2   Reference Handler   WAITING
>     28  shuffle-client-0    RUNNABLE
>     35  shuffle-client-0    RUNNABLE
>     41  shuffle-client-0    RUNNABLE
>     37  shuffle-server-0    RUNNABLE
>     5   Signal Dispatcher   RUNNABLE
>     23  threadDeathWatcher-2-1  TIMED_WAITING
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org