You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Charles Allen (JIRA)" <ji...@apache.org> on 2017/01/06 22:32:58 UTC

[jira] [Created] (SPARK-19111) S3 Mesos history upload fails if too large or if distributed datastore is misbehaving

Charles Allen created SPARK-19111:
-------------------------------------

             Summary: S3 Mesos history upload fails if too large or if distributed datastore is misbehaving
                 Key: SPARK-19111
                 URL: https://issues.apache.org/jira/browse/SPARK-19111
             Project: Spark
          Issue Type: Bug
          Components: EC2, Mesos, Spark Core
    Affects Versions: 2.0.0
            Reporter: Charles Allen


{code}
2017-01-06T21:32:32,928 INFO [main] org.apache.spark.ui.SparkUI - Stopped Spark web UI at http://REDACTED:4041
2017-01-06T21:32:32,938 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.jvmGCTime
2017-01-06T21:32:32,939 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.shuffle.read.localBlocksFetched
2017-01-06T21:32:32,939 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.resultSerializationTime
2017-01-06T21:32:32,939 ERROR [heartbeat-receiver-event-loop-thread] org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(
364,WrappedArray())
2017-01-06T21:32:32,939 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.resultSize
2017-01-06T21:32:32,939 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.peakExecutionMemory
2017-01-06T21:32:32,939 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.shuffle.read.fetchWaitTime
2017-01-06T21:32:32,939 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.memoryBytesSpilled
2017-01-06T21:32:32,940 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.shuffle.read.remoteBytesRead
2017-01-06T21:32:32,940 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.diskBytesSpilled
2017-01-06T21:32:32,940 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.shuffle.read.localBytesRead
2017-01-06T21:32:32,940 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.shuffle.read.recordsRead
2017-01-06T21:32:32,940 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.executorDeserializeTime
2017-01-06T21:32:32,940 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: output/bytes
2017-01-06T21:32:32,941 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.executorRunTime
2017-01-06T21:32:32,941 INFO [SparkListenerBus] com.metamx.starfire.spark.SparkDriver - emitting metric: internal.metrics.shuffle.read.remoteBlocksFetched
2017-01-06T21:32:32,943 INFO [main] org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1387.inprogress' closed. Now beginning upload
2017-01-06T21:32:32,963 ERROR [heartbeat-receiver-event-loop-thread] org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(905,WrappedArray())
2017-01-06T21:32:32,973 ERROR [heartbeat-receiver-event-loop-thread] org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(519,WrappedArray())
2017-01-06T21:32:32,988 ERROR [heartbeat-receiver-event-loop-thread] org.apache.spark.scheduler.LiveListenerBus - SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(596,WrappedArray())
{code}

Running spark on mesos, some large jobs fail to upload to the history server storage!

A successful sequence of events in the log that yield an upload are as follows:

{code}
2017-01-06T19:14:32,925 INFO [main] org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' writing to tempfile '/mnt/tmp/hadoop/output-2516573909248961808.tmp'
2017-01-06T21:59:14,789 INFO [main] org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' closed. Now beginning upload
2017-01-06T21:59:44,679 INFO [main] org.apache.hadoop.fs.s3native.NativeS3FileSystem - OutputStream for key 'eventLogs/remnant/46bf8f87-6de6-4da8-9cba-5b2fecd0875e-1434.inprogress' upload complete
{code}

But large jobs do not ever get to the {{upload complete}} log message, and instead exit before completion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org