You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maciej Bryński (JIRA)" <ji...@apache.org> on 2017/10/24 12:05:00 UTC

[jira] [Commented] (SPARK-22118) Should prevent change epoch in success stage while there is some running stage

    [ https://issues.apache.org/jira/browse/SPARK-22118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16216768#comment-16216768 ] 

Maciej Bryński commented on SPARK-22118:
----------------------------------------

I think I have such a problem

{code}
2017-10-24 11:23:47,482 DEBUG [org.apache.spark.scheduler.DAGScheduler] - submitStage(ShuffleMapStage 52)
2017-10-24 11:23:47,484 DEBUG [org.apache.hadoop.hdfs.DFSClient] - DFSClient seqno: 2164 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 1761269
2017-10-24 11:23:47,484 DEBUG [org.apache.hadoop.hdfs.DFSClient] - DFSClient seqno: 2165 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 1631471
2017-10-24 11:23:47,485 DEBUG [org.apache.hadoop.hdfs.DFSClient] - DFSClient seqno: 2166 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 1608203
2017-10-24 11:23:47,544 INFO [org.apache.spark.storage.memory.MemoryStore] - Block broadcast_69 stored as values in memory (estimated size 514.2 KB, free 897.0 MB)
2017-10-24 11:23:47,544 DEBUG [org.apache.spark.storage.BlockManager] - Put block broadcast_69 locally took  0 ms
2017-10-24 11:23:47,544 DEBUG [org.apache.spark.storage.BlockManager] - Putting block broadcast_69 without replication took  0 ms
2017-10-24 11:23:47,549 INFO [org.apache.spark.storage.memory.MemoryStore] - Block broadcast_69_piece0 stored as bytes in memory (estimated size 514.6 KB, free 896.5 MB)
2017-10-24 11:23:47,550 INFO [org.apache.spark.storage.BlockManagerInfo] - Added broadcast_69_piece0 in memory on 10.32.32.227:36396 (size: 514.6 KB, free: 910.7 MB)
2017-10-24 11:23:47,550 DEBUG [org.apache.spark.storage.BlockManagerMaster] - Updated info of block broadcast_69_piece0
2017-10-24 11:23:47,550 DEBUG [org.apache.spark.storage.BlockManager] - Told master about block broadcast_69_piece0
2017-10-24 11:23:47,550 DEBUG [org.apache.spark.storage.BlockManager] - Put block broadcast_69_piece0 locally took  1 ms
2017-10-24 11:23:47,550 DEBUG [org.apache.spark.storage.BlockManager] - Putting block broadcast_69_piece0 without replication took  1 ms
2017-10-24 11:23:47,550 INFO [org.apache.spark.MapOutputTracker] - Broadcast mapstatuses size = 430, actual size = 526530
2017-10-24 11:23:47,550 INFO [org.apache.spark.MapOutputTrackerMaster] - Size of output statuses for shuffle 14 is 430 bytes
2017-10-24 11:23:47,550 INFO [org.apache.spark.MapOutputTrackerMaster] - Epoch changed, not caching!
2017-10-24 11:23:47,550 DEBUG [org.apache.spark.broadcast.TorrentBroadcast] - Unpersisting TorrentBroadcast 69
2017-10-24 11:23:47,551 DEBUG [org.apache.spark.storage.BlockManagerSlaveEndpoint] - removing broadcast 69
2017-10-24 11:23:47,551 DEBUG [org.apache.spark.storage.BlockManager] - Removing broadcast 69
2017-10-24 11:23:47,551 DEBUG [org.apache.spark.storage.BlockManager] - Removing block broadcast_69
2017-10-24 11:23:47,551 DEBUG [org.apache.spark.storage.memory.MemoryStore] - Block broadcast_69 of size 526576 dropped from memory (free 940623361)
2017-10-24 11:23:47,551 DEBUG [org.apache.spark.storage.BlockManager] - Removing block broadcast_69_piece0
2017-10-24 11:23:47,551 DEBUG [org.apache.spark.storage.memory.MemoryStore] - Block broadcast_69_piece0 of size 526913 dropped from memory (free 941150274)
2017-10-24 11:23:47,551 DEBUG [org.apache.spark.MapOutputTrackerMaster] - cached status not found for : 14
2017-10-24 11:23:47,552 INFO [org.apache.spark.storage.BlockManagerInfo] - Removed broadcast_69_piece0 on 10.32.32.227:36396 in memory (size: 514.6 KB, free: 911.2 MB)
2017-10-24 11:23:47,554 DEBUG [org.apache.spark.storage.BlockManagerMaster] - Updated info of block broadcast_69_piece0
2017-10-24 11:23:47,554 DEBUG [org.apache.spark.storage.BlockManager] - Told master about block broadcast_69_piece0
2017-10-24 11:23:47,555 DEBUG [org.apache.spark.storage.BlockManagerSlaveEndpoint] - Done removing broadcast 69, response is 0
2017-10-24 11:23:47,555 DEBUG [org.apache.spark.storage.BlockManagerSlaveEndpoint] - Sent response: 0 to 10.32.32.227:53171
2017-10-24 11:23:47,556 INFO [org.apache.spark.MapOutputTrackerMasterEndpoint] - Asked to send map output locations for shuffle 14 to 10.32.32.28:39424
2017-10-24 11:23:47,556 DEBUG [org.apache.spark.MapOutputTrackerMaster] - Handling request to send map output locations for shuffle 14 to 10.32.32.28:39424
2017-10-24 11:23:47,556 DEBUG [org.apache.spark.MapOutputTrackerMaster] - cached status not found for : 14
2017-10-24 11:23:47,607 DEBUG [org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint] - Launching task 31490 on executor id: 2 hostname: dwh-hn28.adpilot.co.
2017-10-24 11:23:47,608 DEBUG [org.apache.spark.ExecutorAllocationManager] - Clearing idle timer for 2 because it is now running a task
2017-10-24 11:23:47,633 DEBUG [org.apache.hadoop.ipc.Client] - IPC Client (766216029) connection to master/10.32.32.3:8032 from bi sending #1418
2017-10-24 11:23:47,634 DEBUG [org.apache.hadoop.ipc.Client] - IPC Client (766216029) connection to master/10.32.32.3:8032 from bi got value #1418
2017-10-24 11:23:47,634 DEBUG [org.apache.hadoop.ipc.ProtobufRpcEngine] - Call: getApplicationReport took 1ms
2017-10-24 11:23:47,635 WARN [org.apache.spark.scheduler.TaskSetManager] - Lost task 9.0 in stage 46.0 (TID 31479, testserver, executor 2): java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_69_piece0 of broadcast_69
{code}

> Should prevent change epoch in success stage while there is some running stage
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-22118
>                 URL: https://issues.apache.org/jira/browse/SPARK-22118
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.1
>            Reporter: SuYan
>
> In 2.1, will change epoch if stage success, and will trigger mapoutTracker to clean cache broadcast....
> but when there have other running stage, and want to get mapstatus broadcast value...it will has the possibility that occurs failed to get broadcast? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org