You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/07/15 09:51:04 UTC

[jira] [Commented] (SPARK-6307) Executers fetches the same rdd-block 100's or 1000's of times

    [ https://issues.apache.org/jira/browse/SPARK-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627683#comment-14627683 ] 

Sean Owen commented on SPARK-6307:
----------------------------------

Based on the discussion in the PR, are we saying this is no longer an issue?

> Executers fetches the same rdd-block 100's or 1000's of times
> -------------------------------------------------------------
>
>                 Key: SPARK-6307
>                 URL: https://issues.apache.org/jira/browse/SPARK-6307
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager
>    Affects Versions: 1.2.0
>         Environment: Linux, Spark Standalone 1.2, running in a PBS grid engine
>            Reporter: Tobias Bertelsen
>
> The block manager keept fetching the same blocks over and over, making tasks with network activity extremely slow. Two identical tasks can take between 12 seconds up to more than an hour. (where I stopped it).
> Spark should cache the blocks, so it does not fetch the same blocks over, and over, and over.
> Here is a simplified version of the code that provokes it:
> {code}
> // Read a few thousand lines (~ 15 MB)
> val fileContents = sc.newAPIHadoopFile(path, ......).repartition(16)
> val data = fileContents.map{x => parseContent(x)}.cache()
> // Do a pairwise comparison and count the best pairs
> val pairs = data.cartesian(data).filter { case ((x,y) =>
>   similarity(x, y) > 0.9
> }
> pairs.count()
> {code}
> This is a tiny fraction of one of the worker's stderr:
> {code}
> 15/03/12 21:55:09 INFO BlockManager: Found block rdd_8_2 remotely
> 15/03/12 21:55:09 INFO BlockManager: Found block rdd_8_2 remotely
> 15/03/12 21:55:09 INFO BlockManager: Found block rdd_8_1 remotely
> 15/03/12 21:55:09 INFO BlockManager: Found block rdd_8_0 remotely
> Thousands more lines, fetching the same 16 remote blocks
> 15/03/12 22:25:44 INFO BlockManager: Found block rdd_8_0 remotely
> 15/03/12 22:25:45 INFO BlockManager: Found block rdd_8_0 remotely
> 15/03/12 22:25:45 INFO BlockManager: Found block rdd_8_0 remotely
> 15/03/12 22:25:45 INFO BlockManager: Found block rdd_8_0 remotely
> 15/03/12 22:25:45 INFO BlockManager: Found block rdd_8_0 remotely
> {code}
> h2. Details for that stage from the UI.
>  - *Total task time across all tasks:* 11.9 h
>  - *Input:* 2.2 GB
>  - *Shuffle read:* 4.5 MB
> h3. Summary Metrics for 176 Completed Tasks
> || Metric || Min || 25th percentile || Median || 75th percentile || Max ||
> | Duration | 7 s | 8 s | 8 s | 12 s | 59 min |
> | GC Time | 0 ms | 99 ms | 0.1 s | 0.2 s | 0.5 s |
> | Input | 6.9 MB | 8.2 MB | 8.4 MB | 9.0 MB | 11.0 MB |
> | Shuffle Read (Remote) | 0.0 B | 0.0 B | 0.0 B | 0.0 B | 676.6 KB |
> h3. Aggregated Metrics by Executor
> || Executor ID || Address || Task Time || Total Tasks || Failed Tasks || Succeeded Tasks || Input || Output || Shuffle Read || Shuffle Write || Shuffle Spill (Memory) || Shuffle Spill (Disk) ||
> | 0 | n-62-23-3:49566 | 5.7 h | 9 | 0 | 9 | 171.0 MB | 0.0 B | 0.0 B | 0.0 B | 0.0 B | 0.0 B |
> | 1 | n-62-23-6:57518 | 16.4 h | 20 | 0 | 20 | 169.9 MB | 0.0 B | 0.0 B | 0.0 B | 0.0 B | 0.0 B |
> | 2 | n-62-18-48:33551 | 0 ms | 0 | 0 | 0 | 169.6 MB | 0.0 B | 0.0 B | 0.0 B | 0.0 B | 0.0 B |
> | 3 | n-62-23-5:58421 | 2.9 min | 12 | 0 | 12 | 266.2 MB | 0.0 B | 4.5 MB | 0.0 B | 0.0 B | 0.0 B |
> | 4 | n-62-23-1:40096 | 23 min | 164 | 0 | 164 | 1430.4 MB | 0.0 B | 0.0 B | 0.0 B | 0.0 B | 0.0 B |
> h3. Tasks
> || Index || ID || Attempt || Status || Locality Level || Executor ID / Host || Launch Time || Duration || GC Time || Input || Shuffle Read || Errors ||
> | 1 | 2 | 0 | SUCCESS | ANY | 3 / n-62-23-5 | 2015/03/12 21:55:00 | 12 s | 0.1 s | 6.9 MB (memory) | 676.6 KB |    | 
> | 0 | 1 | 0 | SUCCESS | ANY | 0 / n-62-23-3 | 2015/03/12 21:55:00 | 39 min | 0.3 s | 8.7 MB (network) | 0.0 B |    | 
> | 4 | 5 | 0 | SUCCESS | ANY | 1 / n-62-23-6 | 2015/03/12 21:55:00 | 38 min | 0.4 s | 8.6 MB (network) | 0.0 B |    | 
> | 3 | 4 | 0 | RUNNING | ANY | 2 / n-62-18-48 | 2015/03/12 21:55:00 | 55 min |  | 8.3 MB (network) | 0.0 B |    | 
> | 2 | 3 | 0 | SUCCESS | ANY | 4 / n-62-23-1 | 2015/03/12 21:55:00 | 11 s | 0.3 s | 8.4 MB (memory) | 0.0 B |    | 
> | 7 | 8 | 0 | SUCCESS | ANY | 4 / n-62-23-1 | 2015/03/12 21:55:00 | 12 s | 0.3 s | 9.2 MB (memory) | 0.0 B |    | 
> | 6 | 7 | 0 | SUCCESS | ANY | 3 / n-62-23-5 | 2015/03/12 21:55:00 | 12 s | 0.1 s | 8.1 MB (memory) | 0.0 B |    | 
> | 5 | 6 | 0 | SUCCESS | ANY | 0 / n-62-23-3 | 2015/03/12 21:55:00 | 39 min | 0.3 s | 8.6 MB (network) | 0.0 B |    | 
> | 9 | 10 | 0 | RUNNING | ANY | 1 / n-62-23-6 | 2015/03/12 21:55:00 | 55 min |  | 8.7 MB (network) | 0.0 B |    | 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org