You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by js...@apache.org on 2018/07/25 01:08:48 UTC
spark git commit: [SPARK-24297][CORE] Fetch-to-disk by default for >
2gb
Repository: spark
Updated Branches:
refs/heads/master 3efdf3532 -> 15fff7903
[SPARK-24297][CORE] Fetch-to-disk by default for > 2gb
Fetch-to-mem is guaranteed to fail if the message is bigger than 2 GB,
so we might as well use fetch-to-disk in that case. The message includes
some metadata in addition to the block data itself (in particular
UploadBlock has a lot of metadata), so we leave a little room.
Author: Imran Rashid <ir...@cloudera.com>
Closes #21474 from squito/SPARK-24297.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/15fff790
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/15fff790
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/15fff790
Branch: refs/heads/master
Commit: 15fff79032f6d708d8570b5e83144f1f84519552
Parents: 3efdf35
Author: Imran Rashid <ir...@cloudera.com>
Authored: Wed Jul 25 09:08:42 2018 +0800
Committer: jerryshao <ss...@hortonworks.com>
Committed: Wed Jul 25 09:08:42 2018 +0800
----------------------------------------------------------------------
.../scala/org/apache/spark/internal/config/package.scala | 6 +++++-
docs/configuration.md | 10 ++++++----
2 files changed, 11 insertions(+), 5 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/15fff790/core/src/main/scala/org/apache/spark/internal/config/package.scala
----------------------------------------------------------------------
diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index ba892bf..8fef2aa 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -432,7 +432,11 @@ package object config {
"external shuffle service, this feature can only be worked when external shuffle" +
"service is newer than Spark 2.2.")
.bytesConf(ByteUnit.BYTE)
- .createWithDefault(Long.MaxValue)
+ // fetch-to-mem is guaranteed to fail if the message is bigger than 2 GB, so we might
+ // as well use fetch-to-disk in that case. The message includes some metadata in addition
+ // to the block data itself (in particular UploadBlock has a lot of metadata), so we leave
+ // extra room.
+ .createWithDefault(Int.MaxValue - 512)
private[spark] val TASK_METRICS_TRACK_UPDATED_BLOCK_STATUSES =
ConfigBuilder("spark.taskMetrics.trackUpdatedBlockStatuses")
http://git-wip-us.apache.org/repos/asf/spark/blob/15fff790/docs/configuration.md
----------------------------------------------------------------------
diff --git a/docs/configuration.md b/docs/configuration.md
index 0c7c447..60c0358 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -580,13 +580,15 @@ Apart from these, the following properties are also available, and may be useful
</tr>
<tr>
<td><code>spark.maxRemoteBlockSizeFetchToMem</code></td>
- <td>Long.MaxValue</td>
+ <td>Int.MaxValue - 512</td>
<td>
The remote block will be fetched to disk when size of the block is above this threshold in bytes.
- This is to avoid a giant request takes too much memory. We can enable this config by setting
- a specific value(e.g. 200m). Note this configuration will affect both shuffle fetch
+ This is to avoid a giant request that takes too much memory. By default, this is only enabled
+ for blocks > 2GB, as those cannot be fetched directly into memory, no matter what resources are
+ available. But it can be turned down to a much lower value (eg. 200m) to avoid using too much
+ memory on smaller blocks as well. Note this configuration will affect both shuffle fetch
and block manager remote block fetch. For users who enabled external shuffle service,
- this feature can only be worked when external shuffle service is newer than Spark 2.2.
+ this feature can only be used when external shuffle service is newer than Spark 2.2.
</td>
</tr>
<tr>
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org