You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ma...@apache.org on 2016/02/23 00:27:33 UTC
spark git commit: [SPARK-12546][SQL] Change default number of open
parquet files
Repository: spark
Updated Branches:
refs/heads/master 4a91806a4 -> 173aa949c
[SPARK-12546][SQL] Change default number of open parquet files
A common problem that users encounter with Spark 1.6.0 is that writing to a partitioned parquet table OOMs. The root cause is that parquet allocates a significant amount of memory that is not accounted for by our own mechanisms. As a workaround, we can ensure that only a single file is open per task unless the user explicitly asks for more.
Author: Michael Armbrust <mi...@databricks.com>
Closes #11308 from marmbrus/parquetWriteOOM.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/173aa949
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/173aa949
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/173aa949
Branch: refs/heads/master
Commit: 173aa949c309ff7a7a03e9d762b9108542219a95
Parents: 4a91806
Author: Michael Armbrust <mi...@databricks.com>
Authored: Mon Feb 22 15:27:29 2016 -0800
Committer: Michael Armbrust <mi...@databricks.com>
Committed: Mon Feb 22 15:27:29 2016 -0800
----------------------------------------------------------------------
sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/173aa949/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala b/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala
index 61a7b99..a601c87 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala
@@ -430,7 +430,7 @@ private[spark] object SQLConf {
val PARTITION_MAX_FILES =
intConf("spark.sql.sources.maxConcurrentWrites",
- defaultValue = Some(5),
+ defaultValue = Some(1),
doc = "The maximum number of concurrent files to open before falling back on sorting when " +
"writing out files using dynamic partitioning.")
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org