You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2020/03/17 07:53:58 UTC
[spark] branch master updated: [MINOR][SQL] Update the
DataFrameWriter.bucketBy comment
This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 124b4ce [MINOR][SQL] Update the DataFrameWriter.bucketBy comment
124b4ce is described below
commit 124b4ce2e6e8f84294f8fc13d3e731a82325dacb
Author: Takeshi Yamamuro <ya...@apache.org>
AuthorDate: Tue Mar 17 00:52:45 2020 -0700
[MINOR][SQL] Update the DataFrameWriter.bucketBy comment
### What changes were proposed in this pull request?
This PR intends to update the `DataFrameWriter.bucketBy` comment for clearly describing that the bucketBy scheme follows a Spark "specific" one.
I saw the questions about the current bucketing compatibility with Hive in [SPARK-31162](https://issues.apache.org/jira/browse/SPARK-31162?focusedCommentId=17060408&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17060408) and [SPARK-17495](https://issues.apache.org/jira/browse/SPARK-17495?focusedCommentId=17059847&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17059847) from users and IMHO the comment is a bit confusing [...]
### Why are the changes needed?
To make users understood smoothly.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
N/A
Closes #27930 from maropu/UpdateBucketByComment.
Authored-by: Takeshi Yamamuro <ya...@apache.org>
Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
index 22b26ca..6946c1f 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
@@ -198,7 +198,8 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
/**
* Buckets the output by the given columns. If specified, the output is laid out on the file
- * system similar to Hive's bucketing scheme.
+ * system similar to Hive's bucketing scheme, but with a different bucket hash function
+ * and is not compatible with Hive's bucketing.
*
* This is applicable for all file-based data sources (e.g. Parquet, JSON) starting with Spark
* 2.1.0.
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org