You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiao Li (JIRA)" <ji...@apache.org> on 2016/01/25 03:02:39 UTC
[jira] [Created] (SPARK-12975) Eliminate Bucketing Columns that are
part of Partitioning Columns
Xiao Li created SPARK-12975:
-------------------------------
Summary: Eliminate Bucketing Columns that are part of Partitioning Columns
Key: SPARK-12975
URL: https://issues.apache.org/jira/browse/SPARK-12975
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 2.0.0
Reporter: Xiao Li
When users are using partitionBy and bucketBy at the same time, some bucketing columns might be part of partitioning columns. For example,
{code}
df.write
.format(source)
.partitionBy("i")
.bucketBy(8, "i", "k")
.sortBy("k")
.saveAsTable("bucketed_table")
{code}
However, in the above case, adding column `i` is useless. It is just wasting extra CPU when reading or writing bucket tables. Thus, we can automatically remove these overlapping columns from bucketing columns.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org