You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "John Zhuge (JIRA)" <ji...@apache.org> on 2018/07/26 23:58:00 UTC
[jira] [Updated] (SPARK-24940) Coalesce Hint for SQL Queries
[ https://issues.apache.org/jira/browse/SPARK-24940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Zhuge updated SPARK-24940:
-------------------------------
Summary: Coalesce Hint for SQL Queries (was: Coalesce Hint for SQL)
> Coalesce Hint for SQL Queries
> -----------------------------
>
> Key: SPARK-24940
> URL: https://issues.apache.org/jira/browse/SPARK-24940
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.1.1
> Reporter: John Zhuge
> Priority: Major
>
> Many Spark SQL users in my company have asked for a way to control the number of output files in Spark SQL. The users prefer not to use function repartition\(n\) or coalesce(n, shuffle) that require them to write and deploy Scala/Java/Python code.
>
> There are use cases to either reduce or increase the number.
>
> The DataFrame API has repartition/coalesce for a long time. However, we do not have an equivalent functionality in SQL queries. We propose adding the following Hive-style Coalesce hint to Spark SQL.
> {noformat}
> /*+ COALESCE(n, shuffle) */
> /*+ REPARTITION(n) */
> {noformat}
> REPARTITION\(n\) is equal to COALESCE(n, shuffle=true).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org