You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Fernando Pereira (JIRA)" <ji...@apache.org> on 2018/01/30 17:17:00 UTC

[jira] [Comment Edited] (SPARK-19256) Hive bucketing support

    [ https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345397#comment-16345397 ] 

Fernando Pereira edited comment on SPARK-19256 at 1/30/18 5:16 PM:
-------------------------------------------------------------------

Thanks a lot for this great contribution to Spark.

I was just wondering, would it make sense to apply this to direct outputs (e.g. write.parquet()), so that we could keep partitioning information - and again avoid reshuffling data before a merge? I believe this is most what saveAsTable() does by default in Spark, but to my mind it would improve the DataFrame write API and make these performance benefits more accessible.


was (Author: ferdonline):
Thanks a lot for this great contribution to Spark.

 

I was just wondering, would it make sense to apply this to direct outputs (e.g. write.parquet()), so that we could keep partitioning information - and again avoid reshuffling data before a merge? I believe this is most what saveAsTable() does by default in Spark, but to my mind it would improve the DataFrame write API and make these performance benefits more accessible.

> Hive bucketing support
> ----------------------
>
>                 Key: SPARK-19256
>                 URL: https://issues.apache.org/jira/browse/SPARK-19256
>             Project: Spark
>          Issue Type: Umbrella
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Tejas Patil
>            Priority: Minor
>
> JIRA to track design discussions and tasks related to Hive bucketing support in Spark.
> Proposal : https://docs.google.com/document/d/1a8IDh23RAkrkg9YYAeO51F4aGO8-xAlupKwdshve2fc/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org