You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Nishith Agarwal (Jira)" <ji...@apache.org> on 2020/11/24 01:30:00 UTC

[jira] [Commented] (HUDI-55) Investigate support for bucketed tables ala Hive #74

    [ https://issues.apache.org/jira/browse/HUDI-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237775#comment-17237775 ] 

Nishith Agarwal commented on HUDI-55:
-------------------------------------

Blurb from slack channel : 

```
I have a requirement to compact datalake but need bucketing on top of compaction so that during query time, only the files relevant to the "id" in query would be scanned. Is that supported in Hudi? If not, is it possible to extend Hudi to support it? Hello Team - we have a need for bucketing our datasets (primarily to keep the parquet file size optimized for faster read). We see that Hudi doesn't support bucketing now. Are there any plans to support bucketing in the future?
I have a requirement to compact datalake but need bucketing on top of compaction so that during query time, only the files relevant to the "id" in query would be scanned. Is that supported in Hudi? If not, is it possible to extend Hudi to support it? Following up on the email"Bucketing in Hudi", we would like to schedule a meeting to understand and estimate the code changes needed to achieve bucketing in Hudi. The high level requirements are as detailed in email but we could chat further in the
meeting to get into specifics. When would be the earliest we could have this discussion?
```

> Investigate support for bucketed tables ala Hive #74
> ----------------------------------------------------
>
>                 Key: HUDI-55
>                 URL: https://issues.apache.org/jira/browse/HUDI-55
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: Hive Integration
>            Reporter: Vinoth Chandar
>            Priority: Major
>
> https://github.com/uber/hudi/issues/74



--
This message was sent by Atlassian Jira
(v8.3.4#803005)