You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/09/08 23:45:00 UTC
[jira] [Commented] (KUDU-2671) Change hash number for range partitioning

    [ https://issues.apache.org/jira/browse/KUDU-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17412239#comment-17412239 ] 

ASF subversion and git services commented on KUDU-2671:
-------------------------------------------------------

Commit 03451904a20123ca27eaa4e9773b94c0532fd342 in kudu's branch refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=0345190 ]

KUDU-2671 allow a range to have empty hash schema

With this patch, the semantics of an empty hash schema for a range
changes.  Now an empty per-range hash schema means no hash bucketing for
the range.  Prior to this patch, an empty hash schema for a range meant
using the table-wide hash schema.

The new semantics are better because:
  * they allow for having ranges with no hash bucketing even if there is
    a non-trivial hash bucketing at the table-wide level
  * they are less surprising to a user of the client API

This patch updates several test cases to account for the change and
adds a new test case to cover the new functionality.

Change-Id: Ia43df69ecd7040e285e098fde49d84a7a00d1fbb
Reviewed-on: http://gerrit.cloudera.org:8080/17825
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Reviewed-by: Mahesh Reddy <mr...@cloudera.com>


> Change hash number for range partitioning
> -----------------------------------------
>
>                 Key: KUDU-2671
>                 URL: https://issues.apache.org/jira/browse/KUDU-2671
>             Project: Kudu
>          Issue Type: Improvement
>          Components: client, java, master, server
>    Affects Versions: 1.8.0
>            Reporter: yangz
>            Assignee: Mahesh Reddy
>            Priority: Major
>              Labels: feature, roadmap-candidate, scalability
>         Attachments: 屏幕快照 2019-01-24 下午12.03.41.png
>
>
> For our usage, the kudu schema design isn't flexible enough.
> We create our table for day range such as dt='20181112' as hive table.
> But our data size change a lot every day, for one day it will be 50G， but for some other day it will be 500G. For this case, it be hard to set the hash schema. If too big, for most case, it will be too wasteful. But too small, there is a performance problem in the case of a large amount of data.
>  
> So we suggest a solution we can change the hash number by the history data of a table.
> for example
>  # we create schema with one estimated value.
>  # we collect the data size by day range
>  # we create new day range partition by our collected day size.
> We use this feature for half a year, and it work well. We hope this feature will be useful for the community. Maybe the solution isn't so complete. Please help us make it better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)