You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/12/01 01:33:00 UTC
[jira] [Commented] (KUDU-2671) Change hash number for range partitioning

    [ https://issues.apache.org/jira/browse/KUDU-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241181#comment-17241181 ] 

ASF subversion and git services commented on KUDU-2671:
-------------------------------------------------------

Commit 17575e0b693cf97a6ec5d74e78d89343de4781eb in kudu's branch refs/heads/master from Mahesh Reddy
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=17575e0 ]

[partitioning] KUDU-2671: Support for range specific HashSchemas.

This patch updates PartitionSchema::CreatePartitions() to support the
ability to add different hash schemas per each range. If no hash schema
per range is specified, the table wide hash schema is used. Currently,
this only works if no split_rows are specified.

Since split_rows only exists for backwards compatibility reasons, this
feature will not be supported with split_rows. Returning a message to
the user stating to specify both upper and lower bounds either at table
creation or alteration time should suffice. Split_rows is also more
syntactically ambiguous when specifying bounds.

Currently, range_hash_schemas holds the HashBucketSchemas for each range.
Its order corresponds to the bounds in range_bounds so that when the
bounds are sorted its corresponding hash schemas are sorted as well.

Inspiration from Vlad: https://gerrit.cloudera.org/c/15758/

Change-Id: I8725f4bd072a81b05b36dfc7df0c074c172b4ce8
Reviewed-on: http://gerrit.cloudera.org:8080/16596
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Tested-by: Andrew Wong <aw...@cloudera.com>


> Change hash number for range partitioning
> -----------------------------------------
>
>                 Key: KUDU-2671
>                 URL: https://issues.apache.org/jira/browse/KUDU-2671
>             Project: Kudu
>          Issue Type: Improvement
>          Components: client, java, master, server
>    Affects Versions: 1.8.0
>            Reporter: yangz
>            Assignee: Mahesh Reddy
>            Priority: Major
>              Labels: feature, roadmap-candidate, scalability
>         Attachments: 屏幕快照 2019-01-24 下午12.03.41.png
>
>
> For our usage, the kudu schema design isn't flexible enough.
> We create our table for day range such as dt='20181112' as hive table.
> But our data size change a lot every day, for one day it will be 50G， but for some other day it will be 500G. For this case, it be hard to set the hash schema. If too big, for most case, it will be too wasteful. But too small, there is a performance problem in the case of a large amount of data.
>  
> So we suggest a solution we can change the hash number by the history data of a table.
> for example
>  # we create schema with one estimated value.
>  # we collect the data size by day range
>  # we create new day range partition by our collected day size.
> We use this feature for half a year, and it work well. We hope this feature will be useful for the community. Maybe the solution isn't so complete. Please help us make it better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)