You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/03/12 01:06:00 UTC
[jira] [Commented] (KUDU-2671) Change hash number for range partitioning

    [ https://issues.apache.org/jira/browse/KUDU-2671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17299959#comment-17299959 ] 

ASF subversion and git services commented on KUDU-2671:
-------------------------------------------------------

Commit 8ebe032847f618d76459b59c84b150c997016ee9 in kudu's branch refs/heads/master from Mahesh Reddy
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=8ebe032 ]

KUDU-2671: No-op ClientServerMapping if only one schema exists.

For the new field of PartitionSchema, a client schema is necessary
for RowOperationsPBDecoder to populate the new field. Not all call
sites for PartitionSchema::FromPB() have access to an explicit client
schema.

This patch allows the client schema to have column IDs if it's equal
to the tablet schema and no-ops the ClientServerMapping in this case.
If we don't have access to a client schema, we don't need a mapping
since we only have one schema.

Change-Id: I836f157201509a9b38a3ad42d653236f240dda5e
Reviewed-on: http://gerrit.cloudera.org:8080/17161
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Tested-by: Andrew Wong <aw...@cloudera.com>


> Change hash number for range partitioning
> -----------------------------------------
>
>                 Key: KUDU-2671
>                 URL: https://issues.apache.org/jira/browse/KUDU-2671
>             Project: Kudu
>          Issue Type: Improvement
>          Components: client, java, master, server
>    Affects Versions: 1.8.0
>            Reporter: yangz
>            Assignee: Mahesh Reddy
>            Priority: Major
>              Labels: feature, roadmap-candidate, scalability
>         Attachments: 屏幕快照 2019-01-24 下午12.03.41.png
>
>
> For our usage, the kudu schema design isn't flexible enough.
> We create our table for day range such as dt='20181112' as hive table.
> But our data size change a lot every day, for one day it will be 50G， but for some other day it will be 500G. For this case, it be hard to set the hash schema. If too big, for most case, it will be too wasteful. But too small, there is a performance problem in the case of a large amount of data.
>  
> So we suggest a solution we can change the hash number by the history data of a table.
> for example
>  # we create schema with one estimated value.
>  # we collect the data size by day range
>  # we create new day range partition by our collected day size.
> We use this feature for half a year, and it work well. We hope this feature will be useful for the community. Maybe the solution isn't so complete. Please help us make it better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)