You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Grant Henke (Jira)" <ji...@apache.org> on 2019/08/28 15:27:00 UTC

[jira] [Resolved] (KUDU-2224) Kudu Partition Dynamic Creation on Insertion

     [ https://issues.apache.org/jira/browse/KUDU-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Henke resolved KUDU-2224.
-------------------------------
    Fix Version/s: NA
       Resolution: Duplicate

> Kudu Partition Dynamic Creation on Insertion
> --------------------------------------------
>
>                 Key: KUDU-2224
>                 URL: https://issues.apache.org/jira/browse/KUDU-2224
>             Project: Kudu
>          Issue Type: New Feature
>    Affects Versions: 1.4.0
>            Reporter: Sailesh Patel
>            Assignee: HeLifu
>            Priority: Minor
>             Fix For: NA
>
>
> Option to specify a more simplistic directive for partitioning where by Kudu will create partitions on the fly instead of manual intervention of creating additional partitions as described in:
>   https://kudu.apache.org/2016/08/23/new-range-partitioning-features.html
>   
>   https://kudu.apache.org/docs/kudu_impala_integration.html#partitioning_tables
>        "Non-Covering Range Partitions"
>   
> +Requirement:+
>    When creating partitioning, a partitioning rule is specified, whereby the granularity size is specified and a new partition  is created :
>     -at insert time when one does not exist for that value.
> e.g  proposal
> CREATE TABLE sample_table (ts TIMESTAMP, eventid BIGINT, somevalue STRING, PRIMARY KEY(ts,eventid) )
> PARTITION BY 
> RANGE(ts) GRANULARITY= 86400000000000 START = 1104537600000000 
> STORED AS KUDU;
>    - Maybe an optional END
>    - The start is to show were there partition granularity builds from  
> -----    
> Use case
> - time series data where timestamps arrive out of order, can catch up from sometimes years in the past and and for unpredictable timestamps. Event information is either a timestamp (say epoch nano or epoch millisecond) with partitions based upon a range value of that timestamp (typically day or hour granularity)
> Currently, we script up the creation of partitions in advance of our received data but if they fail for any reason the insert fails. Also, if we receive unexpected data from a timestamp way in the past that if there is no partition for the insert will fail.
> Opening this Jira enhancement for discussion.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)