You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Sailesh Patel (JIRA)" <ji...@apache.org> on 2017/11/23 23:49:01 UTC

[jira] [Created] (KUDU-2224) Kudu Partition Dynamic Creation on Insertion

Sailesh Patel created KUDU-2224:
-----------------------------------

             Summary: Kudu Partition Dynamic Creation on Insertion
                 Key: KUDU-2224
                 URL: https://issues.apache.org/jira/browse/KUDU-2224
             Project: Kudu
          Issue Type: New Feature
    Affects Versions: 1.4.0
            Reporter: Sailesh Patel
            Priority: Minor


Option to specify a more simplistic directive for partitioning where by Kudu will create partitions on the fly instead of manual intervention of creating additional partitions as described in:

  https://kudu.apache.org/2016/08/23/new-range-partitioning-features.html
  
  https://kudu.apache.org/docs/kudu_impala_integration.html#partitioning_tables
       "Non-Covering Range Partitions"
  
+Requirement:+
   When creating partitioning, a partitioning rule is specified, whereby the granularity size is specified and a new partition  is created :
    -at insert time when one does not exist for that value.

e.g  proposal
CREATE TABLE sample_table (ts TIMESTAMP, eventid BIGINT, somevalue STRING, PRIMARY KEY(ts,eventid) )
PARTITION BY 
RANGE(ts) GRANULARITY= 86400000000000 START = 1104537600000000 
STORED AS KUDU;

   - Maybe an optional END
   - The start is to show were there partition granularity builds from  
-----    

Use case
- time series data where timestamps arrive out of order, can catch up from sometimes years in the past and and for unpredictable timestamps. Event information is either a timestamp (say epoch nano or epoch millisecond) with partitions based upon a range value of that timestamp (typically day or hour granularity)

Currently, we script up the creation of partitions in advance of our received data but if they fail for any reason the insert fails. Also, if we receive unexpected data from a timestamp way in the past that if there is no partition for the insert will fail.

Opening this Jira enhancement for discussion.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)