You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@bahir.apache.org by "Balazs Varga (Jira)" <ji...@apache.org> on 2020/06/15 12:44:00 UTC

[jira] [Created] (BAHIR-237) Hash and range partitioning support in Flink-Kudu connector SQL DDL

Balazs Varga created BAHIR-237:
----------------------------------

             Summary: Hash and range partitioning support in Flink-Kudu connector SQL DDL
                 Key: BAHIR-237
                 URL: https://issues.apache.org/jira/browse/BAHIR-237
             Project: Bahir
          Issue Type: Improvement
          Components: Flink Streaming Connectors
            Reporter: Balazs Varga


The current version of the Flink-Kudu connector's SQL DDL only supports a limited set of properties. With regards to partitioning, only the 'kudu.hash-columns' option is available, which doesn't allow the setting of the number of hash partitions. Range partitioning is currently not supported.

Since partitioning cannot be altered later, it should be possible to set the number of hash buckets for each hash column in the DDL. A simple way to achieve this is using additional properties. Here are some ways I can think of specifying it:
 * 
'kudu.hash-columns'='col1,col2', kudu.hash-buckets'='4,8'
 * 
'kudu.hash-partitioning'='col1,4;col2,8'
 * 
'kudu.hash-buckets.col1' = '4', 'kudu.hash-buckets.col2' = '8'

I'd appreciate your input regarding which approach would be the best.

For range partitioning, I recommend adding a property to set the range partitioning columns: 'kudu.range-columns'='col1,col2'

If this is correctly set for a table, the partitions themselves can be added later using ALTER TABLE. Specifying the ranges here would add a lot of complex (parsing) logic. 

{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)