You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bahir.apache.org by "Balazs Varga (Jira)" <ji...@apache.org> on 2020/06/15 12:44:00 UTC
[jira] [Created] (BAHIR-237) Hash and range partitioning support in
Flink-Kudu connector SQL DDL
Balazs Varga created BAHIR-237:
----------------------------------
Summary: Hash and range partitioning support in Flink-Kudu connector SQL DDL
Key: BAHIR-237
URL: https://issues.apache.org/jira/browse/BAHIR-237
Project: Bahir
Issue Type: Improvement
Components: Flink Streaming Connectors
Reporter: Balazs Varga
The current version of the Flink-Kudu connector's SQL DDL only supports a limited set of properties. With regards to partitioning, only the 'kudu.hash-columns' option is available, which doesn't allow the setting of the number of hash partitions. Range partitioning is currently not supported.
Since partitioning cannot be altered later, it should be possible to set the number of hash buckets for each hash column in the DDL. A simple way to achieve this is using additional properties. Here are some ways I can think of specifying it:
*
'kudu.hash-columns'='col1,col2', kudu.hash-buckets'='4,8'
*
'kudu.hash-partitioning'='col1,4;col2,8'
*
'kudu.hash-buckets.col1' = '4', 'kudu.hash-buckets.col2' = '8'
I'd appreciate your input regarding which approach would be the best.
For range partitioning, I recommend adding a property to set the range partitioning columns: 'kudu.range-columns'='col1,col2'
If this is correctly set for a table, the partitions themselves can be added later using ALTER TABLE. Specifying the ranges here would add a lot of complex (parsing) logic.
{{}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)