You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/07/04 10:41:42 UTC

[GitHub] [flink-table-store] JingsongLi opened a new pull request, #193: [FLINK-28017] Introduce bucket-key to table store

JingsongLi opened a new pull request, #193:
URL: https://github.com/apache/flink-table-store/pull/193

   Specifies the table store distribution policy. Data is assigned to each bucket according to the hash value of bucket-key.
   - It is primary key when table has primary key. The user can specify a bucket key, it should be part of primary keys.
   - It is all fields when table has no primary key.
   If there are filter conditions for specific fields, reasonable settings can give a big performance boost to the table, but care needs to be taken to avoid data skewing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi merged pull request #193: [FLINK-28017] Introduce bucket-key to table store

Posted by GitBox <gi...@apache.org>.
JingsongLi merged PR #193:
URL: https://github.com/apache/flink-table-store/pull/193


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] LadyForest commented on a diff in pull request #193: [FLINK-28017] Introduce bucket-key to table store

Posted by GitBox <gi...@apache.org>.
LadyForest commented on code in PR #193:
URL: https://github.com/apache/flink-table-store/pull/193#discussion_r913340759


##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/FileStoreOptions.java:
##########
@@ -155,6 +155,14 @@ public class FileStoreOptions implements Serializable {
                             "Open file cost of a source file. It is used to avoid reading"
                                     + " too many files with a source split, which can be very slow.");
 
+    public static final ConfigOption<String> BUCKET_KEY =
+            ConfigOptions.key("bucket-key")
+                    .stringType()

Review Comment:
   Add a description of how to specify the bucket key? Users do not know whether the delimiter is `,` or `;` unless they refer to the source code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] LadyForest commented on a diff in pull request #193: [FLINK-28017] Introduce bucket-key to table store

Posted by GitBox <gi...@apache.org>.
LadyForest commented on code in PR #193:
URL: https://github.com/apache/flink-table-store/pull/193#discussion_r913339588


##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/FileStoreOptions.java:
##########
@@ -155,6 +155,14 @@ public class FileStoreOptions implements Serializable {
                             "Open file cost of a source file. It is used to avoid reading"
                                     + " too many files with a source split, which can be very slow.");
 
+    public static final ConfigOption<String> BUCKET_KEY =
+            ConfigOptions.key("bucket-key")
+                    .stringType()
+                    .noDefaultValue()
+                    .withDescription(
+                            "Specifies the table store distribution policy. Data is assigned"
+                                    + " to each bucket according to the hash value of bucket-key.");
+
     private final Configuration options;
 
     public static Set<ConfigOption<?>> allOptions() {

Review Comment:
   add `BUCKET_KEY` to `allOptions`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] LadyForest commented on a diff in pull request #193: [FLINK-28017] Introduce bucket-key to table store

Posted by GitBox <gi...@apache.org>.
LadyForest commented on code in PR #193:
URL: https://github.com/apache/flink-table-store/pull/193#discussion_r913340834


##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/schema/TableSchema.java:
##########
@@ -129,6 +134,33 @@ public Map<String, String> options() {
         return options;
     }
 
+    public List<String> bucketKeys() {
+        String key = options.get(BUCKET_KEY.key());
+        if (StringUtils.isNullOrWhitespaceOnly(key)) {
+            return Collections.emptyList();
+        }
+        List<String> bucketKeys = Arrays.asList(key.split(","));

Review Comment:
   split and trim?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] LadyForest commented on a diff in pull request #193: [FLINK-28017] Introduce bucket-key to table store

Posted by GitBox <gi...@apache.org>.
LadyForest commented on code in PR #193:
URL: https://github.com/apache/flink-table-store/pull/193#discussion_r913339347


##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/FileStoreOptions.java:
##########
@@ -155,6 +155,14 @@ public class FileStoreOptions implements Serializable {
                             "Open file cost of a source file. It is used to avoid reading"
                                     + " too many files with a source split, which can be very slow.");
 
+    public static final ConfigOption<String> BUCKET_KEY =
+            ConfigOptions.key("bucket-key")
+                    .stringType()
+                    .noDefaultValue()
+                    .withDescription(
+                            "Specifies the table store distribution policy. Data is assigned"

Review Comment:
   Specifiy



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [flink-table-store] JingsongLi commented on a diff in pull request #193: [FLINK-28017] Introduce bucket-key to table store

Posted by GitBox <gi...@apache.org>.
JingsongLi commented on code in PR #193:
URL: https://github.com/apache/flink-table-store/pull/193#discussion_r913460665


##########
flink-table-store-core/src/main/java/org/apache/flink/table/store/file/schema/TableSchema.java:
##########
@@ -129,6 +134,33 @@ public Map<String, String> options() {
         return options;
     }
 
+    public List<String> bucketKeys() {
+        String key = options.get(BUCKET_KEY.key());
+        if (StringUtils.isNullOrWhitespaceOnly(key)) {
+            return Collections.emptyList();
+        }
+        List<String> bucketKeys = Arrays.asList(key.split(","));

Review Comment:
   Let it go, user can see the wrong field in the exception.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org