You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "Wally Tang (Jira)" <ji...@apache.org> on 2023/03/25 10:28:00 UTC

[jira] [Created] (HUDI-5982) When the user's primary key data contains commas, BucketIdentifier cannot be used

Wally Tang created HUDI-5982:
--------------------------------

             Summary: When the user's primary key data contains commas, BucketIdentifier cannot be used
                 Key: HUDI-5982
                 URL: https://issues.apache.org/jira/browse/HUDI-5982
             Project: Apache Hudi
          Issue Type: Bug
          Components: index
    Affects Versions: 0.12.0
            Reporter: Wally Tang


In the scenario of using composite primary keys and bucket index in a Hudi table, BucketIdentifier splits the recordKey using commas as a delimiter. This can cause exceptions to occur if the user's primary key data contains commas.
{code:java}
// BucketIdentifier.java
private static List<String> getHashKeysUsingIndexFields(String recordKey, List<String> indexKeyFields) {
  Map<String, String> recordKeyPairs = Arrays.stream(recordKey.split(","))
      .map(p -> p.split(":"))
      .collect(Collectors.toMap(p -> p[0], p -> p[1]));
  return indexKeyFields.stream()
      .map(recordKeyPairs::get).collect(Collectors.toList());
} {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)