You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Wally Tang (Jira)" <ji...@apache.org> on 2023/03/25 10:28:00 UTC
[jira] [Created] (HUDI-5982) When the user's primary key data contains commas, BucketIdentifier cannot be used
Wally Tang created HUDI-5982:
--------------------------------
Summary: When the user's primary key data contains commas, BucketIdentifier cannot be used
Key: HUDI-5982
URL: https://issues.apache.org/jira/browse/HUDI-5982
Project: Apache Hudi
Issue Type: Bug
Components: index
Affects Versions: 0.12.0
Reporter: Wally Tang
In the scenario of using composite primary keys and bucket index in a Hudi table, BucketIdentifier splits the recordKey using commas as a delimiter. This can cause exceptions to occur if the user's primary key data contains commas.
{code:java}
// BucketIdentifier.java
private static List<String> getHashKeysUsingIndexFields(String recordKey, List<String> indexKeyFields) {
Map<String, String> recordKeyPairs = Arrays.stream(recordKey.split(","))
.map(p -> p.split(":"))
.collect(Collectors.toMap(p -> p[0], p -> p[1]));
return indexKeyFields.stream()
.map(recordKeyPairs::get).collect(Collectors.toList());
} {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)