You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Ethan Guo (Jira)" <ji...@apache.org> on 2022/04/19 00:17:00 UTC
[jira] [Updated] (HUDI-2954) Code cleanup: HFileDataBock - using integer keys is never used
[ https://issues.apache.org/jira/browse/HUDI-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ethan Guo updated HUDI-2954:
----------------------------
Priority: Blocker (was: Major)
> Code cleanup: HFileDataBock - using integer keys is never used
> ---------------------------------------------------------------
>
> Key: HUDI-2954
> URL: https://issues.apache.org/jira/browse/HUDI-2954
> Project: Apache Hudi
> Issue Type: Improvement
> Components: code-quality, metadata
> Reporter: Manoj Govindassamy
> Assignee: Ethan Guo
> Priority: Blocker
> Fix For: 0.12.0
>
>
>
> KeyField can never be empty for File. If so, there is really no need for falling back to sequential integer keys in the HFileDataBlock::serializeRecords() code path.
>
> {noformat}
> // Build the record key
> final Field schemaKeyField = records.get(0).getSchema().getField(this.keyField);
> if (schemaKeyField == null) {
> // Missing key metadata field. Use an integer sequence key instead.
> useIntegerKey = true;
> keySize = (int) Math.ceil(Math.log(records.size())) + 1;
> }
> while (itr.hasNext()) {
> IndexedRecord record = itr.next();
> String recordKey;
> if (useIntegerKey) {
> recordKey = String.format("%" + keySize + "s", key++);
> } else {
> recordKey = record.get(schemaKeyField.pos()).toString();
> }
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)