You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2022/03/21 03:50:00 UTC
[jira] [Assigned] (ORC-1132) [C++] EncodedStringVectorBatch allocates used buffers
[ https://issues.apache.org/jira/browse/ORC-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Quanlong Huang reassigned ORC-1132:
-----------------------------------
> [C++] EncodedStringVectorBatch allocates used buffers
> -----------------------------------------------------
>
> Key: ORC-1132
> URL: https://issues.apache.org/jira/browse/ORC-1132
> Project: ORC
> Issue Type: Improvement
> Affects Versions: 1.6.0
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Major
>
> The constructor of EncodedStringVectorBatch invokes the constructor of StringVectorBatch with batch capacity:
> {code:cpp}
> EncodedStringVectorBatch::EncodedStringVectorBatch(uint64_t _capacity,
> MemoryPool& pool)
> : StringVectorBatch(_capacity, pool),
> dictionary(),
> index(pool, _capacity) {
> // PASS
> }
> {code}
> This allocates unused `data` and `length` buffer in StringVectorBatch:
> {code:cpp}
> StringVectorBatch::StringVectorBatch(uint64_t _capacity, MemoryPool& pool
> ): ColumnVectorBatch(_capacity, pool),
> data(pool, _capacity),
> length(pool, _capacity),
> blob(pool) {
> // PASS
> }
> {code}
> We only use the `index` buffer and `dictionary` of EncodedStringVectorBatch:
> {code:cpp}
> void StringDictionaryColumnReader::nextEncoded(ColumnVectorBatch& rowBatch,
> uint64_t numValues,
> char* notNull) {
> ColumnReader::next(rowBatch, numValues, notNull);
> notNull = rowBatch.hasNulls ? rowBatch.notNull.data() : nullptr;
> rowBatch.isEncoded = true;
> EncodedStringVectorBatch& batch = dynamic_cast<EncodedStringVectorBatch&>(rowBatch);
> batch.dictionary = this->dictionary;
> // Length buffer is reused to save dictionary entry ids
> rle->next(batch.index.data(), numValues, notNull);
> }
> {code}
> Thus we should avoid allocating buffers in the base class.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)