You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2022/03/21 03:50:00 UTC

[jira] [Assigned] (ORC-1132) [C++] EncodedStringVectorBatch allocates used buffers

     [ https://issues.apache.org/jira/browse/ORC-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Quanlong Huang reassigned ORC-1132:
-----------------------------------


> [C++] EncodedStringVectorBatch allocates used buffers
> -----------------------------------------------------
>
>                 Key: ORC-1132
>                 URL: https://issues.apache.org/jira/browse/ORC-1132
>             Project: ORC
>          Issue Type: Improvement
>    Affects Versions: 1.6.0
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>
> The constructor of EncodedStringVectorBatch invokes the constructor of StringVectorBatch with batch capacity:
> {code:cpp}
>   EncodedStringVectorBatch::EncodedStringVectorBatch(uint64_t _capacity,
>                                                      MemoryPool& pool)
>                       : StringVectorBatch(_capacity, pool),
>                         dictionary(),
>                         index(pool, _capacity) {
>     // PASS
>   }
>  {code}
> This allocates unused `data` and `length` buffer in StringVectorBatch:
> {code:cpp}
>   StringVectorBatch::StringVectorBatch(uint64_t _capacity, MemoryPool& pool
>                ): ColumnVectorBatch(_capacity, pool),
>                   data(pool, _capacity),
>                   length(pool, _capacity),
>                   blob(pool) {
>     // PASS
>   }
> {code}
> We only use the `index` buffer and `dictionary` of EncodedStringVectorBatch:
> {code:cpp}
>   void StringDictionaryColumnReader::nextEncoded(ColumnVectorBatch& rowBatch,
>                                                   uint64_t numValues,
>                                                   char* notNull) {
>     ColumnReader::next(rowBatch, numValues, notNull);
>     notNull = rowBatch.hasNulls ? rowBatch.notNull.data() : nullptr;
>     rowBatch.isEncoded = true;
>     EncodedStringVectorBatch& batch = dynamic_cast<EncodedStringVectorBatch&>(rowBatch);
>     batch.dictionary = this->dictionary;
>     // Length buffer is reused to save dictionary entry ids
>     rle->next(batch.index.data(), numValues, notNull);
>   }
> {code}
> Thus we should avoid allocating buffers in the base class.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)