You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2022/03/21 03:44:00 UTC
[jira] [Assigned] (ORC-1131) [C++] getMemoryUsage() is incorrect on string vector batches
[ https://issues.apache.org/jira/browse/ORC-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Quanlong Huang reassigned ORC-1131:
-----------------------------------
> [C++] getMemoryUsage() is incorrect on string vector batches
> -------------------------------------------------------------
>
> Key: ORC-1131
> URL: https://issues.apache.org/jira/browse/ORC-1131
> Project: ORC
> Issue Type: Bug
> Affects Versions: 1.6.0
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Major
>
> The C++ client produces two kinds of string vector batches, i.e. StringVectorBatch and EncodedStringVectorBatch. They both have incorrect results in getMemoryUsage() currently.
> After ORC-501, we move the blob from StringColumnReader to StringVectorBatch. However, StringVectorBatch::getMemoryUsage() was not updated to count for it.
> {code:cpp}
> uint64_t StringVectorBatch::getMemoryUsage() {
> return ColumnVectorBatch::getMemoryUsage()
> + static_cast<uint64_t>(data.capacity() * sizeof(char*)
> + length.capacity() * sizeof(int64_t));
> } {code}
> For EncodedStringVectorBatch, it inherits StringVectorBatch but doesn't override the getMemoryUsage() method. Thus counting for wrong results.
> {code:cpp}
> struct EncodedStringVectorBatch : public StringVectorBatch {
> EncodedStringVectorBatch(uint64_t capacity, MemoryPool& pool);
> virtual ~EncodedStringVectorBatch();
> std::string toString() const;
> void resize(uint64_t capacity);
> std::shared_ptr<StringDictionary> dictionary;
> // index for dictionary entry
> DataBuffer<int64_t> index;
> };{code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)