You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2017/05/11 16:38:04 UTC
[jira] [Created] (IMPALA-5304) Parquet scanner transfers
decompression buffers when not needed
Tim Armstrong created IMPALA-5304:
-------------------------------------
Summary: Parquet scanner transfers decompression buffers when not needed
Key: IMPALA-5304
URL: https://issues.apache.org/jira/browse/IMPALA-5304
Project: IMPALA
Issue Type: Improvement
Components: Backend
Affects Versions: Impala 2.9.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong
The Parquet scanner always transfers decompression buffers to the scratch batch:
{code}
Status BaseScalarColumnReader::ReadDataPage() {
// We're about to move to the next data page. The previous data page is
// now complete, pass along the memory allocated for it.
parent_->scratch_batch_->mem_pool()->AcquireData(decompressed_data_pool_.get(), false);
{code}
These in turn are passed along with the row batch. This is safe but unnecessary in many cases where the batch does not hold pointers into the decompression buffer: if the column has only fixed-length data, or if the data page is dictionary-encoded.
This can make problems like IMPALA-4923 worse than they would be otherwise because extra data is transferred across threads.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)