You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2017/05/11 16:38:04 UTC

[jira] [Created] (IMPALA-5304) Parquet scanner transfers decompression buffers when not needed

Tim Armstrong created IMPALA-5304:
-------------------------------------

             Summary: Parquet scanner transfers decompression buffers when not needed
                 Key: IMPALA-5304
                 URL: https://issues.apache.org/jira/browse/IMPALA-5304
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
    Affects Versions: Impala 2.9.0
            Reporter: Tim Armstrong
            Assignee: Tim Armstrong


The Parquet scanner always transfers decompression buffers to the scratch batch:
{code}

Status BaseScalarColumnReader::ReadDataPage() {
  // We're about to move to the next data page.  The previous data page is
  // now complete, pass along the memory allocated for it.
  parent_->scratch_batch_->mem_pool()->AcquireData(decompressed_data_pool_.get(), false);
{code}

These in turn are passed along with the row batch. This is safe but unnecessary in many cases where the batch does not hold pointers into the decompression buffer: if the column has only fixed-length data, or if the data page is dictionary-encoded.

This can make problems like IMPALA-4923 worse than they would be otherwise because extra data is transferred across threads.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)