You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by st...@apache.org on 2022/03/18 23:31:53 UTC

[impala] branch master updated (21ce4fb -> 1739edf)

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


    from 21ce4fb  IMPALA-11182: catch exceptions of orc::RowReader::createRowBatch
     new 4d32ab7  IMPALA-11185: Reuse orc row batch in the scanner life-cycle
     new 1739edf  IMPALA-11193: Fix assertion failure of ClientCacheTest.MemLeak in CentOS

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/exec/hdfs-orc-scanner.cc     | 7 ++-----
 be/src/runtime/client-cache-test.cc | 2 +-
 2 files changed, 3 insertions(+), 6 deletions(-)

[impala] 02/02: IMPALA-11193: Fix assertion failure of ClientCacheTest.MemLeak in CentOS

Posted by st...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 1739edf2d97009062cb339a3c276f01dbd4d33bd
Author: Yida Wu <wy...@gmail.com>
AuthorDate: Thu Mar 17 12:40:13 2022 -0700

    IMPALA-11193: Fix assertion failure of ClientCacheTest.MemLeak in CentOS
    
    The assertion failure happens in CentOS because the testcase
    reads the virtual memory usage of the current thread from
    /proc/thread-self. However, CentOS may not contain this symlink,
    therefore the testcase is not able to locate the path and
    then triggers the assertion.
    
    This fix changes the path to /proc/self, which is a symlink to the
    current process and available in CentOS. Because the testcase
    doesn't involve multithreading, it is okay to replace the
    /proc/thread-self by /proc/self to detect the memory usage.
    
    Tests:
    Passed core tests in CentOS.
    
    Change-Id: I045e91aa9b7d8e1b731e3261f0f18cc932c16f43
    Reviewed-on: http://gerrit.cloudera.org:8080/18332
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/runtime/client-cache-test.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/be/src/runtime/client-cache-test.cc b/be/src/runtime/client-cache-test.cc
index 22f54ed..dbfce83 100644
--- a/be/src/runtime/client-cache-test.cc
+++ b/be/src/runtime/client-cache-test.cc
@@ -70,7 +70,7 @@ class ClientCacheTest : public testing::Test {
   uint64_t GetProcessVMSize() {
     // vm size, https://man7.org/linux/man-pages/man5/proc.5.html
     const int vm_size_pos = 22;
-    ifstream stream("/proc/thread-self/stat");
+    ifstream stream("/proc/self/stat");
     string line;
     string space_delimiter = " ";
     vector<string> words{};

[impala] 01/02: IMPALA-11185: Reuse orc row batch in the scanner life-cycle

Posted by st...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 4d32ab7122557ca3336354301a3a467a206913a9
Author: stiga-huang <hu...@gmail.com>
AuthorDate: Wed Mar 16 13:12:13 2022 +0800

    IMPALA-11185: Reuse orc row batch in the scanner life-cycle
    
    In HdfsOrcScanner::AssembleRows(), we always re-create a
    orc::ColumnVectorBatch. The ideal pattern is reusing the batch and only
    destroying it when the scanner is closed.
    
    This save half of the scanner time in some TPCH queries. See the flame
    graph in JIRA description.
    
    Tests:
     - Run CORE test
    
    Change-Id: I03887ed94af2ff03d67cd00c79375c734a75af62
    Reviewed-on: http://gerrit.cloudera.org:8080/18325
    Reviewed-by: Quanlong Huang <hu...@gmail.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/exec/hdfs-orc-scanner.cc | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/be/src/exec/hdfs-orc-scanner.cc b/be/src/exec/hdfs-orc-scanner.cc
index 5809ac7..bc81fc4 100644
--- a/be/src/exec/hdfs-orc-scanner.cc
+++ b/be/src/exec/hdfs-orc-scanner.cc
@@ -437,6 +437,8 @@ Status HdfsOrcScanner::Open(ScannerContext* context) {
     }
     orc_root_reader_ = this->obj_pool_.Add(
         new OrcStructReader(root_type, scan_node_->tuple_desc(), this));
+    orc_root_batch_ = tmp_row_reader->createRowBatch(state_->batch_size());
+    DCHECK_EQ(orc_root_batch_->numElements, 0);
   } RETURN_ON_ORC_EXCEPTION(
       "Encountered parse error during schema selection in ORC file $0: $1");
 
@@ -934,11 +936,6 @@ Status HdfsOrcScanner::AssembleRows(RowBatch* row_batch) {
   // We're going to free the previous batch. Clear the reference first.
   RETURN_IF_ERROR(orc_root_reader_->UpdateInputBatch(nullptr));
 
-  try {
-    orc_root_batch_ = row_reader_->createRowBatch(row_batch->capacity());
-    DCHECK_EQ(orc_root_batch_->numElements, 0);
-  } RETURN_ON_ORC_EXCEPTION("Encounter error in creating ORC row batch for file $0: $1.");
-
   int64_t num_rows_read = 0;
   while (continue_execution) {  // one ORC batch (ColumnVectorBatch) in a round
     if (orc_root_reader_->EndOfBatch()) {