You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by st...@apache.org on 2022/03/18 23:31:53 UTC
[impala] branch master updated (21ce4fb -> 1739edf)
This is an automated email from the ASF dual-hosted git repository.
stigahuang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.
from 21ce4fb IMPALA-11182: catch exceptions of orc::RowReader::createRowBatch
new 4d32ab7 IMPALA-11185: Reuse orc row batch in the scanner life-cycle
new 1739edf IMPALA-11193: Fix assertion failure of ClientCacheTest.MemLeak in CentOS
The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails. The revisions
listed as "add" were already present in the repository and have only
been added to this reference.
Summary of changes:
be/src/exec/hdfs-orc-scanner.cc | 7 ++-----
be/src/runtime/client-cache-test.cc | 2 +-
2 files changed, 3 insertions(+), 6 deletions(-)
[impala] 02/02: IMPALA-11193: Fix assertion failure of ClientCacheTest.MemLeak in CentOS
Posted by st...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
commit 1739edf2d97009062cb339a3c276f01dbd4d33bd
Author: Yida Wu <wy...@gmail.com>
AuthorDate: Thu Mar 17 12:40:13 2022 -0700
IMPALA-11193: Fix assertion failure of ClientCacheTest.MemLeak in CentOS
The assertion failure happens in CentOS because the testcase
reads the virtual memory usage of the current thread from
/proc/thread-self. However, CentOS may not contain this symlink,
therefore the testcase is not able to locate the path and
then triggers the assertion.
This fix changes the path to /proc/self, which is a symlink to the
current process and available in CentOS. Because the testcase
doesn't involve multithreading, it is okay to replace the
/proc/thread-self by /proc/self to detect the memory usage.
Tests:
Passed core tests in CentOS.
Change-Id: I045e91aa9b7d8e1b731e3261f0f18cc932c16f43
Reviewed-on: http://gerrit.cloudera.org:8080/18332
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
be/src/runtime/client-cache-test.cc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/be/src/runtime/client-cache-test.cc b/be/src/runtime/client-cache-test.cc
index 22f54ed..dbfce83 100644
--- a/be/src/runtime/client-cache-test.cc
+++ b/be/src/runtime/client-cache-test.cc
@@ -70,7 +70,7 @@ class ClientCacheTest : public testing::Test {
uint64_t GetProcessVMSize() {
// vm size, https://man7.org/linux/man-pages/man5/proc.5.html
const int vm_size_pos = 22;
- ifstream stream("/proc/thread-self/stat");
+ ifstream stream("/proc/self/stat");
string line;
string space_delimiter = " ";
vector<string> words{};
[impala] 01/02: IMPALA-11185: Reuse orc row batch in the scanner life-cycle
Posted by st...@apache.org.
This is an automated email from the ASF dual-hosted git repository.
stigahuang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
commit 4d32ab7122557ca3336354301a3a467a206913a9
Author: stiga-huang <hu...@gmail.com>
AuthorDate: Wed Mar 16 13:12:13 2022 +0800
IMPALA-11185: Reuse orc row batch in the scanner life-cycle
In HdfsOrcScanner::AssembleRows(), we always re-create a
orc::ColumnVectorBatch. The ideal pattern is reusing the batch and only
destroying it when the scanner is closed.
This save half of the scanner time in some TPCH queries. See the flame
graph in JIRA description.
Tests:
- Run CORE test
Change-Id: I03887ed94af2ff03d67cd00c79375c734a75af62
Reviewed-on: http://gerrit.cloudera.org:8080/18325
Reviewed-by: Quanlong Huang <hu...@gmail.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
be/src/exec/hdfs-orc-scanner.cc | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/be/src/exec/hdfs-orc-scanner.cc b/be/src/exec/hdfs-orc-scanner.cc
index 5809ac7..bc81fc4 100644
--- a/be/src/exec/hdfs-orc-scanner.cc
+++ b/be/src/exec/hdfs-orc-scanner.cc
@@ -437,6 +437,8 @@ Status HdfsOrcScanner::Open(ScannerContext* context) {
}
orc_root_reader_ = this->obj_pool_.Add(
new OrcStructReader(root_type, scan_node_->tuple_desc(), this));
+ orc_root_batch_ = tmp_row_reader->createRowBatch(state_->batch_size());
+ DCHECK_EQ(orc_root_batch_->numElements, 0);
} RETURN_ON_ORC_EXCEPTION(
"Encountered parse error during schema selection in ORC file $0: $1");
@@ -934,11 +936,6 @@ Status HdfsOrcScanner::AssembleRows(RowBatch* row_batch) {
// We're going to free the previous batch. Clear the reference first.
RETURN_IF_ERROR(orc_root_reader_->UpdateInputBatch(nullptr));
- try {
- orc_root_batch_ = row_reader_->createRowBatch(row_batch->capacity());
- DCHECK_EQ(orc_root_batch_->numElements, 0);
- } RETURN_ON_ORC_EXCEPTION("Encounter error in creating ORC row batch for file $0: $1.");
-
int64_t num_rows_read = 0;
while (continue_execution) { // one ORC batch (ColumnVectorBatch) in a round
if (orc_root_reader_->EndOfBatch()) {