You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2017/05/02 23:37:04 UTC

[jira] [Created] (IMPALA-5275) DeserializeThriftMsg prints a stack trace even where is no errors, a single occurrence costs 130ms of CPU

Mostafa Mokhtar created IMPALA-5275:
---------------------------------------

             Summary: DeserializeThriftMsg prints a stack trace even where is no errors, a single occurrence costs 130ms of CPU
                 Key: IMPALA-5275
                 URL: https://issues.apache.org/jira/browse/IMPALA-5275
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
            Reporter: Mostafa Mokhtar
            Assignee: Alexander Behm


When reading Parquet data attempts to find the Parquet header prints a stack trace to the log even if the error was found successfully.
Also printing the stack trace without logging which Parquet file was being read makes the log message useless. 

{code}
Status::Status(const string& error_msg)
  : msg_(new ErrorMsg(TErrorCode::GENERAL, error_msg)) {
  VLOG(1) << msg_->msg() << "\n" << GetStackTrace();
}
{code}

Vtune data showed that capturing the stack trace is very expensive and spends 130ms CPU per invocation.

{code}
This is the expensive call stack btw 
CPU Time
1 of 8: 77.9% (0.106s of 0.136s)

vmlinux ! KSTK_ESP - [unknown source file]
vmlinux ! vm_is_stack + 0xa1 - [unknown source file]
vmlinux ! show_map_vma + 0x268 - [unknown source file]
vmlinux ! show_map + 0x20 - [unknown source file]
vmlinux ! show_pid_map + 0x12 - [unknown source file]
vmlinux ! seq_read + 0x24f - [unknown source file]
vmlinux ! vfs_read + 0x9d - [unknown source file]
vmlinux ! SyS_read + 0x7e - [unknown source file]
vmlinux ! tracesys + 0xdc - [unknown source file]
libpthread-2.17.so ! __read + 0x2c - [unknown source file]
impalad ! google::OpenObjectFileContainingPcAndGetStartAddress + 0x2cf - [unknown source file]
impalad ! google::SymbolizeAndDemangle + 0x488 - [unknown source file]
impalad ! google::DumpStackTrace.constprop.9 + 0x6e - [unknown source file]
impalad ! impala::GetStackTrace + 0x1e - debug-util.cc:321
impalad ! impala::Status::Status + 0x68 - status.cc:114
impalad ! impala::DeserializeThriftMsg<parquet::PageHeader> + 0x2ce - thrift-util.h:128
impalad ! impala::BaseScalarColumnReader::ReadPageHeader + 0x182 - parquet-column-readers.cc:787
impalad ! impala::BaseScalarColumnReader::ReadDataPage + 0x9b - parquet-column-readers.cc:966
impalad ! impala::BaseScalarColumnReader::NextPage + 0x6e - parquet-column-readers.cc:1093
impalad ! ReadValueBatch<false> + 0x196 - parquet-column-readers.cc:306
impalad ! impala::ScalarColumnReader<long, (bool)1>::ReadNonRepeatedValueBatch + 0x60 - parquet-column-readers.cc:253
impalad ! impala::HdfsParquetScanner::AssembleRows + 0x313 - hdfs-parquet-scanner.cc:950
impalad ! impala::HdfsParquetScanner::GetNextInternal + 0x277 - hdfs-parquet-scanner.cc:482
impalad ! impala::HdfsParquetScanner::ProcessSplit + 0x73 - hdfs-parquet-scanner.cc:409
impalad ! impala::HdfsScanNode::ProcessSplit + 0x395 - hdfs-scan-node.cc:526
impalad ! impala::HdfsScanNode::ScannerThread + 0xbb8 - hdfs-scan-node.cc:417
impalad ! boost::function0<void>::operator() + 0x1a - function_template.hpp:767
{code}

What gets printed to the log
{code}
I0501 14:13:24.154422 21797 status.cc:119] couldn't deserialize thrift msg:
No more data to read.
   @           0x838a89  impala::Status::Status()
   @           0xc955af  impala::DeserializeThriftMsg<>()
   @           0xc8c973  impala::BaseScalarColumnReader::ReadPageHeader()
   @           0xc8e0fc  impala::BaseScalarColumnReader::ReadDataPage()
   @           0xc8ee6f  impala::BaseScalarColumnReader::NextPage()
   @           0xc9d5a8  impala::ScalarColumnReader<>::ReadNonRepeatedValueBatch()
   @           0xc72764  impala::HdfsParquetScanner::AssembleRows()
   @           0xc77238  impala::HdfsParquetScanner::GetNextInternal()
   @           0xc71e74  impala::HdfsParquetScanner::ProcessSplit()
   @           0xc4eab6  impala::HdfsScanNode::ProcessSplit()
   @           0xc50719  impala::HdfsScanNode::ScannerThread()
   @           0xbdd739  impala::Thread::SuperviseThread()
   @           0xbde184  boost::detail::thread_data<>::run()
   @           0xe4f3ba  (unknown)
   @       0x3797407aa1  (unknown)
   @       0x37970e893d  (unknown)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)