You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/10/21 22:37:00 UTC

[jira] [Commented] (IMPALA-10257) Hit DCHECK in HdfsParquetScanner::CheckPageFiltering in a CORE S3 build

    [ https://issues.apache.org/jira/browse/IMPALA-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218636#comment-17218636 ] 

ASF subversion and git services commented on IMPALA-10257:
----------------------------------------------------------

Commit 9384a18180cc567a6c66af7a30d574cc2cb7f696 in impala's branch refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=9384a18 ]

IMPALA-10257: Relax check for page filtering

HdfsParquetScanner::CheckPageFiltering() is a bit too strict. It checks
that all column readers agree on the top level rows. Column readers
have different strategies to read columns. One strategy reads ahead
the Parquet def/rep levels, the other strategy reads levels and
values simoultaneously, i.e. no readahead of levels.

We calculate the ordinal of the top level row based on the repetition
level. This means when we readahead the rep level, the top level row
might point to the value to be processed next. While top level row
in the other strategy always points to the row that has been
completely processed last.

Because of this in CheckPageFiltering() we can allow a difference of
one between the 'current_row_' values of the column readers.

I also got rid of the DCHECK in CheckPageFiltering() and replaced it
with a more informative error report.

Testing:
* added a test to nested-types-parquet-page-index.test

Change-Id: I01a570c09eeeb9580f4aa4f6f0de2fe6c7aeb806
Reviewed-on: http://gerrit.cloudera.org:8080/16619
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Hit DCHECK in HdfsParquetScanner::CheckPageFiltering in a CORE S3 build
> -----------------------------------------------------------------------
>
>                 Key: IMPALA-10257
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10257
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Assignee: Zoltán Borók-Nagy
>            Priority: Critical
>
> Saw the crash in a CORE S3 build:
> {code:java}
> F1018 06:14:23.631407 12990 hdfs-parquet-scanner.cc:1170] cf49030f4bbe0736:84de19aa00000002] Check failed: false 
> {code}
> The query is a tpch-nested query:
> {code:java}
> I1018 06:14:22.352707 12712 Frontend.java:1522] cf49030f4bbe0736:84de19aa00000000] Analyzing query: select
>   l_shipmode,
>   sum(case
>     when o_orderpriority = '1-URGENT'
>       or o_orderpriority = '2-HIGH'
>     then 1
>     else 0
>   end) as high_line_count,
>   sum(case
>     when o_orderpriority <> '1-URGENT'
>       and o_orderpriority <> '2-HIGH'
>     then 1
>     else 0
>   end) as low_line_count
> from
>   customer.c_orders o,
>   o.o_lineitems l
> where
>   l_shipmode in ('MAIL', 'SHIP')
>   and l_commitdate < l_receiptdate
>   and l_shipdate < l_commitdate
>   and l_receiptdate >= '1994-01-01'
>   and l_receiptdate < '1995-01-01'
> group by
>   l_shipmode
> order by
>   l_shipmode db: tpch_nested_parquet
> {code}
> The test is
> {code:java}
> query_test.test_tpch_nested_queries.TestTpchNestedQuery.test_tpch_q12[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]{code}
> A similar test also failed in the same build:
> {code:java}
> authorization.test_ranger.TestRangerColumnMaskingTpchNested.test_tpch_nested_column_masking[protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]{code}
> Backtrace:
> {code:java}
> #0  0x00007f32b548c1f7 in raise () from /lib64/libc.so.6
> #1  0x00007f32b548d8e8 in abort () from /lib64/libc.so.6
> #2  0x000000000521cce4 in google::DumpStackTraceAndExit() ()
> #3  0x00000000052120dd in google::LogMessage::Fail() ()
> #4  0x00000000052139cd in google::LogMessage::SendToLog() ()
> #5  0x0000000005211a3b in google::LogMessage::Flush() ()
> #6  0x0000000005215639 in google::LogMessageFatal::~LogMessageFatal() ()
> #7  0x0000000002d87f54 in impala::HdfsParquetScanner::CheckPageFiltering (this=0xb6f0000) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:1170
> #8  0x0000000002d87860 in impala::HdfsParquetScanner::AssembleRows (this=0xb6f0000, column_readers=..., row_batch=0x10bc95a0, skip_row_group=0xb6f01d0) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:1150
> #9  0x0000000002d82453 in impala::HdfsParquetScanner::GetNextInternal (this=0xb6f0000, row_batch=0x10bc95a0) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:458
> #10 0x0000000002d803e2 in impala::HdfsParquetScanner::ProcessSplit (this=0xb6f0000) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/parquet/hdfs-parquet-scanner.cc:350
> #11 0x0000000002986f4d in impala::HdfsScanNode::ProcessSplit (this=0x11ade800, filter_ctxs=..., expr_results_pool=0x7f31da200480, scan_range=0x16a08b20, scanner_thread_reservation=0x7f31da2003a8) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:500
> #12 0x00000000029862ce in impala::HdfsScanNode::ScannerThread (this=0x11ade800, first_thread=true, scanner_thread_reservation=25165824) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:418
> #13 0x0000000002985636 in impala::HdfsScanNode::<lambda()>::operator()(void) const (__closure=0x7f31da200ba8) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/exec/hdfs-scan-node.cc:339
> #14 0x00000000029879ef in boost::detail::function::void_function_obj_invoker0<impala::HdfsScanNode::ThreadTokenAvailableCb(impala::ThreadResourcePool*)::<lambda()>, void>::invoke(boost::detail::function::function_buffer &) (function_obj_ptr=...) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159
> #15 0x00000000021467d6 in boost::function0<void>::operator() (this=0x7f31da200ba0) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770
> #16 0x0000000002727552 in impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*) (name=..., category=..., functor=..., parent_thread_info=0x7f31dc204840, thread_started=0x7f31dc202e70) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/repos/Impala/be/src/util/thread.cc:360
> #17 0x000000000272f4ef in boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::ThreadDebugInfo*>, boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> >::operator()<void (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*), boost::_bi::list0&, int) (this=0x15915340, f=@0x15915338: 0x272720c <impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*)>, a=...) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:531
> #18 0x000000000272f413 in boost::_bi::bind_t<void, void (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*), boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::ThreadDebugInfo*>, boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > >::operator()() (this=0x15915338) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222
> #19 0x000000000272f3d4 in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*), boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::ThreadDebugInfo*>, boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > >::run() (this=0x15915180) at /data/jenkins/workspace/impala-cdpd-master-staging-core-s3/Impala-Toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/thread/detail/thread.hpp:116
> #20 0x0000000003f18c12 in thread_proxy ()
> #21 0x00007f32b8ab8e25 in start_thread () from /lib64/libpthread.so.0
> #22 0x00007f32b554f34d in clone () from /lib64/libc.so.6 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org