You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2022/11/28 12:50:00 UTC
[jira] [Updated] (IMPALA-11751) Crash in processing string partition columns of Avro table with MT_DOP>1

     [ https://issues.apache.org/jira/browse/IMPALA-11751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Quanlong Huang updated IMPALA-11751:
------------------------------------
    Description: 
We saw a crash in a query that aggregates the string partition column of an Avro table with MT_DOP setting to 4. The query is quite simple:
{code:sql}
create external table date_str_avro (v int)
  partitioned by (date_str string)
  stored as avro;
-- Import files attached in this JIRA, repeat the following query.
-- It will crash in 10 runs.
set MT_DOP=2;
select count(*), date_str from date_str_avro group by date_str;
{code}
It needs specifit data set to reproduce the crash. Files and steps given later.
Disable codegen (by "set disable_codegen=1") and reproduce the crash. The stacktrace is
{noformat}
Crash reason:  SIGSEGV /SEGV_MAPERR
Crash address: 0x0
Process uptime: not available

Thread 512 (crashed)
 0  impalad!impala::HashTableCtx::Hash(void const*, int, unsigned int) const [sse-util.h : 227 + 0x2]
 1  impalad!impala::HashTableCtx::HashVariableLenRow(unsigned char const*, unsigned char const*) const [hash-table.cc : 306 + 0x8]
 2  impalad!impala::HashTableCtx::HashRow(unsigned char const*, unsigned char const*) const [hash-table.cc : 255 + 0x5]
 3  impalad!void impala::GroupingAggregator::EvalAndHashPrefetchGroup<false>(impala::RowBatch*, int, impala::TPrefetchMode::type, impala::HashTableCtx*) [hash-table.inline.h : 39 + 0xe]
 4  impalad!impala::GroupingAggregator::AddBatchStreamingImpl(int, bool, impala::TPrefetchMode::type, impala::RowBatch*, impala::RowBatch*, impala::HashTableCtx*, int*) [grouping-aggregator-ir.cc : 185 + 0x1c]
 5  impalad!impala::GroupingAggregator::AddBatchStreaming(impala::RuntimeState*, impala::RowBatch*, impala::RowBatch*, bool*) [grouping-aggregator.cc : 520 + 0x2d]
 6  impalad!impala::StreamingAggregationNode::GetRowsStreaming(impala::RuntimeState*, impala::RowBatch*) [streaming-aggregation-node.cc : 120 + 0x3]
 7  impalad!impala::StreamingAggregationNode::GetNext(impala::RuntimeState*, impala::RowBatch*, bool*) [streaming-aggregation-node.cc : 77 + 0x19]
 8  impalad!impala::FragmentInstanceState::ExecInternal() [fragment-instance-state.cc : 446 + 0x3]
 9  impalad!impala::FragmentInstanceState::Exec() [fragment-instance-state.cc : 104 + 0xb]
10  impalad!impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) [query-state.cc : 950 + 0x19]
11  impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*) [function_template.hpp : 763 + 0x3]
12  impalad!boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*), boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::ThreadDebugInfo*>, boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > >::run() [bind.hpp : 531 + 0x3]
13  impalad!thread_proxy + 0x67
14  libpthread.so.0 + 0x76ba
15  libc.so.6 + 0x1074dd
{noformat}
This is reproduced on commit 2733d039a of the master branch.

Reproducing the bug requires the following conditions:
 * Avro table with string partition columns
 * MT_DOP is set to be larger than 1
 * Query needs follow-up processing on the string partition values, e.g. GROUP BY, JOIN on them, etc.
 * num of files(blocks) > MT_DOP * (num of impalads)
 * There are both small files and large files. So some scan node instances can finish earlier than others.

*Steps to import the attached Avro data files*
{code:java}
$ tar zxf date_str_avro.tar.gz
$ hdfs dfs -put date_str_avro/* hdfs_location_of_table_dir
impala-shell> alter table date_str_avro recover partitions;
{code}
*RCA*
This is a bug introduces by IMPALA-9655.

Each avro file requires at least two scan ranges. The initial range reads the file header and initializes the template tuple. The initial scanner then issues follow-up scan ranges to read the file content. Mem of the template tuple is transferred to the ScanNode. Note that partition values are materialized into the template tuple.

After IMPALA-9655, the ranges of a file could be scheduled to different ScanNode instances when MT_DOP > 1. In the following sequence, there is an illegal mem access of "heap-use-after-free", which could cause a crash.

t0:
Scanner of ScanNode-1 reads header of a large avro file.
Scanner of ScanNode-2 reads header of a small avro file.
Varlen memory of the template_tuple transfers to the corresponding ScanNode.
t1:
Scanner of ScanNode-1 reads content of the small avro file.
Scanner of ScanNode-2 reads content of the large avro file.
Scanner will reuse the template_tuple created by the header scanners [1]. So RowBatch produced by ScanNode-2 actually reference mem owned by ScanNode-1.
t2:
ScanNode-1 finishes first and closes (assuming no more files to read).
Downstream consumer of ScanNode-2 will crash if accessing the partition string values.

[1] [https://github.com/apache/impala/blob/2733d039ad4a830a1ea34c1a75d2b666788e39a9/be/src/exec/avro/hdfs-avro-scanner.cc#L478]

  was:
We saw a crash in a query that aggregates the string partition column of an Avro table with MT_DOP setting to 4. The query is quite simple:
{code:sql}
create external table date_str_avro (v int)
  partitioned by (date_str string)
  stored as avro;
-- Import files attached in this JIRA, repeat the following query.
-- It will crash in 10 runs.
set MT_DOP=2;
select count(*), date_str from date_str_avro group by date_str;
{code}
It needs specifit data set to reproduce the crash. Files and steps given later.
Disable codegen (by "set disable_codegen=1") and reproduce the crash. The stacktrace is
{noformat}
Crash reason:  SIGSEGV /SEGV_MAPERR
Crash address: 0x0
Process uptime: not available

Thread 512 (crashed)
 0  impalad!impala::HashTableCtx::Hash(void const*, int, unsigned int) const [sse-util.h : 227 + 0x2]
 1  impalad!impala::HashTableCtx::HashVariableLenRow(unsigned char const*, unsigned char const*) const [hash-table.cc : 306 + 0x8]
 2  impalad!impala::HashTableCtx::HashRow(unsigned char const*, unsigned char const*) const [hash-table.cc : 255 + 0x5]
 3  impalad!void impala::GroupingAggregator::EvalAndHashPrefetchGroup<false>(impala::RowBatch*, int, impala::TPrefetchMode::type, impala::HashTableCtx*) [hash-table.inline.h : 39 + 0xe]
 4  impalad!impala::GroupingAggregator::AddBatchStreamingImpl(int, bool, impala::TPrefetchMode::type, impala::RowBatch*, impala::RowBatch*, impala::HashTableCtx*, int*) [grouping-aggregator-ir.cc : 185 + 0x1c]
 5  impalad!impala::GroupingAggregator::AddBatchStreaming(impala::RuntimeState*, impala::RowBatch*, impala::RowBatch*, bool*) [grouping-aggregator.cc : 520 + 0x2d]
 6  impalad!impala::StreamingAggregationNode::GetRowsStreaming(impala::RuntimeState*, impala::RowBatch*) [streaming-aggregation-node.cc : 120 + 0x3]
 7  impalad!impala::StreamingAggregationNode::GetNext(impala::RuntimeState*, impala::RowBatch*, bool*) [streaming-aggregation-node.cc : 77 + 0x19]
 8  impalad!impala::FragmentInstanceState::ExecInternal() [fragment-instance-state.cc : 446 + 0x3]
 9  impalad!impala::FragmentInstanceState::Exec() [fragment-instance-state.cc : 104 + 0xb]
10  impalad!impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) [query-state.cc : 950 + 0x19]
11  impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*) [function_template.hpp : 763 + 0x3]
12  impalad!boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*), boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::ThreadDebugInfo*>, boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > >::run() [bind.hpp : 531 + 0x3]
13  impalad!thread_proxy + 0x67
14  libpthread.so.0 + 0x76ba
15  libc.so.6 + 0x1074dd
{noformat}
This is reproduced on commit 2733d039a of the master branch.

Reproducing the bug requires the following conditions:
 * Avro table with string partition columns
 * MT_DOP is set to be larger than 1
 * Query needs follow-up processing on the string partition values, e.g. GROUP BY, JOIN on them, etc.
 * num of files(blocks) > MT_DOP * (num of impalads)
 * There are both small files and large files. So some scan node instances can finish earlier than others.

*Steps to import the attached Avro data files*
{code:java}
$ tar zxf date_str_avro.tar.gz
$ hdfs dfs -put date_str_avro/* hdfs_location_of_table_dir
impala-shell> alter table date_str_avro recover partitions;
{code}
*RCA*
This is a bug introduces by IMPALA-9655.

Each avro file requires at least two scan ranges. The initial range reads the file header and initializes the template tuple. The initial scanner then issues follow-up scan ranges to read the file content. Mem of the template tuple is transferred to the ScanNode.

After IMPALA-9655, the ranges of a file could be schedule to different ScanNode instances when MT_DOP > 1. In the following sequence, there is an illegal mem access of "heap-use-after-free" type, which could cause a crash.

t0:
Scanner of ScanNode-1 reads header of a large avro file.
Scanner of ScanNode-2 reads header of a small avro file.
Varlen memory of the template_tuple transfers to the corresponding ScanNode.
t1:
Scanner of ScanNode-1 reads content of the small avro file.
Scanner of ScanNode-2 reads content of the large avro file.
Scanner will reuse the template_tuple created by the header scanners [1]. So RowBatch produced by ScanNode-2 actually reference mem owned by ScanNode-1.
t2:
ScanNode-1 finishes first and closes (assuming no more files to read).
Downstream consumer of ScanNode-2 will crash if accessing the partition string values.

[1] [https://github.com/apache/impala/blob/2733d039ad4a830a1ea34c1a75d2b666788e39a9/be/src/exec/avro/hdfs-avro-scanner.cc#L478]


> Crash in processing string partition columns of Avro table with MT_DOP>1
> ------------------------------------------------------------------------
>
>                 Key: IMPALA-11751
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11751
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.0.0, Impala 4.1.0, Impala 4.1.1
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> We saw a crash in a query that aggregates the string partition column of an Avro table with MT_DOP setting to 4. The query is quite simple:
> {code:sql}
> create external table date_str_avro (v int)
>   partitioned by (date_str string)
>   stored as avro;
> -- Import files attached in this JIRA, repeat the following query.
> -- It will crash in 10 runs.
> set MT_DOP=2;
> select count(*), date_str from date_str_avro group by date_str;
> {code}
> It needs specifit data set to reproduce the crash. Files and steps given later.
> Disable codegen (by "set disable_codegen=1") and reproduce the crash. The stacktrace is
> {noformat}
> Crash reason:  SIGSEGV /SEGV_MAPERR
> Crash address: 0x0
> Process uptime: not available
> Thread 512 (crashed)
>  0  impalad!impala::HashTableCtx::Hash(void const*, int, unsigned int) const [sse-util.h : 227 + 0x2]
>  1  impalad!impala::HashTableCtx::HashVariableLenRow(unsigned char const*, unsigned char const*) const [hash-table.cc : 306 + 0x8]
>  2  impalad!impala::HashTableCtx::HashRow(unsigned char const*, unsigned char const*) const [hash-table.cc : 255 + 0x5]
>  3  impalad!void impala::GroupingAggregator::EvalAndHashPrefetchGroup<false>(impala::RowBatch*, int, impala::TPrefetchMode::type, impala::HashTableCtx*) [hash-table.inline.h : 39 + 0xe]
>  4  impalad!impala::GroupingAggregator::AddBatchStreamingImpl(int, bool, impala::TPrefetchMode::type, impala::RowBatch*, impala::RowBatch*, impala::HashTableCtx*, int*) [grouping-aggregator-ir.cc : 185 + 0x1c]
>  5  impalad!impala::GroupingAggregator::AddBatchStreaming(impala::RuntimeState*, impala::RowBatch*, impala::RowBatch*, bool*) [grouping-aggregator.cc : 520 + 0x2d]
>  6  impalad!impala::StreamingAggregationNode::GetRowsStreaming(impala::RuntimeState*, impala::RowBatch*) [streaming-aggregation-node.cc : 120 + 0x3]
>  7  impalad!impala::StreamingAggregationNode::GetNext(impala::RuntimeState*, impala::RowBatch*, bool*) [streaming-aggregation-node.cc : 77 + 0x19]
>  8  impalad!impala::FragmentInstanceState::ExecInternal() [fragment-instance-state.cc : 446 + 0x3]
>  9  impalad!impala::FragmentInstanceState::Exec() [fragment-instance-state.cc : 104 + 0xb]
> 10  impalad!impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) [query-state.cc : 950 + 0x19]
> 11  impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*) [function_template.hpp : 763 + 0x3]
> 12  impalad!boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*), boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::ThreadDebugInfo*>, boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > >::run() [bind.hpp : 531 + 0x3]
> 13  impalad!thread_proxy + 0x67
> 14  libpthread.so.0 + 0x76ba
> 15  libc.so.6 + 0x1074dd
> {noformat}
> This is reproduced on commit 2733d039a of the master branch.
> Reproducing the bug requires the following conditions:
>  * Avro table with string partition columns
>  * MT_DOP is set to be larger than 1
>  * Query needs follow-up processing on the string partition values, e.g. GROUP BY, JOIN on them, etc.
>  * num of files(blocks) > MT_DOP * (num of impalads)
>  * There are both small files and large files. So some scan node instances can finish earlier than others.
> *Steps to import the attached Avro data files*
> {code:java}
> $ tar zxf date_str_avro.tar.gz
> $ hdfs dfs -put date_str_avro/* hdfs_location_of_table_dir
> impala-shell> alter table date_str_avro recover partitions;
> {code}
> *RCA*
> This is a bug introduces by IMPALA-9655.
> Each avro file requires at least two scan ranges. The initial range reads the file header and initializes the template tuple. The initial scanner then issues follow-up scan ranges to read the file content. Mem of the template tuple is transferred to the ScanNode. Note that partition values are materialized into the template tuple.
> After IMPALA-9655, the ranges of a file could be scheduled to different ScanNode instances when MT_DOP > 1. In the following sequence, there is an illegal mem access of "heap-use-after-free", which could cause a crash.
> t0:
> Scanner of ScanNode-1 reads header of a large avro file.
> Scanner of ScanNode-2 reads header of a small avro file.
> Varlen memory of the template_tuple transfers to the corresponding ScanNode.
> t1:
> Scanner of ScanNode-1 reads content of the small avro file.
> Scanner of ScanNode-2 reads content of the large avro file.
> Scanner will reuse the template_tuple created by the header scanners [1]. So RowBatch produced by ScanNode-2 actually reference mem owned by ScanNode-1.
> t2:
> ScanNode-1 finishes first and closes (assuming no more files to read).
> Downstream consumer of ScanNode-2 will crash if accessing the partition string values.
> [1] [https://github.com/apache/impala/blob/2733d039ad4a830a1ea34c1a75d2b666788e39a9/be/src/exec/avro/hdfs-avro-scanner.cc#L478]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org