You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/08/25 21:42:00 UTC
[jira] [Commented] (IMPALA-9957) Impalad crashes when serializing large rows in aggregation spilling

    [ https://issues.apache.org/jira/browse/IMPALA-9957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184749#comment-17184749 ] 

ASF subversion and git services commented on IMPALA-9957:
---------------------------------------------------------

Commit e0a6e942b28909baa0f56e21e3d33adfb5eb19b7 in impala's branch refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e0a6e94 ]

IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

The minimum requirement for a spillable operator is ((min_buffers -2) *
default_buffer_size) + 2 * max_row_size. In the min reservation, we only
reserve space for two large pages, one for reading, the other for
writing. However, to make the non-streaming GroupingAggregator work
correctly, we have to manage these extra reservations carefully. So it
won't run out of the min reservation when it actually needs to spill a
large page, or when it actually needs to read a large page.

To be specific, for how to manage the large write page reservation,
depending on whether needs_serialize is true or false:
- If the aggregator needs to serialize the intermediate results when
  spilling a partition, we have to save a large page worth of
  reservation for the serialize stream, in case it needs to write large
  rows. This space can be restored when all the partitions are spilled
  so the serialize stream is not needed until we build/repartition a
  spilled partition and thus have pinned partitions again. If the large
  write page reservation is used, we save it back whenever possible
  after we spill or close a partition.
- If the aggregator doesn't need the serialize stream at all, we can
  restore the large write page reservation whenever we fail to add a
  large row, before spilling any partitions. Reclaim it whenever
  possible after we spill or close a partition.
A special case is when we are processing a large row and it's the last
row in building/repartitioning a spilled partition, the large write page
reservation can be restored for it no matter whether we need the
serialize stream. Because partitions will be read out after this so no
needs for spilling.

For the large read page reservation, it's transferred to the spilled
BufferedTupleStream that we are reading in building/repartitioning a
spilled partition. The stream will restore some of it when reading a
large page, and reclaim it when the output row batch is reset. Note that
the stream is read in attach_on_read mode, the large page will be
attached to the row batch's buffers and only get freed when the row
batch is reset.

Tests:
- Add tests in test_spilling_large_rows (test_spilling.py) with
  different row sizes to reproduce the issue.
- One test in test_spilling_no_debug_action becomes flaky after this
  patch. Revise the query to make the udf allocate larger strings so it
  can consistently pass.
- Run CORE tests.

Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Reviewed-on: http://gerrit.cloudera.org:8080/16240
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Tim Armstrong <ta...@cloudera.com>


> Impalad crashes when serializing large rows in aggregation spilling
> -------------------------------------------------------------------
>
>                 Key: IMPALA-9957
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9957
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> Queries to reproduce the crash using the testdata:
> {code:sql}
> create table bigstrs stored as parquet as
>   select *, repeat(uuid(), cast(random() * 100000 as int)) as bigstr
>   from functional.alltypes;
> set MAX_ROW_SIZE=3.5MB;
> set MEM_LIMIT=4GB;
> create table my_str_group stored as parquet as
>   select group_concat(string_col) as ss, bigstr
>   from bigstrs group by bigstr;
> {code}
> The last query 1) has large rows, 2) needs spilling in aggregation 3) has aggregation on functions needs serialize (e.g. group_concat, appx_median, min(string), etc). With these 3 conditions, it will trigger this bug.
>  The crash stacktraces are different in different build modes. Crash stacktrace in RELEASE build with codegen enabled:
> {code:java}
> Thread 316 (crashed)
>  0  impalad!impala::HashTable::Close() [hash-table.cc : 512 + 0x0]
>  1  impalad!impala::GroupingAggregator::Partition::Spill(bool) [grouping-aggregator-partition.cc : 180 + 0x9]
>  2  impalad!impala::GroupingAggregator::SpillPartition(bool) [grouping-aggregator.cc : 904 + 0x10]
>  3  0x7f5fba83db3c
>  4  impalad!impala::GroupingAggregator::AddBatch(impala::RuntimeState*, impala::RowBatch*) [grouping-aggregator.cc : 437 + 0x2]
>  5  impalad!impala::AggregationNode::Open(impala::RuntimeState*) [aggregation-node.cc : 70 + 0x6]
>  6  libstdc++.so.6.0.24 + 0x120b28
>  7  impalad!apache::hive::service::cli::thrift::TColumnValue::printTo(std::ostream&) const [converter_lexical_streams.hpp : 161 + 0x8]
>  8  impalad!impala::FragmentInstanceState::Open() [fragment-instance-state.cc : 396 + 0x11]
>  9  impalad!tc_newarray + 0x171
> {code}
> Crash stacktrace in RELEASE build with codegen disabled (set DISABLE_CODEGEN=true):
> {code:java}
> Thread 320 (crashed)
>  0  impalad!impala::HashTable::Close() [hash-table.cc : 512 + 0x0]
>  1  impalad!impala::GroupingAggregator::Partition::Spill(bool) [grouping-aggregator-partition.cc : 180 + 0x9]
>  2  impalad!impala::GroupingAggregator::SpillPartition(bool) [grouping-aggregator.cc : 904 + 0x10]
>  3  impalad!impala::Status impala::GroupingAggregator::AddBatchImpl<false>(impala::RowBatch*, impala::TPrefetchMode::type, impala::HashTableCtx*) [grouping-aggregator-ir.cc : 148 + 0x11]
>  4  impalad!impala::GroupingAggregator::AddBatch(impala::RuntimeState*, impala::RowBatch*) [grouping-aggregator.cc : 439 + 0x5]
>  5  impalad!impala::AggregationNode::Open(impala::RuntimeState*) [aggregation-node.cc : 70 + 0x6]
>  6  impalad!impala::FragmentInstanceState::Open() [fragment-instance-state.cc : 396 + 0x11]
>  7  impalad!impala::FragmentInstanceState::Exec() [fragment-instance-state.cc : 97 + 0x12]
>  8  impalad!impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) [query-state.cc : 815 + 0x19]
>  9  impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*) [function_template.hpp : 770 + 0x7]
> 10  impalad!boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*), boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::ThreadDebugInfo*>, boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > >::run() [bind.hpp : 531 + 0xc]
> 11  impalad!thread_proxy + 0x72
> 12  libpthread-2.23.so + 0x76ba
> 13  libc-2.23.so + 0x1074dd
> {code}
> Crash stacktrace in DEBUG build with codegen disabled is a bit ealier - crashed at a DCHECK:
> {code:java}
> F0715 20:29:24.389505 16868 grouping-aggregator-partition.cc:125] 1d4b40df02e6ad76:433ed57400000003] Check failed: !status.ok() Stream was unpinned - AddRow() only fails on error
> *** Check failure stack trace: ***
>     @          0x513f31c  google::LogMessage::Fail()
>     @          0x5140c0c  google::LogMessage::SendToLog()
>     @          0x513ec7a  google::LogMessage::Flush()
>     @          0x5142878  google::LogMessageFatal::~LogMessageFatal()
>     @          0x28b2ca7  impala::GroupingAggregator::Partition::SerializeStreamForSpilling()
>     @          0x28b360f  impala::GroupingAggregator::Partition::Spill()
>     @          0x28a4122  impala::GroupingAggregator::SpillPartition()
>     @          0x28b169c  impala::GroupingAggregator::AddIntermediateTuple<>()
>     @          0x28b09d9  impala::GroupingAggregator::ProcessRow<>()
>     @          0x28af535  impala::GroupingAggregator::AddBatchImpl<>()
>     @          0x289f6ad  impala::GroupingAggregator::AddBatch()
>     @          0x28db463  impala::AggregationNode::Open()
>     @          0x22598bd  impala::FragmentInstanceState::Open()
>     @          0x22562f4  impala::FragmentInstanceState::Exec()
>     @          0x22801ed  impala::QueryState::ExecFInstance()
>     @          0x227e5ef  _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
>     @          0x2281d8e  _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
>     @          0x204d7d1  boost::function0<>::operator()()
>     @          0x26702d5  impala::Thread::SuperviseThread()
>     @          0x2678272  boost::_bi::list5<>::operator()<>()
>     @          0x2678196  boost::_bi::bind_t<>::operator()()
>     @          0x2678157  boost::detail::thread_data<>::run()
>     @          0x3e45d71  thread_proxy
>     @     0x7ff1bfdc46b9  start_thread
>     @     0x7ff1bc98f4dc  clone
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org