You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/08/25 21:42:00 UTC
[jira] [Commented] (IMPALA-9955) Internal error for a query with large rows and spilling

    [ https://issues.apache.org/jira/browse/IMPALA-9955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17184747#comment-17184747 ] 

ASF subversion and git services commented on IMPALA-9955:
---------------------------------------------------------

Commit e0a6e942b28909baa0f56e21e3d33adfb5eb19b7 in impala's branch refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e0a6e94 ]

IMPALA-9955,IMPALA-9957: Fix not enough reservation for large pages in GroupingAggregator

The minimum requirement for a spillable operator is ((min_buffers -2) *
default_buffer_size) + 2 * max_row_size. In the min reservation, we only
reserve space for two large pages, one for reading, the other for
writing. However, to make the non-streaming GroupingAggregator work
correctly, we have to manage these extra reservations carefully. So it
won't run out of the min reservation when it actually needs to spill a
large page, or when it actually needs to read a large page.

To be specific, for how to manage the large write page reservation,
depending on whether needs_serialize is true or false:
- If the aggregator needs to serialize the intermediate results when
  spilling a partition, we have to save a large page worth of
  reservation for the serialize stream, in case it needs to write large
  rows. This space can be restored when all the partitions are spilled
  so the serialize stream is not needed until we build/repartition a
  spilled partition and thus have pinned partitions again. If the large
  write page reservation is used, we save it back whenever possible
  after we spill or close a partition.
- If the aggregator doesn't need the serialize stream at all, we can
  restore the large write page reservation whenever we fail to add a
  large row, before spilling any partitions. Reclaim it whenever
  possible after we spill or close a partition.
A special case is when we are processing a large row and it's the last
row in building/repartitioning a spilled partition, the large write page
reservation can be restored for it no matter whether we need the
serialize stream. Because partitions will be read out after this so no
needs for spilling.

For the large read page reservation, it's transferred to the spilled
BufferedTupleStream that we are reading in building/repartitioning a
spilled partition. The stream will restore some of it when reading a
large page, and reclaim it when the output row batch is reset. Note that
the stream is read in attach_on_read mode, the large page will be
attached to the row batch's buffers and only get freed when the row
batch is reset.

Tests:
- Add tests in test_spilling_large_rows (test_spilling.py) with
  different row sizes to reproduce the issue.
- One test in test_spilling_no_debug_action becomes flaky after this
  patch. Revise the query to make the udf allocate larger strings so it
  can consistently pass.
- Run CORE tests.

Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Reviewed-on: http://gerrit.cloudera.org:8080/16240
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Tim Armstrong <ta...@cloudera.com>


> Internal error for a query with large rows and spilling
> -------------------------------------------------------
>
>                 Key: IMPALA-9955
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9955
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>         Attachments: impalad.INFO, impalad_node1.INFO, impalad_node2.INFO
>
>
> Encounter a query failure due to internal error:
> {code:java}
> create table bigstrs stored as parquet as select *, repeat(uuid(), cast(random() * 100000 as int)) as bigstr from functional.alltypes;
> set MAX_ROW_SIZE=3.5MB;
> set MEM_LIMIT=4GB;
> set DISABLE_CODEGEN=true;
> create table my_cnt stored as parquet as select count(*) as cnt, bigstr from bigstrs group by bigstr;
> {code}
> The error is
> {code:java}
> ERROR: Internal error: couldn't pin large page of 4194304 bytes, client only had 2097152 bytes of unused reservation:
> <BufferPool::Client> 0xcf9dae0 internal state: {<BufferPool::Client> 0xbdf6ac0 name: GroupingAggregator id=3 ptr=0xcf9d900 write_status:  buffers allocated 2097152 num_pages: 2094 pinned_bytes: 41943040 dirty_unpinned_bytes: 0 in_flight_write_bytes: 0 reservation: {<ReservationTracker>: reservation_limit 9223372036854775807 reservation 46137344 used_reservation 44040192 child_reservations 0 parent:
> <ReservationTracker>: reservation_limit 9223372036854775807 reservation 46137344 used_reservation 0 child_reservations 46137344 parent:
> <ReservationTracker>: reservation_limit 9223372036854775807 reservation 46137344 used_reservation 0 child_reservations 46137344 parent:
> <ReservationTracker>: reservation_limit 3435973836 reservation 46137344 used_reservation 0 child_reservations 46137344 parent:
> <ReservationTracker>: reservation_limit 6647046144 reservation 46137344 used_reservation 0 child_reservations 46137344 parent:
> NULL}
>   12 pinned pages: <BufferPool::Page> 0xc9160a0 len: 2097152 pin_count: 1 buf: <BufferPool::BufferHandle> 0xc916118 client: 0xcf9dae0/0xbdf6ac0 data: 0x13200000 len: 2097152
> <BufferPool::Page> 0xc919d40 len: 4194304 pin_count: 1 buf: <BufferPool::BufferHandle> 0xc919db8 client: 0xcf9dae0/0xbdf6ac0 data: 0x124600000 len: 4194304
> <BufferPool::Page> 0xd42aaa0 len: 4194304 pin_count: 1 buf: <BufferPool::BufferHandle> 0xd42ab18 client: 0xcf9dae0/0xbdf6ac0 data: 0x12b200000 len: 4194304
> <BufferPool::Page> 0xd42b900 len: 4194304 pin_count: 1 buf: <BufferPool::BufferHandle> 0xd42b978 client: 0xcf9dae0/0xbdf6ac0 data: 0x132a00000 len: 4194304
> <BufferPool::Page> 0xd42d3e0 len: 2097152 pin_count: 1 buf: <BufferPool::BufferHandle> 0xd42d458 client: 0xcf9dae0/0xbdf6ac0 data: 0xc6a00000 len: 2097152
> <BufferPool::Page> 0xd42dd40 len: 4194304 pin_count: 1 buf: <BufferPool::BufferHandle> 0xd42ddb8 client: 0xcf9dae0/0xbdf6ac0 data: 0x132e00000 len: 4194304
> <BufferPool::Page> 0xd42de80 len: 4194304 pin_count: 1 buf: <BufferPool::BufferHandle> 0xd42def8 client: 0xcf9dae0/0xbdf6ac0 data: 0x137c00000 len: 4194304
> <BufferPool::Page> 0x12d48320 len: 4194304 pin_count: 1 buf: <BufferPool::BufferHandle> 0x12d48398 client: 0xcf9dae0/0xbdf6ac0 data: 0x102c00000 len: 4194304
> <BufferPool::Page> 0x12d483c0 len: 4194304 pin_count: 1 buf: <BufferPool::BufferHandle> 0x12d48438 client: 0xcf9dae0/0xbdf6ac0 data: 0x108a00000 len: 4194304
> <BufferPool::Page> 0x12d48780 len: 4194304 pin_count: 1 buf: <BufferPool::BufferHandle> 0x12d487f8 client: 0xcf9dae0/0xbdf6ac0 data: 0x108e00000 len: 4194304
> <BufferPool::Page> 0x12d492c0 len: 2097152 pin_count: 1 buf: <BufferPool::BufferHandle> 0x12d49338 client: 0xcf9dae0/0xbdf6ac0 data: 0x127600000 len: 2097152
> <BufferPool::Page> 0x12d4a9e0 len: 2097152 pin_count: 1 buf: <BufferPool::BufferHandle> 0x12d4aa58 client: 0xcf9dae0/0xbdf6ac0 data: 0x12d200000 len: 2097152
>   0 dirty unpinned pages: 
>   0 in flight write pages: }
> {code}
> Found the stacktrace from the log:
> {code}
>     @          0x1c9dfbe  impala::Status::Status()
>     @          0x1ca5a78  impala::Status::Status()
>     @          0x2bfe4ec  impala::BufferedTupleStream::NextReadPage()
>     @          0x2c04b72  impala::BufferedTupleStream::GetNextInternal<>()
>     @          0x2c029e6  impala::BufferedTupleStream::GetNextInternal<>()
>     @          0x2bffd19  impala::BufferedTupleStream::GetNext()
>     @          0x28aa43f  impala::GroupingAggregator::ProcessStream<>()
>     @          0x28a2e17  impala::GroupingAggregator::BuildSpilledPartition()
>     @          0x28a2401  impala::GroupingAggregator::NextPartition()
>     @          0x289df5a  impala::GroupingAggregator::GetRowsFromPartition()
>     @          0x289db20  impala::GroupingAggregator::GetNext()
>     @          0x28dbfc7  impala::AggregationNode::GetNext()
>     @          0x2259dfc  impala::FragmentInstanceState::ExecInternal()
>     @          0x22564a0  impala::FragmentInstanceState::Exec()
>     @          0x22801ed  impala::QueryState::ExecFInstance()
>     @          0x227e5ef  _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
>     @          0x2281d8e  _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
>     @          0x204d7d1  boost::function0<>::operator()()
>     @          0x26702d5  impala::Thread::SuperviseThread()
>     @          0x2678272  boost::_bi::list5<>::operator()<>()
>     @          0x2678196  boost::_bi::bind_t<>::operator()()
>     @          0x2678157  boost::detail::thread_data<>::run()
>     @          0x3e45d71  thread_proxy
>     @     0x7fc8339656b9  start_thread
>     @     0x7fc8305314dc  clone
> {code}
> From the codes, the comment indicates that there is a bug:
> {code:c++}
> Status BufferedTupleStream::NextReadPage(ReadIterator* read_iter) {
>   ...
>   int64_t read_page_len = read_iter->read_page_->len();
>   if (!pinned_ && read_page_len > default_page_len_
>       && buffer_pool_client_->GetUnusedReservation() < read_page_len) {
>     // If we are iterating over an unpinned stream and encounter a page that is larger
>     // than the default page length, then unpinning the previous page may not have
>     // freed up enough reservation to pin the next one. The client is responsible for
>     // ensuring the reservation is available, so this indicates a bug.
>     return Status(TErrorCode::INTERNAL_ERROR, Substitute("Internal error: couldn't pin "
>           "large page of $0 bytes, client only had $1 bytes of unused reservation:\n$2",
>           read_page_len, buffer_pool_client_->GetUnusedReservation(),
>           buffer_pool_client_->DebugString()));
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org