You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Riza Suminto (Jira)" <ji...@apache.org> on 2022/02/04 16:11:00 UTC
[jira] [Commented] (IMPALA-11068) Query hit OOM under high decompression activity
[ https://issues.apache.org/jira/browse/IMPALA-11068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487151#comment-17487151 ]
Riza Suminto commented on IMPALA-11068:
---------------------------------------
Initial patch submitted at: https://gerrit.cloudera.org/#/c/18126/
> Query hit OOM under high decompression activity
> -----------------------------------------------
>
> Key: IMPALA-11068
> URL: https://issues.apache.org/jira/browse/IMPALA-11068
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Riza Suminto
> Assignee: Riza Suminto
> Priority: Major
>
> A customer report query hitting OOM over wide table and heavy decompression activity. The impala cluster was running with scanner thread parallelism (MT_DOP=0).
> The following is the error message shown:
> {code:java}
> Errors: Memory limit exceeded: ParquetColumnChunkReader::InitDictionary() failed to allocate 969825 bytes for dictionary.
> HDFS_SCAN_NODE (id=0) could not allocate 947.09 KB without exceeding limit.
> Error occurred on backend [redacted]:22000 by fragment d346730dc3a3771e:c24e3ccf00000008
> Memory left in process limit: 233.77 GB
> Memory left in query limit: 503.51 KB
> Query(d346730dc3a3771e:c24e3ccf00000000): Limit=4.13 GB Reservation=3.30 GB ReservationLimit=3.30 GB OtherMemory=849.17 MB Total=4.13 GB Peak=4.13 GB
> Fragment d346730dc3a3771e:c24e3ccf00000008: Reservation=3.30 GB OtherMemory=849.59 MB Total=4.13 GB Peak=4.13 GB{code}
>
> I look at the corresponding profile of the fragment and notice some key counters as follow:
> {code:java}
> Instance d346730dc3a3771e:c24e3ccf00000008 (host=[redacted]:22000)
> ...
> HDFS_SCAN_NODE (id=0)
> ...
> - AverageHdfsReadThreadConcurrency: 8.00 (8.0)
> - AverageScannerThreadConcurrency: 23.00 (23.0)
> - BytesRead: 2.4 GiB (2619685502)
> ...
> - NumScannerThreadMemUnavailable: 1 (1)
> - NumScannerThreadReservationsDenied: 0 (0)
> - NumScannerThreadsStarted: 23 (23)
> - NumScannersWithNoReads: 12 (12)
> - NumStatsFilteredPages: 4,032 (4032)
> - NumStatsFilteredRowGroups: 1 (1)
> - PeakMemoryUsage: 4.1 GiB (4431745197)
> - PeakScannerThreadConcurrency: 23 (23)
> - PerReadThreadRawHdfsThroughput: 842.1 MiB/s (882954163)
> - RemoteScanRanges: 11 (11)
> - RowBatchBytesEnqueued: 1.1 GiB (1221333486)
> - RowBatchQueueGetWaitTime: 1.83s (1833499080)
> - RowBatchQueuePeakMemoryUsage: 599.3 MiB (628430704)
> - RowBatchQueuePutWaitTime: 1ms (1579356)
> - RowBatchesEnqueued: 124 (124)
> - RowsRead: 2,725,888 (2725888)
> - RowsReturned: 0 (0){code}
>
> Based on these counters, I assume following scenario happened:
> # The concurrent scanner thread count peak at 23 (NumScannerThreadsStarted, PeakScannerThreadConcurrency).
> # Scanner node seems to try schedule the 24th thread, but backend denies it, as indicated by NumScannerThreadMemUnavailable=1.
> # The running threads has been producing output row batches (RowBatchesEnqueued=124), but the next exec node above it has not fetch any yet (RowsReturned=0). So active scanner threads has been consuming its memory reservation, including decompression activity that is happening in [parquet-column-chunk-reader.cc|https://github.com/apache/impala/blob/df42225/be/src/exec/parquet/parquet-column-chunk-reader.cc#L155-L177].
> # Just before the scanner node failed, it has consume Reservation=3.30 GB and OtherMemory=849.59 MB. So per thread is around Reservation=146.92 MB and OtherMemory=36.94 MB. This is close, but slightly higher, to planner initial mem-reservation=128.00 MB for scanner node and 32 MB of [hdfs_scanner_thread_max_estimated_bytes|https://github.com/apache/impala/blob/df42225/be/src/exec/hdfs-scan-node.cc#L57-L63] for decompression usage per thread.
> Note that the 32 MB of hdfs_scanner_thread_max_estimated_bytes is a non-reserved bytes. Meaning, they only allocated as needed during column chunk decompression, but we think that in most cases they wont require more than 32 MB.
> From these insight, I'm suspecting that when scanner node schedule the 23rd thread, the memory reservation left was just barely fit the per-thread consumption estimate (128.00 MB + 32 MB), and the backend allow it to start. As the decompression process goes, one of the scanner thread tried to allocate more memory than what is left in reservation at ParquetColumnChunkReader::InitDictionary(). If the 23rd thread was not launched, we might have enough memory to serve decompression requirement.
> One solution to avoid this OOM is to change our per-thread memory estimation in [scanner-mem-limiter.cc|https://github.com/apache/impala/blob/df42225/be/src/runtime/scanner-mem-limiter.cc#L59]. Maybe we should deny reservation once memory spare capacity can not fit 2 threads allocation consecutively (ie., always leave headroom of 1 thread allocation).
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org