You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Riza Suminto (Jira)" <ji...@apache.org> on 2022/02/04 16:11:00 UTC

[jira] [Commented] (IMPALA-11068) Query hit OOM under high decompression activity

    [ https://issues.apache.org/jira/browse/IMPALA-11068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487151#comment-17487151 ] 

Riza Suminto commented on IMPALA-11068:
---------------------------------------

Initial patch submitted at: https://gerrit.cloudera.org/#/c/18126/

> Query hit OOM under high decompression activity
> -----------------------------------------------
>
>                 Key: IMPALA-11068
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11068
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Riza Suminto
>            Assignee: Riza Suminto
>            Priority: Major
>
> A customer report query hitting OOM over wide table and heavy decompression activity. The impala cluster was running with scanner thread parallelism (MT_DOP=0).
> The following is the error message shown:
> {code:java}
> Errors: Memory limit exceeded: ParquetColumnChunkReader::InitDictionary() failed to allocate 969825 bytes for dictionary.
> HDFS_SCAN_NODE (id=0) could not allocate 947.09 KB without exceeding limit.
> Error occurred on backend [redacted]:22000 by fragment d346730dc3a3771e:c24e3ccf00000008
> Memory left in process limit: 233.77 GB
> Memory left in query limit: 503.51 KB
> Query(d346730dc3a3771e:c24e3ccf00000000): Limit=4.13 GB Reservation=3.30 GB ReservationLimit=3.30 GB OtherMemory=849.17 MB Total=4.13 GB Peak=4.13 GB
> Fragment d346730dc3a3771e:c24e3ccf00000008: Reservation=3.30 GB OtherMemory=849.59 MB Total=4.13 GB Peak=4.13 GB{code}
>  
> I look at the corresponding profile of the fragment and notice some key counters as follow:
> {code:java}
>       Instance d346730dc3a3771e:c24e3ccf00000008 (host=[redacted]:22000)
>       ...
>           HDFS_SCAN_NODE (id=0)
>           ...
>             - AverageHdfsReadThreadConcurrency: 8.00 (8.0)
>             - AverageScannerThreadConcurrency: 23.00 (23.0)
>             - BytesRead: 2.4 GiB (2619685502)
>             ...
>             - NumScannerThreadMemUnavailable: 1 (1)
>             - NumScannerThreadReservationsDenied: 0 (0)
>             - NumScannerThreadsStarted: 23 (23)
>             - NumScannersWithNoReads: 12 (12)
>             - NumStatsFilteredPages: 4,032 (4032)
>             - NumStatsFilteredRowGroups: 1 (1)
>             - PeakMemoryUsage: 4.1 GiB (4431745197)
>             - PeakScannerThreadConcurrency: 23 (23)
>             - PerReadThreadRawHdfsThroughput: 842.1 MiB/s (882954163)
>             - RemoteScanRanges: 11 (11)
>             - RowBatchBytesEnqueued: 1.1 GiB (1221333486)
>             - RowBatchQueueGetWaitTime: 1.83s (1833499080)
>             - RowBatchQueuePeakMemoryUsage: 599.3 MiB (628430704)
>             - RowBatchQueuePutWaitTime: 1ms (1579356)
>             - RowBatchesEnqueued: 124 (124)
>             - RowsRead: 2,725,888 (2725888)
>             - RowsReturned: 0 (0){code}
>  
> Based on these counters, I assume following scenario happened:
>  # The concurrent scanner thread count peak at 23 (NumScannerThreadsStarted, PeakScannerThreadConcurrency).
>  # Scanner node seems to try schedule the 24th thread, but backend denies it, as indicated by NumScannerThreadMemUnavailable=1. 
>  # The running threads has been producing output row batches (RowBatchesEnqueued=124), but the next exec node above it has not fetch any yet (RowsReturned=0). So active scanner threads has been consuming its memory reservation, including decompression activity that is happening in [parquet-column-chunk-reader.cc|https://github.com/apache/impala/blob/df42225/be/src/exec/parquet/parquet-column-chunk-reader.cc#L155-L177].
>  # Just before the scanner node failed, it has consume Reservation=3.30 GB  and OtherMemory=849.59 MB. So per thread is around Reservation=146.92 MB and OtherMemory=36.94 MB. This is close, but slightly higher, to planner initial mem-reservation=128.00 MB for scanner node and 32 MB of [hdfs_scanner_thread_max_estimated_bytes|https://github.com/apache/impala/blob/df42225/be/src/exec/hdfs-scan-node.cc#L57-L63] for decompression usage per thread.
> Note that the 32 MB of hdfs_scanner_thread_max_estimated_bytes is a non-reserved bytes. Meaning, they only allocated as needed during column chunk decompression, but we think that in most cases they wont require more than 32 MB.
> From these insight, I'm suspecting that when scanner node schedule the 23rd thread, the memory reservation left was just barely fit the per-thread consumption estimate (128.00 MB + 32 MB), and the backend allow it to start. As the decompression process goes, one of the scanner thread tried to allocate more memory than what is left in reservation at ParquetColumnChunkReader::InitDictionary(). If the 23rd thread was not launched, we might have enough memory to serve decompression requirement.
> One solution to avoid this OOM is to change our per-thread memory estimation in [scanner-mem-limiter.cc|https://github.com/apache/impala/blob/df42225/be/src/runtime/scanner-mem-limiter.cc#L59]. Maybe we should deny reservation once memory spare capacity can not fit 2 threads allocation consecutively (ie., always leave headroom of 1 thread allocation).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org