You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Riza Suminto (Jira)" <ji...@apache.org> on 2022/01/04 22:16:00 UTC
[jira] [Created] (IMPALA-11068) Query hit OOM under high decompression activity

Riza Suminto created IMPALA-11068:
-------------------------------------

             Summary: Query hit OOM under high decompression activity
                 Key: IMPALA-11068
                 URL: https://issues.apache.org/jira/browse/IMPALA-11068
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
            Reporter: Riza Suminto
            Assignee: Riza Suminto


A customer report query hitting OOM over wide table and heavy decompression activity. The impala cluster was running with scanner thread parallelism (MT_DOP=0).

The following is the error message shown:
Errors: Memory limit exceeded: ParquetColumnChunkReader::InitDictionary() failed to allocate 969825 bytes for dictionary.
HDFS_SCAN_NODE (id=0) could not allocate 947.09 KB without exceeding limit.
Error occurred on backend [redacted]:22000 by fragment d346730dc3a3771e:c24e3ccf00000008
Memory left in process limit: 233.77 GB
Memory left in query limit: 503.51 KB
Query(d346730dc3a3771e:c24e3ccf00000000): Limit=4.13 GB Reservation=3.30 GB ReservationLimit=3.30 GB OtherMemory=849.17 MB Total=4.13 GB Peak=4.13 GB
 Fragment d346730dc3a3771e:c24e3ccf00000008: Reservation=3.30 GB OtherMemory=849.59 MB Total=4.13 GB Peak=4.13 GB
 

I look at the corresponding profile of the fragment and notice some key counters as follow:
{code:java}
      Instance d346730dc3a3771e:c24e3ccf00000008 (host=[redacted]:22000)
      ...
          HDFS_SCAN_NODE (id=0)
          ...
            - AverageHdfsReadThreadConcurrency: 8.00 (8.0)
            - AverageScannerThreadConcurrency: 23.00 (23.0)
            - BytesRead: 2.4 GiB (2619685502)
            ...
            - NumScannerThreadMemUnavailable: 1 (1)
            - NumScannerThreadReservationsDenied: 0 (0)
            - NumScannerThreadsStarted: 23 (23)
            - NumScannersWithNoReads: 12 (12)
            - NumStatsFilteredPages: 4,032 (4032)
            - NumStatsFilteredRowGroups: 1 (1)
            - PeakMemoryUsage: 4.1 GiB (4431745197)
            - PeakScannerThreadConcurrency: 23 (23)
            - PerReadThreadRawHdfsThroughput: 842.1 MiB/s (882954163)
            - RemoteScanRanges: 11 (11)
            - RowBatchBytesEnqueued: 1.1 GiB (1221333486)
            - RowBatchQueueGetWaitTime: 1.83s (1833499080)
            - RowBatchQueuePeakMemoryUsage: 599.3 MiB (628430704)
            - RowBatchQueuePutWaitTime: 1ms (1579356)
            - RowBatchesEnqueued: 124 (124)
            - RowsRead: 2,725,888 (2725888)
            - RowsReturned: 0 (0){code}
 

Based on these counters, I assume following scenario happened:
 # The concurrent scanner thread count peak at 23 (NumScannerThreadsStarted, PeakScannerThreadConcurrency).
 # Scanner node seems to try schedule the 24th thread, but backend denies it, as indicated by NumScannerThreadMemUnavailable=1. 
 # The running threads has been producing output row batches (RowBatchesEnqueued=124), but the next exec node above it has not fetch any yet (RowsReturned=0). So active scanner threads has been consuming its memory reservation, including decompression activity that is happening in [parquet-column-chunk-reader.cc|https://github.com/apache/impala/blob/df42225/be/src/exec/parquet/parquet-column-chunk-reader.cc#L155-L177].
 # Just before the scanner node failed, it has consume Reservation=3.30 GB  and OtherMemory=849.59 MB. So per thread is around Reservation=146.92 MB and OtherMemory=36.94 MB. This is close, but slightly higher, to planner initial mem-reservation=128.00 MB for scanner node and 32 MB of [hdfs_scanner_thread_max_estimated_bytes|https://github.com/apache/impala/blob/df42225/be/src/exec/hdfs-scan-node.cc#L57-L63] for decompression usage per thread.
Note that the 32 MB of hdfs_scanner_thread_max_estimated_bytes is a non-reserved bytes. Meaning, they only allocated as needed during column chunk decompression, but we think that in most cases they wont require more than 32 MB.

From these insight, I'm suspecting that when scanner node schedule the 23rd thread, the memory reservation left was just barely fit the per-thread consumption estimate (128.00 MB + 32 MB), and the backend allow it to start. As the decompression process goes, one of the scanner thread tried to allocate more memory than what is left in reservation at ParquetColumnChunkReader::InitDictionary(). If the 23rd thread was not launched, we might have enough memory to serve decompression requirement.

One solution to avoid this OOM is to change our per-thread memory estimation in [scanner-mem-limiter.cc|https://github.com/apache/impala/blob/df42225/be/src/runtime/scanner-mem-limiter.cc#L59]. Maybe we should deny reservation once memory spare capacity can not fit 2 threads allocation consecutively (ie., always leave headroom of 1 thread allocation).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org