You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Bikramjeet Vig (JIRA)" <ji...@apache.org> on 2017/12/18 22:11:03 UTC
[jira] [Resolved] (IMPALA-6222) Make it easier to root-cause "failed to get minimum memory reservation" error

     [ https://issues.apache.org/jira/browse/IMPALA-6222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bikramjeet Vig resolved IMPALA-6222.
------------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.12.0

https://github.com/apache/impala/commit/de29925912703ba73139ca9b34ab0e28712af45e

This patch adds the following details to the error message encountered
on failure to get minimum memory reservation:
- which ReservationTracker hit its limit
- top 5 admitted queries that are consuming the most memory under the
ReservationTracker that hit its limit

Testing:
- added tests to reservation-tracker-test.cc that verify the error
message returned for different cases.
- tested "initial reservation failed" condition manually to verify
the error message returned.

Change-Id: Ic4675fe923b33fdc4ddefd1872e6d6b803993d74
Reviewed-on: http://gerrit.cloudera.org:8080/8781
Reviewed-by: Bikramjeet Vig <bi...@cloudera.com>
Tested-by: Impala Public Jenkins

> Make it easier to root-cause "failed to get minimum memory reservation" error
> -----------------------------------------------------------------------------
>
>                 Key: IMPALA-6222
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6222
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.10.0
>            Reporter: Tim Armstrong
>            Assignee: Bikramjeet Vig
>              Labels: resource-management
>             Fix For: Impala 2.12.0
>
>
> A user reported this error message:
> {noformat}
>  ExecQueryFInstances rpc query_id=c94288312d6d4055:bbfa166500000000 failed: Failed to get minimum memory reservation of 26.69 MB on daemon hodor-030.edh.cloudera.com:22000 for query c94288312d6d4055:bbfa166500000000 because it would exceed an applicable memory limit. Memory is likely oversubscribed. Reducing query concurrency or configuring admission control may help avoid this error. Memory usage:
> Process: Limit=96.00 GB Total=16.54 GB Peak=83.37 GB
> {noformat}
> It turns out that a query was using up a lot of reservation, but it wasn't immediately apparent that the process reservation was mostly allocated to that query.
> {noformat}
> Process: Limit=96.00 GB Total=12.20 GB Peak=83.37 GB
>   Buffer Pool: Free Buffers: Total=208.00 MB
>   Buffer Pool: Clean Pages: Total=7.19 GB
>   Buffer Pool: Unused Reservation: Total=-76.79 GB
>   Free Disk IO Buffers: Total=1.37 GB Peak=1.37 GB
>   RequestPool=root.default: Total=76.81 GB Peak=77.56 GB
>     Query(464a9afdbf2646cf:d9e2d41100000000): Reservation=76.80 GB ReservationLimit=76.80 GB OtherMemory=6.69 MB Total=76.81 GB Peak=76.93 GB
>       Fragment 464a9afdbf2646cf:d9e2d4110000003f: Reservation=76.80 GB OtherMemory=6.69 MB Total=76.81 GB Peak=76.81 GB
>         SELECT_NODE (id=3): Total=20.00 KB Peak=9.02 MB
>           Exprs: Total=4.00 KB Peak=4.00 KB
>         ANALYTIC_EVAL_NODE (id=2): Reservation=4.00 MB OtherMemory=6.64 MB Total=10.64 MB Peak=15.04 MB
>           Exprs: Total=4.00 KB Peak=4.00 KB
>         SORT_NODE (id=1): Reservation=76.79 GB OtherMemory=16.00 KB Total=76.79 GB Peak=76.80 GB
>         EXCHANGE_NODE (id=4): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=10.19 MB
>         DataStreamSender (dst_id=5): Total=1.48 KB Peak=1.48 KB
>         CodeGen: Total=1.68 KB Peak=710.00 KB
>       Fragment 464a9afdbf2646cf:d9e2d41100000016: Reservation=0 OtherMemory=0 Total=0 Peak=389.69 MB
>         HDFS_SCAN_NODE (id=0): Total=0 Peak=388.54 MB
>         DataStreamSender (dst_id=4): Total=0 Peak=1.23 MB
>         CodeGen: Total=0 Peak=49.00 KB
>   Untracked Memory: Total=3.42 GB
> {noformat}
> When a user or admin sees this problem they really want to immediately know:
> * What resource is exhausted (i.e. the process-wide reservation)?
> * Which query(ies) are using it and how do I kill them (i.e. what are the query ids and coordinators of the query).
> We should think through the error messages and diagnostics and improve them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)