You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2017/08/22 05:18:00 UTC

[jira] [Resolved] (IMPALA-5158) Account for difference between process memory consumption and memory used by queries

     [ https://issues.apache.org/jira/browse/IMPALA-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong resolved IMPALA-5158.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.10.0


  IMPALA-5158,IMPALA-5236: account for unused buffer pool reservations

We were missing accounting for this, since it is part of the expected
difference between query and process memory consumption. The identity
that applies is is:

  buffers allocated from system =
      reservation + cached buffers - unused reservation

Where "cached buffers" includes free buffers and buffers attached to
clean pages. The reservation is accounted against queries and "buffers
allocated from system" is accounted against the process MemTracker.

Reporting this in a direct way required adding a MemTracker with a
negative consumption consumption, which fortunately did not require
any major changes to the MemTracker code.

Example output when applied to buffer pool branch:

  Process: Limit=8.35 GB Total=579.18 MB Peak=590.41 MB
    Buffer Pool: Free Buffers: Total=268.25 MB
    Buffer Pool: Clean Pages: Total=172.25 MB
    Buffer Pool: Unused Reservation: Total=-8.25 MB
    Free Disk IO Buffers: Total=21.98 MB Peak=21.98 MB
    RequestPool=default-pool: Total=12.07 MB Peak=71.58 MB
      ... <snip> ...
    RequestPool=fe-eval-exprs: Total=0 Peak=4.00 KB
    Untracked Memory: Total=112.88 MB

Testing:
Added a basic test for MemTrackers with negative metrics.

Change-Id: Idb1fa3110dc893321f9f4e8ced6b7ede12194dad
Reviewed-on: http://gerrit.cloudera.org:8080/7380
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins

> Account for difference between process memory consumption and memory used by queries
> ------------------------------------------------------------------------------------
>
>                 Key: IMPALA-5158
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5158
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>    Affects Versions: Impala 2.9.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Tim Armstrong
>            Priority: Critical
>              Labels: observability
>             Fix For: Impala 2.10.0
>
>         Attachments: Screen Shot 2017-04-04 at 10.11.12 AM.png, tpcds_q78_memory_discrepancy.txt
>
>
> There is discrepancy between process wide memory usage and memory used by the query, in the example below the query is using 26.54 GB while the process is reporting 11.63 GB. We should make sure that all the memory is accounted for:
> * Used/unused reservations
> * Cached free buffers
> * Clean pages that haven't been evicted
> * True untracked memory (e.g. LLVM, etc)
> * Make the TCMalloc memory report less prominent, since it is less important now.
> Process memory usage
> {code}
> Memory Usage
> Memory consumption / limit: 11.63 GB / 220.00 GB
> {code}
> Breakdown 
> {code}
> Process: Limit=220.00 GB Total=11.63 GB Peak=183.69 GB
>   Free Disk IO Buffers: Total=1.71 GB Peak=1.71 GB
>   RequestPool=fe-eval-exprs: Total=0 Peak=4.00 KB
>   RequestPool=root.mmokhtar: Total=26.54 GB Peak=90.13 GB
>     Query(4b403f9f6fe2ecf6:4f81291100000000): Limit=55.00 GB Total=26.54 GB Peak=44.26 GB
>       Fragment 4b403f9f6fe2ecf6:4f81291100000000: BufferPoolUsed/Reservation=0/0 OtherMemory=84.80 KB Total=84.80 KB Peak=1.53 MB
>         EXCHANGE_NODE (id=36): Total=4.00 KB Peak=4.00 KB
>           Exprs: Total=4.00 KB Peak=4.00 KB
>         DataStreamRecvr: Total=67.38 KB Peak=67.38 KB
>         PLAN_ROOT_SINK: Total=0 Peak=0
>         CodeGen: Total=5.42 KB Peak=1.52 MB
>       Fragment 4b403f9f6fe2ecf6:4f81291100000045: BufferPoolUsed/Reservation=0/26.54 GB OtherMemory=243.20 KB Total=26.54 GB Peak=26.54 GB
>         SORT_NODE (id=20): Total=56.00 KB Peak=56.00 KB
>           Exprs: Total=4.00 KB Peak=4.00 KB
>         HASH_JOIN_NODE (id=19): BufferPoolUsed/Reservation=6.62 GB/16.14 GB OtherMemory=56.25 KB Total=16.14 GB Peak=16.14 GB
>           Exprs: Total=4.00 KB Peak=4.00 KB
>           Hash Join Builder (join_node_id=19): Total=14.12 KB Peak=22.12 KB
>         HASH_JOIN_NODE (id=18): BufferPoolUsed/Reservation=0/7.31 GB OtherMemory=66.25 KB Total=7.31 GB Peak=7.31 GB
>           Exprs: Total=4.00 KB Peak=4.00 KB
>           Hash Join Builder (join_node_id=18): Total=23.12 KB Peak=31.12 KB
>         AGGREGATION_NODE (id=25): BufferPoolUsed/Reservation=0/3.08 GB OtherMemory=31.12 KB Total=3.08 GB Peak=3.08 GB
>           Exprs: Total=8.00 KB Peak=8.00 KB
>         EXCHANGE_NODE (id=24): Total=0 Peak=0
>         EXCHANGE_NODE (id=30): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=2.47 MB
>         EXCHANGE_NODE (id=35): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=22.96 MB
>         DataStreamSender (dst_id=36): Total=1008.00 B Peak=1008.00 B
>         CodeGen: Total=24.59 KB Peak=4.67 MB
> {code}
> This was captured while running TPC-DS Q78
> {code}
> with ws as
>   (select d_year AS ws_sold_year, ws_item_sk,
>     ws_bill_customer_sk ws_customer_sk,
>     sum(ws_quantity) ws_qty,
>     sum(ws_wholesale_cost) ws_wc,
>     sum(ws_sales_price) ws_sp
>    from web_sales
>    left join web_returns on wr_order_number=ws_order_number and ws_item_sk=wr_item_sk
>    join date_dim on ws_sold_date_sk = d_date_sk
>    where wr_order_number is null
>    group by d_year, ws_item_sk, ws_bill_customer_sk
>    ),
> cs as
>   (select d_year AS cs_sold_year, cs_item_sk,
>     cs_bill_customer_sk cs_customer_sk,
>     sum(cs_quantity) cs_qty,
>     sum(cs_wholesale_cost) cs_wc,
>     sum(cs_sales_price) cs_sp
>    from catalog_sales
>    left join catalog_returns on cr_order_number=cs_order_number and cs_item_sk=cr_item_sk
>    join date_dim on cs_sold_date_sk = d_date_sk
>    where cr_order_number is null
>    group by d_year, cs_item_sk, cs_bill_customer_sk
>    ),
> ss as
>   (select d_year AS ss_sold_year, ss_item_sk,
>     ss_customer_sk,
>     sum(ss_quantity) ss_qty,
>     sum(ss_wholesale_cost) ss_wc,
>     sum(ss_sales_price) ss_sp
>    from store_sales
>    left join store_returns on sr_ticket_number=ss_ticket_number and ss_item_sk=sr_item_sk
>    join date_dim on ss_sold_date_sk = d_date_sk
>    where sr_ticket_number is null
>    group by d_year, ss_item_sk, ss_customer_sk
>    )
>  select 
> ss_sold_year, ss_item_sk, ss_customer_sk,
> round(ss_qty/(coalesce(ws_qty+cs_qty,1)),2) ratio,
> ss_qty store_qty, ss_wc store_wholesale_cost, ss_sp store_sales_price,
> coalesce(ws_qty,0)+coalesce(cs_qty,0) other_chan_qty,
> coalesce(ws_wc,0)+coalesce(cs_wc,0) other_chan_wholesale_cost,
> coalesce(ws_sp,0)+coalesce(cs_sp,0) other_chan_sales_price
> from ss
> left join ws on (ws_sold_year=ss_sold_year and ws_item_sk=ss_item_sk and ws_customer_sk=ss_customer_sk)
> left join cs on (cs_sold_year=ss_sold_year and cs_item_sk=cs_item_sk and cs_customer_sk=ss_customer_sk)
> where coalesce(ws_qty,0)>0 and coalesce(cs_qty, 0)>0 and ss_sold_year=2002
> order by 
>   ss_sold_year, ss_item_sk, ss_customer_sk,
>   ss_qty desc, ss_wc desc, ss_sp desc,
>   other_chan_qty,
>   other_chan_wholesale_cost,
>   other_chan_sales_price,
>   round(ss_qty/(coalesce(ws_qty+cs_qty,1)),2)
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)