You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hawq.apache.org by "Ruilong Huo (JIRA)" <ji...@apache.org> on 2015/11/09 07:01:10 UTC

[jira] [Assigned] (HAWQ-139) Out of memory with 10 concurrent TPC-H workload in YARN mode

     [ https://issues.apache.org/jira/browse/HAWQ-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ruilong Huo reassigned HAWQ-139:
--------------------------------

    Assignee: Ruilong Huo  (was: Lei Chang)

> Out of memory with 10 concurrent TPC-H workload in YARN mode
> ------------------------------------------------------------
>
>                 Key: HAWQ-139
>                 URL: https://issues.apache.org/jira/browse/HAWQ-139
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: Resource Manager
>            Reporter: Ruilong Huo
>            Assignee: Ruilong Huo
>
> On a 18 node HAWQ cluster with YARN configured, it errors out with "out of memory" during 10 concurrent TPC-H (10G data per node) workload.
> Further analysis shows that one of TPC-H query 9 session oom using about 1.7G memory while the query is supposed to use about 1G memory.
> For a long term fix, we need to investigate on resource manager and executor to identify action items. For a short term fix, we give HAWQ 8G memory buffer instead of 2G by default.
> {code}
> 91265 [2015-11-03 12:31:15] select
>  nation,
>  o_year,
>  sum(amount) as sum_profit
> from
>  (
>  select
>  n_name as nation,
>  extract(year from o_orderdate) as o_year,
>  l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
>  from
>  part,
>  supplier,
>  lineitem,
>  partsupp,
>  orders,
>  nation
>  where
>  s_suppkey = l_suppkey
>  and ps_suppkey = l_suppkey
>  and ps_partkey = l_partkey
>  and p_partkey = l_partkey
>  and o_orderkey = l_orderkey
>  and s_nationkey = n_nationkey
>  and p_name like '%aquamarine%'
>  ) as profit
> group by
>  nation,
>  o_year
> order by
>  nation,
>  o_year desc;
> 91272 [2015-11-03 12:31:21] psql:/data1/gpadmin/pulse2-agent/agents/agent1/work/HAWQ-main-SystemTest-yarn/rhel5_x86_64/lsp/report/20151103-114720/performance_tpch_concurrent/tpch_parquet_10gpn_nocomp_part_random_10c_gpadmin/tmp/1_8_TPCH_Query_09.sql:32: ERROR:  Canceling query because of high VMEM usage. Used: 1748MB, available 480MB, red zone: 9216MB (runaway_cleaner.c:135)  (seg74 bcn-w3:5532 pid=33619) (dispatcher.c:1681)
> ***|tpch_parquet_10gpn_nocomp_part_random_10c_gpadmin_1_8_TPCH_Query_09.sql|127665
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)