You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Ruilong Huo (JIRA)" <ji...@apache.org> on 2015/11/09 07:01:10 UTC
[jira] [Assigned] (HAWQ-139) Out of memory with 10 concurrent TPC-H
workload in YARN mode
[ https://issues.apache.org/jira/browse/HAWQ-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ruilong Huo reassigned HAWQ-139:
--------------------------------
Assignee: Ruilong Huo (was: Lei Chang)
> Out of memory with 10 concurrent TPC-H workload in YARN mode
> ------------------------------------------------------------
>
> Key: HAWQ-139
> URL: https://issues.apache.org/jira/browse/HAWQ-139
> Project: Apache HAWQ
> Issue Type: Bug
> Components: Resource Manager
> Reporter: Ruilong Huo
> Assignee: Ruilong Huo
>
> On a 18 node HAWQ cluster with YARN configured, it errors out with "out of memory" during 10 concurrent TPC-H (10G data per node) workload.
> Further analysis shows that one of TPC-H query 9 session oom using about 1.7G memory while the query is supposed to use about 1G memory.
> For a long term fix, we need to investigate on resource manager and executor to identify action items. For a short term fix, we give HAWQ 8G memory buffer instead of 2G by default.
> {code}
> 91265 [2015-11-03 12:31:15] select
> nation,
> o_year,
> sum(amount) as sum_profit
> from
> (
> select
> n_name as nation,
> extract(year from o_orderdate) as o_year,
> l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
> from
> part,
> supplier,
> lineitem,
> partsupp,
> orders,
> nation
> where
> s_suppkey = l_suppkey
> and ps_suppkey = l_suppkey
> and ps_partkey = l_partkey
> and p_partkey = l_partkey
> and o_orderkey = l_orderkey
> and s_nationkey = n_nationkey
> and p_name like '%aquamarine%'
> ) as profit
> group by
> nation,
> o_year
> order by
> nation,
> o_year desc;
> 91272 [2015-11-03 12:31:21] psql:/data1/gpadmin/pulse2-agent/agents/agent1/work/HAWQ-main-SystemTest-yarn/rhel5_x86_64/lsp/report/20151103-114720/performance_tpch_concurrent/tpch_parquet_10gpn_nocomp_part_random_10c_gpadmin/tmp/1_8_TPCH_Query_09.sql:32: ERROR: Canceling query because of high VMEM usage. Used: 1748MB, available 480MB, red zone: 9216MB (runaway_cleaner.c:135) (seg74 bcn-w3:5532 pid=33619) (dispatcher.c:1681)
> ***|tpch_parquet_10gpn_nocomp_part_random_10c_gpadmin_1_8_TPCH_Query_09.sql|127665
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)