You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Ruilong Huo (JIRA)" <ji...@apache.org> on 2015/11/09 07:01:10 UTC
[jira] [Created] (HAWQ-139) Out of memory with 10 concurrent TPC-H
workload in YARN mode
Ruilong Huo created HAWQ-139:
--------------------------------
Summary: Out of memory with 10 concurrent TPC-H workload in YARN mode
Key: HAWQ-139
URL: https://issues.apache.org/jira/browse/HAWQ-139
Project: Apache HAWQ
Issue Type: Bug
Components: Resource Manager
Reporter: Ruilong Huo
Assignee: Lei Chang
On a 18 node HAWQ cluster with YARN configured, it errors out with "out of memory" during 10 concurrent TPC-H (10G data per node) workload.
Further analysis shows that one of TPC-H query 9 session oom using about 1.7G memory while the query is supposed to use about 1G memory.
For a long term fix, we need to investigate on resource manager and executor to identify action items. For a short term fix, we give HAWQ 8G memory buffer instead of 2G by default.
{code}
91265 [2015-11-03 12:31:15] select
nation,
o_year,
sum(amount) as sum_profit
from
(
select
n_name as nation,
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
from
part,
supplier,
lineitem,
partsupp,
orders,
nation
where
s_suppkey = l_suppkey
and ps_suppkey = l_suppkey
and ps_partkey = l_partkey
and p_partkey = l_partkey
and o_orderkey = l_orderkey
and s_nationkey = n_nationkey
and p_name like '%aquamarine%'
) as profit
group by
nation,
o_year
order by
nation,
o_year desc;
91272 [2015-11-03 12:31:21] psql:/data1/gpadmin/pulse2-agent/agents/agent1/work/HAWQ-main-SystemTest-yarn/rhel5_x86_64/lsp/report/20151103-114720/performance_tpch_concurrent/tpch_parquet_10gpn_nocomp_part_random_10c_gpadmin/tmp/1_8_TPCH_Query_09.sql:32: ERROR: Canceling query because of high VMEM usage. Used: 1748MB, available 480MB, red zone: 9216MB (runaway_cleaner.c:135) (seg74 bcn-w3:5532 pid=33619) (dispatcher.c:1681)
***|tpch_parquet_10gpn_nocomp_part_random_10c_gpadmin_1_8_TPCH_Query_09.sql|127665
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)