You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Ruilong Huo (JIRA)" <ji...@apache.org> on 2015/10/13 08:57:05 UTC

[jira] [Comment Edited] (HAWQ-12) "Cannot allocate memory" in parquet_compression test in installcheck-good with hawq dbg build

    [ https://issues.apache.org/jira/browse/HAWQ-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952756#comment-14952756 ] 

Ruilong Huo edited comment on HAWQ-12 at 10/13/15 6:56 AM:
-----------------------------------------------------------

After the evaluation, the memory consumption of the query is reasonable:
1) The "internal statistics" and "external monitoring" memory usage of the query matches.
2) There is no outstanding operator consumes excessive memory.

Details below:

1. From "internal" perspective, the explain analyze result shows that:
1) The memory quota for the query is 2G.
2) There is one segment for the query, and it has 6 slices. The memory consumption of all slices is about 115564K bytes (about 112.8M bytes).
{noformat}
slice0   Executor memory:  712K bytes.
slice1   Executor memory:  11980K bytes (seg0:localhost.localdomain)
slice2   Executor memory:  11980K bytes (seg0:localhost.localdomain)
slice3   Executor memory:  11980K bytes (seg0:localhost.localdomain)
slice4   Executor memory:  15733K bytes (seg0:localhost.localdomain)
slice5   Executor memory:  63179K bytes (seg0:localhost.localdomain)
--------------------------------------------------------------------
Query    Executor memory: 115564K bytes (112.8M bytes)
{noformat}
3) There is no outstanding operator consumes excessive memory.

For details, please refer to attached parquet_compression_explain_analyze.out and parquet_compression_explain_analyze.gif

2. From "external" perspective, if we monitoring the rough memory usage during the query execution, we can see:
{noformat}
       Memory usage                          Available memory on OS during query execution
Run 1: 317936K bytes   (310.5M bytes)        Min: 3943444K bytes Max: 4261380K bytes
Run 2: 241860K bytes   (236.2M bytes)        Min: 4062304K bytes Max: 4304164K bytes
{noformat}

3. There is about 124M ~ 198M gap between "internal" and "external" view of the query's memory consumption. The gap comes from:
1) There is 1 QD and 6 QE. Each of them consumes 12M+ bytes. In total, it takes about 100M bytes.
2) The memory consumption of some library function (i.e., strcoll) is not covered by memory monitoring. The reason is that these functions bypass gp_malloc and call malloc/free directly.


was (Author: huor):
The explain analyze result shows that it uses about 2G memory. For details, please refer to attached parquet_compression_explain_analyze.out and parquet_compression_explain_analyze.gif

> "Cannot allocate memory" in parquet_compression test in installcheck-good with hawq dbg build
> ---------------------------------------------------------------------------------------------
>
>                 Key: HAWQ-12
>                 URL: https://issues.apache.org/jira/browse/HAWQ-12
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: Storage
>         Environment: Red Hat Enterprise Linux Server release 5.5 (Tikanga)
> Linux pbld3 2.6.18-194.el5 #1 SMP Tue Mar 16 21:52:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
>            Reporter: Ruilong Huo
>            Assignee: Ruilong Huo
>         Attachments: parquet_compression_explain_analyze.gif, parquet_compression_explain_analyze.out
>
>
> When running installcheck-good with hawq dbg build on a Linux box (RHEL 5.5, 12G Memory, Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz with 4 processors), the parquet_compression test fails with "Cannot allocate memory" from time to time.
> Initial investigation shows that strcoll fails to allocate memory to complete string comparison with locale considered during outer join of two partitioned parquet tables with gzip compression.
> We need to understand: 1) the amount of memory used by outer join query and conclude if it is expected; 2) fix the oom if there are issues either with memory leak or with memory protection/enforcement.
> {noformat}
> 2015-09-25 00:31:22.852771 PDT,"gpadmin","regression",p9703,th-1437302464,"127.0.0.1","39230",2015-09-25 00:31:16 PDT,4502,con368,cmd50,seg-1,,,x4502,sx1,"ERROR","XX000","Unable to compare strings.  Error: Cannot allocate memory.  First string has length 1145620 and value (limited to 100 characters): 'large data value for text data typelarge data value for text data typelarge data value for text data'.  Second string has length 1145620 and value (limited to 100 characters): 'large data value for text data typelarge data value for text data typelarge data value for text data' (string_wrapper.h:58)  (seg0 pbld3:23011 pid=9715) (dispatcher.c:1681)",,,,,,"select count(*) from parquet_gzip_part c1 full outer join parquet_gzip_part_unc c2 on c1.p1=c2.p1 and c1.document=c2.document and c1.vch1=c2.vch1 and c1.bta1=c2.bta1 and c1.bitv1=c2.bitv1;",0,,"dispatcher.c",1681,"Stack trace:
> 1    0x9de185 postgres errstart (elog.c:473)
> 2    0xb856f2 postgres <symbol not found> (dispatcher.c:1679)
> 3    0xb84c45 postgres dispatch_catch_error (dispatcher.c:1342)
> 4    0x7384e0 postgres mppExecutorCleanup (execUtils.c:2267)
> 5    0x718b21 postgres ExecutorRun (execMain.c:1230)
> 6    0x900648 postgres <symbol not found> (pquery.c:1642)
> 7    0x900225 postgres PortalRun (pquery.c:1466)
> 8    0x8f6276 postgres <symbol not found> (postgres.c:1728)
> 9    0x8faec8 postgres PostgresMain (postgres.c:4693)
> 10   0x89db5a postgres <symbol not found> (postmaster.c:5846)
> 11   0x89cfe4 postgres <symbol not found> (postmaster.c:5438)
> 12   0x897702 postgres <symbol not found> (postmaster.c:2146)
> 13   0x8967d8 postgres PostmasterMain (postmaster.c:1432)
> 14   0x7b095e postgres main (main.c:226)
> 15   0x336e21d994 libc.so.6 __libc_start_main (??:0)
> 16   0x4b9109 postgres <symbol not found> (??:0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)