You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Khurram Faraaz (JIRA)" <ji...@apache.org> on 2017/06/07 20:56:18 UTC

[jira] [Updated] (DRILL-5576) OutOfMemoryException when some CPU cores are taken offline while concurrent queries are under execution

     [ https://issues.apache.org/jira/browse/DRILL-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Khurram Faraaz updated DRILL-5576:
----------------------------------
    Description: 
When we reduce the number of available CPU cores while concurrent queries are under execution we see an OOM.

Drill 1.11.0 commit ID: d11aba2
three node CentOS 6.8 cluster
On each of the nodes Drill's direct memory was set to
export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"16G"}

There are 24 cores on the node where foreman Drillbit is under execution.
{noformat}
[root@centos-01 logs]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0,2,4,5,8,9,12,14,15,18,20,22
Off-line CPU(s) list:  1,3,6,7,10,11,13,16,17,19,21,23
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 44
Model name:            Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz
Stepping:              2
CPU MHz:               1600.000
BogoMIPS:              4799.86
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              12288K
NUMA node0 CPU(s):     0,2,4,5,12,14,15
NUMA node1 CPU(s):     8,9,18,20,22
{noformat}

Java code snippet that creates threads and executes TPC-DS query 11 concurrently
{noformat}
        ExecutorService executor = Executors.newFixedThreadPool(48);
        try {
            for (int i = 1; i <= 48; i++) {
                executor.submit(new ConcurrentQuery(conn));
            }
        } catch (Exception e) {
            System.out.println(e.getMessage());
            e.printStackTrace();
        }
{noformat}

While the TPC-DS Query 11 is under execution using above program, we take half of the available CPU cores offline
{noformat}
[root@centos-01 ~]# sh turnCPUCoresOffline.sh
OFFLINE cores are :
1,3,6-7,10-11,13,16-17,19,21,23
ONLINE cores are :
0,2,4-5,8-9,12,14-15,18,20,22
{noformat}

The result is we see an OutOfMemoryException, drillbit.log files are attached.

stacktrace from drillbit.log, looks like it originates at AsyncPageReader.java:437
{noformat}
2017-06-07 20:13:42,450 [scan-1] INFO  o.a.d.e.s.p.c.AsyncPageReader - User Error Occurred: Exception occurred while reading from disk. (Unable to allocate buffer of size 524288 (rounded from 404674) due to memory limit. Current allocation: 4784176)
org.apache.drill.common.exceptions.UserException: DATA_READ ERROR: Exception occurred while reading from disk.

File:  /drill/testdata/tpcds_sf1/parquet/customer/0_0_0.parquet
Column:  c_birth_country
Row Group Start:  4638444

[Error Id: 9684da4b-c601-4ae4-a66d-4253ab42035f ]
        at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544) ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.handleAndThrowException(AsyncPageReader.java:199) [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.access$600(AsyncPageReader.java:81) [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:483) [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:392) [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_65]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_65]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_65]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate buffer of size 524288 (rounded from 404674) due to memory limit. Current allocation: 4784176
        at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:231) ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:206) ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at org.apache.drill.exec.util.filereader.DirectBufInputStream.getNext(DirectBufInputStream.java:108) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:437) [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
        ... 5 common frames omitted
{noformat}

  was:
When we reduce the number of available CPU cores while concurrent queries are under execution we see an OOM.

Drill 1.11.0 commit ID: d11aba2
three node CentOS 6.8 cluster
On each of the nodes Drill's direct memory was set to
export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"16G"}

There are 24 cores on the node where foreman Drillbit is under execution.
{noformat}
[root@centos-01 logs]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0,2,4,5,8,9,12,14,15,18,20,22
Off-line CPU(s) list:  1,3,6,7,10,11,13,16,17,19,21,23
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 44
Model name:            Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz
Stepping:              2
CPU MHz:               1600.000
BogoMIPS:              4799.86
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              12288K
NUMA node0 CPU(s):     0,2,4,5,12,14,15
NUMA node1 CPU(s):     8,9,18,20,22
{noformat}

Java code snippet that creates threads and executes TPC-DS query 11 concurrently
{noformat}
        ExecutorService executor = Executors.newFixedThreadPool(48);
        try {
            for (int i = 1; i <= 48; i++) {
                executor.submit(new ConcurrentQuery(conn));
            }
        } catch (Exception e) {
            System.out.println(e.getMessage());
            e.printStackTrace();
        }
{noformat}

While the TPC-DS Query 11 is under execution using above program, we take half of the available CPU cores offline
{noformat}
[root@centos-01 ~]# sh turnCPUCoresOffline.sh
OFFLINE cores are :
1,3,6-7,10-11,13,16-17,19,21,23
ONLINE cores are :
0,2,4-5,8-9,12,14-15,18,20,22
{noformat}

The result is we see an OutOfMemoryException, drillbit.log files are attached.


> OutOfMemoryException when some CPU cores are taken offline while concurrent queries are under execution
> -------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-5576
>                 URL: https://issues.apache.org/jira/browse/DRILL-5576
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.11.0
>         Environment: 3 nodes CentOS cluster
>            Reporter: Khurram Faraaz
>
> When we reduce the number of available CPU cores while concurrent queries are under execution we see an OOM.
> Drill 1.11.0 commit ID: d11aba2
> three node CentOS 6.8 cluster
> On each of the nodes Drill's direct memory was set to
> export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"16G"}
> There are 24 cores on the node where foreman Drillbit is under execution.
> {noformat}
> [root@centos-01 logs]# lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                24
> On-line CPU(s) list:   0,2,4,5,8,9,12,14,15,18,20,22
> Off-line CPU(s) list:  1,3,6,7,10,11,13,16,17,19,21,23
> Thread(s) per core:    1
> Core(s) per socket:    4
> Socket(s):             2
> NUMA node(s):          2
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 44
> Model name:            Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz
> Stepping:              2
> CPU MHz:               1600.000
> BogoMIPS:              4799.86
> Virtualization:        VT-x
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              256K
> L3 cache:              12288K
> NUMA node0 CPU(s):     0,2,4,5,12,14,15
> NUMA node1 CPU(s):     8,9,18,20,22
> {noformat}
> Java code snippet that creates threads and executes TPC-DS query 11 concurrently
> {noformat}
>         ExecutorService executor = Executors.newFixedThreadPool(48);
>         try {
>             for (int i = 1; i <= 48; i++) {
>                 executor.submit(new ConcurrentQuery(conn));
>             }
>         } catch (Exception e) {
>             System.out.println(e.getMessage());
>             e.printStackTrace();
>         }
> {noformat}
> While the TPC-DS Query 11 is under execution using above program, we take half of the available CPU cores offline
> {noformat}
> [root@centos-01 ~]# sh turnCPUCoresOffline.sh
> OFFLINE cores are :
> 1,3,6-7,10-11,13,16-17,19,21,23
> ONLINE cores are :
> 0,2,4-5,8-9,12,14-15,18,20,22
> {noformat}
> The result is we see an OutOfMemoryException, drillbit.log files are attached.
> stacktrace from drillbit.log, looks like it originates at AsyncPageReader.java:437
> {noformat}
> 2017-06-07 20:13:42,450 [scan-1] INFO  o.a.d.e.s.p.c.AsyncPageReader - User Error Occurred: Exception occurred while reading from disk. (Unable to allocate buffer of size 524288 (rounded from 404674) due to memory limit. Current allocation: 4784176)
> org.apache.drill.common.exceptions.UserException: DATA_READ ERROR: Exception occurred while reading from disk.
> File:  /drill/testdata/tpcds_sf1/parquet/customer/0_0_0.parquet
> Column:  c_birth_country
> Row Group Start:  4638444
> [Error Id: 9684da4b-c601-4ae4-a66d-4253ab42035f ]
>         at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544) ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.handleAndThrowException(AsyncPageReader.java:199) [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.access$600(AsyncPageReader.java:81) [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:483) [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:392) [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_65]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_65]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_65]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate buffer of size 524288 (rounded from 404674) due to memory limit. Current allocation: 4784176
>         at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:231) ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:206) ~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.util.filereader.DirectBufInputStream.getNext(DirectBufInputStream.java:108) ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:437) [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         ... 5 common frames omitted
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)