You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Deneche A. Hakim (JIRA)" <ji...@apache.org> on 2016/01/26 04:41:39 UTC

[jira] [Commented] (DRILL-4310) Memory leak in hash partition sender when query is cancelled

    [ https://issues.apache.org/jira/browse/DRILL-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116609#comment-15116609 ] 

Deneche A. Hakim commented on DRILL-4310:
-----------------------------------------

Looking at the Foreman's log (133) it seems that the query failed because the RPC connection between the foreman node and the client timed out, this is what caused the remaining fragments to be cancelled:
{noformat}
2016-01-26 00:45:16,276 [UserServer-1] INFO  o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.133:31010 <--> /10.10.88.133:59875 (user client) timed out.  Timeout was set to 30 seconds. Closing connection.
2016-01-26 00:45:16,278 [UserServer-1] INFO  o.a.d.e.w.fragment.FragmentExecutor - 295940a8-3662-16ba-4c63-b28acb67e0a6:0:0: State change requested FAILED --> FAILED
2016-01-26 00:45:16,279 [UserServer-1] INFO  o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.133:31010 <--> /10.10.88.133:59882 (user client) timed out.  Timeout was set to 30 seconds. Closing connection.
2016-01-26 00:45:16,280 [UserServer-1] INFO  o.a.d.e.w.fragment.FragmentExecutor - 295940a8-3662-16ba-4c63-b28acb67e0a6:0:0: State change requested FAILED --> FAILED
2016-01-26 00:45:16,338 [UserServer-1] INFO  o.a.drill.exec.rpc.user.UserServer - RPC connection /10.10.88.133:31010 <--> /10.10.88.133:59885 (user client) timed out.  Timeout was set to 30 seconds. Closing connection.
2016-01-26 00:45:16,340 [UserServer-1] INFO  o.a.d.e.w.fragment.FragmentExecutor - 295940a8-3662-16ba-4c63-b28acb67e0a6:0:0: State change requested FAILED --> FAILED
{noformat}

> Memory leak in hash partition sender when query is cancelled
> ------------------------------------------------------------
>
>                 Key: DRILL-4310
>                 URL: https://issues.apache.org/jira/browse/DRILL-4310
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 0.5.0
>            Reporter: Victoria Markman
>         Attachments: 29593ea8-88d2-612e-7c58-aa11652c4072.sys.drill, drillbit.log.133, drillbit.log.134, drillbit.log.135, drillbit.log.136
>
>
> Query got cancelled (still investigating what caused cancellation).
> Here is an excerpt from drillbit.log
> {code}
> 2016-01-26 00:46:29,627 [29593ea8-88d2-612e-7c58-aa11652c4072:frag:2:2] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: Allocator[op:2:2:0:HashPartitionSender] closed with outstanding buffers allocated (4).
> Allocator(op:2:2:0:HashPartitionSender) 1000000/10240/2140160/10000000000 (res/actual/peak/limit)
>   child allocators: 0
>   ledgers: 4
>     ledger[10892635] allocator: op:2:2:0:HashPartitionSender), isOwning: true, size: 4096, references: 1, life: 23697371310917183..0, allocatorManager: [7140397, life: 23697371310913697..0] holds 1 buffers.
>         DrillBuf[13122380], udle: [7140398 0..4096]
>     ledger[10892636] allocator: op:2:2:0:HashPartitionSender), isOwning: true, size: 1024, references: 1, life: 23697371311045504..0, allocatorManager: [7140398, life: 23697371311041789..0] holds 1 buffers.
>         DrillBuf[13122381], udle: [7140399 0..1024]
>     ledger[10892634] allocator: op:2:2:0:HashPartitionSender), isOwning: true, size: 4096, references: 1, life: 23697371310795164..0, allocatorManager: [7140396, life: 23697371310789988..0] holds 1 buffers.
>         DrillBuf[13122379], udle: [7140397 0..4096]
>     ledger[10892513] allocator: op:2:2:0:HashPartitionSender), isOwning: true, size: 1024, references: 1, life: 23697371288488073..0, allocatorManager: [7140275, life: 23697371288484282..0] holds 1 buffers.
>         DrillBuf[13122245], udle: [7140276 0..1024]
>   reservations: 0
> Fragment 2:2
> [Error Id: 043c5d25-c4de-4a70-9cb1-d4987822ee3b on atsqa4-134.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: IllegalStateException: Allocator[op:2:2:0:HashPartitionSender] closed with outstanding buffers allocated (4).
> Allocator(op:2:2:0:HashPartitionSender) 1000000/10240/2140160/10000000000 (res/actual/peak/limit)
>   child allocators: 0
>   ledgers: 4
>     ledger[10892635] allocator: op:2:2:0:HashPartitionSender), isOwning: true, size: 4096, references: 1, life: 23697371310917183..0, allocatorManager: [7140397, life: 23697371310913697..0] holds 1 buffers.
>         DrillBuf[13122380], udle: [7140398 0..4096]
>     ledger[10892636] allocator: op:2:2:0:HashPartitionSender), isOwning: true, size: 1024, references: 1, life: 23697371311045504..0, allocatorManager: [7140398, life: 23697371311041789..0] holds 1 buffers.
>         DrillBuf[13122381], udle: [7140399 0..1024]
>     ledger[10892634] allocator: op:2:2:0:HashPartitionSender), isOwning: true, size: 4096, references: 1, life: 23697371310795164..0, allocatorManager: [7140396, life: 23697371310789988..0] holds 1 buffers.
>         DrillBuf[13122379], udle: [7140397 0..4096]
>     ledger[10892513] allocator: op:2:2:0:HashPartitionSender), isOwning: true, size: 1024, references: 1, life: 23697371288488073..0, allocatorManager: [7140275, life: 23697371288484282..0] holds 1 buffers.
>         DrillBuf[13122245], udle: [7140276 0..1024]
>   reservations: 0
> {code}
> Reproduced twice by running: ./run.sh -s Advanced/tpcds/tpcds_sf100/original -g smoke -t 600 -n 10 -i 100 -m
> Cluster configuration: vanilla, 48GB of memory, 4GB heap.
> Attaching query profile and logs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)