You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Gabor Arki (Jira)" <ji...@apache.org> on 2021/10/04 13:09:00 UTC

[jira] [Comment Edited] (KYLIN-4500) Timeout waiting for connection from pool

    [ https://issues.apache.org/jira/browse/KYLIN-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423927#comment-17423927 ] 

Gabor Arki edited comment on KYLIN-4500 at 10/4/21, 1:08 PM:
-------------------------------------------------------------

This has happened to our production environment today, now with Kylin 3.1.0 running on EMR 5.28. Restarting the query server released the connections again and resolved the issue.


was (Author: arkigabor):
This has happened to our production environment today, now with Kylin 3.1.0 running on ERM 5.28. Restarting the query server released the connections again and resolved the issue.

> Timeout waiting for connection from pool
> ----------------------------------------
>
>                 Key: KYLIN-4500
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4500
>             Project: Kylin
>          Issue Type: Bug
>    Affects Versions: v3.0.0, v3.1.0
>            Reporter: Gabor Arki
>            Priority: Major
>         Attachments: kylin-connection-timeout.txt
>
>
> h4. Environment
>  * Kylin server 3.0.0
>  * EMR 5.28
> h4. Issue
> After an extended uptime, both Kylin query server and jobs running on EMR stop working. The root cause in both cases is:
> {noformat}
> Caused by: java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
>         at com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) ~[emrfs-hadoop-assembly-2.37.0.jar:?]{noformat}
> Based on [https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/] increasing the fs.s3.maxConnections setting to 10000 is just delaying the issue thus the underlying issue is likely a connection leak. It also indicates a leak that restarting the kylin service solves the problem.
> A full stack trace from the QueryService is attached.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)