You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/03/24 03:45:00 UTC

[jira] [Commented] (KYLIN-4432) duplicated queries with sytax error take unexpect long time when lazy query enabled

    [ https://issues.apache.org/jira/browse/KYLIN-4432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065293#comment-17065293 ] 

ASF GitHub Bot commented on KYLIN-4432:
---------------------------------------

xiacongling commented on pull request #1167: KYLIN-4432 duplicated queries with sytax error take unexpect long time when lazy query enabled
URL: https://github.com/apache/kylin/pull/1167
 
 
   ## Proposed changes
   This PR fix the bug described in [KYLIN-4432](https://issues.apache.org/jira/browse/KYLIN-4432) that duplicated queries with sytax error take unexpect long time when lazy query enabled. 
   
   ## Types of changes
   
   What types of changes does your code introduce to Kylin?
   _Put an `x` in the boxes that apply_
   
   - [x] Bugfix (non-breaking change which fixes an issue)
   - [ ] New feature (non-breaking change which adds functionality)
   - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
   - [ ] Documentation Update (if none of the other choices apply)
   
   ## Checklist
   
   _Put an `x` in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code._
   
   - [x] I have create an issue on [Kylin's jira](https://issues.apache.org/jira/browse/KYLIN), and have described the bug/feature there in detail
   - [x] Commit messages in my PR start with the related jira ID, like "KYLIN-0000 Make Kylin project open-source"
   - [x] Compiling and unit tests pass locally with my changes
   - [x] I have added tests that prove my fix is effective or that my feature works
   - [x] If this change need a document change, I will prepare another pr against the `document` branch
   - [x] Any dependent changes have been merged
   
   ## Further comments
   N/A
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> duplicated queries with sytax error take unexpect long time when lazy query enabled
> -----------------------------------------------------------------------------------
>
>                 Key: KYLIN-4432
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4432
>             Project: Kylin
>          Issue Type: Bug
>            Reporter: Congling Xia
>            Priority: Major
>
> Hi! Our Kylin server becomes unresponsive recently. All query threads were being TIMED_WAITING so that no more thread in the thread pool of Tomcat could response new requests. The server logs are full of "Duplicate SQL request"s, and many requests retrying for so long time:
> {code:java}
> $ cat kylin.log.25 | grep 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991 | head -3
> 2020-03-23 12:15:32,279 INFO [Query 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991] service.QueryService:571 : Duplicated SQL request is running, waiting...
> 2020-03-23 12:15:32,379 INFO [Query 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991] service.QueryService:571 : Duplicated SQL request is running, waiting...
> 2020-03-23 12:15:32,479 INFO [Query 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991] service.QueryService:571 : Duplicated SQL request is running, waiting...
> $ cat kylin.log.25 | grep 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991 | tail -3
> 2020-03-23 12:26:56,920 INFO [Query 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991] service.QueryService:571 : Duplicated SQL request is running, waiting...
> 2020-03-23 12:26:57,020 INFO [Query 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991] service.QueryService:571 : Duplicated SQL request is running, waiting...
> 2020-03-23 12:26:57,121 INFO [Query 14db5fa0-78e0-4ca6-5d04-92995b2b8367-3133991] service.QueryService:571 : Duplicated SQL request is running, waiting...
> {code}
> This can be re-produced easily:
>  * enable lazy query (KYLIN-2897) by setting environment variable kylin.query.cache-enabled=true and kylin.query.lazy-query-enabled=true
>  * send the same query with syntax error many times in several minutes
> The first query will quickly response with error message, and the following queries will executing for long time and will repeatly output "Duplicated SQL request is running, waiting...".
> The code in org.apache.kylin.rest.service.QueryService#searchQueryInCache indicates duplicated query will wait at most one minute with default setting of LazyQueryWaitingTimeoutMilliSeconds but in fact it waits quite longer than that.
> {code:java}
> while (response.isRunning()) {
>     // Wait at most one minute
>     if (System.currentTimeMillis() - response.getLazyQueryStartTime() >= getConfig()
>             .getLazyQueryWaitingTimeoutMilliSeconds()) {
>         cache.evict(sqlRequest.getCacheKey());
>         return null;
>     }
>     logger.info("Duplicated SQL request is running, waiting...");
>     try {
>         Thread.sleep(100L);
>     } catch (InterruptedException e) {
>     }
>     wrapper = cache.get(sqlRequest.getCacheKey());
>     if (wrapper == null) {
>         return null;
>     }
>     response = (SQLResponse) wrapper.get();
>     if (response == null) {
>         return null;
>     }
> }
> {code}
> After some work-around, I find that the dummy response status remains unchanged after execution with exception. So the 2nd query will wait LazyQueryWaitingTimeoutMilliSeconds, evict the dummy response created by the 1st query from cache, and put a new dummy response into the cache. And the 3rd query may use the new dummy response's start time to compute timeout, and will wait for another LazyQueryWaitingTimeoutMilliSeconds.
> Suppose we send _n_ identical bad queries to the server quickly. The last query may wait for nearly _(n-1)_ times of LazyQueryWaitingTimeoutMilliSeconds. That's why the Web container's thread pool was exhausted – user program uses bad queries to check availability of Kylin service and retries many times in a short time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)