You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@jena.apache.org by "Andy Seaborne (Jira)" <ji...@apache.org> on 2022/06/08 07:36:00 UTC

[jira] [Comment Edited] (JENA-2328) Query timeouts failing when plan phase is long

    [ https://issues.apache.org/jira/browse/JENA-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17551440#comment-17551440 ] 

Andy Seaborne edited comment on JENA-2328 at 6/8/22 7:35 AM:
-------------------------------------------------------------

[~der] I'd be grateful if you or colleagues could try this out in your environment.

Your test case worked for me. The new mechanism is covered by existing tests.

(The test isn't included directly because it will be unstable on loaded CI where pauses of a thread for several seconds can occur. That makes detecting why a timeout occurred somewhat tricky and also the JUnit timeout may go off first - they don't necessarily go off "in order".)

 


was (Author: andy.seaborne):
[~der] I'd be grateful if you or colleagues could try this out in your environment.

Your test case worked for me. The new mechanism is covered by existing tests.

(The test included directly because it will be unstable on loaded CI where pauses of several seconds can occur making detecting why a timeout occurred somewhat tricky because the JUnit timeout may go off first.)

 

 

> Query timeouts failing when plan phase is long
> ----------------------------------------------
>
>                 Key: JENA-2328
>                 URL: https://issues.apache.org/jira/browse/JENA-2328
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>            Reporter: Dave Reynolds
>            Assignee: Andy Seaborne
>            Priority: Major
>             Fix For: Jena 4.6.0
>
>         Attachments: TestQueryExecutionTimeout3.java
>
>
> In a production service with a large TDB store (around 500MT) we find that some complex queries evade the query timeouts (set to 90s first result, 120s total) and then run for hours soaking up all available CPU cores. While the queries show no clear pattern, and it has been hard replicate in a controlled setting, we do now have one example which is expressible as a test case. See attached.
> The behaviour is that the abort() call from the alarm timeout is received by QueryExecDataset before there is an iterator to cancel - the QueryExecDataset instance is deep in getPlan() which itself executes part of the query. In the specific example it's OpSlice which is iterating through the offset while still in the planning phase. Though not  queries which cause this sort of behaviour use offsets.
> Sorry but have no PR to offer at this stage. Have looked at whether it's possible to have getPlan() return some future or deferrable plan so that the top level exec has a handle on something that it can abort. However, the changes looks far reaching and I don't yet have a satisfactory approach to offer.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: jira-unsubscribe@jena.apache.org
For additional commands, e-mail: jira-help@jena.apache.org