You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Bridget Bevens (JIRA)" <ji...@apache.org> on 2019/04/18 19:58:00 UTC
[jira] [Updated] (DRILL-6879) Indicate a warning in the WebUI when
a query makes little to no progress for a while
[ https://issues.apache.org/jira/browse/DRILL-6879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bridget Bevens updated DRILL-6879:
----------------------------------
Labels: doc-complete ready-to-commit (was: doc-impacting ready-to-commit)
> Indicate a warning in the WebUI when a query makes little to no progress for a while
> ------------------------------------------------------------------------------------
>
> Key: DRILL-6879
> URL: https://issues.apache.org/jira/browse/DRILL-6879
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Monitoring, Web Server
> Affects Versions: 1.14.0
> Reporter: Kunal Khatua
> Assignee: Kunal Khatua
> Priority: Major
> Labels: doc-complete, ready-to-commit
> Fix For: 1.16.0
>
> Attachments: image-2018-12-04-11-54-54-247.png, image-2018-12-06-11-19-00-339.png, image-2018-12-06-11-27-14-719.png
>
>
> When running a very large query on a cluster with limited resource, we noticed that one of the node's VM thread freezes the fragment threads as it tries to do some work (GC perhaps?). This is a clear indication that the query is stuck in a weird state where it might not recover from.
> Under such circumstances, it makes sense to cancel or atleast warn the user on that page of the query exceeding a certain threshold.
> For detecting this, the user will find that the {{Last Progress}} column in the Fragments Overview section will show large times.
> !image-2018-12-04-11-54-54-247.png|width=969,height=336!
> In addition, there are instances where a query might have buffered operators spilling to disk, which also hits performance (and, subsequently, longer run times). Calling out this skew can be very useful.
> !image-2018-12-06-11-27-14-719.png|width=969,height=256!
> Or there might be cases where a single fragment takes much longer than the average (indicated by an extreme skew in the Gantt chart).
> !image-2018-12-06-11-19-00-339.png|width=969,height=150!
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)