You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/12/11 22:44:00 UTC

[jira] [Commented] (DRILL-6879) Indicate a warning in the WebUI when a query makes little to no progress for a while

    [ https://issues.apache.org/jira/browse/DRILL-6879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16718117#comment-16718117 ] 

ASF GitHub Bot commented on DRILL-6879:
---------------------------------------

kkhatua opened a new pull request #1572: DRILL-6879: Show warnings for potential performance issues
URL: https://github.com/apache/drill/pull/1572
 
 
   1. Introduced warning for non-progressive fragments. Based on a threshold (`drill.exec.http.profile.warning.progress.threshold`), if all fragments have not made progress within that time, a warning is issued. The default is 5 minutes (300sec)
   
   2. Introduced a warning if any of the buffered operators spill to disk.
   
   3. Introduced a warning for operators where the longest running fragment runs beyond a minimum threshold (`drill.exec.http.profile.warning.time.skew.min`), and runs atleast 2 times longer than the average (`drill.exec.http.profile.warning.time.skew.ratio.process`). The _clock_ symbol with a tooltip indicates the extent of the skew. A similar comparison is made for the wait time of a fragment, but with a max wait time exceeding the average by a separate ratio (`drill.exec.http.profile.warning.time.skew.ratio.wait`)
   
   4. Introduced a warning for operators where the average wait time of a scan operator exceeds its processing time, for a minimum threshold (`drill.exec.http.profile.warning.scan.wait.min`). The _turtle_ symbol with a tooltip indicates which scan operator spent more time waiting than processing.
   
   5. `TableBuilder` class refactored
    a. Using attribute map instead of String arguments, eg. for 'title'
    b. Removed APIs that pass a hyperlink since that is never used.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Indicate a warning in the WebUI when a query makes little to no progress for a while
> ------------------------------------------------------------------------------------
>
>                 Key: DRILL-6879
>                 URL: https://issues.apache.org/jira/browse/DRILL-6879
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Monitoring, Web Server
>    Affects Versions: 1.14.0
>            Reporter: Kunal Khatua
>            Assignee: Kunal Khatua
>            Priority: Major
>              Labels: user-experience
>             Fix For: 1.16.0
>
>         Attachments: image-2018-12-04-11-54-54-247.png, image-2018-12-06-11-19-00-339.png, image-2018-12-06-11-27-14-719.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When running a very large query on a cluster with limited resource, we noticed that one of the node's VM thread freezes the fragment threads as it tries to do some work (GC perhaps?). This is a clear indication that the query is stuck in a weird state where it might not recover from.
>  Under such circumstances, it makes sense to cancel or atleast warn the user on that page of the query exceeding a certain threshold. 
>  For detecting this, the user will find that the {{Last Progress}} column in the Fragments Overview section will show large times.
> !image-2018-12-04-11-54-54-247.png|width=969,height=336!
> In addition, there are instances where a query might have buffered operators spilling to disk, which also hits performance (and, subsequently, longer run times). Calling out this skew can be very useful.
> !image-2018-12-06-11-27-14-719.png|width=969,height=256!  
> Or there might be cases where a single fragment takes much longer than the average (indicated by an extreme skew in the Gantt chart).
> !image-2018-12-06-11-19-00-339.png|width=969,height=150!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)