You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/04/15 17:46:00 UTC

[jira] [Commented] (DRILL-2362) Drill should manage Query Profiling archiving

    [ https://issues.apache.org/jira/browse/DRILL-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16818213#comment-16818213 ] 

ASF GitHub Bot commented on DRILL-2362:
---------------------------------------

kkhatua commented on pull request #1750: DRILL-2362: Profile Mgmt
URL: https://github.com/apache/drill/pull/1750
 
 
   This PR is a WIP for managing a large number of profiles. It involves the following features.
   
   1. Write profiles to indexed partitions (created on the fly, and default being organized in nested directories by year, month and date).
   2. Read chronologically from the above partitioned dirs. This improves performance by scanning and retrieving only from the most recent profiles
   3. Leverage Guava Cache by saving on cost of deserializing a profile multiple times from the disk. (Even 1 attempt at rendering a profile leads to atleast 2 times deserialization).
   4. Infer which partitioned dir has a profile based on queryId alone. This means that rather than scanning all the directories, we reverse engineer the query ID to figure out the approximate start time of the query to narrow down on the profile's location.
   5. Trace Exception [qId: 259432dc-7f8e-8fc5-af69-16a1ca817689 ] -> This is a sample bad profile and make the UI more robust in handling bad profiles that cant be deserialized
   6. Auto Index for 1st time (In batches of 10000) from root dir (sync if Distributed). Using ZK, synchronization is maintained when multiple Drillbits are sharing the same profile location
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Drill should manage Query Profiling archiving
> ---------------------------------------------
>
>                 Key: DRILL-2362
>                 URL: https://issues.apache.org/jira/browse/DRILL-2362
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Storage - Other
>    Affects Versions: 0.7.0
>            Reporter: Chris Westin
>            Assignee: Kunal Khatua
>            Priority: Major
>             Fix For: 1.17.0
>
>
> We collect query profile information for analysis purposes, but we keep it forever. At this time, for a few queries, it isn't a problem. But as users start putting Drill into production, automated use via other applications will make this grow quickly. We need to come up with a retention policy mechanism, with suitable settings administrators can use, and implement it so that this data can be cleaned up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)