You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/05/04 20:43:00 UTC

[jira] [Commented] (DRILL-5270) Improve loading of profiles listing in the WebUI

    [ https://issues.apache.org/jira/browse/DRILL-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464369#comment-16464369 ] 

ASF GitHub Bot commented on DRILL-5270:
---------------------------------------

kkhatua opened a new pull request #1250: DRILL-5270: Improve loading of profiles listing in the WebUI
URL: https://github.com/apache/drill/pull/1250
 
 
   When Drill is displaying profiles stored on the file system (Local or Distributed), it does so by loading the entire list of `.sys.drill` files in the profile directory, sorting and deserializing. This can get expensive, since only a single CPU thread does this.
   As an example, a directory of 120K profiles, the time to just fetch the list of files alone is over 6 seconds. After that, based on the number of profiles being rendered, the time varies. An average of 30ms is needed to deserialize a standard profile, which translates to an additional 3sec for therendering of default 100 profiles.
   
   A user reported issue confirms just that:
   DRILL-5028 Opening profiles page from web ui gets very slow when a lot of history files have been stored in HDFS or Local FS
   
   Additional JIRAs filed ask for managing these profiles
   DRILL-2362 Drill should manage Query Profiling archiving
   DRILL-2861 enhance drill profile file management
   
   This PR brings the following enhancements to achieve that:
   1. Mimick the In-memory persistence of profiles (DRILL-5481), by keeping only a predefined `max-capacity` number of profiles in the directory and moving the oldest to an 'archived' sub-directory.
   2. Improve loading times by pinning the deserialized list in memory (TreeSet; for maintaining a memory-efficient sortedness of the profiles). That way, if we do not detect any new profiles in the profileStore (i.e. profile directory) since the last time a web-request for rendering the profiles was made, we can re-serve the same listing and skip making a trip to the filesystem to re-fetch all the profiles.
   
   Reload & reconstruction of the profiles in the Tree is done in the event of any of the following states changing:
     i.   Modification Time of profile dir
     ii.  Number of profiles in the profile dir
     iii. Number of profiles requested exceeds existing the currently available list
   
   3. When 2 or more web-requests for rendering arrive, the WebServer code already processes the requests sequentially. As a result, the earliest request will trigger the reconstruction of the in-memory profile-set, and the last-modified timestamp of the profileStore is tracked. This way, the remaining blocked requests can re-use the freshly-reconstructed profile-set for rendering if the underlying profileStore has not been modified. There is an assumption made here that the rate of profiles being added to the profileStore is not too high to trigger a reconstruction for every queued up request. 
   4. To prevent frequent archiving, there is a threshold (max-capacity) defined for triggering the archive. However, the number of profiles archived is selected to ensure that the profiles not archived is 90% of the threshold.
   5. To prevent the archiving process from taking too long, an archival rate (`drill.exec.profiles.store.archive.rate`) is defined so that upto that many number of profiles are archived in one go, before resumption of re-rendering takes place.
   6. On a Distributed FileSystem (e.g. HDFS), multiple Drillbits might attempt to archive. To mitigate that, if a Drillbit detects that it is unable to archive a profile, it will assume that another Drillbit is also archiving, and stop archiving any more.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Improve loading of profiles listing in the WebUI
> ------------------------------------------------
>
>                 Key: DRILL-5270
>                 URL: https://issues.apache.org/jira/browse/DRILL-5270
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Web Server
>    Affects Versions: 1.9.0
>            Reporter: Kunal Khatua
>            Assignee: Kunal Khatua
>            Priority: Major
>             Fix For: 1.14.0
>
>
> Currently, as the number of profiles increase, we reload the same list of profiles from the FS.
> An ideal improvement would be to detect if there are any new profiles and only reload from the disk then. Otherwise, a cached list is sufficient.
> For a directory of 280K profiles, the load time is close to 6 seconds on a 32 core server. With the caching, we can get it down to as much as a few milliseconds.
> To render the cache as invalid, we inspect the last modified time of the directory to confirm whether a reload is needed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)