You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/05/19 02:10:30 UTC

[GitHub] [druid] jasonk000 opened a new pull request #11272: perf: improve concurrency and improve perf for task query in HeapMemoryTaskStorage

jasonk000 opened a new pull request #11272:
URL: https://github.com/apache/druid/pull/11272


   This PR is easiest viewed in split! (imo).
   
   Fixes #11140 (
   the task view only).
   
   ### Description
   
   See #11140.
   
   This PR unlocks concurrency and improves performance of the `HeapMemoryTaskStorage` task storage component.
   
   The patch introduces two key changes:
   1. Replace `giant.lock` with (a) a ConcurrentHashMap for the task list queries and (b) synchronized blocks for remaining task audit log and task locks.
   2. Simplify algorithm for getting tasks (as used by overlord /tasks API) so that tasks are filtered _before_ they are sorted, and limited _before_ they are materialized. As screenshots in #11140 show this results in big improvements on our services.
   
   ### Design decisions
   
   - I did note that both task locks and task audits are (mostly) distinct APIs, sharing only the task ID and the task removal function. It is possible to also make these lock-free, however in the interest of minimizing surface area of change I opted not to do this.
   - Part of the reason for this is that it is possible to create audit logs and lock entries for tasks that do not exist in the tasks list. I am not sure if this is intended so I opted to retain existing behaviour. Alternatively, if this is not intended, and locks / logs can only exist for previously created tasks, then I would bring the locks & logs into the ConcurrentHashMap which would allow elimination of all locking.
   - I did look at creating a ReaderWriterLock to replace the existing giant lock, however in our internal reviews the code was actually less clear to read than a simpler lock-free implementation.
   
   - I opted for a slight reorder of `getLocks` and `removeTasksOlderThan` since it keeps their related locks map codepaths closer in the file. I could have left this but I find it easier to keep the same synchronized sections on the same page.
   
   ### Checklist
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
         - see below!
   - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [x] been tested in a test Druid cluster.
   
   
   ### Key questions
   
   - Would appreciate input on testing scope
       - what/ specific additional concurrency tests would be desired. OverlordTest?
       - It seems like from the `diff-test-coverage` report that I'd be well served by adding tests to cover items such as whether the comparator works or filter/map rules that were not previously covered. I can add these if desired. Where do you think the most value is?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 merged pull request #11272: perf: improve concurrency and improve perf for task query in HeapMemoryTaskStorage

Posted by GitBox <gi...@apache.org>.
FrankChen021 merged pull request #11272:
URL: https://github.com/apache/druid/pull/11272


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 commented on pull request #11272: perf: improve concurrency and improve perf for task query in HeapMemoryTaskStorage

Posted by GitBox <gi...@apache.org>.
jasonk000 commented on pull request #11272:
URL: https://github.com/apache/druid/pull/11272#issuecomment-850036315


   Hi @nishantmonu51 , I'm not sure how to take this forward. It looks like everything passed except for `perfect rollup parallel batch index integration test`, but I do not think this is related to task storage. Could you point me to best plan of attack? (I could perform a git empty-commit and retrigger to retry?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org