You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/06/19 03:02:44 UTC

[GitHub] [incubator-druid] himanshug commented on issue #7919: disable all compression in intermediate segment persists while ingestion

himanshug commented on issue #7919: disable all compression in intermediate segment persists while ingestion
URL: https://github.com/apache/incubator-druid/pull/7919#issuecomment-503387212
 
 
   @clintropolis all the PRs you linked are independent improvements on indexing, index merging, compression, freeing buffers as soon as possible etc which are great and will happen at some point.
   This PR is an immediate solution to address the problem. So, I am glad that you agree that having different IndexSpec for intermediate persisted segments makes sense.
   
   Now the real question that affects this PR is whether to change default IndexSpec for intermediate persisted segments or not. 
   
   From code perspective it is fairy trivial to retain existing default behavior and I can make that change if that is what most people desire, however here is my rationale for changing the default.
   
   My observation on different clusters have been that current indexing task process peak memory usage is much higher compared to average utilization during ingestion due to merge process at the time of publish. Due to that, users plan for task process to have the "peak" memory available to it throughout its lifetime.
   When compression is disabled on intermediate segments, then average memory utilization would increase (more page cache used) but overall peak memory usage would decrease due to no decompression buffers allocated at time of merge. Also, queries would run faster because data is stored uncompressed.
   All said, you are right as above assumptions would not hold on some clusters due to specifics of the datasets that these configurations depend upon , there is no single choice that is good for everyone (or else we wouldn't have those configs :) )
   If we keep existing behavior as default then I'm afraid there would be very few cluster operators who will use the config introduced here to disable compression on intermediate segments. OTOH , with changed default behavior it would be improve things in most cases and where not, they can use the config to get back older behavior.
   Or maybe I am totally wrong and problems would show up on different test clusters upgrading to RCs and we will re-instate the old behavior as default in a follow-up PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org