You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/03/22 06:28:38 UTC

[GitHub] [incubator-druid] lxqfy opened a new issue #7324: Automatic segment compaction segment size not matching targetCompactionSizeBytes

lxqfy opened a new issue #7324: Automatic segment compaction segment size not matching targetCompactionSizeBytes
URL: https://github.com/apache/incubator-druid/issues/7324
 
 
   Automatic segment compaction segment size not matching targetCompactionSizeBytes
   ### Affected Version
   0.13.0-incubationg
   The Druid version where the problem was encountered.
   
   ### Description
   I am trying to use the "Automatic segment compaction". My auto-compaction config is as follows:
   {
     "dataSource": ds,
     "inputSegmentSizeBytes":524288000,
     "targetCompactionSizeBytes": 524288000,
     "skipOffsetFromLatest": "PT3H",
     "keepSegmentGranularity": false
   }
   
   However, after the compact task finished, I can see that the segment(shard) size is around 130+ MB. Those segments will also be involved in the next round compaction tasks and result in the same-size segments as before. Infinite loop.
   
   For example, the compaction task tries to compact 2 segment shards with targetCompactionSizeBytes=500mb:
   2019-03-06T04:00:00.000Z/2019-03-08T01:00:00.000Z_1 (130mb)
   2019-03-06T04:00:00.000Z/2019-03-08T01:00:00.000Z_2 (130mb)
   After the compaction task, those 2 segments shards size not compacted to 1 shard, they remain pretty much the same size, just with a new version. And the coordinator will try to compact those segment shard again and again without actually compact to single shard of 260mb.
   
   After some investigation, I found that:
   
   The compaction task will generate an internal index task. The targetPartionSize is calculated by targetCompactionSizeBytes and avgRowsPerByte. 
   
   Estimated targetPartitionSize[%d] = avgRowsPerByte[%f] * targetCompactionSizeBytes[%d]
   
   There is another configuration, maxTotalRows for index task, the default value is 20000000. If targetPartionSize is larger than maxTotalRows, it won't work as expected.
   
   The workaround is set the maxTotalRows to a larger value. By default, users won't have any idea about this, maxTotalRows should be overwritten.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org