You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/04/24 18:05:30 UTC

[GitHub] [druid] jihoonson opened a new issue #9768: Allow minor compaction for non-consecutive segments

jihoonson opened a new issue #9768:
URL: https://github.com/apache/druid/issues/9768


   Each segment has a partitionId which uniquely identifies the segment in a time chunk. Currently, you cannot compact segments with minor compaction (which uses the segment lock) if their partitionIds are not consecutive. For example, you cannot compact the segments of the partitionIds 0, 1, 10 together because the partitionId 1 and the partitionId 10 are not consecutive. 
   
   This is an expected limitation of the minor compaction by its design. (It is for reducing memory footprint. See https://github.com/apache/druid/issues/7491 for more details.) However, in practice, it would be nice if the minor compaction can compact non-consecutive segments. This will be nice especially if there are some expected but transient task failures in streaming ingestion because those task failures can cause the non-consecutive segment IDs.
   
   The minor compaction can support this if it's guaranteed that no new segments have the partitionId which falls in the overlapping root partition range of the existing segments.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on issue #9768: Allow minor compaction for non-consecutive segments

Posted by GitBox <gi...@apache.org>.
jihoonson commented on issue #9768:
URL: https://github.com/apache/druid/issues/9768#issuecomment-904200393


   Assuming that we will keep the current segment ID allocation protocol that monotonically increases the partition ID on task failures, the problem we want to solve is, given a missing partitionId, how we would know whether the segment of that ID really doesn't exist or it is being created by some other task. One way to do is modifying the compaction task to as below.
   
   1) When some missing partitionIds are found, the compaction task tries to lock them using the regular locking mechanism.
   2) If the locking succeeds, the compaction task can safely assume that those partitionIds will never be used since there is no ingestion task creating segments of those partitionIds. In this case, the compaction task can simply ignore those missing partitionIds and compact the given segments all together.
   3) If the locking fails, there should be some ingestion task creating the segments of those partitionIds. In this case, the compaction task can split the input segments into multiple groups where each group has only consecutive partitionIds, and compact each group separately. Those segments that are being created by other task can be compacted later using another compaction task.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org