You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2023/01/10 12:50:21 UTC

[GitHub] [doris] freemandealer opened a new pull request, #15785: [fix](compaction) segcompaction coredump if the rowset starts with a …

freemandealer opened a new pull request, #15785:
URL: https://github.com/apache/doris/pull/15785

   …big segment (#14174)
   
   Signed-off-by: freemandealer <fr...@gmail.com>
   
   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem summary
   
   Check will fail because _segid_statistics_map.find(_num_segcompacted) == _segid_statistics_map.end().
   Here the check is ensuring _segid_statistics_map has no existing entry indexed by _num_segcompacted.
   
   When will _segid_statistics_map add an entry? The answer is:
   
   after a segment is flushed, or
   after segcompacting, or
   after renaming the big segments which need not segcompact.
   We note a segment that has never been played by segcompaction as 'raw_seg'. When these raw segments are compacted, their records will be erased from _segid_statistics_map and 'new_seg' (compacted results) will be added to the map as a replacement.
   
   For example:
   
   For 7 raw segments 'oooOOoo' ('O' for the big segment while 'o' for the small), we break it into four parts: 1) ooo , 2)O, 3)O, 4)oo.
   Group 1 will be compacted to form 'new_seg_1-3', and raw_seg_1, raw_seg_2, raw_seg_3 are wiped out.
   Group 2 will be renamed from 'raw_seg_4' to 'new_seg_2' and add it to the map.
   Group 3 will be renamed from 'raw_seg_5' to 'new_seg_3' and add it to the map.
   Group 4 will be compacted to form 'new_seg_6-7', and raw_seg_6, raw_seg_7 are wiped out.
   Finally, we rename 'new_seg_1-3' to 'new_seg_1' and 'new_seg_6-7' to 'new_seg_4'. So we end up having new_seg_1, new_seg_2, new_seg_3, and new_seg_4.
   
   But for those who start with one or more big segments, the problem happens.
   Take 'OOoooo' as an example. We break them into 3 groups: 1) O, 2) O, 3) oooo.
   Group 1 will be renamed from 'raw_seg_1' to 'new_seg_1' and add it to the map. Coz it is the first segment that is big, filenames get lined up -- src filename & dst filename are the same (ignore the raw/new sign that are only used to distinguish in this comment).
   
   The case should be carefully handled. We do not need to actually rename it but we should count it as handled. If we miss counting (increase _num_segcompacted), the following group 2 will still want to be renamed as 'new_seg_1', but 'new_seg_1' is already in the map, causing the check to fail at last.
   
   Describe your changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: 
       - [ ] Yes
       - [ ] No
       - [ ] I don't know
   2. Has unit tests been added:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   3. Has document been added or modified:
       - [ ] Yes
       - [ ] No
       - [ ] No Need
   4. Does it need to update dependencies:
       - [ ] Yes
       - [ ] No
   5. Are there any changes that cannot be rolled back:
       - [ ] Yes (If Yes, please explain WHY)
       - [ ] No
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15785: [fix](compaction) segcompaction coredump if the rowset starts with a …

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15785:
URL: https://github.com/apache/doris/pull/15785#issuecomment-1377215759

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] github-actions[bot] commented on pull request #15785: [fix](compaction) segcompaction coredump if the rowset starts with a …

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #15785:
URL: https://github.com/apache/doris/pull/15785#issuecomment-1377215727

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [doris] dataroaring merged pull request #15785: [fix](compaction) segcompaction coredump if the rowset starts with a …

Posted by GitBox <gi...@apache.org>.
dataroaring merged PR #15785:
URL: https://github.com/apache/doris/pull/15785


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org