You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/01/15 01:24:34 UTC

[GitHub] gianm commented on issue #6849: [Proposal] Consolidated segment metadata management

gianm commented on issue #6849: [Proposal] Consolidated segment metadata management
URL: https://github.com/apache/incubator-druid/issues/6849#issuecomment-454232720
 
 
   This is probably a good idea. The descriptors aren't used for anything besides the `insert-segment-to-db` and DataSegmentFinders. But those are flawed anyway: they assume that any segment in deep storage is valid. It's not the case, since segments can get pushed to deep storage but _not_ published for a variety of reasons (mostly related to tasks running partially and then dying). That's not the only way that the descriptor.json in deep storage and the payload in the metadata store can get out of sync: it also happens if you do segment moves (the MoveTask) or if you manually edit the metadata store for some reason (like for a deep storage migration).
   
   A question: how much of the `descriptor.json` could be re-created from the segment path & index.zip, if needed? I'm wondering about a disaster recovery scenario: let's say you _did_ lose your metadata store and you wanted to try to recover whatever metadata you could from deep storage. How much could you get back?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org