You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "cbalci (via GitHub)" <gi...@apache.org> on 2024/02/14 21:17:52 UTC

[I] Rename "Deleted Segments" feature to "Archived Segments" [pinot]

cbalci opened a new issue, #12424:
URL: https://github.com/apache/pinot/issues/12424

   Proposing a somewhat cosmetic change and some related small features.
   
   Pinot already has a mechanism for keeping 'deleted' segments around for a configurable period under `<deep-store>/Deleted_segments` directory.
   
   I'm proposing to rename this feature from **Deleted_Segments** to **Archived_Segments** and provide more configuration options:
   - Retention period (Already exists) 
   - Archival location (not hard coded to '<dir>/Deleted_segments')
   - Archival format (Capability to use different storage formats, e.g. psf, parquet,  etc.)
   - ...(Open to more ideas)
   
   This would make the archival feature more useful for recovery purposes. Users can specify external locations with different durability and quota properties. With customized archival formats, users can optimize based on their needs, data size, query-ability etc.
   
   I'm willing to get the implementation going if I can get some alignment on the idea. Please let me know what you think.
   
   cc @chenboat @ankitsultana @Jackie-Jiang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Rename "Deleted Segments" feature to "Archived Segments" [pinot]

Posted by "hpvd (via GitHub)" <gi...@apache.org>.
hpvd commented on issue #12424:
URL: https://github.com/apache/pinot/issues/12424#issuecomment-1946196006

   regarding "Archival format":
   is archiving a one way thing or should there also be a "restore" option which then needs to be able to deal with the different formats (and all these conversion needs to be lossless)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Rename "Deleted Segments" feature to "Archived Segments" [pinot]

Posted by "hpvd (via GitHub)" <gi...@apache.org>.
hpvd commented on issue #12424:
URL: https://github.com/apache/pinot/issues/12424#issuecomment-1946208585

   from functional pov:
   is there an intersection between "archive" and the mass export topic:
   https://github.com/apache/pinot/issues/12315
   
   e.g. is "archive" an "export with delete (move)" ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] Rename "Deleted Segments" feature to "Archived Segments" [pinot]

Posted by "cbalci (via GitHub)" <gi...@apache.org>.
cbalci commented on issue #12424:
URL: https://github.com/apache/pinot/issues/12424#issuecomment-1946957794

   > regarding "Archival format":
   > is archiving a one way thing or should there also be a "restore" option which then needs to be able to deal with the different formats (and all these conversion needs to be lossless)?
   
   Good question. Of course any restore functionality needs to recognize the archival format. We can add one for standard Pinot format, but other ones (say Parquet) can be left as an exercise to the user. In the end there are multitudes of ways to read parquet and generate Pinot segments.
   
   > is there an intersection between "archive" and the "mass export" #12315
   
   I'm not entirely clear on the proposal there but I suppose there is at least one common theme: offloading unnecessary data from Pinot servers. 
   In this case (archival), there is no aggregation or filtering other than retention. I imagine if you archive in a suitable format such as parquet, you can make older segments accessible through another query engine such as Trino. You may even find a way to merge data from archived and non-archived segments, again, using Trino.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org