You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/01/24 19:17:24 UTC

[GitHub] [pinot] walterddr opened a new issue #8064: Adding batch operation APIs for PinotFS

walterddr opened a new issue #8064:
URL: https://github.com/apache/pinot/issues/8064


   Currently PinotFS doesn't support batch operation API extensions easily. 
   
   For some usage such as SegmentDeletionManager. once has to iterate over all files checking for deletion and then do the actual delete. but for many of the cloud FS, there's more efficient batch APIs to reduce significantly on the remote request overheads. 
   
   Propose to
   1. add `delete` / `copy` / `move` `(List<URI> segmentUris, ...)` API to its single URI variant.
   2. make default implementation for them to fall back to looping each individually. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #8064: Adding batch operation APIs for PinotFS

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #8064:
URL: https://github.com/apache/pinot/issues/8064#issuecomment-1021486361


   @mcvsubbu Which config are you referring to?
   I find a config for the retention days before deleting the segment from the deleted segment dir, but didn't find one to skip moving the segment to the deleted segment dir. In `SegmentDeletionManager.removeSegmentFromStore()`, segment is always moved.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #8064: Adding batch operation APIs for PinotFS

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #8064:
URL: https://github.com/apache/pinot/issues/8064#issuecomment-1021441130


   Discussed offline, and another issue for remote FS is that `move` might not be an cheap operation (e.g. S3 move is actually copy + delete). To solve this, we should add a config to allow directly deleting the segment without moving it to a different directory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on issue #8064: Adding batch operation APIs for PinotFS

Posted by GitBox <gi...@apache.org>.
walterddr commented on issue #8064:
URL: https://github.com/apache/pinot/issues/8064#issuecomment-1022578578


   > This raises the question of how the state of the deletion operation is maintained, currently it is managed by the client. Some databases do the wrong thing and let you delete as many rows as possible before a timeout is reached because they treat this as a stateless operation.
   
   I dont think it is currently "managed" by the client. if some segments were not deleted (moved), it simply just logs a warning and move on to the next segment, it doesn't even return it back to the client. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on issue #8064: Adding batch operation APIs for PinotFS

Posted by GitBox <gi...@apache.org>.
richardstartin commented on issue #8064:
URL: https://github.com/apache/pinot/issues/8064#issuecomment-1021554989


   > for this issue it is more the multiple-round-trip per segment when we operate on the entire table.
   > 
   > Even if the deleteSegment does the deletion only without the move. on a large enough table it will still take 10s of minutes to complete the deletion since it requires pinot to issue `delete(URI segmentUri)` sequentially segment after segment instead of leveraging the underlying PinotFS impl, which might have a much more efficient way to batch operator on a list of segment URIs
   
   This raises the question of how the state of the deletion operation is maintained, currently it is managed by the client. Some databases do the wrong thing and let you delete as many rows as possible before a timeout is reached because they treat this as a stateless operation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mcvsubbu commented on issue #8064: Adding batch operation APIs for PinotFS

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on issue #8064:
URL: https://github.com/apache/pinot/issues/8064#issuecomment-1021451477


   > 
   
   Do you mean not saving the deleted segments? I believe we already have a config there. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on issue #8064: Adding batch operation APIs for PinotFS

Posted by GitBox <gi...@apache.org>.
walterddr commented on issue #8064:
URL: https://github.com/apache/pinot/issues/8064#issuecomment-1021529488


   for this issue it is more the multiple-round-trip per segment when we operate on the entire table. 
   
   Even if the deleteSegment does the deletion only without the move. on a large enough table it will still take 10s of minutes to complete the deletion since it requires pinot to issue `delete(URI segmentUri)` sequentially segment after segment instead of leveraging the underlying PinotFS impl, which might have a much more efficient way to batch operator on a list of segment URIs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org