You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/05/25 05:14:07 UTC

[GitHub] [druid] pjain1 opened a new issue #11297: Using deep storage as intermediate store for shuffle tasks

pjain1 opened a new issue #11297:
URL: https://github.com/apache/druid/issues/11297

Using deep storage as intermediate store for shuffle tasks

### Description
If autoscaling for MM is enabled then MM which generated the intermediate index might not be available as it may have been scaled down. So it is a good idea to have an option to use deep storage for intermediate data.

### Changes

#### For pushing partial segments
`ShuffleDataSegmentPusher` uses `IntermediaryDataManager`. It can be converted to an interface having methods -
1. `long addSegment(String supervisorTaskId, String subTaskId, DataSegment segment, URI segmentLocation)`
2. `Optional<ByteSource> findPartitionFile(String supervisorTaskId, String subTaskId, Interval interval, int bucketId)`
3. `void deletePartitions(String supervisorTaskId)`

Default implementation of `IntermediaryDataManager` can be `LocalIntermediaryDataManager` which manages partial segments locally on MM. Optional implementation can be added via extensions to support different deep storages or other places.

#### For pulling partial segments
`ShuffleClient` is already interfaced having default implementation of `HttpShuffleClient`, so just need to implement ones for other storage. Interface method need to be changed to `File fetchSegmentFile(URI partitionDir, String supervisorTaskId, P location)`. Might need to check if different implementation of `PartitionLocation` is also needed.

### Motivation
To make shuffle work with MM auto scaling.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] jihoonson commented on issue #11297: Using deep storage as intermediate store for shuffle tasks

Posted by GitBox <gi...@apache.org>.

jihoonson commented on issue #11297:
URL: https://github.com/apache/druid/issues/11297#issuecomment-847542636


   @pjain1 thanks for the proposal! I like the idea. One question I have is, what are you thinking of auto cleanup for the intermediate data that is no longer necessary? For example, how will you clean up intermediate data if a parallel task fails during shuffle? The task might not be able to clean up before it exits. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] pjain1 edited a comment on issue #11297: Using deep storage as intermediate store for shuffle tasks

Posted by GitBox <gi...@apache.org>.

pjain1 edited a comment on issue #11297:
URL: https://github.com/apache/druid/issues/11297#issuecomment-847585532


   Hmm, I see that `IntermediaryDataManger` will be started at middleManager and Indexer as it is injected in `ShuffleResource`. However, I also see that `IntermediaryDataManger` and `ShuffleDataSegmentPusher` are used in `Appenderator` code to push segments which runs in `peon` process so `IntermediaryDataManger` will be running the clean up code at peon process also ? This confused me. 
   > The simplest idea I can think of is having an auto cleanup process in the coordinator (or the overlord?).
   
   This sounds like a good idea, it can be a standalone coordinator duty.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] pjain1 commented on issue #11297: Using deep storage as intermediate store for shuffle tasks

Posted by GitBox <gi...@apache.org>.

pjain1 commented on issue #11297:
URL: https://github.com/apache/druid/issues/11297#issuecomment-847547882


   @jihoonson since you have implemented this feature originally it would be good to hear any ideas you have.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] jihoonson commented on issue #11297: Using deep storage as intermediate store for shuffle tasks

Posted by GitBox <gi...@apache.org>.

jihoonson commented on issue #11297:
URL: https://github.com/apache/druid/issues/11297#issuecomment-879273572


   >  For the first version, I wonder if it would be ok to suggest the Druid Operator to set up a separate Lifecycle policy rule on the objects in the intermediary path to auto-expire objects for the boundary cases.
   
   @nishantmonu51 I agree that auto cleanup can be developed separately from storing intermediary data in deep storage. Also using auto-expiry seems OK to me as long as it is documented clearly how to set the retention and what could be the impact when it is set wrong. The problem of using auto-expiry is it will be hard to set the retention correctly since the retention should be longer than your longest job which depends on input data and the resource available while the job is running. Perhaps it should be marked as experimental (or alpha could be a better term) until the auto cleanup is developed in druid.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] zachjsh closed issue #11297: Using deep storage as intermediate store for shuffle tasks

Posted by GitBox <gi...@apache.org>.

zachjsh closed issue #11297:
URL: https://github.com/apache/druid/issues/11297


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] jihoonson commented on issue #11297: Using deep storage as intermediate store for shuffle tasks

Posted by GitBox <gi...@apache.org>.

jihoonson commented on issue #11297:
URL: https://github.com/apache/druid/issues/11297#issuecomment-847567539


   `IntermediaryDataManger` currently runs on middleManagers and indexers. Do you mean making them to be able to discover and clean up shuffle data in deep storage? I'm not sure how they can work in parallel. Can you elaborate more on your idea? 
   
   > @jihoonson since you have implemented this feature originally it would be good to hear any ideas you have.
   
   The simplest idea I can think of is having an auto cleanup process in the coordinator (or the overlord?). This single process periodically scans deep storage and cleans up all shuffle data that is no longer necessary to keep. This can be expensive, but it may be OK since it's a periodic process that can have enough time between runs. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] pjain1 commented on issue #11297: Using deep storage as intermediate store for shuffle tasks

Posted by GitBox <gi...@apache.org>.

pjain1 commented on issue #11297:
URL: https://github.com/apache/druid/issues/11297#issuecomment-847585532


   Hmm, I see that `IntermediaryDataManger` will be started at middleManager and Indexer as it is injected in `ShuffleResource`. However, I also see that `IntermediaryDataManger` and `ShuffleDataSegmentPusher` are used in `Appenderator` code to push segments which runs in `peon` process so `IntermediaryDataManger` will be running the clean up code at peon process also ? This confused me. 
   > The simplest idea I can think of is having an auto cleanup process in the coordinator (or the overlord?).
   This sounds like a good idea, it can be a standalone coordinator duty.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] suneet-s commented on issue #11297: Using deep storage as intermediate store for shuffle tasks

Posted by GitBox <gi...@apache.org>.

suneet-s commented on issue #11297:
URL: https://github.com/apache/druid/issues/11297#issuecomment-887041292


   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] pjain1 edited a comment on issue #11297: Using deep storage as intermediate store for shuffle tasks

Posted by GitBox <gi...@apache.org>.

pjain1 edited a comment on issue #11297:
URL: https://github.com/apache/druid/issues/11297#issuecomment-847547238


   @jihoonson A similar method to `deleteExpiredSuprevisorTaskPartitionsIfNotRunning` in `IntermediaryDataManager` can be implemented for other managers as well. If we want to enforce this for all managers then we can move probably make `IntermediaryDataManager` abstract have a default implementation in start method where we use an executor to call this method periodically. This method can be moved to the abstract class so that it is always implemented. Does this make sense ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] pjain1 removed a comment on issue #11297: Using deep storage as intermediate store for shuffle tasks

Posted by GitBox <gi...@apache.org>.

pjain1 removed a comment on issue #11297:
URL: https://github.com/apache/druid/issues/11297#issuecomment-847581686


   Hmm, I see that `IntermediaryDataManger` will be started at middleManager and Indexer as it is injected in `ShuffleResource`. However, I also see that `IntermediaryDataManger` and `ShuffleDataSegmentPusher` are used in `Appenderator` code to push segments which runs in `peon` process
    are used in `Appenderator` code to push segments which runs in `peon` process not middleManager process. So I had the opinion that expired expired supervisor data is cleaned up by peons when they run tasks as 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] maytasm commented on issue #11297: Using deep storage as intermediate store for shuffle tasks

Posted by GitBox <gi...@apache.org>.

maytasm commented on issue #11297:
URL: https://github.com/apache/druid/issues/11297#issuecomment-886969584


   This proposal sounds good to me. I took a stab at it at https://github.com/apache/druid/pull/11492


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] pjain1 commented on issue #11297: Using deep storage as intermediate store for shuffle tasks

Posted by GitBox <gi...@apache.org>.

pjain1 commented on issue #11297:
URL: https://github.com/apache/druid/issues/11297#issuecomment-847581686


   Hmm, I see that `IntermediaryDataManger` will be started at middleManager and Indexer as it is injected in `ShuffleResource`. However, I also see that `IntermediaryDataManger` and `ShuffleDataSegmentPusher` are used in `Appenderator` code to push segments which runs in `peon` process
    are used in `Appenderator` code to push segments which runs in `peon` process not middleManager process. So I had the opinion that expired expired supervisor data is cleaned up by peons when they run tasks as 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] pjain1 commented on issue #11297: Using deep storage as intermediate store for shuffle tasks

Posted by GitBox <gi...@apache.org>.

pjain1 commented on issue #11297:
URL: https://github.com/apache/druid/issues/11297#issuecomment-847547238


   @jihoonson A similar method to `deleteExpiredSuprevisorTaskPartitionsIfNotRunning` in `IntermediaryDataManager` can be implemented for other managers as well. If we want to enforce this for all managers then we can move probably make `IntermediaryDataManager` abstract have a default implementation in start method where we use an executor to call this method periodically. This method can be moved to the abstract class so that it is always implemented.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] nishantmonu51 commented on issue #11297: Using deep storage as intermediate store for shuffle tasks

Posted by GitBox <gi...@apache.org>.

nishantmonu51 commented on issue #11297:
URL: https://github.com/apache/druid/issues/11297#issuecomment-879120931


   @jihoonson @pjain1 : Although it would be nice to implement the cleanup of the deep storage files. 
   For the first version, I wonder if it would be ok to suggest the Druid Operator to set up a separate Lifecycle policy rule on the objects in the intermediary path to auto-expire objects for the boundary cases. 
   e.g. 
   
   - https://docs.amazonaws.cn/en_us/AmazonS3/latest/userguide/lifecycle-expire-general-considerations.html
   - https://cloud.google.com/storage/docs/lifecycle


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org