You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/01/04 00:29:07 UTC

[GitHub] jihoonson commented on issue #6803: [Proposal] Kill Hadoop MR task on kill of ingestion task and resume ability for Hadoop ingestion tasks

jihoonson commented on issue #6803: [Proposal] Kill Hadoop MR task on kill of ingestion task and resume ability for Hadoop ingestion tasks
URL: https://github.com/apache/incubator-druid/issues/6803#issuecomment-451321623

Thanks @ankit0811. Sounds useful!

For phase 1, I think we need a unified way to support various platforms like Hadoop, Spark, and so on. So, I would suggest to change the way killing Druid tasks. Currently, the overlord sends a kill request to a middleManager where the task is running. Then, the middleManager just destroys the task process. As a result, the task can't have a chance to prepare stopping like cleaning up its resources or killing Hadoop jobs. Instead, I think the task can clean up resources before stop by changing how the middleManager kills the task. This way makes more sense to me because the Hadoop job is started and killed in the same place (Druid Hadoop task).

Fortunately, we have some logics already implemented. First, there's `stopGracefully` in `Task`.

```java
/**
* Asks a task to arrange for its "run" method to exit promptly. This method will only be called if
* {@link #canRestore()} returns true. Tasks that take too long to stop gracefully will be terminated with
* extreme prejudice.
*/
void stopGracefully();
```

This is currently only for restorable tasks, so you may want to make the hadoop task restorable (maybe related to phase 2?). `stopGracefully` method is currently called in `SingleTaskBackgroundRunner.stop()` which in turn is called when the output stream of the task process is closed (see `ForkingTaskRunner.stop()` and `ExecutorLifecycle.start()`). So, it would work if you make hadoop task restorable and change the way to kill to closing the output stream of the task process instead of destroying the task process directly. What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org