You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/11/26 19:42:00 UTC

[GitHub] [incubator-druid] sascha-coenen edited a comment on issue #8801: KubernetesTaskRunner for running druid tasks as kubernetes jobs

sascha-coenen edited a comment on issue #8801: KubernetesTaskRunner for running druid tasks as kubernetes jobs
URL: https://github.com/apache/incubator-druid/issues/8801#issuecomment-558787375

we are also very interested in this kind of stuff and spent quite some time investigating good approaches.

With regard to the current limitations of scaling down in Kubernetes:
* the respective SIG (special interest group) was considering different approaches for how to handle this in future K8s versions. The current idea seems to be that there will be a property named "evictionCost" within a pod's status object. This will help feeding back preferences about which pod of a deployment to kill for a scale-down operation to kubernetes
* the current workaround that is known to the community which is the least hacky indeed seems to be to use Kubernetes Jobs.

With regard to the autoscaling mechanism in Druid to be tied to AWS EC2:
* we wrote an autoscaler for Druid which delegates the execution of provision/terminate/ip2idLookup/id2ipLookup to an external REST endpoint. This is more lightweight than having the platform-specific code sitting inside of a Druid extension which makes it slow/difficult to make quick adjustments. We are willing to contribute this.
* we are currently working towards another contribution that improves the autoscaling mechanism in Druid to support different tiers of middlemanagers. ( #8695 )
This feature proposal will soon receive a code contribution and we will also update the description of the proposal. It seems to me that in general, one should either make sure that the approach in the scope of this proposal should either be aligned with the existing concept of the autoscaling extension-point in Druid or the current autoscaling mechanism in Druid should be clearly marked as deprecated so that the community can align well on working on future-safe contributions.

Open points I did not think too much about yet:
* how would volume management work in the case of many autoscaled peon/indexer pods
* how to design this approach such that it integrates well with the peer project that works towards a Druid operator for k8s
* how to handle task failover in cases Kubernetes relocates pods to free up physical nodes. Such compactions happen and statefulsets/deployments deal well with such planned operational disruptions via PodDisruptionBudgets and via the internal controller logic of deployments/statefulsets. I don't recall the specifics of k8s Jobs in this context, but it seems that one needs to think explicitly about how to handle such cases. Jobs don't get restarted indefinately often in response to a pod relocation and task failover would need to be supported for native batch tasks such that such planned disruptions would not make a whole multi-task batch ingestion task fail.

By the way: would it make sense if this proposal were to target only Druid's new ingestion pipeline consisting of the Druid indexer and native ingestion tasks such as index_parallel etc. or would it be required to support different modes like middlemanager mode vs. indexer mode.
I would certainly hope that the new native indexing mechanism would be supported along with the new indexer. Not sure though whether it is necessary to also support the middlemanager/peon lane as well. What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org