You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "nlippis (via GitHub)" <gi...@apache.org> on 2023/03/06 15:14:47 UTC

[GitHub] [druid] nlippis commented on pull request #13880: Use base task dir in kubernetes task runner

nlippis commented on PR #13880:
URL: https://github.com/apache/druid/pull/13880#issuecomment-1456322496

   > I don't understand. is there any material impact this change will have? For sake of consistency, we prefer k8s task runner use the same interface as other runners.
   
   I see the benefits of this change to be twofold.
   
   **Reduce Confusion**
   As a user, I don't know what the behavior of specifying multiple base task dir paths will be without reading the code within the context of the `KubernetesTaskRunner`.  Given that only one of the directories will be used per task, this seems like odd behavior.
   
   As a druid contributor, I would be confused why this concept exists within the `KubernetesTaskRunner` since every task runs within its own container so there is no benefit from using it here.
   
   **Reduce state that needs to be tracked by the KubernetesTaskRunner**
   After this PR I will be introducing a series of changes that allows the user to specify a specific Druid task <-> K8s Job adapter as well as a K8s pod template file based adapter.  In this change, the only information passed to the adapter is the task itself.  If we were to continue to use the `TaskStorageDirTracker` then we would need to pass more state information into the adapter.  While implementing that change would be trivial, there is no reason to do so since there is only base task dir used per task.
   
   Alternatively we could add a new concept to the `TaskStorageDirTracker` that just returns the first directory and doesn't track allocated directories per task, however IMHO that would make the purpose of the class unclear.
   
   Another alternative would be to introduce a new interface and have two different implementations (one that tracks directories and another that returns the base task dir) however this seemed to be overkill.
   
   In the end I decided to leave task dir management policy up to the implementation of the `TaskRunner` which IMHO is simplest for the Druid user and contributor to understand and use.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org