You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/09/27 17:49:03 UTC

[GitHub] [druid] averma111 opened a new issue #10442: Why druid expects data files to be available on all data servers

averma111 opened a new issue #10442:
URL: https://github.com/apache/druid/issues/10442


   Druid 0.18.1.
   
   Hi Team,
   
   I have a general question. I have cluster setup with 1 master 2 data and 1 query node. I wanted to know why I need place a csv files on both data nodes to be read by Druid for data ingestion?
   
   1.Does Druid reads the data from both files?
   2.Does only 1 data nodes reads data from file?
   
   
   Thanks,
   Ashish


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] FrankChen021 commented on issue #10442: Why druid expects data files to be available on all data servers

Posted by GitBox <gi...@apache.org>.

FrankChen021 commented on issue #10442:
URL: https://github.com/apache/druid/issues/10442#issuecomment-700382915


   There're some strategies used by overlord nodes to select a worker to execute a task, see **[Worker Select Strategy](https://druid.apache.org/docs/latest/configuration/index.html#overlord-dynamic-configuration)**  here. The default strategy is `Equal Distribution` which means tasks are load balanced among middle managers.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] averma111 commented on issue #10442: Why druid expects data files to be available on all data servers

Posted by GitBox <gi...@apache.org>.

averma111 commented on issue #10442:
URL: https://github.com/apache/druid/issues/10442#issuecomment-700147906


   @FrankChen021 --- but in the cluster mode we have datanode having both (historical as well as middlemanger) running , thats the reason I wrote data node.
   Also tell me if I have 2 middle managers which one will be used(decided by zookeeper as LB)?
   
   
   Thanks,
   Ashish


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] FrankChen021 commented on issue #10442: Why druid expects data files to be available on all data servers

Posted by GitBox <gi...@apache.org>.

FrankChen021 commented on issue #10442:
URL: https://github.com/apache/druid/issues/10442#issuecomment-699920028


   I don't think the files should be placed on a query or a historical node. But both middle manager and coordinator/overlord node are needed. 
   
   Because there's a sampling processing when create a task from the web console, and sampling is executed on coordinator/overlord node. Once the ingestion task is submitted, it will be scheduled on any middle manager nodes which means we have to place the files on all middle manager nodes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] averma111 commented on issue #10442: Why druid expects data files to be available on all data servers

Posted by GitBox <gi...@apache.org>.

averma111 commented on issue #10442:
URL: https://github.com/apache/druid/issues/10442#issuecomment-701123090


   @FrankChen021  Thank you for the prompt response , I got what I wanted to know


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org