You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by LittleCho <li...@littlecho.tw> on 2017/11/07 14:21:29 UTC

Question about how Drill optimizes the queries and splits the loads in HDFS cluster?

Hello Sir,

   I have been studying installing Drill on data nodes within a hadoop
   cluster. According to the Drill's online document, we can install
   Drill on each datanode of hdfs. And then we can change the connection
   setting in file storage plugin to hdfs's namenode to finish the set
   up. And here comes my question, as we know a file will be split into
   several blocks based on the setting, so is the query will be split
   and assigned to each drill instance on each datanode? I would like to
   know how more about how Drill works in distributed mode with hdfs
   cluster. Thank you!!

-- 
BR, LittleCho

Re: Question about how Drill optimizes the queries and splits the loads in HDFS cluster?

Posted by Chun Chang <cc...@mapr.com>.
Yes, data locality is considered in deciding which drillbit gets to work on what.

________________________________
From: LittleCho <li...@littlecho.tw>
Sent: Tuesday, November 7, 2017 6:21:29 AM
To: user@drill.apache.org
Subject: Question about how Drill optimizes the queries and splits the loads in HDFS cluster?

Hello Sir,

   I have been studying installing Drill on data nodes within a hadoop
   cluster. According to the Drill's online document, we can install
   Drill on each datanode of hdfs. And then we can change the connection
   setting in file storage plugin to hdfs's namenode to finish the set
   up. And here comes my question, as we know a file will be split into
   several blocks based on the setting, so is the query will be split
   and assigned to each drill instance on each datanode? I would like to
   know how more about how Drill works in distributed mode with hdfs
   cluster. Thank you!!

--
BR, LittleCho