You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by "Knapp, Michael" <Mi...@capitalone.com> on 2017/03/31 18:40:51 UTC

Struggling to run in distributed mode

Hi,

I am struggling to get drill to run in distributed mode.  I stood up two Amazon EC2 instances using amazon’s linux for the OS, on m4.large instances.  I decided to install zookeeper on one of them.  I also have java 8 installed on each of them.  I followed the instructions provided here:
https://drill.apache.org/docs/installing-drill-on-the-cluster/

My drill-override.conf file is the same on each node, it looks like this:

drill.exec: {
  cluster-id: "drillbits1",
  zk.connect: "10.XXX.YYY.ZZZ:2181"
}

The IP address of the zookeeper is the same as one of the drill nodes, because it is running on the same machine.  Unfortunately for me, drillbit.sh fails.

On the machine that is co-located with zookeeper, I get this IOException in the drillbit.out file:
Failure to connect to the zookeeper cluster service within the allotted time of 10000 milliseconds
I have confirmed that ZK is running and I can open a shell to it.

On the machine that is NOT co-located with zookeeper, I get a DrillbitStartupException: Drillbit is disallowed to bind to loopback address in distributed mode.  I looked in the drillbit.log file, it has:
host.name=ip-10-XXX-YYY-ZZZ.****.com.

After reviewing the source code, I realized that Drill lets you override the host name from drill-env.sh.  By setting “DRILL_HOST_NAME” to the IP address, I was able to start the Drill instance that is NOT co-located with Zookeeper.  However, the same solution did not work with the instance that is co-located with zookeeper, it is still failing with a timeout while trying to connect with Zookeeper.

I do not understand why this is not working, would somebody please explain what is happening?
Also, would somebody please update the installation instructions to explain when I need to set the “DRILL_HOST_NAME”?
Last and most importantly, why is my Drill instance unable to connect with Zookeeper on its own machine?

Michael Knapp



________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Struggling to run in distributed mode

Posted by Kunal Khatua <kk...@mapr.com>.
This is the reference I came across for why the DRILL_HOST_NAME was introduced:

https://issues.apache.org/jira/browse/DRILL-4935

It was meant to handle issues in the Docker containers. Perhaps the comment thread might help you figure out why the EC2 instances had the same problem.

________________________________
From: Knapp, Michael <Mi...@capitalone.com>
Sent: Friday, March 31, 2017 1:40:29 PM
To: user@drill.apache.org
Subject: Re: Struggling to run in distributed mode

I was not using localhost.  In fact, I don’t recall any part of the configuration that required me to set my own host.

From my first email: I looked in the drillbit.log file, it has:
        host.name=ip-10-XXX-YYY-ZZZ.****.com.

meaning it was not localhost.


On 3/31/17, 4:36 PM, "Kunal Khatua" <kk...@mapr.com> wrote:

    "DrillbitStartupException: Drillbit is disallowed to bind to loopback address in distributed mode"

    I believe this is because you are using localhost / 127.0.0.1

    The DRILL_HOST_NAME might have been a hack, which would explain why the instructions missed it. You should not have to specify it to begin with. (I know I have never used it).

    As for Telnet failing, it might be possible that the EC2 setup itself is preventing access.. probably as a way to prevent DoS attacks. But, it is clear that if telnet failed... so will a starting up Drillbit.


    -----Original Message-----
    From: Knapp, Michael [mailto:Michael.Knapp@capitalone.com]
    Sent: Friday, March 31, 2017 12:17 PM
    To: user@drill.apache.org
    Subject: Re: Struggling to run in distributed mode

    I was able to get this working.  On the machine that is co-located with Zookeeper, I modified the drill-override.conf file to use “localhost” as the Zookeeper host string.  Now the drill override files are different between the two drill hosts, but apparently this is not a problem.

    Yes I did try applying the DRILL_HOST_NAME in the drill-env.sh file for that instance.  I just tried unsetting it and got the exception “DrillbitStartupException: Drillbit is disallowed to bind to loopback address in distributed mode” again.  It seems that for both of my instances I needed to set that in my drill-env.sh file.

    These machines are on the same subnet, I don’t understand why one machine can connect to the other, but they cannot connect to their own selves.  The machine could ping its own IP, but telnet command said connection refused.

    I think this may have something to do with the Amazon Linux OS I am running, not sure.

    I’m still interested in knowing why I needed to set DRILL_HOST_NAME but the instructions did not mention it?  Also, why cannot my machine telnet to itself?

    Michael Knapp

    On 3/31/17, 3:05 PM, "Kunal Khatua" <kk...@mapr.com> wrote:

        What is the error for the collocated Drillbit instance ? Did you apply the "DRILL_HOST_NAME" to the collocated Drillbit's drill-env.sh file as well?

        One quick test is to try if the ZK port is accessible locally, but not using the loopback IP (i.e. 127.0.0.1 ). Telnet is a good way to verify that you're able to connect.



        -----Original Message-----
        From: Knapp, Michael [mailto:Michael.Knapp@capitalone.com]
        Sent: Friday, March 31, 2017 11:41 AM
        To: user@drill.apache.org
        Subject: Struggling to run in distributed mode

        Hi,

        I am struggling to get drill to run in distributed mode.  I stood up two Amazon EC2 instances using amazon’s linux for the OS, on m4.large instances.  I decided to install zookeeper on one of them.  I also have java 8 installed on each of them.  I followed the instructions provided here:
        https://drill.apache.org/docs/installing-drill-on-the-cluster/

        My drill-override.conf file is the same on each node, it looks like this:

        drill.exec: {
          cluster-id: "drillbits1",
          zk.connect: "10.XXX.YYY.ZZZ:2181"
        }

        The IP address of the zookeeper is the same as one of the drill nodes, because it is running on the same machine.  Unfortunately for me, drillbit.sh fails.

        On the machine that is co-located with zookeeper, I get this IOException in the drillbit.out file:
        Failure to connect to the zookeeper cluster service within the allotted time of 10000 milliseconds I have confirmed that ZK is running and I can open a shell to it.

        On the machine that is NOT co-located with zookeeper, I get a DrillbitStartupException: Drillbit is disallowed to bind to loopback address in distributed mode.  I looked in the drillbit.log file, it has:
        host.name=ip-10-XXX-YYY-ZZZ.****.com.

        After reviewing the source code, I realized that Drill lets you override the host name from drill-env.sh.  By setting “DRILL_HOST_NAME” to the IP address, I was able to start the Drill instance that is NOT co-located with Zookeeper.  However, the same solution did not work with the instance that is co-located with zookeeper, it is still failing with a timeout while trying to connect with Zookeeper.

        I do not understand why this is not working, would somebody please explain what is happening?
        Also, would somebody please update the installation instructions to explain when I need to set the “DRILL_HOST_NAME”?
        Last and most importantly, why is my Drill instance unable to connect with Zookeeper on its own machine?

        Michael Knapp



        ________________________________________________________

        The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.


    ________________________________________________________

    The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.


________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Struggling to run in distributed mode

Posted by Kunal Khatua <kk...@mapr.com>.
Interesting.


Could you share the final configs for the 2 Drillbits? If we can't repro it on a physical cluster, it might be something specific to EC2, for which we should make additions to the instructions.



________________________________
From: Knapp, Michael <Mi...@capitalone.com>
Sent: Friday, March 31, 2017 1:40:29 PM
To: user@drill.apache.org
Subject: Re: Struggling to run in distributed mode

I was not using localhost.  In fact, I don’t recall any part of the configuration that required me to set my own host.

From my first email: I looked in the drillbit.log file, it has:
        host.name=ip-10-XXX-YYY-ZZZ.****.com.

meaning it was not localhost.


On 3/31/17, 4:36 PM, "Kunal Khatua" <kk...@mapr.com> wrote:

    "DrillbitStartupException: Drillbit is disallowed to bind to loopback address in distributed mode"

    I believe this is because you are using localhost / 127.0.0.1

    The DRILL_HOST_NAME might have been a hack, which would explain why the instructions missed it. You should not have to specify it to begin with. (I know I have never used it).

    As for Telnet failing, it might be possible that the EC2 setup itself is preventing access.. probably as a way to prevent DoS attacks. But, it is clear that if telnet failed... so will a starting up Drillbit.


    -----Original Message-----
    From: Knapp, Michael [mailto:Michael.Knapp@capitalone.com]
    Sent: Friday, March 31, 2017 12:17 PM
    To: user@drill.apache.org
    Subject: Re: Struggling to run in distributed mode

    I was able to get this working.  On the machine that is co-located with Zookeeper, I modified the drill-override.conf file to use “localhost” as the Zookeeper host string.  Now the drill override files are different between the two drill hosts, but apparently this is not a problem.

    Yes I did try applying the DRILL_HOST_NAME in the drill-env.sh file for that instance.  I just tried unsetting it and got the exception “DrillbitStartupException: Drillbit is disallowed to bind to loopback address in distributed mode” again.  It seems that for both of my instances I needed to set that in my drill-env.sh file.

    These machines are on the same subnet, I don’t understand why one machine can connect to the other, but they cannot connect to their own selves.  The machine could ping its own IP, but telnet command said connection refused.

    I think this may have something to do with the Amazon Linux OS I am running, not sure.

    I’m still interested in knowing why I needed to set DRILL_HOST_NAME but the instructions did not mention it?  Also, why cannot my machine telnet to itself?

    Michael Knapp

    On 3/31/17, 3:05 PM, "Kunal Khatua" <kk...@mapr.com> wrote:

        What is the error for the collocated Drillbit instance ? Did you apply the "DRILL_HOST_NAME" to the collocated Drillbit's drill-env.sh file as well?

        One quick test is to try if the ZK port is accessible locally, but not using the loopback IP (i.e. 127.0.0.1 ). Telnet is a good way to verify that you're able to connect.



        -----Original Message-----
        From: Knapp, Michael [mailto:Michael.Knapp@capitalone.com]
        Sent: Friday, March 31, 2017 11:41 AM
        To: user@drill.apache.org
        Subject: Struggling to run in distributed mode

        Hi,

        I am struggling to get drill to run in distributed mode.  I stood up two Amazon EC2 instances using amazon’s linux for the OS, on m4.large instances.  I decided to install zookeeper on one of them.  I also have java 8 installed on each of them.  I followed the instructions provided here:
        https://drill.apache.org/docs/installing-drill-on-the-cluster/

        My drill-override.conf file is the same on each node, it looks like this:

        drill.exec: {
          cluster-id: "drillbits1",
          zk.connect: "10.XXX.YYY.ZZZ:2181"
        }

        The IP address of the zookeeper is the same as one of the drill nodes, because it is running on the same machine.  Unfortunately for me, drillbit.sh fails.

        On the machine that is co-located with zookeeper, I get this IOException in the drillbit.out file:
        Failure to connect to the zookeeper cluster service within the allotted time of 10000 milliseconds I have confirmed that ZK is running and I can open a shell to it.

        On the machine that is NOT co-located with zookeeper, I get a DrillbitStartupException: Drillbit is disallowed to bind to loopback address in distributed mode.  I looked in the drillbit.log file, it has:
        host.name=ip-10-XXX-YYY-ZZZ.****.com.

        After reviewing the source code, I realized that Drill lets you override the host name from drill-env.sh.  By setting “DRILL_HOST_NAME” to the IP address, I was able to start the Drill instance that is NOT co-located with Zookeeper.  However, the same solution did not work with the instance that is co-located with zookeeper, it is still failing with a timeout while trying to connect with Zookeeper.

        I do not understand why this is not working, would somebody please explain what is happening?
        Also, would somebody please update the installation instructions to explain when I need to set the “DRILL_HOST_NAME”?
        Last and most importantly, why is my Drill instance unable to connect with Zookeeper on its own machine?

        Michael Knapp



        ________________________________________________________

        The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.


    ________________________________________________________

    The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.


________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Struggling to run in distributed mode

Posted by "Knapp, Michael" <Mi...@capitalone.com>.
I was not using localhost.  In fact, I don’t recall any part of the configuration that required me to set my own host.  

From my first email: I looked in the drillbit.log file, it has:
        host.name=ip-10-XXX-YYY-ZZZ.****.com.

meaning it was not localhost.


On 3/31/17, 4:36 PM, "Kunal Khatua" <kk...@mapr.com> wrote:

    "DrillbitStartupException: Drillbit is disallowed to bind to loopback address in distributed mode" 
    
    I believe this is because you are using localhost / 127.0.0.1 
    
    The DRILL_HOST_NAME might have been a hack, which would explain why the instructions missed it. You should not have to specify it to begin with. (I know I have never used it). 
    
    As for Telnet failing, it might be possible that the EC2 setup itself is preventing access.. probably as a way to prevent DoS attacks. But, it is clear that if telnet failed... so will a starting up Drillbit.
    
    
    -----Original Message-----
    From: Knapp, Michael [mailto:Michael.Knapp@capitalone.com] 
    Sent: Friday, March 31, 2017 12:17 PM
    To: user@drill.apache.org
    Subject: Re: Struggling to run in distributed mode
    
    I was able to get this working.  On the machine that is co-located with Zookeeper, I modified the drill-override.conf file to use “localhost” as the Zookeeper host string.  Now the drill override files are different between the two drill hosts, but apparently this is not a problem.
    
    Yes I did try applying the DRILL_HOST_NAME in the drill-env.sh file for that instance.  I just tried unsetting it and got the exception “DrillbitStartupException: Drillbit is disallowed to bind to loopback address in distributed mode” again.  It seems that for both of my instances I needed to set that in my drill-env.sh file.
    
    These machines are on the same subnet, I don’t understand why one machine can connect to the other, but they cannot connect to their own selves.  The machine could ping its own IP, but telnet command said connection refused.
    
    I think this may have something to do with the Amazon Linux OS I am running, not sure.  
    
    I’m still interested in knowing why I needed to set DRILL_HOST_NAME but the instructions did not mention it?  Also, why cannot my machine telnet to itself?
    
    Michael Knapp
    
    On 3/31/17, 3:05 PM, "Kunal Khatua" <kk...@mapr.com> wrote:
    
        What is the error for the collocated Drillbit instance ? Did you apply the "DRILL_HOST_NAME" to the collocated Drillbit's drill-env.sh file as well?
        
        One quick test is to try if the ZK port is accessible locally, but not using the loopback IP (i.e. 127.0.0.1 ). Telnet is a good way to verify that you're able to connect. 
        
         
        
        -----Original Message-----
        From: Knapp, Michael [mailto:Michael.Knapp@capitalone.com] 
        Sent: Friday, March 31, 2017 11:41 AM
        To: user@drill.apache.org
        Subject: Struggling to run in distributed mode
        
        Hi,
        
        I am struggling to get drill to run in distributed mode.  I stood up two Amazon EC2 instances using amazon’s linux for the OS, on m4.large instances.  I decided to install zookeeper on one of them.  I also have java 8 installed on each of them.  I followed the instructions provided here:
        https://drill.apache.org/docs/installing-drill-on-the-cluster/
        
        My drill-override.conf file is the same on each node, it looks like this:
        
        drill.exec: {
          cluster-id: "drillbits1",
          zk.connect: "10.XXX.YYY.ZZZ:2181"
        }
        
        The IP address of the zookeeper is the same as one of the drill nodes, because it is running on the same machine.  Unfortunately for me, drillbit.sh fails.
        
        On the machine that is co-located with zookeeper, I get this IOException in the drillbit.out file:
        Failure to connect to the zookeeper cluster service within the allotted time of 10000 milliseconds I have confirmed that ZK is running and I can open a shell to it.
        
        On the machine that is NOT co-located with zookeeper, I get a DrillbitStartupException: Drillbit is disallowed to bind to loopback address in distributed mode.  I looked in the drillbit.log file, it has:
        host.name=ip-10-XXX-YYY-ZZZ.****.com.
        
        After reviewing the source code, I realized that Drill lets you override the host name from drill-env.sh.  By setting “DRILL_HOST_NAME” to the IP address, I was able to start the Drill instance that is NOT co-located with Zookeeper.  However, the same solution did not work with the instance that is co-located with zookeeper, it is still failing with a timeout while trying to connect with Zookeeper.
        
        I do not understand why this is not working, would somebody please explain what is happening?
        Also, would somebody please update the installation instructions to explain when I need to set the “DRILL_HOST_NAME”?
        Last and most importantly, why is my Drill instance unable to connect with Zookeeper on its own machine?
        
        Michael Knapp
        
        
        
        ________________________________________________________
        
        The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
        
    
    ________________________________________________________
    
    The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
    

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

RE: Struggling to run in distributed mode

Posted by Kunal Khatua <kk...@mapr.com>.
"DrillbitStartupException: Drillbit is disallowed to bind to loopback address in distributed mode" 

I believe this is because you are using localhost / 127.0.0.1 

The DRILL_HOST_NAME might have been a hack, which would explain why the instructions missed it. You should not have to specify it to begin with. (I know I have never used it). 

As for Telnet failing, it might be possible that the EC2 setup itself is preventing access.. probably as a way to prevent DoS attacks. But, it is clear that if telnet failed... so will a starting up Drillbit.


-----Original Message-----
From: Knapp, Michael [mailto:Michael.Knapp@capitalone.com] 
Sent: Friday, March 31, 2017 12:17 PM
To: user@drill.apache.org
Subject: Re: Struggling to run in distributed mode

I was able to get this working.  On the machine that is co-located with Zookeeper, I modified the drill-override.conf file to use “localhost” as the Zookeeper host string.  Now the drill override files are different between the two drill hosts, but apparently this is not a problem.

Yes I did try applying the DRILL_HOST_NAME in the drill-env.sh file for that instance.  I just tried unsetting it and got the exception “DrillbitStartupException: Drillbit is disallowed to bind to loopback address in distributed mode” again.  It seems that for both of my instances I needed to set that in my drill-env.sh file.

These machines are on the same subnet, I don’t understand why one machine can connect to the other, but they cannot connect to their own selves.  The machine could ping its own IP, but telnet command said connection refused.

I think this may have something to do with the Amazon Linux OS I am running, not sure.  

I’m still interested in knowing why I needed to set DRILL_HOST_NAME but the instructions did not mention it?  Also, why cannot my machine telnet to itself?

Michael Knapp

On 3/31/17, 3:05 PM, "Kunal Khatua" <kk...@mapr.com> wrote:

    What is the error for the collocated Drillbit instance ? Did you apply the "DRILL_HOST_NAME" to the collocated Drillbit's drill-env.sh file as well?
    
    One quick test is to try if the ZK port is accessible locally, but not using the loopback IP (i.e. 127.0.0.1 ). Telnet is a good way to verify that you're able to connect. 
    
     
    
    -----Original Message-----
    From: Knapp, Michael [mailto:Michael.Knapp@capitalone.com] 
    Sent: Friday, March 31, 2017 11:41 AM
    To: user@drill.apache.org
    Subject: Struggling to run in distributed mode
    
    Hi,
    
    I am struggling to get drill to run in distributed mode.  I stood up two Amazon EC2 instances using amazon’s linux for the OS, on m4.large instances.  I decided to install zookeeper on one of them.  I also have java 8 installed on each of them.  I followed the instructions provided here:
    https://drill.apache.org/docs/installing-drill-on-the-cluster/
    
    My drill-override.conf file is the same on each node, it looks like this:
    
    drill.exec: {
      cluster-id: "drillbits1",
      zk.connect: "10.XXX.YYY.ZZZ:2181"
    }
    
    The IP address of the zookeeper is the same as one of the drill nodes, because it is running on the same machine.  Unfortunately for me, drillbit.sh fails.
    
    On the machine that is co-located with zookeeper, I get this IOException in the drillbit.out file:
    Failure to connect to the zookeeper cluster service within the allotted time of 10000 milliseconds I have confirmed that ZK is running and I can open a shell to it.
    
    On the machine that is NOT co-located with zookeeper, I get a DrillbitStartupException: Drillbit is disallowed to bind to loopback address in distributed mode.  I looked in the drillbit.log file, it has:
    host.name=ip-10-XXX-YYY-ZZZ.****.com.
    
    After reviewing the source code, I realized that Drill lets you override the host name from drill-env.sh.  By setting “DRILL_HOST_NAME” to the IP address, I was able to start the Drill instance that is NOT co-located with Zookeeper.  However, the same solution did not work with the instance that is co-located with zookeeper, it is still failing with a timeout while trying to connect with Zookeeper.
    
    I do not understand why this is not working, would somebody please explain what is happening?
    Also, would somebody please update the installation instructions to explain when I need to set the “DRILL_HOST_NAME”?
    Last and most importantly, why is my Drill instance unable to connect with Zookeeper on its own machine?
    
    Michael Knapp
    
    
    
    ________________________________________________________
    
    The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
    

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Struggling to run in distributed mode

Posted by "Knapp, Michael" <Mi...@capitalone.com>.
I was able to get this working.  On the machine that is co-located with Zookeeper, I modified the drill-override.conf file to use “localhost” as the Zookeeper host string.  Now the drill override files are different between the two drill hosts, but apparently this is not a problem.

Yes I did try applying the DRILL_HOST_NAME in the drill-env.sh file for that instance.  I just tried unsetting it and got the exception “DrillbitStartupException: Drillbit is disallowed to bind to loopback address in distributed mode” again.  It seems that for both of my instances I needed to set that in my drill-env.sh file.

These machines are on the same subnet, I don’t understand why one machine can connect to the other, but they cannot connect to their own selves.  The machine could ping its own IP, but telnet command said connection refused.

I think this may have something to do with the Amazon Linux OS I am running, not sure.  

I’m still interested in knowing why I needed to set DRILL_HOST_NAME but the instructions did not mention it?  Also, why cannot my machine telnet to itself?

Michael Knapp

On 3/31/17, 3:05 PM, "Kunal Khatua" <kk...@mapr.com> wrote:

    What is the error for the collocated Drillbit instance ? Did you apply the "DRILL_HOST_NAME" to the collocated Drillbit's drill-env.sh file as well?
    
    One quick test is to try if the ZK port is accessible locally, but not using the loopback IP (i.e. 127.0.0.1 ). Telnet is a good way to verify that you're able to connect. 
    
     
    
    -----Original Message-----
    From: Knapp, Michael [mailto:Michael.Knapp@capitalone.com] 
    Sent: Friday, March 31, 2017 11:41 AM
    To: user@drill.apache.org
    Subject: Struggling to run in distributed mode
    
    Hi,
    
    I am struggling to get drill to run in distributed mode.  I stood up two Amazon EC2 instances using amazon’s linux for the OS, on m4.large instances.  I decided to install zookeeper on one of them.  I also have java 8 installed on each of them.  I followed the instructions provided here:
    https://drill.apache.org/docs/installing-drill-on-the-cluster/
    
    My drill-override.conf file is the same on each node, it looks like this:
    
    drill.exec: {
      cluster-id: "drillbits1",
      zk.connect: "10.XXX.YYY.ZZZ:2181"
    }
    
    The IP address of the zookeeper is the same as one of the drill nodes, because it is running on the same machine.  Unfortunately for me, drillbit.sh fails.
    
    On the machine that is co-located with zookeeper, I get this IOException in the drillbit.out file:
    Failure to connect to the zookeeper cluster service within the allotted time of 10000 milliseconds I have confirmed that ZK is running and I can open a shell to it.
    
    On the machine that is NOT co-located with zookeeper, I get a DrillbitStartupException: Drillbit is disallowed to bind to loopback address in distributed mode.  I looked in the drillbit.log file, it has:
    host.name=ip-10-XXX-YYY-ZZZ.****.com.
    
    After reviewing the source code, I realized that Drill lets you override the host name from drill-env.sh.  By setting “DRILL_HOST_NAME” to the IP address, I was able to start the Drill instance that is NOT co-located with Zookeeper.  However, the same solution did not work with the instance that is co-located with zookeeper, it is still failing with a timeout while trying to connect with Zookeeper.
    
    I do not understand why this is not working, would somebody please explain what is happening?
    Also, would somebody please update the installation instructions to explain when I need to set the “DRILL_HOST_NAME”?
    Last and most importantly, why is my Drill instance unable to connect with Zookeeper on its own machine?
    
    Michael Knapp
    
    
    
    ________________________________________________________
    
    The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
    

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

RE: Struggling to run in distributed mode

Posted by Kunal Khatua <kk...@mapr.com>.
What is the error for the collocated Drillbit instance ? Did you apply the "DRILL_HOST_NAME" to the collocated Drillbit's drill-env.sh file as well?

One quick test is to try if the ZK port is accessible locally, but not using the loopback IP (i.e. 127.0.0.1 ). Telnet is a good way to verify that you're able to connect. 

 

-----Original Message-----
From: Knapp, Michael [mailto:Michael.Knapp@capitalone.com] 
Sent: Friday, March 31, 2017 11:41 AM
To: user@drill.apache.org
Subject: Struggling to run in distributed mode

Hi,

I am struggling to get drill to run in distributed mode.  I stood up two Amazon EC2 instances using amazon’s linux for the OS, on m4.large instances.  I decided to install zookeeper on one of them.  I also have java 8 installed on each of them.  I followed the instructions provided here:
https://drill.apache.org/docs/installing-drill-on-the-cluster/

My drill-override.conf file is the same on each node, it looks like this:

drill.exec: {
  cluster-id: "drillbits1",
  zk.connect: "10.XXX.YYY.ZZZ:2181"
}

The IP address of the zookeeper is the same as one of the drill nodes, because it is running on the same machine.  Unfortunately for me, drillbit.sh fails.

On the machine that is co-located with zookeeper, I get this IOException in the drillbit.out file:
Failure to connect to the zookeeper cluster service within the allotted time of 10000 milliseconds I have confirmed that ZK is running and I can open a shell to it.

On the machine that is NOT co-located with zookeeper, I get a DrillbitStartupException: Drillbit is disallowed to bind to loopback address in distributed mode.  I looked in the drillbit.log file, it has:
host.name=ip-10-XXX-YYY-ZZZ.****.com.

After reviewing the source code, I realized that Drill lets you override the host name from drill-env.sh.  By setting “DRILL_HOST_NAME” to the IP address, I was able to start the Drill instance that is NOT co-located with Zookeeper.  However, the same solution did not work with the instance that is co-located with zookeeper, it is still failing with a timeout while trying to connect with Zookeeper.

I do not understand why this is not working, would somebody please explain what is happening?
Also, would somebody please update the installation instructions to explain when I need to set the “DRILL_HOST_NAME”?
Last and most importantly, why is my Drill instance unable to connect with Zookeeper on its own machine?

Michael Knapp



________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.