You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by harrisonmebane <gi...@git.apache.org> on 2016/11/06 20:30:47 UTC

[GitHub] drill pull request #647: DRILL-4935 Allow Drill to advertise a specific host...

GitHub user harrisonmebane opened a pull request:

    https://github.com/apache/drill/pull/647

    DRILL-4935 Allow Drill to advertise a specific hostname to Zookeeper

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/harrisonmebane/drill DRILL-4935

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/647.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #647
    
----
commit 01dd583ff0f7dc0092371a85823b4737ca26b02d
Author: Harrison Mebane <ha...@svds.com>
Date:   2016-10-07T20:10:36Z

    allow configuration of advertised drillbit IP address

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #647: DRILL-4935 Allow Drill to advertise a specific host...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/647#discussion_r86916748
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
    @@ -49,6 +49,7 @@
       String USER_RPC_TIMEOUT = "drill.exec.rpc.user.timeout";
       String METRICS_CONTEXT_NAME = "drill.exec.metrics.context";
       String USE_IP_ADDRESS = "drill.exec.rpc.use.ip";
    +  String BIT_ADVERTISED_HOST = "drill.exec.rpc.bit.advertised.host";
    --- End diff --
    
    drill-env.sh is a script that holds the "external" customizations for each installation. "External" customizations are those that must be made before Drill starts. Setting the memory limit is a typical example. Here, drill-env.sh lets you set a value per-node by using a Linux command to set the host name:
    
        export DRILL_HOST_NAME=`hostname`
    
    The idea is that someone who needs your feature would add a line to drill-env.sh to set the proper "public" name of the host using a command appropriate to their setup. (Maybe using hostname, or, on EC2, using the appropriate Amazon-provided command.)
    
    The environment variable would pass the info into Drill where it would override the default.
    
    Because the host name must be different per node, and Drill is supposed to be distributed, it probably makes a bit less sense to set the host name in the Drill config file. (Would be cool if the config file supported scripts - but, alas, it does not.)
    
    Similar examples are the DRILL_CONFIG_DIR, DRILL_LOG_DIR, DRILL_JAVA_LIB_PATH, etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #647: DRILL-4935 Allow Drill to advertise a specific hostname to...

Posted by harrisonmebane <gi...@git.apache.org>.
Github user harrisonmebane commented on the issue:

    https://github.com/apache/drill/pull/647
  
    I have made the changes suggested by @paul-rogers .  Are there any unit tests we should add?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #647: DRILL-4935 Allow Drill to advertise a specific hostname to...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/647
  
    Please update the JIRA ticket to explain the solution. What does an admin need to know to use the feature? How can the admin verify that it works? This will allow the documentation team to add the needed information for folks to use this feature.
    
    Also, assign the JIRA ticket to yourself, since you're working on it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #647: DRILL-4935 Allow Drill to advertise a specific hostname to...

Posted by harrisonmebane <gi...@git.apache.org>.
Github user harrisonmebane commented on the issue:

    https://github.com/apache/drill/pull/647
  
    I'm aware that the tests will not pass yet; putting what I have up here to make sure I'm on the right track.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #647: DRILL-4935 Allow Drill to advertise a specific hostname to...

Posted by harrisonmebane <gi...@git.apache.org>.
Github user harrisonmebane commented on the issue:

    https://github.com/apache/drill/pull/647
  
    I have tested this fix manually, in the following way:
    
    * Deploy Drill in Docker containers on an existing cluster on AWS, with the line 
    ```export DRILL_HOST_NAME=`curl http://169.254.169.254/latest/meta-data/local-ipv4` ``` in `drill-env.sh`, to ensure that the variable is populated with the host machine's IP.
    * Start a Drill session in the docker container and run `select * from sys.drillbits;`.  The result was: 
    ```
    +----------------+------------+---------------+------------+----------+
    |    hostname    | user_port  | control_port  | data_port  | current  |
    +----------------+------------+---------------+------------+----------+
    | 172.31.21.207  | 31010      | 31011         | 31012      | false    |
    | 172.31.29.130  | 31010      | 31011         | 31012      | true     |
    | 172.31.22.200  | 31010      | 31011         | 31012      | false    |
    +----------------+------------+---------------+------------+----------+
    ```
    * On one of the drillbits, comment out the `DRILL_HOST_NAME` line in `drill-env.sh`, unset the variable, and restart the drillbit.
    * Log into Drill shell again and run `select * from sys.drillbits;`  Result is now:
    ```
    +----------------+------------+---------------+------------+----------+
    |    hostname    | user_port  | control_port  | data_port  | current  |
    +----------------+------------+---------------+------------+----------+
    | 172.31.22.200  | 31010      | 31011         | 31012      | false    |
    | a53b37888f62   | 31010      | 31011         | 31012      | true     |
    | 172.31.29.130  | 31010      | 31011         | 31012      | false    |
    +----------------+------------+---------------+------------+----------+
    ```
    * Running queries that require the other nodes results in a `Error: SYSTEM ERROR: UnresolvedAddressException`, indicating that the new address registered on Zookeeper no longer works
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #647: DRILL-4935 Allow Drill to advertise a specific hostname to...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/647
  
    Two ways to test.
    
    First, try the fix manually. Start Drill once without the env var set to ensure Drill uses the default. Then, set the env var and ensure that Drill uses the correct value.
    
    Second, would be a unit test. I'm not familiar with any existing ZK unit tests. Do any of the other committers know if we have such tests? Something that validates Drillbit registration, etc? If so, then that test can be modified for this use case.
    
    But, there is one gotcha: in Java, env vars are read-only, there is no way to set env vars in a test (that I've ever found.)  Anyone have a work-around?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #647: DRILL-4935 Allow Drill to advertise a specific hostname to...

Posted by harrison-svds <gi...@git.apache.org>.
Github user harrison-svds commented on the issue:

    https://github.com/apache/drill/pull/647
  
    @paul-rogers I added a comment detailing the basic solution.  I don't know that I have the permissions to assign the story to myself.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #647: DRILL-4935 Allow Drill to advertise a specific hostname to...

Posted by harrisonmebane <gi...@git.apache.org>.
Github user harrisonmebane commented on the issue:

    https://github.com/apache/drill/pull/647
  
    Do I need to do anything more to push this forward?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #647: DRILL-4935 Allow Drill to advertise a specific hostname to...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/647
  
    The committers are in the middle of doing a release. Once that is done, you need one of them to do a quick review and merge your pull request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #647: DRILL-4935 Allow Drill to advertise a specific hostname to...

Posted by zfong <gi...@git.apache.org>.
Github user zfong commented on the issue:

    https://github.com/apache/drill/pull/647
  
    @harrisonmebane - for pull requests submitted by non-committers, each week, one of the committers will do a batch commit of changes that have gone through review.  This is normally done towards the end of the week.  There may be an exception this week because as @paul-rogers has noted, we're in the middle of trying to push out a release candidate build for the 1.9 release.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #647: DRILL-4935 Allow Drill to advertise a specific host...

Posted by harrisonmebane <gi...@git.apache.org>.
Github user harrisonmebane commented on a diff in the pull request:

    https://github.com/apache/drill/pull/647#discussion_r86922289
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
    @@ -49,6 +49,7 @@
       String USER_RPC_TIMEOUT = "drill.exec.rpc.user.timeout";
       String METRICS_CONTEXT_NAME = "drill.exec.metrics.context";
       String USE_IP_ADDRESS = "drill.exec.rpc.use.ip";
    +  String BIT_ADVERTISED_HOST = "drill.exec.rpc.bit.advertised.host";
    --- End diff --
    
    That all seems reasonable to me.  I can't find any examples of environment variables being accessed from in the Java code.  I assume we don't want the name of the environment variable hard-coded in the hostname resolution code, but I'm not sure of the best place to define it, i.e. ```String DRILL_HOST_NAME = "DRILL_HOST_NAME"```
    Thoughts?  I could put it in `ExecConstants` but it would be the only env variable in there.  I could also just make it a static variable in the `ServiceEngine` class.
    
    Do we still want to provide the option to override through a system property, or just rely on the environment variable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #647: DRILL-4935 Allow Drill to advertise a specific hostname to...

Posted by harrisonmebane <gi...@git.apache.org>.
Github user harrisonmebane commented on the issue:

    https://github.com/apache/drill/pull/647
  
    I have implemented @xhochy 's suggestion.  I will need some guidance on the best way to unit test this new configuration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #647: DRILL-4935 Allow Drill to advertise a specific host...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/647#discussion_r86867904
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/service/ServiceEngine.java ---
    @@ -142,7 +142,12 @@ private static BufferAllocator newAllocator(
     
       public DrillbitEndpoint start() throws DrillbitStartupException, UnknownHostException{
         int userPort = userServer.bind(config.getInt(ExecConstants.INITIAL_USER_PORT), allowPortHunting);
    -    String address = useIP ?  InetAddress.getLocalHost().getHostAddress() : InetAddress.getLocalHost().getCanonicalHostName();
    +    String address = null;
    --- End diff --
    
    Factor out into a getHostName( ) method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #647: DRILL-4935 Allow Drill to advertise a specific host...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/647#discussion_r86867777
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
    @@ -49,6 +49,7 @@
       String USER_RPC_TIMEOUT = "drill.exec.rpc.user.timeout";
       String METRICS_CONTEXT_NAME = "drill.exec.metrics.context";
       String USE_IP_ADDRESS = "drill.exec.rpc.use.ip";
    +  String BIT_ADVERTISED_HOST = "drill.exec.rpc.bit.advertised.host";
    --- End diff --
    
    advertised.host -> advertised-host (or advertised_host). Here, "advertised" does not seem to form a group, so no need for dot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #647: DRILL-4935 Allow Drill to advertise a specific host...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/647#discussion_r87103376
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/service/ServiceEngine.java ---
    @@ -140,9 +140,19 @@ private static BufferAllocator newAllocator(
             name, context.getConfig().getLong(initReservation), context.getConfig().getLong(maxAllocation));
       }
     
    +  private String getHostName() throws UnknownHostException{
    +    // DRILL_HOST_NAME sets custom host name.  See drill-env.sh for details.
    +    String customHost = System.getenv("DRILL_HOST_NAME");
    +    if (customHost == null) {
    +      return useIP ? InetAddress.getLocalHost().getHostAddress() : InetAddress.getLocalHost().getCanonicalHostName();
    +    } else {
    +      return customHost;
    +    }
    --- End diff --
    
    Minor code flow suggestion:
    
        if (customHost == null) {
          return customHost;
        }
        return useIP ? InetAddress.getLocalHost().getHostAddress() : InetAddress.getLocalHost().getCanonicalHostName();
     


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #647: DRILL-4935 Allow Drill to advertise a specific host...

Posted by harrisonmebane <gi...@git.apache.org>.
Github user harrisonmebane commented on a diff in the pull request:

    https://github.com/apache/drill/pull/647#discussion_r86856157
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/service/ServiceEngine.java ---
    @@ -142,7 +142,13 @@ private static BufferAllocator newAllocator(
     
       public DrillbitEndpoint start() throws DrillbitStartupException, UnknownHostException{
         int userPort = userServer.bind(config.getInt(ExecConstants.INITIAL_USER_PORT), allowPortHunting);
    -    String address = useIP ?  InetAddress.getLocalHost().getHostAddress() : InetAddress.getLocalHost().getCanonicalHostName();
    +    String configIP = config.getString(ExecConstants.BIT_ADVERTISED_HOST);
    +    String address = null;
    +    if (configIP == "") {
    --- End diff --
    
    Good suggestion!  Much cleaner.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #647: DRILL-4935 Allow Drill to advertise a specific hostname to...

Posted by parthchandra <gi...@git.apache.org>.
Github user parthchandra commented on the issue:

    https://github.com/apache/drill/pull/647
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #647: DRILL-4935 Allow Drill to advertise a specific host...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/647


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #647: DRILL-4935 Allow Drill to advertise a specific hostname to...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/647
  
    Great! LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #647: DRILL-4935 Allow Drill to advertise a specific host...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/647#discussion_r86868307
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
    @@ -49,6 +49,7 @@
       String USER_RPC_TIMEOUT = "drill.exec.rpc.user.timeout";
       String METRICS_CONTEXT_NAME = "drill.exec.metrics.context";
       String USE_IP_ADDRESS = "drill.exec.rpc.use.ip";
    +  String BIT_ADVERTISED_HOST = "drill.exec.rpc.bit.advertised.host";
    --- End diff --
    
    Is this really what we want to do? After this change, each host will have its on Drill config file. That means that, on every config change, the admin must:
    
    1. Modify a master file.
    2. Use a script to regenerate the per-host files.
    3. Push the per-host files to the remote systems.
    4. Restart the cluster.
    
    If Drill is deployed on a system such as YARN, it is not possible to have per-host config files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #647: DRILL-4935 Allow Drill to advertise a specific host...

Posted by xhochy <gi...@git.apache.org>.
Github user xhochy commented on a diff in the pull request:

    https://github.com/apache/drill/pull/647#discussion_r86720949
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/service/ServiceEngine.java ---
    @@ -142,7 +142,13 @@ private static BufferAllocator newAllocator(
     
       public DrillbitEndpoint start() throws DrillbitStartupException, UnknownHostException{
         int userPort = userServer.bind(config.getInt(ExecConstants.INITIAL_USER_PORT), allowPortHunting);
    -    String address = useIP ?  InetAddress.getLocalHost().getHostAddress() : InetAddress.getLocalHost().getCanonicalHostName();
    +    String configIP = config.getString(ExecConstants.BIT_ADVERTISED_HOST);
    +    String address = null;
    +    if (configIP == "") {
    --- End diff --
    
    You could use `config.hasPath` instead of relying on matching on an empty string.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #647: DRILL-4935 Allow Drill to advertise a specific host...

Posted by harrisonmebane <gi...@git.apache.org>.
Github user harrisonmebane commented on a diff in the pull request:

    https://github.com/apache/drill/pull/647#discussion_r86883806
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
    @@ -49,6 +49,7 @@
       String USER_RPC_TIMEOUT = "drill.exec.rpc.user.timeout";
       String METRICS_CONTEXT_NAME = "drill.exec.metrics.context";
       String USE_IP_ADDRESS = "drill.exec.rpc.use.ip";
    +  String BIT_ADVERTISED_HOST = "drill.exec.rpc.bit.advertised.host";
    --- End diff --
    
    @paul-rogers Thank you for your helpful suggestions.  I understand what you are saying, and indeed, I was using exactly the flow you outlined above for my own use case.
    
    Let me make sure I understand what you are proposing.  We can either:
    
    1. Access the host name as an environment variable, in which case we could access it via `System.getEnv(...)`
    2. Pass the host name in as a system variable (``-Ddrill.exec.rpc.bit.advertised-host=myhost`), which could just override the parameter I've already specified in `ExecConstants.java`
    
    In either case, I guess I don't understand why I'd need to add anything to `drill-env.sh`, unless you mean just to add a commented out line for documentation purposes.  In the second case, is there any change I would need to make at all?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---