You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by sohami <gi...@git.apache.org> on 2016/12/06 23:22:55 UTC

[GitHub] drill pull request #679: DRILL-5098: Improving fault tolerance for connectio...

GitHub user sohami opened a pull request:

    https://github.com/apache/drill/pull/679

    DRILL-5098: Improving fault tolerance for connection between client a\u2026

    \u2026nd foreman node.
    
             Note: Adding tries config option in connection string.
                   Improving fault tolerance in Drill client when trying to make first connection with foreman.
                   The client will try to connect to min(tries, num_drillbits) unique drillbits unless a successfull
                   connection is established.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sohami/drill DRILL-5098

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/679.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #679
    
----
commit d13c6cc91b72b91b481bd1d428ca77775490a8cf
Author: Sorabh Hamirwasia <sh...@maprtech.com>
Date:   2016-12-01T22:58:00Z

    DRILL-5098: Improving fault tolerance for connection between client and foreman node.
             Note: Adding tries config option in connection string.
                   Improving fault tolerance in Drill client when trying to make first connection with foreman.
                   The client will try to connect to min(tries, num_drillbits) unique drillbits unless a successfull
                   connection is established.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #679: DRILL-5098: Improving fault tolerance for connectio...

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/679#discussion_r91820952
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
    @@ -357,10 +357,54 @@ protected void afterExecute(final Runnable r, final Throwable t) {
             super.afterExecute(r, t);
           }
         };
    -    client = new UserClient(clientName, config, supportComplexTypes, allocator, eventLoopGroup, executor);
    -    logger.debug("Connecting to server {}:{}", endpoint.getAddress(), endpoint.getUserPort());
    -    connect(endpoint);
    -    connected = true;
    +
    +    // "tries" is max number of unique drillbit to try connecting until successfully connected to one of them
    +    final String connectTriesConf = (props != null) ? props.getProperty("tries", "5") : "5";
    +
    +    int connectTriesVal;
    +    try {
    +      connectTriesVal = Math.min(endpoints.size(), Integer.parseInt(connectTriesConf));
    +    } catch (NumberFormatException e) {
    +      throw new InvalidConnectionInfoException("Invalid tries value: " + connectTriesConf + " specified in " +
    +                                               "connection string");
    +    }
    +
    +    // If the value provided in the connection string is <=0 then override with 1 since we want to try connecting
    +    // at least once
    +    connectTriesVal = Math.max(1, connectTriesVal);
    +
    +    int triedEndpointIndex = 0;
    +    DrillbitEndpoint endpoint;
    +
    +    while (triedEndpointIndex < connectTriesVal) {
    +      client = new UserClient(clientName, config, supportComplexTypes, allocator, eventLoopGroup, executor);
    +      endpoint = endpoints.get(triedEndpointIndex);
    +      logger.debug("Connecting to server {}:{}", endpoint.getAddress(), endpoint.getUserPort());
    +
    +      try {
    +        connect(endpoint);
    +        connected = true;
    +        logger.info("Successfully connected to server {}:{}", endpoint.getAddress(), endpoint.getUserPort());
    +        break;
    +      } catch (InvalidConnectionInfoException ex) {
    +        logger.error("Connection to {}:{} failed with error {}. Not retrying anymore", endpoint.getAddress(),
    +                     endpoint.getUserPort(), ex.getMessage());
    +        throw ex;
    +      } catch (RpcException ex) {
    +        ++triedEndpointIndex;
    +        logger.error("Attempt {}: Failed to connect to server {}:{}", triedEndpointIndex, endpoint.getAddress(),
    +                     endpoint.getUserPort());
    +
    +        // Close the connection
    +        if (client.isActive()) {
    --- End diff --
    
    Here client is userClient and "isActive" check for all the below conditions:
    
    return connection != null
            && connection.getChannel() != null
            && connection.getChannel().isActive();
    
    and close --> closes the corresponding channel.
    
    Since we are retrying so other resources will be reused for next connection attempt. After all retries are exhausted those will be closed by the DrillClient.close method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #679: DRILL-5098: Improving fault tolerance for connectio...

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/679#discussion_r91822093
  
    --- Diff: exec/jdbc/src/test/java/org/apache/drill/jdbc/test/JdbcConnectTriesTestEmbeddedBits.java ---
    @@ -0,0 +1,162 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + * <p>
    + * http://www.apache.org/licenses/LICENSE-2.0
    + * <p>
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.jdbc.test;
    +
    +import org.apache.drill.exec.rpc.InvalidConnectionInfoException;
    +import org.apache.drill.exec.rpc.RpcException;
    +import org.apache.drill.jdbc.Driver;
    +import org.apache.drill.jdbc.JdbcTestBase;
    +
    +import org.junit.Test;
    +
    +import java.sql.SQLException;
    +import java.sql.Connection;
    +
    +import java.util.concurrent.ExecutionException;
    +
    +import static junit.framework.Assert.assertNotNull;
    +import static junit.framework.TestCase.fail;
    +import static org.junit.Assert.assertNull;
    +import static org.junit.Assert.assertTrue;
    +
    +public class JdbcConnectTriesTestEmbeddedBits extends JdbcTestBase {
    +
    +  @Test
    +  public void testDirectConnectionConnectTriesEqualsDrillbitCount() throws SQLException {
    +    Connection connection = null;
    +    try {
    +      connection = new Driver().connect("jdbc:drill:drillbit=127.0.0.1:5000,127.0.0.1:5001;" + "tries=2",
    --- End diff --
    
    Fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #679: DRILL-5098: Improving fault tolerance for connection betwe...

Posted by sudheeshkatkam <gi...@git.apache.org>.
Github user sudheeshkatkam commented on the issue:

    https://github.com/apache/drill/pull/679
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #679: DRILL-5098: Improving fault tolerance for connectio...

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/679#discussion_r92296665
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
    @@ -357,10 +357,53 @@ protected void afterExecute(final Runnable r, final Throwable t) {
             super.afterExecute(r, t);
           }
         };
    -    client = new UserClient(clientName, config, supportComplexTypes, allocator, eventLoopGroup, executor);
    -    logger.debug("Connecting to server {}:{}", endpoint.getAddress(), endpoint.getUserPort());
    -    connect(endpoint);
    -    connected = true;
    +
    +    // "tries" is max number of unique drillbit to try connecting until successfully connected to one of them
    +    final String connectTriesConf = (props != null) ? props.getProperty("tries", "5") : "5";
    +
    +    int connectTriesVal;
    +    try {
    +      connectTriesVal = Math.min(endpoints.size(), Integer.parseInt(connectTriesConf));
    +    } catch (NumberFormatException e) {
    +      throw new InvalidConnectionInfoException("Invalid tries value: " + connectTriesConf + " specified in " +
    +                                               "connection string");
    +    }
    +
    +    // If the value provided in the connection string is <=0 then override with 1 since we want to try connecting
    +    // at least once
    +    connectTriesVal = Math.max(1, connectTriesVal);
    +
    +    int triedEndpointIndex = 0;
    +    DrillbitEndpoint endpoint;
    +
    +    while (triedEndpointIndex < connectTriesVal) {
    +      client = new UserClient(clientName, config, supportComplexTypes, allocator, eventLoopGroup, executor);
    +      endpoint = endpoints.get(triedEndpointIndex);
    +      logger.debug("Connecting to server {}:{}", endpoint.getAddress(), endpoint.getUserPort());
    +
    +      try {
    +        connect(endpoint);
    +        connected = true;
    +        logger.info("Successfully connected to server {}:{}", endpoint.getAddress(), endpoint.getUserPort());
    +        break;
    +      } catch (InvalidConnectionInfoException ex) {
    +        logger.error("Connection to {}:{} failed with error {}. Not retrying anymore", endpoint.getAddress(),
    +                     endpoint.getUserPort(), ex.getMessage());
    +        throw ex;
    +      } catch (RpcException ex) {
    +        ++triedEndpointIndex;
    +        logger.error("Attempt {}: Failed to connect to server {}:{}", triedEndpointIndex, endpoint.getAddress(),
    +                     endpoint.getUserPort());
    +
    +        // Close the connection
    +        client.close();
    --- End diff --
    
    I didn't found anywhere in documentation specifying the cause of calling close multiple times. It just says that internally close is being called for each handler in the pipeline. https://netty.io/4.0/api/io/netty/channel/Channel.html#close()
    
    Though I do have unit tests which will result in covering that code path where close will be called twice. Didn't found any issue running the test. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #679: DRILL-5098: Improving fault tolerance for connectio...

Posted by sudheeshkatkam <gi...@git.apache.org>.
Github user sudheeshkatkam commented on a diff in the pull request:

    https://github.com/apache/drill/pull/679#discussion_r91823124
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
    @@ -357,10 +357,54 @@ protected void afterExecute(final Runnable r, final Throwable t) {
             super.afterExecute(r, t);
           }
         };
    -    client = new UserClient(clientName, config, supportComplexTypes, allocator, eventLoopGroup, executor);
    -    logger.debug("Connecting to server {}:{}", endpoint.getAddress(), endpoint.getUserPort());
    -    connect(endpoint);
    -    connected = true;
    +
    +    // "tries" is max number of unique drillbit to try connecting until successfully connected to one of them
    +    final String connectTriesConf = (props != null) ? props.getProperty("tries", "5") : "5";
    +
    +    int connectTriesVal;
    +    try {
    +      connectTriesVal = Math.min(endpoints.size(), Integer.parseInt(connectTriesConf));
    +    } catch (NumberFormatException e) {
    +      throw new InvalidConnectionInfoException("Invalid tries value: " + connectTriesConf + " specified in " +
    +                                               "connection string");
    +    }
    +
    +    // If the value provided in the connection string is <=0 then override with 1 since we want to try connecting
    +    // at least once
    +    connectTriesVal = Math.max(1, connectTriesVal);
    +
    +    int triedEndpointIndex = 0;
    +    DrillbitEndpoint endpoint;
    +
    +    while (triedEndpointIndex < connectTriesVal) {
    +      client = new UserClient(clientName, config, supportComplexTypes, allocator, eventLoopGroup, executor);
    +      endpoint = endpoints.get(triedEndpointIndex);
    +      logger.debug("Connecting to server {}:{}", endpoint.getAddress(), endpoint.getUserPort());
    +
    +      try {
    +        connect(endpoint);
    +        connected = true;
    +        logger.info("Successfully connected to server {}:{}", endpoint.getAddress(), endpoint.getUserPort());
    +        break;
    +      } catch (InvalidConnectionInfoException ex) {
    +        logger.error("Connection to {}:{} failed with error {}. Not retrying anymore", endpoint.getAddress(),
    +                     endpoint.getUserPort(), ex.getMessage());
    +        throw ex;
    +      } catch (RpcException ex) {
    +        ++triedEndpointIndex;
    +        logger.error("Attempt {}: Failed to connect to server {}:{}", triedEndpointIndex, endpoint.getAddress(),
    +                     endpoint.getUserPort());
    +
    +        // Close the connection
    +        if (client.isActive()) {
    --- End diff --
    
    The loop recreates a new client anyway. So this should be closed, regardless of being active.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #679: DRILL-5098: Improving fault tolerance for connectio...

Posted by sudheeshkatkam <gi...@git.apache.org>.
Github user sudheeshkatkam commented on a diff in the pull request:

    https://github.com/apache/drill/pull/679#discussion_r91795818
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/client/ConnectTriesPropertyTestClusterBits.java ---
    @@ -0,0 +1,244 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + * <p>
    + * http://www.apache.org/licenses/LICENSE-2.0
    + * <p>
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.client;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.Properties;
    +import java.util.concurrent.ExecutionException;
    +
    +import org.apache.drill.common.config.DrillConfig;
    +import org.apache.drill.exec.ZookeeperHelper;
    +import org.apache.drill.exec.coord.ClusterCoordinator;
    +import org.apache.drill.exec.exception.DrillbitStartupException;
    +import org.apache.drill.exec.proto.CoordinationProtos.DrillbitEndpoint;
    +import org.apache.drill.exec.rpc.InvalidConnectionInfoException;
    +import org.apache.drill.exec.rpc.RpcException;
    +import org.apache.drill.exec.server.Drillbit;
    +
    +import org.apache.drill.exec.server.RemoteServiceSet;
    +
    +import org.junit.AfterClass;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +
    +import static junit.framework.TestCase.assertTrue;
    +import static junit.framework.TestCase.fail;
    +
    +public class ConnectTriesPropertyTestClusterBits {
    +
    +  public static StringBuilder bitInfo;
    +  public static final String fakeBitsInfo = "127.0.0.1:5000,127.0.0.1:5001";
    +  public static List<Drillbit> drillbits;
    +  public static final int drillBitCount = 1;
    +  public static ZookeeperHelper zkHelper;
    +  public static RemoteServiceSet remoteServiceSet;
    +  public static DrillConfig drillConfig;
    +
    +  @BeforeClass
    +  public static void testSetUp() throws Exception {
    +    remoteServiceSet = RemoteServiceSet.getLocalServiceSet();
    +    zkHelper = new ZookeeperHelper();
    +    zkHelper.startZookeeper(1);
    +
    +    // Creating Drillbits
    +    drillConfig = zkHelper.getConfig();
    +    try {
    +      int drillBitStarted = 0;
    +      drillbits = new ArrayList<>();
    +      while(drillBitStarted < drillBitCount){
    +        drillbits.add(Drillbit.start(drillConfig, remoteServiceSet));
    +        ++drillBitStarted;
    +      }
    +    } catch (DrillbitStartupException e) {
    +      throw new RuntimeException("Failed to start drillbits.", e);
    +    }
    +    bitInfo = new StringBuilder();
    +
    +    for (int i = 0; i < drillBitCount; ++i) {
    +      final DrillbitEndpoint currentEndPoint = drillbits.get(i).getContext().getEndpoint();
    +      final String currentBitIp = currentEndPoint.getAddress();
    +      final int currentBitPort = currentEndPoint.getUserPort();
    +      bitInfo.append(",");
    +      bitInfo.append(currentBitIp);
    +      bitInfo.append(":");
    +      bitInfo.append(currentBitPort);
    +    }
    +  }
    +
    +  @AfterClass
    +  public static void testCleanUp(){
    +    for(int i=0; i < drillBitCount; ++i){
    +      drillbits.get(i).close();
    --- End diff --
    
    + spacing in the signature
    + Use `AutoCloseables.close(drillbits);`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #679: DRILL-5098: Improving fault tolerance for connectio...

Posted by sohami <gi...@git.apache.org>.
Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/679#discussion_r91821466
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/client/ConnectTriesPropertyTestClusterBits.java ---
    @@ -0,0 +1,244 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + * <p>
    + * http://www.apache.org/licenses/LICENSE-2.0
    + * <p>
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.client;
    +
    +import java.util.ArrayList;
    +import java.util.List;
    +import java.util.Properties;
    +import java.util.concurrent.ExecutionException;
    +
    +import org.apache.drill.common.config.DrillConfig;
    +import org.apache.drill.exec.ZookeeperHelper;
    +import org.apache.drill.exec.coord.ClusterCoordinator;
    +import org.apache.drill.exec.exception.DrillbitStartupException;
    +import org.apache.drill.exec.proto.CoordinationProtos.DrillbitEndpoint;
    +import org.apache.drill.exec.rpc.InvalidConnectionInfoException;
    +import org.apache.drill.exec.rpc.RpcException;
    +import org.apache.drill.exec.server.Drillbit;
    +
    +import org.apache.drill.exec.server.RemoteServiceSet;
    +
    +import org.junit.AfterClass;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +
    +import static junit.framework.TestCase.assertTrue;
    +import static junit.framework.TestCase.fail;
    +
    +public class ConnectTriesPropertyTestClusterBits {
    +
    +  public static StringBuilder bitInfo;
    +  public static final String fakeBitsInfo = "127.0.0.1:5000,127.0.0.1:5001";
    +  public static List<Drillbit> drillbits;
    +  public static final int drillBitCount = 1;
    +  public static ZookeeperHelper zkHelper;
    +  public static RemoteServiceSet remoteServiceSet;
    +  public static DrillConfig drillConfig;
    +
    +  @BeforeClass
    +  public static void testSetUp() throws Exception {
    +    remoteServiceSet = RemoteServiceSet.getLocalServiceSet();
    +    zkHelper = new ZookeeperHelper();
    +    zkHelper.startZookeeper(1);
    +
    +    // Creating Drillbits
    +    drillConfig = zkHelper.getConfig();
    +    try {
    +      int drillBitStarted = 0;
    +      drillbits = new ArrayList<>();
    +      while(drillBitStarted < drillBitCount){
    +        drillbits.add(Drillbit.start(drillConfig, remoteServiceSet));
    +        ++drillBitStarted;
    +      }
    +    } catch (DrillbitStartupException e) {
    +      throw new RuntimeException("Failed to start drillbits.", e);
    +    }
    +    bitInfo = new StringBuilder();
    +
    +    for (int i = 0; i < drillBitCount; ++i) {
    +      final DrillbitEndpoint currentEndPoint = drillbits.get(i).getContext().getEndpoint();
    +      final String currentBitIp = currentEndPoint.getAddress();
    +      final int currentBitPort = currentEndPoint.getUserPort();
    +      bitInfo.append(",");
    +      bitInfo.append(currentBitIp);
    +      bitInfo.append(":");
    +      bitInfo.append(currentBitPort);
    +    }
    +  }
    +
    +  @AfterClass
    +  public static void testCleanUp(){
    +    for(int i=0; i < drillBitCount; ++i){
    +      drillbits.get(i).close();
    --- End diff --
    
    Fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #679: DRILL-5098: Improving fault tolerance for connectio...

Posted by sudheeshkatkam <gi...@git.apache.org>.
Github user sudheeshkatkam commented on a diff in the pull request:

    https://github.com/apache/drill/pull/679#discussion_r91768005
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
    @@ -357,10 +357,54 @@ protected void afterExecute(final Runnable r, final Throwable t) {
             super.afterExecute(r, t);
           }
         };
    -    client = new UserClient(clientName, config, supportComplexTypes, allocator, eventLoopGroup, executor);
    -    logger.debug("Connecting to server {}:{}", endpoint.getAddress(), endpoint.getUserPort());
    -    connect(endpoint);
    -    connected = true;
    +
    +    // "tries" is max number of unique drillbit to try connecting until successfully connected to one of them
    +    final String connectTriesConf = (props != null) ? props.getProperty("tries", "5") : "5";
    +
    +    int connectTriesVal;
    +    try {
    +      connectTriesVal = Math.min(endpoints.size(), Integer.parseInt(connectTriesConf));
    +    } catch (NumberFormatException e) {
    +      throw new InvalidConnectionInfoException("Invalid tries value: " + connectTriesConf + " specified in " +
    +                                               "connection string");
    +    }
    +
    +    // If the value provided in the connection string is <=0 then override with 1 since we want to try connecting
    +    // at least once
    +    connectTriesVal = Math.max(1, connectTriesVal);
    +
    +    int triedEndpointIndex = 0;
    +    DrillbitEndpoint endpoint;
    +
    +    while (triedEndpointIndex < connectTriesVal) {
    +      client = new UserClient(clientName, config, supportComplexTypes, allocator, eventLoopGroup, executor);
    +      endpoint = endpoints.get(triedEndpointIndex);
    +      logger.debug("Connecting to server {}:{}", endpoint.getAddress(), endpoint.getUserPort());
    +
    +      try {
    +        connect(endpoint);
    +        connected = true;
    +        logger.info("Successfully connected to server {}:{}", endpoint.getAddress(), endpoint.getUserPort());
    +        break;
    +      } catch (InvalidConnectionInfoException ex) {
    +        logger.error("Connection to {}:{} failed with error {}. Not retrying anymore", endpoint.getAddress(),
    +                     endpoint.getUserPort(), ex.getMessage());
    +        throw ex;
    +      } catch (RpcException ex) {
    +        ++triedEndpointIndex;
    +        logger.error("Attempt {}: Failed to connect to server {}:{}", triedEndpointIndex, endpoint.getAddress(),
    +                     endpoint.getUserPort());
    +
    +        // Close the connection
    +        if (client.isActive()) {
    --- End diff --
    
    Shouldn't the client be closed regardless of being active? There maybe other resources to close.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #679: DRILL-5098: Improving fault tolerance for connectio...

Posted by sudheeshkatkam <gi...@git.apache.org>.
Github user sudheeshkatkam commented on a diff in the pull request:

    https://github.com/apache/drill/pull/679#discussion_r92291348
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/client/DrillClient.java ---
    @@ -357,10 +357,53 @@ protected void afterExecute(final Runnable r, final Throwable t) {
             super.afterExecute(r, t);
           }
         };
    -    client = new UserClient(clientName, config, supportComplexTypes, allocator, eventLoopGroup, executor);
    -    logger.debug("Connecting to server {}:{}", endpoint.getAddress(), endpoint.getUserPort());
    -    connect(endpoint);
    -    connected = true;
    +
    +    // "tries" is max number of unique drillbit to try connecting until successfully connected to one of them
    +    final String connectTriesConf = (props != null) ? props.getProperty("tries", "5") : "5";
    +
    +    int connectTriesVal;
    +    try {
    +      connectTriesVal = Math.min(endpoints.size(), Integer.parseInt(connectTriesConf));
    +    } catch (NumberFormatException e) {
    +      throw new InvalidConnectionInfoException("Invalid tries value: " + connectTriesConf + " specified in " +
    +                                               "connection string");
    +    }
    +
    +    // If the value provided in the connection string is <=0 then override with 1 since we want to try connecting
    +    // at least once
    +    connectTriesVal = Math.max(1, connectTriesVal);
    +
    +    int triedEndpointIndex = 0;
    +    DrillbitEndpoint endpoint;
    +
    +    while (triedEndpointIndex < connectTriesVal) {
    +      client = new UserClient(clientName, config, supportComplexTypes, allocator, eventLoopGroup, executor);
    +      endpoint = endpoints.get(triedEndpointIndex);
    +      logger.debug("Connecting to server {}:{}", endpoint.getAddress(), endpoint.getUserPort());
    +
    +      try {
    +        connect(endpoint);
    +        connected = true;
    +        logger.info("Successfully connected to server {}:{}", endpoint.getAddress(), endpoint.getUserPort());
    +        break;
    +      } catch (InvalidConnectionInfoException ex) {
    +        logger.error("Connection to {}:{} failed with error {}. Not retrying anymore", endpoint.getAddress(),
    +                     endpoint.getUserPort(), ex.getMessage());
    +        throw ex;
    +      } catch (RpcException ex) {
    +        ++triedEndpointIndex;
    +        logger.error("Attempt {}: Failed to connect to server {}:{}", triedEndpointIndex, endpoint.getAddress(),
    +                     endpoint.getUserPort());
    +
    +        // Close the connection
    +        client.close();
    --- End diff --
    
    Is `UserClient#close` idempotent?
    
    Looks like `if (triedEndpointIndex == connectTriesVal)`, then `UserClient#close` is called twice, here and in `DrillClient#close`. The is unlike the previous catch clause i.e. `catch (InvalidConnectionInfoException ex)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #679: DRILL-5098: Improving fault tolerance for connectio...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/679


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #679: DRILL-5098: Improving fault tolerance for connectio...

Posted by sudheeshkatkam <gi...@git.apache.org>.
Github user sudheeshkatkam commented on a diff in the pull request:

    https://github.com/apache/drill/pull/679#discussion_r91796042
  
    --- Diff: exec/jdbc/src/test/java/org/apache/drill/jdbc/test/JdbcConnectTriesTestEmbeddedBits.java ---
    @@ -0,0 +1,162 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + * <p>
    + * http://www.apache.org/licenses/LICENSE-2.0
    + * <p>
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.drill.jdbc.test;
    +
    +import org.apache.drill.exec.rpc.InvalidConnectionInfoException;
    +import org.apache.drill.exec.rpc.RpcException;
    +import org.apache.drill.jdbc.Driver;
    +import org.apache.drill.jdbc.JdbcTestBase;
    +
    +import org.junit.Test;
    +
    +import java.sql.SQLException;
    +import java.sql.Connection;
    +
    +import java.util.concurrent.ExecutionException;
    +
    +import static junit.framework.Assert.assertNotNull;
    +import static junit.framework.TestCase.fail;
    +import static org.junit.Assert.assertNull;
    +import static org.junit.Assert.assertTrue;
    +
    +public class JdbcConnectTriesTestEmbeddedBits extends JdbcTestBase {
    +
    +  @Test
    +  public void testDirectConnectionConnectTriesEqualsDrillbitCount() throws SQLException {
    +    Connection connection = null;
    +    try {
    +      connection = new Driver().connect("jdbc:drill:drillbit=127.0.0.1:5000,127.0.0.1:5001;" + "tries=2",
    --- End diff --
    
    Create one driver and use across tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---