You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by drill <ga...@gmail.com> on 2018/08/20 12:47:46 UTC

Apache Drill High Availability using HAproxy

Hi Team,

Good Evening . I am Satish working as big data developer. I need your help regarding Drill high availability usinh Ha proxy load balancer.
Is Apache drill supports High availability if yes please let me know the process.


-Thanks,
Satish

Sent from Mail for Windows 10


Re: Apache Drill High Availability using HAproxy

Posted by John Omernik <jo...@omernik.com>.
This is a great topic, that I have run into running Drill on Apache Mesos
due to each of my bits having essentially a DNS load balancer. (One DNS
Name, multiple Drill bits IPs assigned to them).   That said, I've run into
a few issues and have a few workarounds. Note, I am talking about the REST
API here, not the other interfaces, I am not sure how that would work,
(perhaps the same)

So the best way, if you are using HAProxy, is to use sticky connections.
Essentially, when a user connects to HA PRoxy, the connection to the
backend Drillbit will stay sticky there until a timeout period or the
session is closed.  This should allow you to ensure the best user exp,
while keeping HA.  I am not sure how HAProxy balances things, however, with
a decent Drill cluster size, it shouldn't be an issue.

I didn't have HAProxy setup, and so what I did in my jupyter_drill module (
https://github.com/johnomernik/jupyter_drill) is at the application level,
prior connecting to Drill, I did a DNS lookup and grabbed the first IP
returned. Then I directly connected to that drill bit, for the the duration
of the session. It's not perfect, and I have not tested this at scale, but
it has worked on a small scale. I even used some python requests module
magic to use use the host name in the SSL verification even though I am
connecting by IP.

So a few options, if you already are looking at HAProxy, checking into the
sticky connections.


John


On Mon, Aug 20, 2018 at 1:01 PM, Paul Rogers <pa...@yahoo.com.invalid>
wrote:

> Hi Satish,
>
> You did not say if you are using HAProxy for the RESTful API or the native
> Drill RPC (as used by the Drill client, JDBC and ODBC.)
>
> To understand the use of proxies and load balancers, it is helpful to
> remember that Drill is a stateful SQL engine. Drill encourages the use of
> many stateful commands such as USE, CTTAS, and ALTER SESSION.
>
> Session state is lost when connecting to a new Drillbit, or reconnecting
> to the same Drillbit. Thus, a query that runs fine before the reconnect can
> fail afterwards.
>
> This issue is not unique to Drill; it is a common constraint of all
> old-school SQL engines.
>
> If state were not an issue, then the Drill client itself could handle HA.
> The client is given a list of ZK nodes. The client, on encountering a
> disconnect, could ask ZK for a new node and reconnect. Since ZK is HA, the
> client can also recover from a ZK node failure by trying another.
>
> We discussed this client-based HA approach multiple times, but each time,
> the SQL state has been a show-stopper.
>
> In short, the issue is not whether to use HAProxy to solve the problem;
> Drill can do it internally in the client. The issue is how to handle
> session state.
>
> A possible solution would be to store user session state in ZK so that we
> could re-establish the same logical session after a physical reconnection.
> In particular a unique session ID could be used to key connections to
> session state in ZK.
>
> Making this change would be a good contributor project: it involves
> detailed knowledge of how the Drill session and ZK state work, but is
> pretty isolated to just those specific areas.
> Thanks,
> - Paul
>
>
>
>     On Monday, August 20, 2018, 8:26:09 AM PDT, drill <
> ganesh.satish34@gmail.com> wrote:
>
>  Hi Team,
>
> Good Evening . I am Satish working as big data developer. I need your help
> regarding Drill high availability usinh Ha proxy load balancer.
> Is Apache drill supports High availability if yes please let me know the
> process.
>
>
> -Thanks,
> Satish
>
> Sent from Mail for Windows 10
>
>

Re: Apache Drill High Availability using HAproxy

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi Satish,

You did not say if you are using HAProxy for the RESTful API or the native Drill RPC (as used by the Drill client, JDBC and ODBC.)

To understand the use of proxies and load balancers, it is helpful to remember that Drill is a stateful SQL engine. Drill encourages the use of many stateful commands such as USE, CTTAS, and ALTER SESSION.

Session state is lost when connecting to a new Drillbit, or reconnecting to the same Drillbit. Thus, a query that runs fine before the reconnect can fail afterwards.

This issue is not unique to Drill; it is a common constraint of all old-school SQL engines.

If state were not an issue, then the Drill client itself could handle HA. The client is given a list of ZK nodes. The client, on encountering a disconnect, could ask ZK for a new node and reconnect. Since ZK is HA, the client can also recover from a ZK node failure by trying another.

We discussed this client-based HA approach multiple times, but each time, the SQL state has been a show-stopper.

In short, the issue is not whether to use HAProxy to solve the problem; Drill can do it internally in the client. The issue is how to handle session state.

A possible solution would be to store user session state in ZK so that we could re-establish the same logical session after a physical reconnection. In particular a unique session ID could be used to key connections to session state in ZK.

Making this change would be a good contributor project: it involves detailed knowledge of how the Drill session and ZK state work, but is pretty isolated to just those specific areas. 
Thanks,
- Paul

 

    On Monday, August 20, 2018, 8:26:09 AM PDT, drill <ga...@gmail.com> wrote:  
 
 Hi Team,

Good Evening . I am Satish working as big data developer. I need your help regarding Drill high availability usinh Ha proxy load balancer.
Is Apache drill supports High availability if yes please let me know the process.


-Thanks,
Satish

Sent from Mail for Windows 10