You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by James Turton <dz...@apache.org> on 2021/10/19 04:42:09 UTC

[DISCUSS] Being less eager about outbound JDBC connections

Hi devs

I'd like to propose a change to the defaults for our outbound connection 
pool management, at least for JDBC but perhaps ultimately wherever we 
can manage it.  Currently we are eager about initiating outbound JDBC 
connections, bringing up 10 per storage config per drillbit.  For 
example, if a user creates 3 storage configs pointing to a single DBMS 
(the configs differing in their DB path and credentials, say) on a 
cluster of 5 drillbits then we'll bring up 10x3x5 = 150 connections as 
soon as we can and try to keep them up permanently.  The fixed pool size 
of 10 is a default we picked up from HikariCP which surely set it with 
application servers in mind.

We've had a report from the field of a MySQL server declining to provide 
said 150 connections, leaving the Drill user unable to proceed.  
Additionally, as you can imagine, almost all 150 connections will be 
idle most of the time for typical Drill cluster workloads.  Furthermore, 
while connections pools are ubiquitous in the OLTP world they are rare 
in the OLAP world where the cost of creating and destroying them is 
negligible compared to the cost of a single user query, while the 
benefits of per-user access control, resource management and session 
management which they bring over shared pools are valuable.  Bringing 
these latter benefits to Drill's outbound JDBC connections is not in the 
scope of this email, the point made is in only "traditionally, OLAP 
environments have avoided connection pools because the losses far 
outweigh the gains".

In light of the above I suggest that we transition from eager to lazy 
outbound JDBC connections, more like Apache Spark (I'm told). I propose 
initially that we only change our *default* HikariCP configuration to 
maintain small, finitely scalable pools (e.g. baseline 1, up to 10) 
instead of fixed pools.  The HikariCP configuration is already 
overridable today for users that prefer the current eager connection 
behaviour.

James


Re: [DISCUSS] Being less eager about outbound JDBC connections

Posted by luoc <lu...@apache.org>.
  James, thanks for the work you did.

> 在 2021年10月20日,17:23,James Turton <ja...@somecomputer.xyz.invalid> 写道:
> 
> I went and looked at the other storage plugins.  Good news is that we were already being lazy about connecting in 11 of the 13 plugins I tested, the exceptions being storage-jdbc and storage-splunk. Storage-splunk connects eagerly to fetch Splunk indexes and storage-jdbc connects eagerly x 10 because of HikariCP.  I've fixed both cases, meaning that now all plugins (that I found) can be loaded even if the data source is not ready at the time.


Re: [DISCUSS] Being less eager about outbound JDBC connections

Posted by James Turton <ja...@somecomputer.xyz.INVALID>.
I went and looked at the other storage plugins.  Good news is that we 
were already being lazy about connecting in 11 of the 13 plugins I 
tested, the exceptions being storage-jdbc and storage-splunk. 
Storage-splunk connects eagerly to fetch Splunk indexes and storage-jdbc 
connects eagerly x 10 because of HikariCP.  I've fixed both cases, 
meaning that now all plugins (that I found) can be loaded even if the 
data source is not ready at the time.


On 2021/10/19 16:21, luoc wrote:
>    James, Is your idea related to the HikariCP pools ? What is difference JDBC connection of storage plugin that do not use the HikariCP ?
>
>> 在 2021年10月19日,20:37,James Turton <dz...@apache.org> 写道:
>>
>> HikariCP


Re: [DISCUSS] Being less eager about outbound JDBC connections

Posted by James Turton <dz...@apache.org>.
The *idea*, that of lazy outbound connections, can in principle be 
applied to any storage plugin: Mongo, Splunk, Druid, whatever.  And I 
think it is a good idea to apply as it is widely possible because it's 
efficient and scalable.

The *immediate proposal*, and the proof of concept I did, is just for 
the generic JDBC storage plugin (storage-jdbc) which is based on 
HikariCP (unless I've missed something?).

On 2021/10/19 16:21, luoc wrote:
>    James, Is your idea related to the HikariCP pools ? What is difference JDBC connection of storage plugin that do not use the HikariCP ?
>
>> 在 2021年10月19日,20:37,James Turton <dz...@apache.org> 写道:
>>
>> HikariCP


Re: [DISCUSS] Being less eager about outbound JDBC connections

Posted by luoc <lu...@apache.org>.
  James, Is your idea related to the HikariCP pools ? What is difference JDBC connection of storage plugin that do not use the HikariCP ?

> 在 2021年10月19日,20:37,James Turton <dz...@apache.org> 写道:
> 
> HikariCP


Re: [DISCUSS] Being less eager about outbound JDBC connections

Posted by Charles Givre <cg...@gmail.com>.
Nice!
-- C



> On Oct 19, 2021, at 8:36 AM, James Turton <dz...@apache.org> wrote:
> 
> Having kicked the tyres on this idea, I can report that it works nicely.  I went one step further and made the default idle pool size 0, rather than 1, which has a side benefit that Drill does not try to connect out when it starts up at all, only upon receiving the first query (and then HikariCP caches that connection for some amount of time).  The advantage here is that if Drill gets restarted in the middle of the night when some JDBC data source happens not to be available, that doesn't kick the storage config into the disabled state.
> 
> When I send in a rapid spate of queries, the HikariCP pool grows accordingly, up to the configured max.
> 
> On 2021/10/19 06:42, James Turton wrote:
>> Hi devs
>> 
>> I'd like to propose a change to the defaults for our outbound connection pool management, at least for JDBC but perhaps ultimately wherever we can manage it.  Currently we are eager about initiating outbound JDBC connections, bringing up 10 per storage config per drillbit.  For example, if a user creates 3 storage configs pointing to a single DBMS (the configs differing in their DB path and credentials, say) on a cluster of 5 drillbits then we'll bring up 10x3x5 = 150 connections as soon as we can and try to keep them up permanently.  The fixed pool size of 10 is a default we picked up from HikariCP which surely set it with application servers in mind.
>> 
>> We've had a report from the field of a MySQL server declining to provide said 150 connections, leaving the Drill user unable to proceed.  Additionally, as you can imagine, almost all 150 connections will be idle most of the time for typical Drill cluster workloads.  Furthermore, while connections pools are ubiquitous in the OLTP world they are rare in the OLAP world where the cost of creating and destroying them is negligible compared to the cost of a single user query, while the benefits of per-user access control, resource management and session management which they bring over shared pools are valuable.  Bringing these latter benefits to Drill's outbound JDBC connections is not in the scope of this email, the point made is in only "traditionally, OLAP environments have avoided connection pools because the losses far outweigh the gains".
>> 
>> In light of the above I suggest that we transition from eager to lazy outbound JDBC connections, more like Apache Spark (I'm told). I propose initially that we only change our *default* HikariCP configuration to maintain small, finitely scalable pools (e.g. baseline 1, up to 10) instead of fixed pools.  The HikariCP configuration is already overridable today for users that prefer the current eager connection behaviour.
>> 
>> James
>> 
> 


Re: [DISCUSS] Being less eager about outbound JDBC connections

Posted by Charles Givre <cg...@gmail.com>.
Hey James, 
I really like this approach.  My sense is that this will be very helpful for improving the stability of Drill. 
Best,
— C

> On Oct 19, 2021, at 8:36 AM, James Turton <dz...@apache.org> wrote:
> 
> Having kicked the tyres on this idea, I can report that it works nicely.  I went one step further and made the default idle pool size 0, rather than 1, which has a side benefit that Drill does not try to connect out when it starts up at all, only upon receiving the first query (and then HikariCP caches that connection for some amount of time).  The advantage here is that if Drill gets restarted in the middle of the night when some JDBC data source happens not to be available, that doesn't kick the storage config into the disabled state.
> 
> When I send in a rapid spate of queries, the HikariCP pool grows accordingly, up to the configured max.
> 
> On 2021/10/19 06:42, James Turton wrote:
>> Hi devs
>> 
>> I'd like to propose a change to the defaults for our outbound connection pool management, at least for JDBC but perhaps ultimately wherever we can manage it.  Currently we are eager about initiating outbound JDBC connections, bringing up 10 per storage config per drillbit.  For example, if a user creates 3 storage configs pointing to a single DBMS (the configs differing in their DB path and credentials, say) on a cluster of 5 drillbits then we'll bring up 10x3x5 = 150 connections as soon as we can and try to keep them up permanently.  The fixed pool size of 10 is a default we picked up from HikariCP which surely set it with application servers in mind.
>> 
>> We've had a report from the field of a MySQL server declining to provide said 150 connections, leaving the Drill user unable to proceed.  Additionally, as you can imagine, almost all 150 connections will be idle most of the time for typical Drill cluster workloads.  Furthermore, while connections pools are ubiquitous in the OLTP world they are rare in the OLAP world where the cost of creating and destroying them is negligible compared to the cost of a single user query, while the benefits of per-user access control, resource management and session management which they bring over shared pools are valuable.  Bringing these latter benefits to Drill's outbound JDBC connections is not in the scope of this email, the point made is in only "traditionally, OLAP environments have avoided connection pools because the losses far outweigh the gains".
>> 
>> In light of the above I suggest that we transition from eager to lazy outbound JDBC connections, more like Apache Spark (I'm told). I propose initially that we only change our *default* HikariCP configuration to maintain small, finitely scalable pools (e.g. baseline 1, up to 10) instead of fixed pools.  The HikariCP configuration is already overridable today for users that prefer the current eager connection behaviour.
>> 
>> James
>> 
> 


Re: [DISCUSS] Being less eager about outbound JDBC connections

Posted by James Turton <dz...@apache.org>.
Having kicked the tyres on this idea, I can report that it works 
nicely.  I went one step further and made the default idle pool size 0, 
rather than 1, which has a side benefit that Drill does not try to 
connect out when it starts up at all, only upon receiving the first 
query (and then HikariCP caches that connection for some amount of 
time).  The advantage here is that if Drill gets restarted in the middle 
of the night when some JDBC data source happens not to be available, 
that doesn't kick the storage config into the disabled state.

When I send in a rapid spate of queries, the HikariCP pool grows 
accordingly, up to the configured max.

On 2021/10/19 06:42, James Turton wrote:
> Hi devs
>
> I'd like to propose a change to the defaults for our outbound 
> connection pool management, at least for JDBC but perhaps ultimately 
> wherever we can manage it.  Currently we are eager about initiating 
> outbound JDBC connections, bringing up 10 per storage config per 
> drillbit.  For example, if a user creates 3 storage configs pointing 
> to a single DBMS (the configs differing in their DB path and 
> credentials, say) on a cluster of 5 drillbits then we'll bring up 
> 10x3x5 = 150 connections as soon as we can and try to keep them up 
> permanently.  The fixed pool size of 10 is a default we picked up from 
> HikariCP which surely set it with application servers in mind.
>
> We've had a report from the field of a MySQL server declining to 
> provide said 150 connections, leaving the Drill user unable to 
> proceed.  Additionally, as you can imagine, almost all 150 connections 
> will be idle most of the time for typical Drill cluster workloads.  
> Furthermore, while connections pools are ubiquitous in the OLTP world 
> they are rare in the OLAP world where the cost of creating and 
> destroying them is negligible compared to the cost of a single user 
> query, while the benefits of per-user access control, resource 
> management and session management which they bring over shared pools 
> are valuable.  Bringing these latter benefits to Drill's outbound JDBC 
> connections is not in the scope of this email, the point made is in 
> only "traditionally, OLAP environments have avoided connection pools 
> because the losses far outweigh the gains".
>
> In light of the above I suggest that we transition from eager to lazy 
> outbound JDBC connections, more like Apache Spark (I'm told). I 
> propose initially that we only change our *default* HikariCP 
> configuration to maintain small, finitely scalable pools (e.g. 
> baseline 1, up to 10) instead of fixed pools.  The HikariCP 
> configuration is already overridable today for users that prefer the 
> current eager connection behaviour.
>
> James
>