You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by James Turton <dz...@apache.org> on 2021/10/19 04:42:09 UTC
[DISCUSS] Being less eager about outbound JDBC connections
Hi devs
I'd like to propose a change to the defaults for our outbound connection
pool management, at least for JDBC but perhaps ultimately wherever we
can manage it. Currently we are eager about initiating outbound JDBC
connections, bringing up 10 per storage config per drillbit. For
example, if a user creates 3 storage configs pointing to a single DBMS
(the configs differing in their DB path and credentials, say) on a
cluster of 5 drillbits then we'll bring up 10x3x5 = 150 connections as
soon as we can and try to keep them up permanently. The fixed pool size
of 10 is a default we picked up from HikariCP which surely set it with
application servers in mind.
We've had a report from the field of a MySQL server declining to provide
said 150 connections, leaving the Drill user unable to proceed.
Additionally, as you can imagine, almost all 150 connections will be
idle most of the time for typical Drill cluster workloads. Furthermore,
while connections pools are ubiquitous in the OLTP world they are rare
in the OLAP world where the cost of creating and destroying them is
negligible compared to the cost of a single user query, while the
benefits of per-user access control, resource management and session
management which they bring over shared pools are valuable. Bringing
these latter benefits to Drill's outbound JDBC connections is not in the
scope of this email, the point made is in only "traditionally, OLAP
environments have avoided connection pools because the losses far
outweigh the gains".
In light of the above I suggest that we transition from eager to lazy
outbound JDBC connections, more like Apache Spark (I'm told). I propose
initially that we only change our *default* HikariCP configuration to
maintain small, finitely scalable pools (e.g. baseline 1, up to 10)
instead of fixed pools. The HikariCP configuration is already
overridable today for users that prefer the current eager connection
behaviour.
James
Re: [DISCUSS] Being less eager about outbound JDBC connections
Posted by luoc <lu...@apache.org>.
James, thanks for the work you did.
> 在 2021年10月20日,17:23,James Turton <ja...@somecomputer.xyz.invalid> 写道:
>
> I went and looked at the other storage plugins. Good news is that we were already being lazy about connecting in 11 of the 13 plugins I tested, the exceptions being storage-jdbc and storage-splunk. Storage-splunk connects eagerly to fetch Splunk indexes and storage-jdbc connects eagerly x 10 because of HikariCP. I've fixed both cases, meaning that now all plugins (that I found) can be loaded even if the data source is not ready at the time.
Re: [DISCUSS] Being less eager about outbound JDBC connections
Posted by James Turton <ja...@somecomputer.xyz.INVALID>.
I went and looked at the other storage plugins. Good news is that we
were already being lazy about connecting in 11 of the 13 plugins I
tested, the exceptions being storage-jdbc and storage-splunk.
Storage-splunk connects eagerly to fetch Splunk indexes and storage-jdbc
connects eagerly x 10 because of HikariCP. I've fixed both cases,
meaning that now all plugins (that I found) can be loaded even if the
data source is not ready at the time.
On 2021/10/19 16:21, luoc wrote:
> James, Is your idea related to the HikariCP pools ? What is difference JDBC connection of storage plugin that do not use the HikariCP ?
>
>> 在 2021年10月19日,20:37,James Turton <dz...@apache.org> 写道:
>>
>> HikariCP
Re: [DISCUSS] Being less eager about outbound JDBC connections
Posted by James Turton <dz...@apache.org>.
The *idea*, that of lazy outbound connections, can in principle be
applied to any storage plugin: Mongo, Splunk, Druid, whatever. And I
think it is a good idea to apply as it is widely possible because it's
efficient and scalable.
The *immediate proposal*, and the proof of concept I did, is just for
the generic JDBC storage plugin (storage-jdbc) which is based on
HikariCP (unless I've missed something?).
On 2021/10/19 16:21, luoc wrote:
> James, Is your idea related to the HikariCP pools ? What is difference JDBC connection of storage plugin that do not use the HikariCP ?
>
>> 在 2021年10月19日,20:37,James Turton <dz...@apache.org> 写道:
>>
>> HikariCP
Re: [DISCUSS] Being less eager about outbound JDBC connections
Posted by luoc <lu...@apache.org>.
James, Is your idea related to the HikariCP pools ? What is difference JDBC connection of storage plugin that do not use the HikariCP ?
> 在 2021年10月19日,20:37,James Turton <dz...@apache.org> 写道:
>
> HikariCP
Re: [DISCUSS] Being less eager about outbound JDBC connections
Posted by Charles Givre <cg...@gmail.com>.
Nice!
-- C
> On Oct 19, 2021, at 8:36 AM, James Turton <dz...@apache.org> wrote:
>
> Having kicked the tyres on this idea, I can report that it works nicely. I went one step further and made the default idle pool size 0, rather than 1, which has a side benefit that Drill does not try to connect out when it starts up at all, only upon receiving the first query (and then HikariCP caches that connection for some amount of time). The advantage here is that if Drill gets restarted in the middle of the night when some JDBC data source happens not to be available, that doesn't kick the storage config into the disabled state.
>
> When I send in a rapid spate of queries, the HikariCP pool grows accordingly, up to the configured max.
>
> On 2021/10/19 06:42, James Turton wrote:
>> Hi devs
>>
>> I'd like to propose a change to the defaults for our outbound connection pool management, at least for JDBC but perhaps ultimately wherever we can manage it. Currently we are eager about initiating outbound JDBC connections, bringing up 10 per storage config per drillbit. For example, if a user creates 3 storage configs pointing to a single DBMS (the configs differing in their DB path and credentials, say) on a cluster of 5 drillbits then we'll bring up 10x3x5 = 150 connections as soon as we can and try to keep them up permanently. The fixed pool size of 10 is a default we picked up from HikariCP which surely set it with application servers in mind.
>>
>> We've had a report from the field of a MySQL server declining to provide said 150 connections, leaving the Drill user unable to proceed. Additionally, as you can imagine, almost all 150 connections will be idle most of the time for typical Drill cluster workloads. Furthermore, while connections pools are ubiquitous in the OLTP world they are rare in the OLAP world where the cost of creating and destroying them is negligible compared to the cost of a single user query, while the benefits of per-user access control, resource management and session management which they bring over shared pools are valuable. Bringing these latter benefits to Drill's outbound JDBC connections is not in the scope of this email, the point made is in only "traditionally, OLAP environments have avoided connection pools because the losses far outweigh the gains".
>>
>> In light of the above I suggest that we transition from eager to lazy outbound JDBC connections, more like Apache Spark (I'm told). I propose initially that we only change our *default* HikariCP configuration to maintain small, finitely scalable pools (e.g. baseline 1, up to 10) instead of fixed pools. The HikariCP configuration is already overridable today for users that prefer the current eager connection behaviour.
>>
>> James
>>
>
Re: [DISCUSS] Being less eager about outbound JDBC connections
Posted by Charles Givre <cg...@gmail.com>.
Hey James,
I really like this approach. My sense is that this will be very helpful for improving the stability of Drill.
Best,
— C
> On Oct 19, 2021, at 8:36 AM, James Turton <dz...@apache.org> wrote:
>
> Having kicked the tyres on this idea, I can report that it works nicely. I went one step further and made the default idle pool size 0, rather than 1, which has a side benefit that Drill does not try to connect out when it starts up at all, only upon receiving the first query (and then HikariCP caches that connection for some amount of time). The advantage here is that if Drill gets restarted in the middle of the night when some JDBC data source happens not to be available, that doesn't kick the storage config into the disabled state.
>
> When I send in a rapid spate of queries, the HikariCP pool grows accordingly, up to the configured max.
>
> On 2021/10/19 06:42, James Turton wrote:
>> Hi devs
>>
>> I'd like to propose a change to the defaults for our outbound connection pool management, at least for JDBC but perhaps ultimately wherever we can manage it. Currently we are eager about initiating outbound JDBC connections, bringing up 10 per storage config per drillbit. For example, if a user creates 3 storage configs pointing to a single DBMS (the configs differing in their DB path and credentials, say) on a cluster of 5 drillbits then we'll bring up 10x3x5 = 150 connections as soon as we can and try to keep them up permanently. The fixed pool size of 10 is a default we picked up from HikariCP which surely set it with application servers in mind.
>>
>> We've had a report from the field of a MySQL server declining to provide said 150 connections, leaving the Drill user unable to proceed. Additionally, as you can imagine, almost all 150 connections will be idle most of the time for typical Drill cluster workloads. Furthermore, while connections pools are ubiquitous in the OLTP world they are rare in the OLAP world where the cost of creating and destroying them is negligible compared to the cost of a single user query, while the benefits of per-user access control, resource management and session management which they bring over shared pools are valuable. Bringing these latter benefits to Drill's outbound JDBC connections is not in the scope of this email, the point made is in only "traditionally, OLAP environments have avoided connection pools because the losses far outweigh the gains".
>>
>> In light of the above I suggest that we transition from eager to lazy outbound JDBC connections, more like Apache Spark (I'm told). I propose initially that we only change our *default* HikariCP configuration to maintain small, finitely scalable pools (e.g. baseline 1, up to 10) instead of fixed pools. The HikariCP configuration is already overridable today for users that prefer the current eager connection behaviour.
>>
>> James
>>
>
Re: [DISCUSS] Being less eager about outbound JDBC connections
Posted by James Turton <dz...@apache.org>.
Having kicked the tyres on this idea, I can report that it works
nicely. I went one step further and made the default idle pool size 0,
rather than 1, which has a side benefit that Drill does not try to
connect out when it starts up at all, only upon receiving the first
query (and then HikariCP caches that connection for some amount of
time). The advantage here is that if Drill gets restarted in the middle
of the night when some JDBC data source happens not to be available,
that doesn't kick the storage config into the disabled state.
When I send in a rapid spate of queries, the HikariCP pool grows
accordingly, up to the configured max.
On 2021/10/19 06:42, James Turton wrote:
> Hi devs
>
> I'd like to propose a change to the defaults for our outbound
> connection pool management, at least for JDBC but perhaps ultimately
> wherever we can manage it. Currently we are eager about initiating
> outbound JDBC connections, bringing up 10 per storage config per
> drillbit. For example, if a user creates 3 storage configs pointing
> to a single DBMS (the configs differing in their DB path and
> credentials, say) on a cluster of 5 drillbits then we'll bring up
> 10x3x5 = 150 connections as soon as we can and try to keep them up
> permanently. The fixed pool size of 10 is a default we picked up from
> HikariCP which surely set it with application servers in mind.
>
> We've had a report from the field of a MySQL server declining to
> provide said 150 connections, leaving the Drill user unable to
> proceed. Additionally, as you can imagine, almost all 150 connections
> will be idle most of the time for typical Drill cluster workloads.
> Furthermore, while connections pools are ubiquitous in the OLTP world
> they are rare in the OLAP world where the cost of creating and
> destroying them is negligible compared to the cost of a single user
> query, while the benefits of per-user access control, resource
> management and session management which they bring over shared pools
> are valuable. Bringing these latter benefits to Drill's outbound JDBC
> connections is not in the scope of this email, the point made is in
> only "traditionally, OLAP environments have avoided connection pools
> because the losses far outweigh the gains".
>
> In light of the above I suggest that we transition from eager to lazy
> outbound JDBC connections, more like Apache Spark (I'm told). I
> propose initially that we only change our *default* HikariCP
> configuration to maintain small, finitely scalable pools (e.g.
> baseline 1, up to 10) instead of fixed pools. The HikariCP
> configuration is already overridable today for users that prefer the
> current eager connection behaviour.
>
> James
>