You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by Hrafn Malmquist <hr...@gmail.com> on 2021/01/01 17:23:48 UTC

Re: [dbcp] Optimal defaults for DSpace

>> Hi Gary
>>
>> I have and they don't know. Therefore, we are kind of looking at this
>> afresh.
>>
>> For a web server like this, where there are usually lots of reads and not
>> many writes.
>>
>
>DBCP is agnostic to reading vs. writing, that all happens in SQL as I am
>sure you know ;-)

When I think about it it's obvious that it doesn't matter what happens
during the connection session.
The fact that I offer that piece of useless information only shows how much
I am struggling to understand what should guide a decision for optimal
defaults.

> Does having defaults:
>> maxWaitMillis = 5000,
>> maxIdle = 10,
>> maxTotal = 30
>>
>> Make more sense than the DCP2 defaults?
>>
>
>Only if you think so, I'm sorry I can't offer any guidelines for your
>application.

I appreciate that you are hesitant to offer generic advice. Nonetheless you
are clearly an authority in this field being the main committer to the
DBCP2 codebase.

For Tomcat 8 it is explicitly recommended that maxWaitMillis not be set to
lower than 10 seconds, preferably 10-15 seconds [1]

Consider Deep Blue, the DSpace institutional repository for the University
of Michigan [2] Taken at face value, it is likely that this web site gets
high traffic as it is a relatively popular institution with a lot of
content (130k > items).

It is likely of course that the db administrator running it knows enough
about connection pooling to calibrate the settings to something more
sensible but as I am sure you understand it would be better if the defaults
that come with DSpace are as close to optimal settings as possible.

Correct me if I'm wrong, my understanding is that since maxWaitMillis
causes exceptions to be raised on expiry, a codebase that uses a relatively
short setting would need to be defensively coded to handle exceptions very
well. Considering the fragmentary and decentralized way that DSpace has
been developed (the classic open source way) I think it is fair to say that
the codebase isn't very resilient. Therefore, not least in light of the
abovementioned recommendations for Tomcat settings, the optimal generic
setting for maxWaitMillis is at least 10000.

1 -
https://tomcat.apache.org/tomcat-8.0-doc/jndi-datasource-examples-howto.html#Intermittent_Database_Connection_Failures
2 - https://deepblue.lib.umich.edu/

On Thu, Dec 31, 2020 at 5:41 PM Gary Gregory <ga...@gmail.com> wrote:

> On Thu, Dec 31, 2020 at 11:55 AM Hrafn Malmquist <
> hrafn.malmquist@gmail.com>
> wrote:
>
> > Hi Gary
> >
> > I have and they don't know. Therefore, we are kind of looking at this
> > afresh.
> >
> > For a web server like this, where there are usually lots of reads and not
> > many writes.
> >
>
> DBCP is agnostic to reading vs. writing, that all happens in SQL as I am
> sure you know ;-)
>
>
> > Does having defaults:
> > maxWaitMillis = 5000,
> > maxIdle = 10,
> > maxTotal = 30
> >
> > Make more sense than the DCP2 defaults?
> >
>
> Only if you think so, I'm sorry I can't offer any guidelines for your
> application.
>
>
> >
> > maxWaitMillis = indefinitely,
> > maxIdle = 8,
> > maxTotal = 8
> >
> > Perhaps having higher maxIdle and maxTotal can't hurt as these are
> maximum
> > bounds but the unusually (right?) low maxWaitMillis seems like it could
> > easily cause problems, right?
> >
>
> Maybe some else here has generic advice for you but I do not, as each
> customer I've seen at work all have highly variable needs, configurations,
> and operating environments, everything from Linux, Windows, to IBM i/Series
> and z/Series.
>
>
> > Also, these are the only properties wrapped into the configurable DSpace
> > configuration. What other properties are those most commonly tweaked from
> > DBCP2 defaults?
> >
>
> Again, this is highly dependent on your use case. You'll have to experiment
> within your operating envirnoment.
>
> Gary
>
>
> > Happy new year
> > Hrafn
> >
> > On Tue, Dec 29, 2020 at 2:31 PM Gary Gregory <ga...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I think you will have to ask the Dspace committers why they chose those
> > > specific values.
> > >
> > > Gary
> > >
> > > On Mon, Dec 21, 2020, 00:27 Hrafn Malmquist <hrafn.malmquist@gmail.com
> >
> > > wrote:
> > >
> > > > Hi Gary
> > > >
> > > > Thanks for taking the time to respond.
> > > >
> > > > I hope you can bear with me as I am still learning about database
> > > > connection pooling.
> > > >
> > > > Perhaps I did not ask the question correctly. I am not asking about a
> > > site
> > > > specific setup but rather what defaults should be shipped with the
> > > > software. I am part of the minor version release team.
> > > >
> > > > Currently, the default setup is a DBCP2 v. 2.1.1 connection pool with
> > > > only maxWaitMillis,
> > > > maxIdle and maxTotal configurable in the DSpace configuration
> settings
> > > and
> > > > the default values for these settings set to 5000, 10 and 30
> > > respectively.
> > > > It's unclear why these defaults were chosen to begin with, git blame
> > > shows
> > > > they were chosen back in 2015. I don't think a lot of thought went
> into
> > > > choosing 1) which parameters should be configurable nor 2) what their
> > > > defaults should be (or why they should differ from DBCP2 defaults).
> > > >
> > > > DSpace repositories are run by higher education institutions and all
> > > sorts
> > > > of institutions and organisations involved in research, for instance
> > the
> > > > Smithsonian (https://repository.si.edu/). Therefore, although the
> vast
> > > > majority of instances are run by small institutions that get little
> > > > traffic, others are likely to receive relatively heavy traffic, from
> > > users
> > > > and crawlers.
> > > >
> > > > So the idea is to ask the experts what parameters should be
> > configurable
> > > > for the average repository admin, keeping in mind that the aim is for
> > > > installation and setup to be simple (in effect, what are the "main"
> > > > parameters likely to need tweaking) and what should the
> out-of-the-box
> > > > defaults be (if at all different from the DBCP2 defaults).
> > > >
> > > > I am particularly surprised at the low maxWaitMillis chosen. Is that
> > not
> > > > likely to cause problems for high traffic sites?
> > > >
> > > > Best regards, Hrafn
> > > >
> > > >
> > > > [1] :
> > > >
> > > >
> > >
> >
> https://github.com/DSpace/DSpace/blob/250c87dc1604c34e2a963b6804163c73278e9ff7/dspace/config/spring/api/core-hibernate.xml#L41-L48
> > > >
> > > > [2] :
> > > >
> > > >
> > >
> >
> https://github.com/DSpace/DSpace/blob/250c87dc1604c34e2a963b6804163c73278e9ff7/dspace/config/dspace.cfg#L77-L86
> > > >
> > > > On Sun, Dec 20, 2020 at 6:40 PM Gary Gregory <garydgregory@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Each new DBCP release brings fixes, additions,  and other updates,
> as
> > > you
> > > > > can read in the release notes.
> > > > >
> > > > > How to best configure DBCP for any given combination of JDBC
> driver,
> > > its
> > > > > database, and application will be quite variable, which is somewhat
> > out
> > > > of
> > > > > scope here IMO.
> > > > >
> > > > > Gary
> > > > >
> > > > > On Fri, Dec 18, 2020, 11:15 Hrafn Malmquist <
> > hrafn.malmquist@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Good day
> > > > > >
> > > > > > I'm wondering what are optimal defaults for DSpace, open source
> > > digital
> > > > > > repository software aimed especially at  academic, non-profit,
> and
> > > > > > commercial organizations (see https://duraspace.org/dspace/).
> > > > > >
> > > > > > DSpace supports both Postgres and Oracle and recommends Tomcat,
> > Jetty
> > > > or
> > > > > > Caucho Resin. I suspect 9/10 installations use Tomcat.
> > > > > >
> > > > > > DSpace comes packaged with Apache Commons DCBP 2.1.1. DSpace only
> > > > > > configures three configurations for DBCP2 using non-default
> > settings.
> > > > > (see:
> > > > > > [1] and [2])
> > > > > >
> > > > > > These are
> > > > > > maxTotal = 30
> > > > > > maxIdle = 10
> > > > > > maxWaitMillis = 5000
> > > > > >
> > > > > > I am not sure what reasoning is behind the choice of these
> > > > configuration
> > > > > > settings. DSpace is used by all sorts of institutions, some
> > receiving
> > > > > very
> > > > > > high traffic. My guess is that using the DBCP2 defaults is
> > > recommended.
> > > > > My
> > > > > > question is, is this a good default configuration? Should there
> be
> > > more
> > > > > > configuration configurable by DSpace users in the DSpace config?
> > > There
> > > > > have
> > > > > > been reports of the database not being reachable because of too
> > many
> > > > idle
> > > > > > connections. According to one doc [3] maxWaitMillis should be at
> a
> > > > > > minimum of 10000 ms if I understand correctly.
> > > > > >
> > > > > > Also, I assume there are benefits to upgrading the DBCP2
> dependency
> > > to
> > > > > the
> > > > > > most recent version, 2.8.0. I'm not sure what the major benefits
> > are
> > > > > > though. I can see v. 2.5.0 only runs on Java 8.
> > > > > >
> > > > > > [1] -
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/DSpace/DSpace/blob/755f0732aeea7dd1449830593caa54d77890e5bd/dspace/config/local.cfg.EXAMPLE#L88-L99
> > > > > > [2] -
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/DSpace/DSpace/blob/755f0732aeea7dd1449830593caa54d77890e5bd/dspace/config/spring/api/core-hibernate.xml#L46-L48
> > > > > > [3] -
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://tomcat.apache.org/tomcat-8.0-doc/jndi-datasource-examples-howto.html#Intermittent_Database_Connection_Failures
> > > > > >
> > > > >
> > > >
> > >
> >
>