You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2011/09/02 00:36:27 UTC

Re: DataImportHandler using new connection on each query

: However, I tested this against a slower SQL Server and I saw 
: dramatically worse results. Instead of re-using their database, each of 
: the sub-entities is recreating a connection each time the query runs. 

are you seeing any specific errors logged before these new connections are 
created?

I don't *think* there's anything in the DIH JDBC/SQL code that causes it 
to timeout existing connections -- is it possible this is sometihng 
specific to the JDBC Driver you are using?  

Or maybe you are using the DIH "threads" option along with a JNDI/JDBC 
based pool of connections that is configured to create new Connections on 
demand, and with the fast DB it can reuse them but on the slow DB it does 
enough stuff in parallel to keep asking for new connections to be created?


If it's DIH creating new connections over and over then i'm pretty sure 
you should see an INFO level log message like this for each connection...

        LOG.info("Creating a connection for entity "
                + context.getEntityAttribute(DataImporter.NAME) + " with URL: "
                + url);

...are those messages different against you fast DB and your slow DB?

-Hoss

Re: DataImportHandler using new connection on each query

Posted by eks dev <ek...@googlemail.com>.
take care, "running 10 hours" != "idling 10 seconds" and trying again.
Those are different cases.

It is not dropping *used* connections (good to know it works that
good, thanks for reporting!), just not reusing connections more than
10 seconds idle



On Fri, Sep 2, 2011 at 10:26 PM, Gora Mohanty <go...@mimirtech.com> wrote:
> On Sat, Sep 3, 2011 at 1:38 AM, Shawn Heisey <so...@elyograg.org> wrote:
> [...]
>> I use DIH with MySQL.  When things are going well, a full rebuild will leave
>> connections open and active for over two hours.  This is the case with
>> 1.4.0, 1.4.1, 3.1.0, and 3.2.0.  Due to some kind of problem on the database
>> server, last night I had a rebuild going for more than 11 hours with no
>> problems, verified from the processlist on the server.
>
> Will second that. Have had DIH connections open to both
> mysql, and MS-SQL for 8-10h. Dropped connections could
> be traced to network issues, or some other exception.
>
> Regards,
> Gora
>

Re: DataImportHandler using new connection on each query

Posted by eks dev <ek...@yahoo.co.uk>.
watch out, "running 10 hours" != "idling 10 seconds" and trying again.
Those are different cases.

It is not dropping *used* connections (good to know it works that
good, thanks for reporting!), just not reusing connections more than
10 seconds idle



On Fri, Sep 2, 2011 at 10:26 PM, Gora Mohanty <go...@mimirtech.com> wrote:
> On Sat, Sep 3, 2011 at 1:38 AM, Shawn Heisey <so...@elyograg.org> wrote:
> [...]
>> I use DIH with MySQL.  When things are going well, a full rebuild will leave
>> connections open and active for over two hours.  This is the case with
>> 1.4.0, 1.4.1, 3.1.0, and 3.2.0.  Due to some kind of problem on the database
>> server, last night I had a rebuild going for more than 11 hours with no
>> problems, verified from the processlist on the server.
>
> Will second that. Have had DIH connections open to both
> mysql, and MS-SQL for 8-10h. Dropped connections could
> be traced to network issues, or some other exception.
>
> Regards,
> Gora
>

Re: DataImportHandler using new connection on each query

Posted by Gora Mohanty <go...@mimirtech.com>.
On Sat, Sep 3, 2011 at 1:38 AM, Shawn Heisey <so...@elyograg.org> wrote:
[...]
> I use DIH with MySQL.  When things are going well, a full rebuild will leave
> connections open and active for over two hours.  This is the case with
> 1.4.0, 1.4.1, 3.1.0, and 3.2.0.  Due to some kind of problem on the database
> server, last night I had a rebuild going for more than 11 hours with no
> problems, verified from the processlist on the server.

Will second that. Have had DIH connections open to both
mysql, and MS-SQL for 8-10h. Dropped connections could
be traced to network issues, or some other exception.

Regards,
Gora

Re: DataImportHandler using new connection on each query

Posted by Shawn Heisey <so...@elyograg.org>.
On 9/2/2011 1:59 PM, Chris Hostetter wrote:
> : I am not sure if current version has this, but  DIH used to reload
> : connections after some idle time
> :
> : if (currTime - connLastUsed>  CONN_TIME_OUT) {
> : 			synchronized (this) {
> : 				Connection tmpConn = factory.call();
> : 				closeConnection();
> : 				connLastUsed = System.currentTimeMillis();
> : 				return conn = tmpConn;
> : 			}
> :
> :
> : Where CONN_TIME_OUT = 10 seconds
>
> ...oh wow.  i saw the CONN_TIME_OUT constant but i thought (foolishly
> evidently) that CONN was "connect" as it a timeout on creating a
> connection, not a timeout on how long DIH is willing ot use a perfectly
> good connection.
>
> I honestly can't make heads or tails of why that code would exist.
>
> Noble? Shalin?  what's the point of throwing away a connection that's been
> in use for more then 10 seconds?

I use DIH with MySQL.  When things are going well, a full rebuild will 
leave connections open and active for over two hours.  This is the case 
with 1.4.0, 1.4.1, 3.1.0, and 3.2.0.  Due to some kind of problem on the 
database server, last night I had a rebuild going for more than 11 hours 
with no problems, verified from the processlist on the server.

Thanks,
Shawn


Re: DataImportHandler using new connection on each query

Posted by Chris Hostetter <ho...@fucit.org>.
: > Noble? Shalin?  what's the point of throwing away a connection that's been
: > in use for more then 10 seconds?

: Hoss, as others have noted, DIH throws away connections which have been idle
: for more than the timeout value (10 seconds). The jdbc standard way of
: checking for a valid connection is not implemented or incorrectly
: implemented by many drivers. So, either you can execute a query and get an
: exception and try to determine if the exception was a case of an invalid
: connection (which again is sometimes different from driver to driver) or
: take the easy way out and throw away connections idle for more than 10
: seconds, which is what we went for.

Hmmm...

a) at a minimum this seems like it should be a config option -- why punish 
people using "good" jdbc drivers?

b) you keep refering to this time out in relation to connections being 
*idle* longer then 10 seconds, but unless i'm missing something that's not 
what it's doing at all.  

The only time connLastUsed is assigned to is when getConnection() is 
called - so even if a connection has only been idle for 1 pico-second, it 
will still be closed/reopened if the total amount of time it was used 
before being idle was more then 1 second -- that was the scenerio 
described in the first message of this thread...

second 000: app starts
second 006: ResultSetIterator constructed on queryA
second 007: getConnection() called, conn initalized, connLastUsed = 007
   ... conn in use for a while while iterating over results...
second 099: done iterating over ResultSetIterator
second 100: ResultSetIterator constructed on queryB
second 101: getConnection() called again...

...at second #101, that connection has really only been idle for 2 
seconds, but connLastUsed hasn't been updated for 94 seconds, so it 
forces a new connection for no reason.

If the goal is to track how long the connection has been idle, shouldn't 
every method in ResultSetIterator update connLastUsed ?





-Hoss

Re: DataImportHandler using new connection on each query

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Sat, Sep 3, 2011 at 1:29 AM, Chris Hostetter <ho...@fucit.org>wrote:

>
> : I am not sure if current version has this, but  DIH used to reload
> : connections after some idle time
> :
> : if (currTime - connLastUsed > CONN_TIME_OUT) {
> :                       synchronized (this) {
> :                               Connection tmpConn = factory.call();
> :                               closeConnection();
> :                               connLastUsed = System.currentTimeMillis();
> :                               return conn = tmpConn;
> :                       }
> :
> :
> : Where CONN_TIME_OUT = 10 seconds
>
> ...oh wow.  i saw the CONN_TIME_OUT constant but i thought (foolishly
> evidently) that CONN was "connect" as it a timeout on creating a
> connection, not a timeout on how long DIH is willing ot use a perfectly
> good connection.
>
> I honestly can't make heads or tails of why that code would exist.
>
> Noble? Shalin?  what's the point of throwing away a connection that's been
> in use for more then 10 seconds?
>
>
Hoss, as others have noted, DIH throws away connections which have been idle
for more than the timeout value (10 seconds). The jdbc standard way of
checking for a valid connection is not implemented or incorrectly
implemented by many drivers. So, either you can execute a query and get an
exception and try to determine if the exception was a case of an invalid
connection (which again is sometimes different from driver to driver) or
take the easy way out and throw away connections idle for more than 10
seconds, which is what we went for.

-- 
Regards,
Shalin Shekhar Mangar.

Re: DataImportHandler using new connection on each query

Posted by Chris Hostetter <ho...@fucit.org>.
: I am not sure if current version has this, but  DIH used to reload
: connections after some idle time
: 
: if (currTime - connLastUsed > CONN_TIME_OUT) {
: 			synchronized (this) {
: 				Connection tmpConn = factory.call();
: 				closeConnection();
: 				connLastUsed = System.currentTimeMillis();
: 				return conn = tmpConn;
: 			}
: 
: 
: Where CONN_TIME_OUT = 10 seconds

...oh wow.  i saw the CONN_TIME_OUT constant but i thought (foolishly 
evidently) that CONN was "connect" as it a timeout on creating a 
connection, not a timeout on how long DIH is willing ot use a perfectly 
good connection.

I honestly can't make heads or tails of why that code would exist.

Noble? Shalin?  what's the point of throwing away a connection that's been 
in use for more then 10 seconds?



-Hoss

Re: DataImportHandler using new connection on each query

Posted by eks dev <ek...@yahoo.co.uk>.
I am not sure if current version has this, but  DIH used to reload
connections after some idle time

if (currTime - connLastUsed > CONN_TIME_OUT) {
			synchronized (this) {
				Connection tmpConn = factory.call();
				closeConnection();
				connLastUsed = System.currentTimeMillis();
				return conn = tmpConn;
			}


Where CONN_TIME_OUT = 10 seconds



On Fri, Sep 2, 2011 at 12:36 AM, Chris Hostetter
<ho...@fucit.org> wrote:
>
> : However, I tested this against a slower SQL Server and I saw
> : dramatically worse results. Instead of re-using their database, each of
> : the sub-entities is recreating a connection each time the query runs.
>
> are you seeing any specific errors logged before these new connections are
> created?
>
> I don't *think* there's anything in the DIH JDBC/SQL code that causes it
> to timeout existing connections -- is it possible this is sometihng
> specific to the JDBC Driver you are using?
>
> Or maybe you are using the DIH "threads" option along with a JNDI/JDBC
> based pool of connections that is configured to create new Connections on
> demand, and with the fast DB it can reuse them but on the slow DB it does
> enough stuff in parallel to keep asking for new connections to be created?
>
>
> If it's DIH creating new connections over and over then i'm pretty sure
> you should see an INFO level log message like this for each connection...
>
>        LOG.info("Creating a connection for entity "
>                + context.getEntityAttribute(DataImporter.NAME) + " with URL: "
>                + url);
>
> ...are those messages different against you fast DB and your slow DB?
>
> -Hoss
>