You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jasper Floor <ja...@m4n.nl> on 2012/06/14 15:21:18 UTC

DIH idle in transaction forever

Hi all,

It seems that DIH always holds two connections open to the database.
One of them is almost always 'idle in transaction'. It may sometimes
seem to do a little work but then it goes idle again.


datasource definition:
        <dataSource name="df-stream-store-ds"
jndiName="java:ext_solr_datafeeds_dba" type="JdbcDataSource"
autoCommit="false" batchSize="10000" />

We have a datasource defined in the jndi:
	<no-tx-datasource>
		<jndi-name>ext_solr_datafeeds_dba</jndi-name>
		<security-domain>ext_solr_datafeeds_dba_realm</security-domain>
		<connection-url>jdbc:postgresql://db1.live.mbuyu.nl/datafeeds</connection-url>
		<min-pool-size>0</min-pool-size>
		<max-pool-size>5</max-pool-size>
		<transaction-isolation>TRANSACTION_READ_COMMITTED</transaction-isolation>
		<driver-class>org.postgresql.Driver</driver-class>
		<blocking-timeout-millis>30000</blocking-timeout-millis>
		<idle-timeout-minutes>5</idle-timeout-minutes>
		<new-connection-sql>SELECT 1</new-connection-sql>
		<check-valid-connection-sql>SELECT 1</check-valid-connection-sql>
	</no-tx-datasource>


If we set autocommit to true then we get an OOM on indexing so that is
not an option.

Does anyone have any idea why this happens? I would guess that DIH
doesn't close the connection, but reading the code I can't be sure of
this. The ResultSet object should close itself once it reaches the
end.

mvg,
JAsper

Re: DIH idle in transaction forever

Posted by Jasper Floor <ja...@m4n.nl>.
Btw, I removed the batchSize but performance is better with
batchSize=10000. I haven't done further testing to see what the best
setting is, but the difference between setting it at 10000 and not
setting it is almost double the indexing time (~20 minutes vs ~37
minutes)

On Thu, Jun 14, 2012 at 4:49 PM, Jasper Floor <ja...@m4n.nl> wrote:
> Actually, the readOnly=true makes things worse.
> What it does (among other things) is:
>            c.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED);
>
> which leads to:
> Caused by: org.postgresql.util.PSQLException: Cannot change
> transaction isolation level in the middle of a transaction.
>
> because the connection is idle in transaction.
>
> I found this issue:
> https://issues.apache.org/jira/browse/SOLR-2045
>
> Patching DIH with the code they suggest seems to work.
>
> mvg,
> Jasper
>
> On Thu, Jun 14, 2012 at 4:36 PM, Dyer, James <Ja...@ingrambook.com> wrote:
>> Try readOnly="true" in the dataSource configuration.  This causes several defaults to get set in the JDBC connection, and often will solve problems like this. (see http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource)  Also, try a batch size of 0 to let your jdbc driver pick what it thinks is optimal.  This might be better than 10000.
>>
>> There is also an issue in that it doesn't explicitly close the resultset but relies on closing the connection to implicily close the child objects.  I know when I tried using DIH with Derby a while back this had at the least caused some log warnings, and it wouldn't work at all without readOnly=false.  Not sure abour PostgreSql.
>>
>> James Dyer
>> E-Commerce Systems
>> Ingram Content Group
>> (615) 213-4311
>>
>>
>> -----Original Message-----
>> From: Jasper Floor [mailto:jasper.floor@m4n.nl]
>> Sent: Thursday, June 14, 2012 8:21 AM
>> To: solr-user@lucene.apache.org
>> Subject: DIH idle in transaction forever
>>
>> Hi all,
>>
>> It seems that DIH always holds two connections open to the database.
>> One of them is almost always 'idle in transaction'. It may sometimes
>> seem to do a little work but then it goes idle again.
>>
>>
>> datasource definition:
>>        <dataSource name="df-stream-store-ds"
>> jndiName="java:ext_solr_datafeeds_dba" type="JdbcDataSource"
>> autoCommit="false" batchSize="10000" />
>>
>> We have a datasource defined in the jndi:
>>        <no-tx-datasource>
>>                <jndi-name>ext_solr_datafeeds_dba</jndi-name>
>>                <security-domain>ext_solr_datafeeds_dba_realm</security-domain>
>>                <connection-url>jdbc:postgresql://db1.live.mbuyu.nl/datafeeds</connection-url>
>>                <min-pool-size>0</min-pool-size>
>>                <max-pool-size>5</max-pool-size>
>>                <transaction-isolation>TRANSACTION_READ_COMMITTED</transaction-isolation>
>>                <driver-class>org.postgresql.Driver</driver-class>
>>                <blocking-timeout-millis>30000</blocking-timeout-millis>
>>                <idle-timeout-minutes>5</idle-timeout-minutes>
>>                <new-connection-sql>SELECT 1</new-connection-sql>
>>                <check-valid-connection-sql>SELECT 1</check-valid-connection-sql>
>>        </no-tx-datasource>
>>
>>
>> If we set autocommit to true then we get an OOM on indexing so that is
>> not an option.
>>
>> Does anyone have any idea why this happens? I would guess that DIH
>> doesn't close the connection, but reading the code I can't be sure of
>> this. The ResultSet object should close itself once it reaches the
>> end.
>>
>> mvg,
>> JAsper

Re: DIH idle in transaction forever

Posted by Jasper Floor <ja...@m4n.nl>.
Actually, the readOnly=true makes things worse.
What it does (among other things) is:
            c.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED);

which leads to:
Caused by: org.postgresql.util.PSQLException: Cannot change
transaction isolation level in the middle of a transaction.

because the connection is idle in transaction.

I found this issue:
https://issues.apache.org/jira/browse/SOLR-2045

Patching DIH with the code they suggest seems to work.

mvg,
Jasper

On Thu, Jun 14, 2012 at 4:36 PM, Dyer, James <Ja...@ingrambook.com> wrote:
> Try readOnly="true" in the dataSource configuration.  This causes several defaults to get set in the JDBC connection, and often will solve problems like this. (see http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource)  Also, try a batch size of 0 to let your jdbc driver pick what it thinks is optimal.  This might be better than 10000.
>
> There is also an issue in that it doesn't explicitly close the resultset but relies on closing the connection to implicily close the child objects.  I know when I tried using DIH with Derby a while back this had at the least caused some log warnings, and it wouldn't work at all without readOnly=false.  Not sure abour PostgreSql.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Jasper Floor [mailto:jasper.floor@m4n.nl]
> Sent: Thursday, June 14, 2012 8:21 AM
> To: solr-user@lucene.apache.org
> Subject: DIH idle in transaction forever
>
> Hi all,
>
> It seems that DIH always holds two connections open to the database.
> One of them is almost always 'idle in transaction'. It may sometimes
> seem to do a little work but then it goes idle again.
>
>
> datasource definition:
>        <dataSource name="df-stream-store-ds"
> jndiName="java:ext_solr_datafeeds_dba" type="JdbcDataSource"
> autoCommit="false" batchSize="10000" />
>
> We have a datasource defined in the jndi:
>        <no-tx-datasource>
>                <jndi-name>ext_solr_datafeeds_dba</jndi-name>
>                <security-domain>ext_solr_datafeeds_dba_realm</security-domain>
>                <connection-url>jdbc:postgresql://db1.live.mbuyu.nl/datafeeds</connection-url>
>                <min-pool-size>0</min-pool-size>
>                <max-pool-size>5</max-pool-size>
>                <transaction-isolation>TRANSACTION_READ_COMMITTED</transaction-isolation>
>                <driver-class>org.postgresql.Driver</driver-class>
>                <blocking-timeout-millis>30000</blocking-timeout-millis>
>                <idle-timeout-minutes>5</idle-timeout-minutes>
>                <new-connection-sql>SELECT 1</new-connection-sql>
>                <check-valid-connection-sql>SELECT 1</check-valid-connection-sql>
>        </no-tx-datasource>
>
>
> If we set autocommit to true then we get an OOM on indexing so that is
> not an option.
>
> Does anyone have any idea why this happens? I would guess that DIH
> doesn't close the connection, but reading the code I can't be sure of
> this. The ResultSet object should close itself once it reaches the
> end.
>
> mvg,
> JAsper

RE: DIH idle in transaction forever

Posted by "Dyer, James" <Ja...@ingrambook.com>.
Try readOnly="true" in the dataSource configuration.  This causes several defaults to get set in the JDBC connection, and often will solve problems like this. (see http://wiki.apache.org/solr/DataImportHandler#Configuring_JdbcDataSource)  Also, try a batch size of 0 to let your jdbc driver pick what it thinks is optimal.  This might be better than 10000.

There is also an issue in that it doesn't explicitly close the resultset but relies on closing the connection to implicily close the child objects.  I know when I tried using DIH with Derby a while back this had at the least caused some log warnings, and it wouldn't work at all without readOnly=false.  Not sure abour PostgreSql.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Jasper Floor [mailto:jasper.floor@m4n.nl] 
Sent: Thursday, June 14, 2012 8:21 AM
To: solr-user@lucene.apache.org
Subject: DIH idle in transaction forever

Hi all,

It seems that DIH always holds two connections open to the database.
One of them is almost always 'idle in transaction'. It may sometimes
seem to do a little work but then it goes idle again.


datasource definition:
        <dataSource name="df-stream-store-ds"
jndiName="java:ext_solr_datafeeds_dba" type="JdbcDataSource"
autoCommit="false" batchSize="10000" />

We have a datasource defined in the jndi:
	<no-tx-datasource>
		<jndi-name>ext_solr_datafeeds_dba</jndi-name>
		<security-domain>ext_solr_datafeeds_dba_realm</security-domain>
		<connection-url>jdbc:postgresql://db1.live.mbuyu.nl/datafeeds</connection-url>
		<min-pool-size>0</min-pool-size>
		<max-pool-size>5</max-pool-size>
		<transaction-isolation>TRANSACTION_READ_COMMITTED</transaction-isolation>
		<driver-class>org.postgresql.Driver</driver-class>
		<blocking-timeout-millis>30000</blocking-timeout-millis>
		<idle-timeout-minutes>5</idle-timeout-minutes>
		<new-connection-sql>SELECT 1</new-connection-sql>
		<check-valid-connection-sql>SELECT 1</check-valid-connection-sql>
	</no-tx-datasource>


If we set autocommit to true then we get an OOM on indexing so that is
not an option.

Does anyone have any idea why this happens? I would guess that DIH
doesn't close the connection, but reading the code I can't be sure of
this. The ResultSet object should close itself once it reaches the
end.

mvg,
JAsper