You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shane Perry <th...@gmail.com> on 2011/01/07 20:40:51 UTC

DIH - Closing ResultSet in JdbcDataSource

Hi,

I am in the process of migrating our system from Postgres 8.4 to Solr
1.4.1.  Our system is fairly complex and as a result, I have had to define
19 base entities in the data-config.xml definition file.  Each of these
entities executes 5 queries.  When doing a full-import, as each entity
completes, the server hosting Postgres shows 5 "idle in transaction" for the
entity.

In digging through the code, I found that the JdbcDataSource wraps the
ResultSet object in a custom ResultSetIterator object, leaving the ResultSet
open.  Walking through the code I can't find a close() call anywhere on the
ResultSet.  I believe this results in the "idle in transaction" processes.

Am I off base here?  I'm not sure what the overall implications are of the
"idle in transaction" processes, but is there a way I can get around the
issue without importing each entity manually?  Any feedback would be greatly
appreciated.

Thanks in advance,

Shane

Re: DIH - Closing ResultSet in JdbcDataSource

Posted by Adam Estrada <es...@gmail.com>.
This is my configuration which seems to work just fine.

<?xml version="1.0" encoding="utf-8" ?>
<dataConfig>
  <dataSource type="JdbcDataSource"
       name="DBImport"
       driver="net.sourceforge.jtds.jdbc.Driver"

url="jdbc:jtds:sqlserver://localhost;databaseName=50_DEV;responseBuffering=adaptive;"
       user="test"
       password="test"
       onError="skip"/>
  <document>

>From there it's just a matter of running the select statement and mapping it
against the correct fields in your index.

Adam

On Fri, Jan 7, 2011 at 2:40 PM, Shane Perry <th...@gmail.com> wrote:

> Hi,
>
> I am in the process of migrating our system from Postgres 8.4 to Solr
> 1.4.1.  Our system is fairly complex and as a result, I have had to define
> 19 base entities in the data-config.xml definition file.  Each of these
> entities executes 5 queries.  When doing a full-import, as each entity
> completes, the server hosting Postgres shows 5 "idle in transaction" for
> the
> entity.
>
> In digging through the code, I found that the JdbcDataSource wraps the
> ResultSet object in a custom ResultSetIterator object, leaving the
> ResultSet
> open.  Walking through the code I can't find a close() call anywhere on the
> ResultSet.  I believe this results in the "idle in transaction" processes.
>
> Am I off base here?  I'm not sure what the overall implications are of the
> "idle in transaction" processes, but is there a way I can get around the
> issue without importing each entity manually?  Any feedback would be
> greatly
> appreciated.
>
> Thanks in advance,
>
> Shane
>

Re: DIH - Closing ResultSet in JdbcDataSource

Posted by Shane Perry <th...@gmail.com>.
I have found where a root entity has completed processing and added the
logic to clear the entity's cache at that point (didn't change any of the
logic for clearing all entity caches once the import has completed).  I have
also created an enhancement request found at
https://issues.apache.org/jira/browse/SOLR-2313.

On Tue, Jan 11, 2011 at 2:54 PM, Shane Perry <th...@gmail.com> wrote:

> By placing some strategic debug messages, I have found that the JDBC
> connections are not being closed until all <entity> elements have been
> processed (in the entire config file).  A simplified example would be:
>
> <dataConfig>
>   <dataSource name="ds1" driver="org.postgresql.Driver"
> url="jdbc:postgresql://localhost:5432/db1" user="..." password="..." />
>   <dataSource name="ds2" driver="org.postgresql.Driver"
> url="jdbc:postgresql://localhost:5432/db2" user="..." password="..." />
>
>   <document>
>     <entity name="entity1" datasource="ds1" ...>
>       ... field list ...
>       <entity name="entity1a" datasource="ds1" ...>
>         ... field list ...
>       </entity>
>    </entity>
>     <entity name="entity2" datasource="ds2" ...>
>       ... field list ...
>       <entity name="entity2a" datasource="ds2" ...>
>         ... field list ...
>       </entity>
>    </entity>
>   </document>
> </dataConfig>
>
> The behavior is:
>
> JDBC connection opened for entity1 and entity1a - Applicable queries run
> and ResultSet objects processed
> All open ResultSet and Statement objects closed for entity1 and entity1a
> JDBC connection opened for entity2 and entity2a - Applicable queries run
> and ResultSet objects processed
> All open ResultSet and Statement objects closed for entity2 and entity2a
> All JDBC connections (none are closed at this point) are closed.
>
> In my instance, I have some 95 unique <entity> elements (19 parents with 5
> children each), resulting in 95 open JDBC connections.  If I understand the
> process correctly, it should be safe to close the JDBC connection for a
> "root" <entity> (immediate children of <document>) and all descendant
> <entity> elements once the parent has been successfully completed.  I have
> been digging around the code, but due to my unfamiliarity with the code, I'm
> not sure where this would occur.
>
> Is this a valid solution?  It's looking like I should probably open a
> defect and I'm willing to do so along with submitting a patch, but need a
> little more direction on where the fix would best reside.
>
> Thanks,
>
> Shane
>
>
>
> On Mon, Jan 10, 2011 at 7:14 AM, Shane Perry <th...@gmail.com> wrote:
>
>> Gora,
>>
>> Thanks for the response.  After taking another look, you are correct about
>> the hasnext() closing the ResultSet object (1.4.1 as well as 1.4.0).  I
>> didn't recognize the case difference in the two function calls, so missed
>> it.  I'll keep looking into the original issue and reply if I find a
>> cause/solution.
>>
>> Shane
>>
>>
>> On Sat, Jan 8, 2011 at 4:04 AM, Gora Mohanty <go...@mimirtech.com> wrote:
>>
>>> On Sat, Jan 8, 2011 at 1:10 AM, Shane Perry <th...@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > I am in the process of migrating our system from Postgres 8.4 to Solr
>>> > 1.4.1.  Our system is fairly complex and as a result, I have had to
>>> define
>>> > 19 base entities in the data-config.xml definition file.  Each of these
>>> > entities executes 5 queries.  When doing a full-import, as each entity
>>> > completes, the server hosting Postgres shows 5 "idle in transaction"
>>> for the
>>> > entity.
>>> >
>>> > In digging through the code, I found that the JdbcDataSource wraps the
>>> > ResultSet object in a custom ResultSetIterator object, leaving the
>>> ResultSet
>>> > open.  Walking through the code I can't find a close() call anywhere on
>>> the
>>> > ResultSet.  I believe this results in the "idle in transaction"
>>> processes.
>>> [...]
>>>
>>> Have not examined the "idle in transaction" issue that you
>>> mention, but the ResultSet object in a ResultSetIterator is
>>> closed in the private hasnext() method, when there are no
>>> more results, or if there is an exception. hasnext() is called
>>> by the public hasNext() method that should be used in
>>> iterating over the results, so I see no issue there.
>>>
>>> Regards,
>>> Gora
>>>
>>> P.S. This is from Solr 1.4.0 code, but I would not think that
>>>        this part of the code would have changed.
>>>
>>
>>
>

Re: DIH - Closing ResultSet in JdbcDataSource

Posted by Shane Perry <th...@gmail.com>.
By placing some strategic debug messages, I have found that the JDBC
connections are not being closed until all <entity> elements have been
processed (in the entire config file).  A simplified example would be:

<dataConfig>
  <dataSource name="ds1" driver="org.postgresql.Driver"
url="jdbc:postgresql://localhost:5432/db1" user="..." password="..." />
  <dataSource name="ds2" driver="org.postgresql.Driver"
url="jdbc:postgresql://localhost:5432/db2" user="..." password="..." />

  <document>
    <entity name="entity1" datasource="ds1" ...>
      ... field list ...
      <entity name="entity1a" datasource="ds1" ...>
        ... field list ...
      </entity>
   </entity>
    <entity name="entity2" datasource="ds2" ...>
      ... field list ...
      <entity name="entity2a" datasource="ds2" ...>
        ... field list ...
      </entity>
   </entity>
  </document>
</dataConfig>

The behavior is:

JDBC connection opened for entity1 and entity1a - Applicable queries run and
ResultSet objects processed
All open ResultSet and Statement objects closed for entity1 and entity1a
JDBC connection opened for entity2 and entity2a - Applicable queries run and
ResultSet objects processed
All open ResultSet and Statement objects closed for entity2 and entity2a
All JDBC connections (none are closed at this point) are closed.

In my instance, I have some 95 unique <entity> elements (19 parents with 5
children each), resulting in 95 open JDBC connections.  If I understand the
process correctly, it should be safe to close the JDBC connection for a
"root" <entity> (immediate children of <document>) and all descendant
<entity> elements once the parent has been successfully completed.  I have
been digging around the code, but due to my unfamiliarity with the code, I'm
not sure where this would occur.

Is this a valid solution?  It's looking like I should probably open a defect
and I'm willing to do so along with submitting a patch, but need a little
more direction on where the fix would best reside.

Thanks,

Shane


On Mon, Jan 10, 2011 at 7:14 AM, Shane Perry <th...@gmail.com> wrote:

> Gora,
>
> Thanks for the response.  After taking another look, you are correct about
> the hasnext() closing the ResultSet object (1.4.1 as well as 1.4.0).  I
> didn't recognize the case difference in the two function calls, so missed
> it.  I'll keep looking into the original issue and reply if I find a
> cause/solution.
>
> Shane
>
>
> On Sat, Jan 8, 2011 at 4:04 AM, Gora Mohanty <go...@mimirtech.com> wrote:
>
>> On Sat, Jan 8, 2011 at 1:10 AM, Shane Perry <th...@gmail.com> wrote:
>> > Hi,
>> >
>> > I am in the process of migrating our system from Postgres 8.4 to Solr
>> > 1.4.1.  Our system is fairly complex and as a result, I have had to
>> define
>> > 19 base entities in the data-config.xml definition file.  Each of these
>> > entities executes 5 queries.  When doing a full-import, as each entity
>> > completes, the server hosting Postgres shows 5 "idle in transaction" for
>> the
>> > entity.
>> >
>> > In digging through the code, I found that the JdbcDataSource wraps the
>> > ResultSet object in a custom ResultSetIterator object, leaving the
>> ResultSet
>> > open.  Walking through the code I can't find a close() call anywhere on
>> the
>> > ResultSet.  I believe this results in the "idle in transaction"
>> processes.
>> [...]
>>
>> Have not examined the "idle in transaction" issue that you
>> mention, but the ResultSet object in a ResultSetIterator is
>> closed in the private hasnext() method, when there are no
>> more results, or if there is an exception. hasnext() is called
>> by the public hasNext() method that should be used in
>> iterating over the results, so I see no issue there.
>>
>> Regards,
>> Gora
>>
>> P.S. This is from Solr 1.4.0 code, but I would not think that
>>        this part of the code would have changed.
>>
>
>

Re: DIH - Closing ResultSet in JdbcDataSource

Posted by Shane Perry <th...@gmail.com>.
Gora,

Thanks for the response.  After taking another look, you are correct about
the hasnext() closing the ResultSet object (1.4.1 as well as 1.4.0).  I
didn't recognize the case difference in the two function calls, so missed
it.  I'll keep looking into the original issue and reply if I find a
cause/solution.

Shane

On Sat, Jan 8, 2011 at 4:04 AM, Gora Mohanty <go...@mimirtech.com> wrote:

> On Sat, Jan 8, 2011 at 1:10 AM, Shane Perry <th...@gmail.com> wrote:
> > Hi,
> >
> > I am in the process of migrating our system from Postgres 8.4 to Solr
> > 1.4.1.  Our system is fairly complex and as a result, I have had to
> define
> > 19 base entities in the data-config.xml definition file.  Each of these
> > entities executes 5 queries.  When doing a full-import, as each entity
> > completes, the server hosting Postgres shows 5 "idle in transaction" for
> the
> > entity.
> >
> > In digging through the code, I found that the JdbcDataSource wraps the
> > ResultSet object in a custom ResultSetIterator object, leaving the
> ResultSet
> > open.  Walking through the code I can't find a close() call anywhere on
> the
> > ResultSet.  I believe this results in the "idle in transaction"
> processes.
> [...]
>
> Have not examined the "idle in transaction" issue that you
> mention, but the ResultSet object in a ResultSetIterator is
> closed in the private hasnext() method, when there are no
> more results, or if there is an exception. hasnext() is called
> by the public hasNext() method that should be used in
> iterating over the results, so I see no issue there.
>
> Regards,
> Gora
>
> P.S. This is from Solr 1.4.0 code, but I would not think that
>        this part of the code would have changed.
>

Re: DIH - Closing ResultSet in JdbcDataSource

Posted by Gora Mohanty <go...@mimirtech.com>.
On Sat, Jan 8, 2011 at 1:10 AM, Shane Perry <th...@gmail.com> wrote:
> Hi,
>
> I am in the process of migrating our system from Postgres 8.4 to Solr
> 1.4.1.  Our system is fairly complex and as a result, I have had to define
> 19 base entities in the data-config.xml definition file.  Each of these
> entities executes 5 queries.  When doing a full-import, as each entity
> completes, the server hosting Postgres shows 5 "idle in transaction" for the
> entity.
>
> In digging through the code, I found that the JdbcDataSource wraps the
> ResultSet object in a custom ResultSetIterator object, leaving the ResultSet
> open.  Walking through the code I can't find a close() call anywhere on the
> ResultSet.  I believe this results in the "idle in transaction" processes.
[...]

Have not examined the "idle in transaction" issue that you
mention, but the ResultSet object in a ResultSetIterator is
closed in the private hasnext() method, when there are no
more results, or if there is an exception. hasnext() is called
by the public hasNext() method that should be used in
iterating over the results, so I see no issue there.

Regards,
Gora

P.S. This is from Solr 1.4.0 code, but I would not think that
        this part of the code would have changed.