You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mike O'Leary <tm...@uw.edu> on 2012/02/11 08:40:57 UTC

Recovering from database connection resets in DataimportHandler

I am trying to use Solr's DataImportHandler to index a large number of database records in a SQL Server database that is owned and managed by a group we are collaborating with. The indexing jobs I have run so far, except for the initial very small test runs, have failed due to database connection resets. I have gotten indexing jobs to go further by using CachedSqlEntityProcessor and specifying responseBuffering=adaptive in the connection url, but I think in order to index that data I'm going to have to work out how to catch database connection reset exceptions and resubmit the queries that failed. Can anyone can suggest a good way to approach this? Or have any of you encountered this problem and worked out a solution to it already?
Thanks,
Mike

Re: Recovering from database connection resets in DataimportHandler

Posted by Erick Erickson <er...@gmail.com>.
It *just happens* that I wrote a blog on this very topic, see:
http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/

That code contains two rather different methods, one that indexes
based on a SQL database and one based on indexing random files
with client-side Tika.

Best
Erick

On Wed, Feb 22, 2012 at 8:51 PM, Mike O'Leary <tm...@uw.edu> wrote:
> Could you point me to the most non-intimidating introduction to SolrJ that you know of? I have a passing familiarity with Javascript and, with few exceptions, I haven't developing software that has a graphical user interface of any kind in about 25 years. I like the idea of having finer control over data imported from a database though.
> Thanks,
> Mike
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Monday, February 13, 2012 6:19 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Recovering from database connection resets in DataimportHandler
>
> I'd seriously consider using SolrJ and your favorite JDBC driver instead. It's actually quite easy to create one, although as always it may be a bit intimidating to get started. This allows you much finer control over error  conditions than DIH does, so may be more suited to your needs.
>
> Best
> Erick
>
> On Sat, Feb 11, 2012 at 2:40 AM, Mike O'Leary <tm...@uw.edu> wrote:
>> I am trying to use Solr's DataImportHandler to index a large number of database records in a SQL Server database that is owned and managed by a group we are collaborating with. The indexing jobs I have run so far, except for the initial very small test runs, have failed due to database connection resets. I have gotten indexing jobs to go further by using CachedSqlEntityProcessor and specifying responseBuffering=adaptive in the connection url, but I think in order to index that data I'm going to have to work out how to catch database connection reset exceptions and resubmit the queries that failed. Can anyone can suggest a good way to approach this? Or have any of you encountered this problem and worked out a solution to it already?
>> Thanks,
>> Mike

RE: Recovering from database connection resets in DataimportHandler

Posted by Mike O'Leary <tm...@uw.edu>.
Could you point me to the most non-intimidating introduction to SolrJ that you know of? I have a passing familiarity with Javascript and, with few exceptions, I haven't developing software that has a graphical user interface of any kind in about 25 years. I like the idea of having finer control over data imported from a database though.
Thanks,
Mike

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Monday, February 13, 2012 6:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Recovering from database connection resets in DataimportHandler

I'd seriously consider using SolrJ and your favorite JDBC driver instead. It's actually quite easy to create one, although as always it may be a bit intimidating to get started. This allows you much finer control over error  conditions than DIH does, so may be more suited to your needs.

Best
Erick

On Sat, Feb 11, 2012 at 2:40 AM, Mike O'Leary <tm...@uw.edu> wrote:
> I am trying to use Solr's DataImportHandler to index a large number of database records in a SQL Server database that is owned and managed by a group we are collaborating with. The indexing jobs I have run so far, except for the initial very small test runs, have failed due to database connection resets. I have gotten indexing jobs to go further by using CachedSqlEntityProcessor and specifying responseBuffering=adaptive in the connection url, but I think in order to index that data I'm going to have to work out how to catch database connection reset exceptions and resubmit the queries that failed. Can anyone can suggest a good way to approach this? Or have any of you encountered this problem and worked out a solution to it already?
> Thanks,
> Mike

Re: Recovering from database connection resets in DataimportHandler

Posted by Erick Erickson <er...@gmail.com>.
I'd seriously consider using SolrJ and your favorite JDBC driver
instead. It's actually quite easy to create one, although as always
it may be a bit intimidating to get started. This allows you much
finer control over error  conditions than DIH does, so may be more
suited to your needs.

Best
Erick

On Sat, Feb 11, 2012 at 2:40 AM, Mike O'Leary <tm...@uw.edu> wrote:
> I am trying to use Solr's DataImportHandler to index a large number of database records in a SQL Server database that is owned and managed by a group we are collaborating with. The indexing jobs I have run so far, except for the initial very small test runs, have failed due to database connection resets. I have gotten indexing jobs to go further by using CachedSqlEntityProcessor and specifying responseBuffering=adaptive in the connection url, but I think in order to index that data I'm going to have to work out how to catch database connection reset exceptions and resubmit the queries that failed. Can anyone can suggest a good way to approach this? Or have any of you encountered this problem and worked out a solution to it already?
> Thanks,
> Mike