You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sascha Szott <sz...@zib.de> on 2009/11/09 19:42:22 UTC

[DIH] blocking import operation

Hi all,

currently, DIH's import operation(s) only works asynchronously. 
Therefore, after submitting an import request, DIH returns immediately, 
while the import process (in case a large amount of data needs to be 
indexed) continues asynchronously behind the scenes.

So, what is the recommended way to check if the import process has 
already finished? Or still better, is there any method / workaround that 
will block the import operation's caller until the operation has finished?

In my application, the DIH receives some URL parameters which are used 
for determining the database name that is used within data-config.xml, e.g.

http://localhost:8983/solr/dataimport?command=full-import&dbname=foo

Since only one DIH, /dataimport, is defined, but several database needs 
to be indexed, it is required to issue this command several times, e.g.

http://localhost:8983/solr/dataimport?command=full-import&dbname=foo

... wait until /dataimport?command=status says "Indexing completed" (but 
without using a loop that checks it again and again) ...

http://localhost:8983/solr/dataimport?command=full-import&dbname=bar&clean=false


A suitable solution, at least IMHO, would be to have an additional DIH 
parameter which determines whether the import call is blocking on 
non-blocking, the default. As far as I see, this could be accomplished 
since Solr can execute more than one import operation at a time (it 
starts a new thread for each). Perhaps, my question is somehow related 
to the discussion [1] on ParallelDataImportHandler.

Best,
Sascha

[1] http://www.lucidimagination.com/search/document/a9b26ade46466ee


Re: [DIH] blocking import operation

Posted by Sascha Szott <sz...@zib.de>.
Noble Paul wrote:
> Yes , open an issue . This is a trivial change
I've opened JIRA issue SOLR-1554.

-Sascha

>
> On Thu, Nov 12, 2009 at 5:08 AM, Sascha Szott <sz...@zib.de> wrote:
>> Noble,
>>
>> Noble Paul wrote:
>>> DIH imports are really long running. There is a good chance that the
>>> connection times out or breaks in between.
>> Yes, you're right, I missed that point (in my case imports take no
>> longer
>> than a minute).
>>
>>> how about a callback?
>> Thanks for the hint. There was a discussion on adding a callback url to
>> DIH a month ago, but it seems that no issue was raised. So, up to now
>> its
>> only possible to implement an appropriate Solr EventListener. Should we
>> open an issue for supporting callback urls?
>>
>> Best,
>> Sascha
>>
>>>
>>> On Tue, Nov 10, 2009 at 12:12 AM, Sascha Szott <sz...@zib.de> wrote:
>>>> Hi all,
>>>>
>>>> currently, DIH's import operation(s) only works asynchronously.
>>>> Therefore,
>>>> after submitting an import request, DIH returns immediately, while the
>>>> import process (in case a large amount of data needs to be indexed)
>>>> continues asynchronously behind the scenes.
>>>>
>>>> So, what is the recommended way to check if the import process has
>>>> already
>>>> finished? Or still better, is there any method / workaround that will
>>>> block
>>>> the import operation's caller until the operation has finished?
>>>>
>>>> In my application, the DIH receives some URL parameters which are used
>>>> for
>>>> determining the database name that is used within data-config.xml,
>>>> e.g.
>>>>
>>>> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo
>>>>
>>>> Since only one DIH, /dataimport, is defined, but several database
>>>> needs
>>>> to
>>>> be indexed, it is required to issue this command several times, e.g.
>>>>
>>>> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo
>>>>
>>>> ... wait until /dataimport?command=status says "Indexing completed"
>>>> (but
>>>> without using a loop that checks it again and again) ...
>>>>
>>>> http://localhost:8983/solr/dataimport?command=full-import&dbname=bar&clean=false
>>>>
>>>>
>>>> A suitable solution, at least IMHO, would be to have an additional DIH
>>>> parameter which determines whether the import call is blocking on
>>>> non-blocking, the default. As far as I see, this could be accomplished
>>>> since
>>>> Solr can execute more than one import operation at a time (it starts a
>>>> new
>>>> thread for each). Perhaps, my question is somehow related to the
>>>> discussion
>>>> [1] on ParallelDataImportHandler.
>>>>
>>>> Best,
>>>> Sascha
>>>>
>>>> [1] http://www.lucidimagination.com/search/document/a9b26ade46466ee
>>>>
>>
>
>
>
> --
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com
>


Re: [DIH] blocking import operation

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
Yes , open an issue . This is a trivial change

On Thu, Nov 12, 2009 at 5:08 AM, Sascha Szott <sz...@zib.de> wrote:
> Noble,
>
> Noble Paul wrote:
>> DIH imports are really long running. There is a good chance that the
>> connection times out or breaks in between.
> Yes, you're right, I missed that point (in my case imports take no longer
> than a minute).
>
>> how about a callback?
> Thanks for the hint. There was a discussion on adding a callback url to
> DIH a month ago, but it seems that no issue was raised. So, up to now its
> only possible to implement an appropriate Solr EventListener. Should we
> open an issue for supporting callback urls?
>
> Best,
> Sascha
>
>>
>> On Tue, Nov 10, 2009 at 12:12 AM, Sascha Szott <sz...@zib.de> wrote:
>>> Hi all,
>>>
>>> currently, DIH's import operation(s) only works asynchronously.
>>> Therefore,
>>> after submitting an import request, DIH returns immediately, while the
>>> import process (in case a large amount of data needs to be indexed)
>>> continues asynchronously behind the scenes.
>>>
>>> So, what is the recommended way to check if the import process has
>>> already
>>> finished? Or still better, is there any method / workaround that will
>>> block
>>> the import operation's caller until the operation has finished?
>>>
>>> In my application, the DIH receives some URL parameters which are used
>>> for
>>> determining the database name that is used within data-config.xml, e.g.
>>>
>>> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo
>>>
>>> Since only one DIH, /dataimport, is defined, but several database needs
>>> to
>>> be indexed, it is required to issue this command several times, e.g.
>>>
>>> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo
>>>
>>> ... wait until /dataimport?command=status says "Indexing completed" (but
>>> without using a loop that checks it again and again) ...
>>>
>>> http://localhost:8983/solr/dataimport?command=full-import&dbname=bar&clean=false
>>>
>>>
>>> A suitable solution, at least IMHO, would be to have an additional DIH
>>> parameter which determines whether the import call is blocking on
>>> non-blocking, the default. As far as I see, this could be accomplished
>>> since
>>> Solr can execute more than one import operation at a time (it starts a
>>> new
>>> thread for each). Perhaps, my question is somehow related to the
>>> discussion
>>> [1] on ParallelDataImportHandler.
>>>
>>> Best,
>>> Sascha
>>>
>>> [1] http://www.lucidimagination.com/search/document/a9b26ade46466ee
>>>
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: [DIH] blocking import operation

Posted by Sascha Szott <sz...@zib.de>.
Noble,

Noble Paul wrote:
> DIH imports are really long running. There is a good chance that the
> connection times out or breaks in between.
Yes, you're right, I missed that point (in my case imports take no longer
than a minute).

> how about a callback?
Thanks for the hint. There was a discussion on adding a callback url to
DIH a month ago, but it seems that no issue was raised. So, up to now its
only possible to implement an appropriate Solr EventListener. Should we
open an issue for supporting callback urls?

Best,
Sascha

>
> On Tue, Nov 10, 2009 at 12:12 AM, Sascha Szott <sz...@zib.de> wrote:
>> Hi all,
>>
>> currently, DIH's import operation(s) only works asynchronously.
>> Therefore,
>> after submitting an import request, DIH returns immediately, while the
>> import process (in case a large amount of data needs to be indexed)
>> continues asynchronously behind the scenes.
>>
>> So, what is the recommended way to check if the import process has
>> already
>> finished? Or still better, is there any method / workaround that will
>> block
>> the import operation's caller until the operation has finished?
>>
>> In my application, the DIH receives some URL parameters which are used
>> for
>> determining the database name that is used within data-config.xml, e.g.
>>
>> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo
>>
>> Since only one DIH, /dataimport, is defined, but several database needs
>> to
>> be indexed, it is required to issue this command several times, e.g.
>>
>> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo
>>
>> ... wait until /dataimport?command=status says "Indexing completed" (but
>> without using a loop that checks it again and again) ...
>>
>> http://localhost:8983/solr/dataimport?command=full-import&dbname=bar&clean=false
>>
>>
>> A suitable solution, at least IMHO, would be to have an additional DIH
>> parameter which determines whether the import call is blocking on
>> non-blocking, the default. As far as I see, this could be accomplished
>> since
>> Solr can execute more than one import operation at a time (it starts a
>> new
>> thread for each). Perhaps, my question is somehow related to the
>> discussion
>> [1] on ParallelDataImportHandler.
>>
>> Best,
>> Sascha
>>
>> [1] http://www.lucidimagination.com/search/document/a9b26ade46466ee
>>

Re: [DIH] blocking import operation

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
DIH imports are really long running. There is a good chance that the
connection times out or breaks in between.

how about a callback?

On Tue, Nov 10, 2009 at 12:12 AM, Sascha Szott <sz...@zib.de> wrote:
> Hi all,
>
> currently, DIH's import operation(s) only works asynchronously. Therefore,
> after submitting an import request, DIH returns immediately, while the
> import process (in case a large amount of data needs to be indexed)
> continues asynchronously behind the scenes.
>
> So, what is the recommended way to check if the import process has already
> finished? Or still better, is there any method / workaround that will block
> the import operation's caller until the operation has finished?
>
> In my application, the DIH receives some URL parameters which are used for
> determining the database name that is used within data-config.xml, e.g.
>
> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo
>
> Since only one DIH, /dataimport, is defined, but several database needs to
> be indexed, it is required to issue this command several times, e.g.
>
> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo
>
> ... wait until /dataimport?command=status says "Indexing completed" (but
> without using a loop that checks it again and again) ...
>
> http://localhost:8983/solr/dataimport?command=full-import&dbname=bar&clean=false
>
>
> A suitable solution, at least IMHO, would be to have an additional DIH
> parameter which determines whether the import call is blocking on
> non-blocking, the default. As far as I see, this could be accomplished since
> Solr can execute more than one import operation at a time (it starts a new
> thread for each). Perhaps, my question is somehow related to the discussion
> [1] on ParallelDataImportHandler.
>
> Best,
> Sascha
>
> [1] http://www.lucidimagination.com/search/document/a9b26ade46466ee
>
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com