You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sascha Szott <sz...@zib.de> on 2009/11/09 19:42:22 UTC
[DIH] blocking import operation
Hi all,
currently, DIH's import operation(s) only works asynchronously.
Therefore, after submitting an import request, DIH returns immediately,
while the import process (in case a large amount of data needs to be
indexed) continues asynchronously behind the scenes.
So, what is the recommended way to check if the import process has
already finished? Or still better, is there any method / workaround that
will block the import operation's caller until the operation has finished?
In my application, the DIH receives some URL parameters which are used
for determining the database name that is used within data-config.xml, e.g.
http://localhost:8983/solr/dataimport?command=full-import&dbname=foo
Since only one DIH, /dataimport, is defined, but several database needs
to be indexed, it is required to issue this command several times, e.g.
http://localhost:8983/solr/dataimport?command=full-import&dbname=foo
... wait until /dataimport?command=status says "Indexing completed" (but
without using a loop that checks it again and again) ...
http://localhost:8983/solr/dataimport?command=full-import&dbname=bar&clean=false
A suitable solution, at least IMHO, would be to have an additional DIH
parameter which determines whether the import call is blocking on
non-blocking, the default. As far as I see, this could be accomplished
since Solr can execute more than one import operation at a time (it
starts a new thread for each). Perhaps, my question is somehow related
to the discussion [1] on ParallelDataImportHandler.
Best,
Sascha
[1] http://www.lucidimagination.com/search/document/a9b26ade46466ee
Re: [DIH] blocking import operation
Posted by Sascha Szott <sz...@zib.de>.
Noble Paul wrote:
> Yes , open an issue . This is a trivial change
I've opened JIRA issue SOLR-1554.
-Sascha
>
> On Thu, Nov 12, 2009 at 5:08 AM, Sascha Szott <sz...@zib.de> wrote:
>> Noble,
>>
>> Noble Paul wrote:
>>> DIH imports are really long running. There is a good chance that the
>>> connection times out or breaks in between.
>> Yes, you're right, I missed that point (in my case imports take no
>> longer
>> than a minute).
>>
>>> how about a callback?
>> Thanks for the hint. There was a discussion on adding a callback url to
>> DIH a month ago, but it seems that no issue was raised. So, up to now
>> its
>> only possible to implement an appropriate Solr EventListener. Should we
>> open an issue for supporting callback urls?
>>
>> Best,
>> Sascha
>>
>>>
>>> On Tue, Nov 10, 2009 at 12:12 AM, Sascha Szott <sz...@zib.de> wrote:
>>>> Hi all,
>>>>
>>>> currently, DIH's import operation(s) only works asynchronously.
>>>> Therefore,
>>>> after submitting an import request, DIH returns immediately, while the
>>>> import process (in case a large amount of data needs to be indexed)
>>>> continues asynchronously behind the scenes.
>>>>
>>>> So, what is the recommended way to check if the import process has
>>>> already
>>>> finished? Or still better, is there any method / workaround that will
>>>> block
>>>> the import operation's caller until the operation has finished?
>>>>
>>>> In my application, the DIH receives some URL parameters which are used
>>>> for
>>>> determining the database name that is used within data-config.xml,
>>>> e.g.
>>>>
>>>> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo
>>>>
>>>> Since only one DIH, /dataimport, is defined, but several database
>>>> needs
>>>> to
>>>> be indexed, it is required to issue this command several times, e.g.
>>>>
>>>> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo
>>>>
>>>> ... wait until /dataimport?command=status says "Indexing completed"
>>>> (but
>>>> without using a loop that checks it again and again) ...
>>>>
>>>> http://localhost:8983/solr/dataimport?command=full-import&dbname=bar&clean=false
>>>>
>>>>
>>>> A suitable solution, at least IMHO, would be to have an additional DIH
>>>> parameter which determines whether the import call is blocking on
>>>> non-blocking, the default. As far as I see, this could be accomplished
>>>> since
>>>> Solr can execute more than one import operation at a time (it starts a
>>>> new
>>>> thread for each). Perhaps, my question is somehow related to the
>>>> discussion
>>>> [1] on ParallelDataImportHandler.
>>>>
>>>> Best,
>>>> Sascha
>>>>
>>>> [1] http://www.lucidimagination.com/search/document/a9b26ade46466ee
>>>>
>>
>
>
>
> --
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com
>
Re: [DIH] blocking import operation
Posted by Noble Paul നോബിള് नोब्ळ् <no...@corp.aol.com>.
Yes , open an issue . This is a trivial change
On Thu, Nov 12, 2009 at 5:08 AM, Sascha Szott <sz...@zib.de> wrote:
> Noble,
>
> Noble Paul wrote:
>> DIH imports are really long running. There is a good chance that the
>> connection times out or breaks in between.
> Yes, you're right, I missed that point (in my case imports take no longer
> than a minute).
>
>> how about a callback?
> Thanks for the hint. There was a discussion on adding a callback url to
> DIH a month ago, but it seems that no issue was raised. So, up to now its
> only possible to implement an appropriate Solr EventListener. Should we
> open an issue for supporting callback urls?
>
> Best,
> Sascha
>
>>
>> On Tue, Nov 10, 2009 at 12:12 AM, Sascha Szott <sz...@zib.de> wrote:
>>> Hi all,
>>>
>>> currently, DIH's import operation(s) only works asynchronously.
>>> Therefore,
>>> after submitting an import request, DIH returns immediately, while the
>>> import process (in case a large amount of data needs to be indexed)
>>> continues asynchronously behind the scenes.
>>>
>>> So, what is the recommended way to check if the import process has
>>> already
>>> finished? Or still better, is there any method / workaround that will
>>> block
>>> the import operation's caller until the operation has finished?
>>>
>>> In my application, the DIH receives some URL parameters which are used
>>> for
>>> determining the database name that is used within data-config.xml, e.g.
>>>
>>> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo
>>>
>>> Since only one DIH, /dataimport, is defined, but several database needs
>>> to
>>> be indexed, it is required to issue this command several times, e.g.
>>>
>>> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo
>>>
>>> ... wait until /dataimport?command=status says "Indexing completed" (but
>>> without using a loop that checks it again and again) ...
>>>
>>> http://localhost:8983/solr/dataimport?command=full-import&dbname=bar&clean=false
>>>
>>>
>>> A suitable solution, at least IMHO, would be to have an additional DIH
>>> parameter which determines whether the import call is blocking on
>>> non-blocking, the default. As far as I see, this could be accomplished
>>> since
>>> Solr can execute more than one import operation at a time (it starts a
>>> new
>>> thread for each). Perhaps, my question is somehow related to the
>>> discussion
>>> [1] on ParallelDataImportHandler.
>>>
>>> Best,
>>> Sascha
>>>
>>> [1] http://www.lucidimagination.com/search/document/a9b26ade46466ee
>>>
>
--
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com
Re: [DIH] blocking import operation
Posted by Sascha Szott <sz...@zib.de>.
Noble,
Noble Paul wrote:
> DIH imports are really long running. There is a good chance that the
> connection times out or breaks in between.
Yes, you're right, I missed that point (in my case imports take no longer
than a minute).
> how about a callback?
Thanks for the hint. There was a discussion on adding a callback url to
DIH a month ago, but it seems that no issue was raised. So, up to now its
only possible to implement an appropriate Solr EventListener. Should we
open an issue for supporting callback urls?
Best,
Sascha
>
> On Tue, Nov 10, 2009 at 12:12 AM, Sascha Szott <sz...@zib.de> wrote:
>> Hi all,
>>
>> currently, DIH's import operation(s) only works asynchronously.
>> Therefore,
>> after submitting an import request, DIH returns immediately, while the
>> import process (in case a large amount of data needs to be indexed)
>> continues asynchronously behind the scenes.
>>
>> So, what is the recommended way to check if the import process has
>> already
>> finished? Or still better, is there any method / workaround that will
>> block
>> the import operation's caller until the operation has finished?
>>
>> In my application, the DIH receives some URL parameters which are used
>> for
>> determining the database name that is used within data-config.xml, e.g.
>>
>> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo
>>
>> Since only one DIH, /dataimport, is defined, but several database needs
>> to
>> be indexed, it is required to issue this command several times, e.g.
>>
>> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo
>>
>> ... wait until /dataimport?command=status says "Indexing completed" (but
>> without using a loop that checks it again and again) ...
>>
>> http://localhost:8983/solr/dataimport?command=full-import&dbname=bar&clean=false
>>
>>
>> A suitable solution, at least IMHO, would be to have an additional DIH
>> parameter which determines whether the import call is blocking on
>> non-blocking, the default. As far as I see, this could be accomplished
>> since
>> Solr can execute more than one import operation at a time (it starts a
>> new
>> thread for each). Perhaps, my question is somehow related to the
>> discussion
>> [1] on ParallelDataImportHandler.
>>
>> Best,
>> Sascha
>>
>> [1] http://www.lucidimagination.com/search/document/a9b26ade46466ee
>>
Re: [DIH] blocking import operation
Posted by Noble Paul നോബിള് नोब्ळ् <no...@corp.aol.com>.
DIH imports are really long running. There is a good chance that the
connection times out or breaks in between.
how about a callback?
On Tue, Nov 10, 2009 at 12:12 AM, Sascha Szott <sz...@zib.de> wrote:
> Hi all,
>
> currently, DIH's import operation(s) only works asynchronously. Therefore,
> after submitting an import request, DIH returns immediately, while the
> import process (in case a large amount of data needs to be indexed)
> continues asynchronously behind the scenes.
>
> So, what is the recommended way to check if the import process has already
> finished? Or still better, is there any method / workaround that will block
> the import operation's caller until the operation has finished?
>
> In my application, the DIH receives some URL parameters which are used for
> determining the database name that is used within data-config.xml, e.g.
>
> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo
>
> Since only one DIH, /dataimport, is defined, but several database needs to
> be indexed, it is required to issue this command several times, e.g.
>
> http://localhost:8983/solr/dataimport?command=full-import&dbname=foo
>
> ... wait until /dataimport?command=status says "Indexing completed" (but
> without using a loop that checks it again and again) ...
>
> http://localhost:8983/solr/dataimport?command=full-import&dbname=bar&clean=false
>
>
> A suitable solution, at least IMHO, would be to have an additional DIH
> parameter which determines whether the import call is blocking on
> non-blocking, the default. As far as I see, this could be accomplished since
> Solr can execute more than one import operation at a time (it starts a new
> thread for each). Perhaps, my question is somehow related to the discussion
> [1] on ParallelDataImportHandler.
>
> Best,
> Sascha
>
> [1] http://www.lucidimagination.com/search/document/a9b26ade46466ee
>
>
--
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com