You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sascha Szott <sz...@zib.de> on 2009/11/12 02:05:13 UTC

[DIH] concurrent requests to DIH

Hi all,

I'm using the DIH in a parameterized way by passing request parameters
that are used inside of my data-config. All imports end up in the same
index.

1. Is it considered as good practice to set up several DIH request
handlers, one for each possible parameter value?

2. In case the range of parameter values is broad, it's not convenient to
define separate request handlers for each value. But this entails a
limitation (as far as I see): It is not possible to fire several request
to the same DIH handler (with different parameter values) at the same
time. However, in case several request handlers would be used (as in 1.),
concurrent requests (to the different handlers) are possible. So, how to
overcome this limitation?

Best,
Sascha

Re: [DIH] concurrent requests to DIH

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
I guess SOLR-1352 should solve all the problems with performance. I am
working on one currently and I hope to submit a patch soon.

On Thu, Nov 12, 2009 at 8:05 PM, Sascha Szott <sz...@zib.de> wrote:
> Hi Avlesh,
>
> Avlesh Singh wrote:
>>>
>>> 1. Is it considered as good practice to set up several DIH request
>>> handlers, one for each possible parameter value?
>>>
>> Nothing wrong with this. My assumption is that you want to do this to
>> speed
>> up indexing. Each DIH instance would block all others, once a Lucene
>> commit
>> for the former is performed.
> Thanks for this clarification.
>
>> 2. In case the range of parameter values is broad, it's not convenient to
>>> define separate request handlers for each value. But this entails a
>>> limitation (as far as I see): It is not possible to fire several request
>>> to the same DIH handler (with different parameter values) at the same
>>> time.
>>>
>> Nope.
>>
>> I had done a similar exercise in my quest to write a
>> ParallelDataImportHandler. This thread might be of interest to you -
>> http://www.lucidimagination.com/search/document/a9b26ade46466ee/queries_regarding_a_paralleldataimporthandler.
>> Though there is a ticket in JIRA, I haven't been able to contribute this
>> back. If you think this is what you need, lemme know.
> Actually, I've already read this thread. In my opinion, both support for
> batch processing and multi-threading are important extensions of DIH's
> current capabilities, though issue SOLR-1352 mainly targets the latter. Is
> your PDIH implementation able to deal with batch processing right now?
>
> Best,
> Sascha
>
>> On Thu, Nov 12, 2009 at 6:35 AM, Sascha Szott <sz...@zib.de> wrote:
>>
>>> Hi all,
>>>
>>> I'm using the DIH in a parameterized way by passing request parameters
>>> that are used inside of my data-config. All imports end up in the same
>>> index.
>>>
>>> 1. Is it considered as good practice to set up several DIH request
>>> handlers, one for each possible parameter value?
>>>
>>> 2. In case the range of parameter values is broad, it's not convenient
>>> to
>>> define separate request handlers for each value. But this entails a
>>> limitation (as far as I see): It is not possible to fire several request
>>> to the same DIH handler (with different parameter values) at the same
>>> time. However, in case several request handlers would be used (as in
>>> 1.),
>>> concurrent requests (to the different handlers) are possible. So, how to
>>> overcome this limitation?
>>>
>>> Best,
>>> Sascha
>>>
>>
>
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: [DIH] concurrent requests to DIH

Posted by Sascha Szott <sz...@zib.de>.
Hi Avlesh,

Avlesh Singh wrote:
>>
>> 1. Is it considered as good practice to set up several DIH request
>> handlers, one for each possible parameter value?
>>
> Nothing wrong with this. My assumption is that you want to do this to
> speed
> up indexing. Each DIH instance would block all others, once a Lucene
> commit
> for the former is performed.
Thanks for this clarification.

> 2. In case the range of parameter values is broad, it's not convenient to
>> define separate request handlers for each value. But this entails a
>> limitation (as far as I see): It is not possible to fire several request
>> to the same DIH handler (with different parameter values) at the same
>> time.
>>
> Nope.
>
> I had done a similar exercise in my quest to write a
> ParallelDataImportHandler. This thread might be of interest to you -
> http://www.lucidimagination.com/search/document/a9b26ade46466ee/queries_regarding_a_paralleldataimporthandler.
> Though there is a ticket in JIRA, I haven't been able to contribute this
> back. If you think this is what you need, lemme know.
Actually, I've already read this thread. In my opinion, both support for
batch processing and multi-threading are important extensions of DIH's
current capabilities, though issue SOLR-1352 mainly targets the latter. Is
your PDIH implementation able to deal with batch processing right now?

Best,
Sascha

> On Thu, Nov 12, 2009 at 6:35 AM, Sascha Szott <sz...@zib.de> wrote:
>
>> Hi all,
>>
>> I'm using the DIH in a parameterized way by passing request parameters
>> that are used inside of my data-config. All imports end up in the same
>> index.
>>
>> 1. Is it considered as good practice to set up several DIH request
>> handlers, one for each possible parameter value?
>>
>> 2. In case the range of parameter values is broad, it's not convenient
>> to
>> define separate request handlers for each value. But this entails a
>> limitation (as far as I see): It is not possible to fire several request
>> to the same DIH handler (with different parameter values) at the same
>> time. However, in case several request handlers would be used (as in
>> 1.),
>> concurrent requests (to the different handlers) are possible. So, how to
>> overcome this limitation?
>>
>> Best,
>> Sascha
>>
>


Re: [DIH] concurrent requests to DIH

Posted by Avlesh Singh <av...@gmail.com>.
>
> 1. Is it considered as good practice to set up several DIH request
> handlers, one for each possible parameter value?
>
Nothing wrong with this. My assumption is that you want to do this to speed
up indexing. Each DIH instance would block all others, once a Lucene commit
for the former is performed.

2. In case the range of parameter values is broad, it's not convenient to
> define separate request handlers for each value. But this entails a
> limitation (as far as I see): It is not possible to fire several request
> to the same DIH handler (with different parameter values) at the same
> time.
>
Nope.

I had done a similar exercise in my quest to write a
ParallelDataImportHandler. This thread might be of interest to you -
http://www.lucidimagination.com/search/document/a9b26ade46466ee/queries_regarding_a_paralleldataimporthandler.
Though there is a ticket in JIRA, I haven't been able to contribute this
back. If you think this is what you need, lemme know.

Cheers
Avlesh

On Thu, Nov 12, 2009 at 6:35 AM, Sascha Szott <sz...@zib.de> wrote:

> Hi all,
>
> I'm using the DIH in a parameterized way by passing request parameters
> that are used inside of my data-config. All imports end up in the same
> index.
>
> 1. Is it considered as good practice to set up several DIH request
> handlers, one for each possible parameter value?
>
> 2. In case the range of parameter values is broad, it's not convenient to
> define separate request handlers for each value. But this entails a
> limitation (as far as I see): It is not possible to fire several request
> to the same DIH handler (with different parameter values) at the same
> time. However, in case several request handlers would be used (as in 1.),
> concurrent requests (to the different handlers) are possible. So, how to
> overcome this limitation?
>
> Best,
> Sascha
>