You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by markwaddle <ma...@markwaddle.com> on 2010/10/26 21:24:35 UTC

How does DIH multithreading work?

I understand that the thread count is specified on root entities only. Does
it spawn multiple threads per root entity? Or multiple threads per
descendant entity? Can someone give an example of how you would make a
database query in an entity with 4 threads that would select 1 row per
thread?

Thanks,
Mark
-- 
View this message in context: http://lucene.472066.n3.nabble.com/How-does-DIH-multithreading-work-tp1776111p1776111.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How does DIH multithreading work?

Posted by markwaddle <ma...@markwaddle.com>.
Anyone know how it works?
-- 
View this message in context: http://lucene.472066.n3.nabble.com/How-does-DIH-multithreading-work-tp1776111p1784419.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How does DIH multithreading work?

Posted by Lance Norskog <go...@gmail.com>.
It is useful for parsing PDFs on a multi-processor machine. Also, if a
sub-entity does an outbound I/O call to a database, a file, or another
SOLR (SOLR-1499).

Anything where the pipeline time outweighs disk i/o time.

Threading happens on a per-document level- there is no concurrent
access inside a document pipeline.

There is a bug which causes Entityprocessor that look up attributes to
throw an exception. This make Tika unusable inside a thread. Two other
EPs also won't work, but I did not test them.

https://issues.apache.org/jira/browse/SOLR-2186

On Mon, Nov 1, 2010 at 10:43 AM, Dyer, James <Ja...@ingrambook.com> wrote:
> Mark,
>
> I have the same question so I did a little research on this.  Not a complete answer but here is what I've found:
>
> - "threads" was aded with SOLR-1352 (https://issues.apache.org/jira/browse/SOLR-1352).
>
> - Also see http://www.lucidimagination.com/search/document/a9b26ade46466ee/queries_regarding_a_paralleldataimporthandler for background info.
>
> - Only available in 3.x and trunk.  Committed on 1/12/2010 by Noble Paul (who surely can tell you more accurate info than I can).
>
> - Seems like when using, each thread will call "nextRow" on your root entity datasource in parallel.
>
> - Not sure this will help with child entities (ie. I had hoped I could get it to build child caches in parallel but I don't think this is the case).
>
> - A doc comment on ThreadedEntityProcessorWrapper indicates this will help speed up running transformers becauses they'd be in parallel.  This would make sense if maybe your database can only pull back so fast, but then you have an intensive transformer.  Maybe adding a thread would make your processing no slower than the db...
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: markwaddle [mailto:mark@markwaddle.com]
> Sent: Tuesday, October 26, 2010 2:25 PM
> To: solr-user@lucene.apache.org
> Subject: How does DIH multithreading work?
>
>
> I understand that the thread count is specified on root entities only. Does
> it spawn multiple threads per root entity? Or multiple threads per
> descendant entity? Can someone give an example of how you would make a
> database query in an entity with 4 threads that would select 1 row per
> thread?
>
> Thanks,
> Mark
> --
> View this message in context: http://lucene.472066.n3.nabble.com/How-does-DIH-multithreading-work-tp1776111p1776111.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Lance Norskog
goksron@gmail.com

RE: How does DIH multithreading work?

Posted by "Dyer, James" <Ja...@ingrambook.com>.
Mark,

I have the same question so I did a little research on this.  Not a complete answer but here is what I've found:

- "threads" was aded with SOLR-1352 (https://issues.apache.org/jira/browse/SOLR-1352).

- Also see http://www.lucidimagination.com/search/document/a9b26ade46466ee/queries_regarding_a_paralleldataimporthandler for background info.

- Only available in 3.x and trunk.  Committed on 1/12/2010 by Noble Paul (who surely can tell you more accurate info than I can).

- Seems like when using, each thread will call "nextRow" on your root entity datasource in parallel.

- Not sure this will help with child entities (ie. I had hoped I could get it to build child caches in parallel but I don't think this is the case).

- A doc comment on ThreadedEntityProcessorWrapper indicates this will help speed up running transformers becauses they'd be in parallel.  This would make sense if maybe your database can only pull back so fast, but then you have an intensive transformer.  Maybe adding a thread would make your processing no slower than the db...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: markwaddle [mailto:mark@markwaddle.com] 
Sent: Tuesday, October 26, 2010 2:25 PM
To: solr-user@lucene.apache.org
Subject: How does DIH multithreading work?


I understand that the thread count is specified on root entities only. Does
it spawn multiple threads per root entity? Or multiple threads per
descendant entity? Can someone give an example of how you would make a
database query in an entity with 4 threads that would select 1 row per
thread?

Thanks,
Mark
-- 
View this message in context: http://lucene.472066.n3.nabble.com/How-does-DIH-multithreading-work-tp1776111p1776111.html
Sent from the Solr - User mailing list archive at Nabble.com.