You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by John Wang <jo...@gmail.com> on 2005/01/05 19:54:20 UTC

multi-threaded thru-put in lucene

Hi folks:

    We are trying to measure thru-put lucene in a multi-threaded environment. 

    This is what we found:

     1 thread, search takes 20 ms.

      2 threads, search takes 40 ms.

      5 threads, search takes 100 ms.


     Seems like under a multi-threaded scenario, thru-put isn't good,
performance is not any better than that of 1 thread.

     I tried to share an IndexSearcher amongst all threads as well as
having an IndexSearcher per thread. Both yield same numbers.

     Is this consistent with what you'd expect?

Thanks

-John

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: multi-threaded thru-put in lucene

Posted by John Wang <jo...@gmail.com>.
Thanks Doug! You are right, by adding a Thread.sleep() helped greatly.

Mysteries of Java...

Another Java threading question.
With 1 thread, iterations of 100 searches, it took about 850 ms.
by adding a Thread.sleep(10) in the loop. It is taking about 2200 ms.

So there is 2200 - 1850 = 350 ms unaccounted for. Is that due to
thread scheduling/context switching?

Thanks

-John


On Thu, 6 Jan 2005 10:36:12 -0800, John Wang <jo...@gmail.com> wrote:
> Is the operation IndexSearcher.search I/O or CPU bound if I am doing
> 100's of searches on the same query?
> 
> Thanks
> 
> -John
> 
> 
> On Thu, 06 Jan 2005 10:31:49 -0800, Doug Cutting <cu...@apache.org> wrote:
> > John Wang wrote:
> > > 1 thread: 445 ms.
> > > 2 threads: 870 ms.
> > > 5 threads: 2200 ms.
> > >
> > > Pretty much the same numbers you'd get if you are running them sequentially.
> > >
> > > Any ideas? Am I doing something wrong?
> >
> > If you're performing compute-bound work on a single-processor machine
> > then threading should give you no better performance than sequential,
> > perhaps a bit worse.  If you're performing io-bound work on a
> > single-disk machine then threading should again provide no improvement.
> >   If the task is evenly compute and i/o bound then you could achieve at
> > best a 2x speedup on a single CPU system with a single disk.
> >
> > If you're compute-bound on an N-CPU system then threading should
> > optimally be able to provide a factor of N speedup.
> >
> > Java's scheduling of compute-bound theads when no threads call
> > Thread.sleep() can also be very unfair.
> >
> > Doug
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: multi-threaded thru-put in lucene

Posted by Doug Cutting <cu...@apache.org>.
John Wang wrote:
> Is the operation IndexSearcher.search I/O or CPU bound if I am doing
> 100's of searches on the same query?

CPU bound.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: multi-threaded thru-put in lucene

Posted by John Wang <jo...@gmail.com>.
Is the operation IndexSearcher.search I/O or CPU bound if I am doing
100's of searches on the same query?

Thanks

-John


On Thu, 06 Jan 2005 10:31:49 -0800, Doug Cutting <cu...@apache.org> wrote:
> John Wang wrote:
> > 1 thread: 445 ms.
> > 2 threads: 870 ms.
> > 5 threads: 2200 ms.
> >
> > Pretty much the same numbers you'd get if you are running them sequentially.
> >
> > Any ideas? Am I doing something wrong?
> 
> If you're performing compute-bound work on a single-processor machine
> then threading should give you no better performance than sequential,
> perhaps a bit worse.  If you're performing io-bound work on a
> single-disk machine then threading should again provide no improvement.
>   If the task is evenly compute and i/o bound then you could achieve at
> best a 2x speedup on a single CPU system with a single disk.
> 
> If you're compute-bound on an N-CPU system then threading should
> optimally be able to provide a factor of N speedup.
> 
> Java's scheduling of compute-bound theads when no threads call
> Thread.sleep() can also be very unfair.
> 
> Doug
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: multi-threaded thru-put in lucene

Posted by Doug Cutting <cu...@apache.org>.
John Wang wrote:
> 1 thread: 445 ms.
> 2 threads: 870 ms.
> 5 threads: 2200 ms.
> 
> Pretty much the same numbers you'd get if you are running them sequentially.
> 
> Any ideas? Am I doing something wrong?

If you're performing compute-bound work on a single-processor machine 
then threading should give you no better performance than sequential, 
perhaps a bit worse.  If you're performing io-bound work on a 
single-disk machine then threading should again provide no improvement. 
  If the task is evenly compute and i/o bound then you could achieve at 
best a 2x speedup on a single CPU system with a single disk.

If you're compute-bound on an N-CPU system then threading should 
optimally be able to provide a factor of N speedup.

Java's scheduling of compute-bound theads when no threads call 
Thread.sleep() can also be very unfair.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: multi-threaded thru-put in lucene

Posted by John Wang <jo...@gmail.com>.
I actually ran a few tests. But seeing similar behaviors.

After removing all the possible variations, this is what I used:

1 Index, doccount is 15,000.
Using FSDirectory, e.g. new IndexSearcher(String path), by default I
think it uses FSDirectory.

each thread is doing 100 iterations of search, e.g.

for (int i=0;i<100;++i){
    idxSearcher.search(q);
}

for each thread and each iteration, I am using the same query.

I am timing them the following way:

long start=System.currenTimeInMillis();

for (int i =0;i<threadCount;++i){
   thread[i].start();
}

for (int i=0;i<threadCount;++i){
   thread[i].join();
}


long duration=System.currenTimeInMillis()-start;

duration numbers I am getting are:

1 thread: 445 ms.
2 threads: 870 ms.
5 threads: 2200 ms.

Pretty much the same numbers you'd get if you are running them sequentially.

Any ideas? Am I doing something wrong?

Thanks advance for all your help

-John

On Thu, 6 Jan 2005 00:06:09 -0800 (PST), Chris Hostetter
<ho...@fucit.org> wrote:
> 
> :     This is what we found:
> :
> :      1 thread, search takes 20 ms.
> :
> :       2 threads, search takes 40 ms.
> :
> :       5 threads, search takes 100 ms.
> 
> how big is your index?  What are the term frequencies like in your index?
> how many differnt queries did you try? what was the structure of your
> query objects like?  were you using a RAMDirectory or an FSDirectory? what
> hardware were you running on?
> 
> Is your test application small enough that you can post it to the list?
> 
> I haven't done a lot of PMA testing of Lucene, but from what limited
> testing i have done I'm a little suprised at those numbers, you'd get
> results just as good if you ran the queries sequentially.
> 
> -Hoss
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: multi-threaded thru-put in lucene

Posted by Chris Hostetter <ho...@fucit.org>.
:     This is what we found:
:
:      1 thread, search takes 20 ms.
:
:       2 threads, search takes 40 ms.
:
:       5 threads, search takes 100 ms.

how big is your index?  What are the term frequencies like in your index?
how many differnt queries did you try? what was the structure of your
query objects like?  were you using a RAMDirectory or an FSDirectory? what
hardware were you running on?

Is your test application small enough that you can post it to the list?

I haven't done a lot of PMA testing of Lucene, but from what limited
testing i have done I'm a little suprised at those numbers, you'd get
results just as good if you ran the queries sequentially.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: multi-threaded thru-put in lucene

Posted by Mariella Di Giacomo <ma...@lanl.gov>.
Hi,

I have a question.
How big (in size and documents)  is your index ?
How many indexes do you search ?

Thanks,

Mariella

At 10:54 AM 1/5/2005 -0800, you wrote:
>Hi folks:
>
>     We are trying to measure thru-put lucene in a multi-threaded 
> environment.
>
>     This is what we found:
>
>      1 thread, search takes 20 ms.
>
>       2 threads, search takes 40 ms.
>
>       5 threads, search takes 100 ms.
>
>
>      Seems like under a multi-threaded scenario, thru-put isn't good,
>performance is not any better than that of 1 thread.
>
>      I tried to share an IndexSearcher amongst all threads as well as
>having an IndexSearcher per thread. Both yield same numbers.
>
>      Is this consistent with what you'd expect?
>
>Thanks
>
>-John
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org