You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by "Peter W." <pe...@marketingbrokers.com> on 2007/05/21 23:38:48 UTC

In memory MultiSearcher

Hello,

I have been using a large, in memory MultiSearcher that
is reaching the limits of my hardware RAM with this code:

       try
          {
          IndexSearcher[] searcher_a=
             {
             new IndexSearcher(new RAMDirectory(index_one_path)),
             new IndexSearcher(new RAMDirectory(index_two_path)),
             new IndexSearcher(new RAMDirectory(index_thee_path)),
             new IndexSearcher(new RAMDirectory(index_four_path)),
             new IndexSearcher(new RAMDirectory(index_n_path))
             };

	 MultiSearcher searcher_ms=new MultiSearcher(searcher_a);
	...
          }
       catch(Exception e)
          {
          System.out.println(e);
          }

For example, one of several indexes is 768MB. Is there possibly a  
better way to do this?

Regards,

Peter W.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: In memory MultiSearcher

Posted by Erick Erickson <er...@gmail.com>.

You're right, I am suggesting that you use the Lucene
caching and see if it is adequate.

Mind you, I have no clue whether your application will be well
served by this or not, I've just seen too many examples of folks
(includeing me) jumping into a solution to a problem that doesn't
exist to be able to refrain from asking "Do you *know* you need
to get fancy?" <G>...

FWIW, a couple of things to watch out for. The searchers need to
process a few queries before the caches are built up, so if you
are doing load testing, be aware that the first few queries aren't
representative.

Second, you'll probably need to build at least a simple load tester
(actually, I'm probably sure some exist off the shelf, but we've
been doing things in-house so far) to fire off a bunch of threads
that make requests and measure response. I'd hate for my off-the-cuff
advice to cause your app to tip over the first time you opened it up
to the public <G>....

Best
Erick

On 5/22/07, Peter W. <pe...@marketingbrokers.com> wrote:
>
> Erick,
>
> Thanks for the reply, this is a web application.
>
> If you want to serve image files in a scalable fashion
> on the Internet you make Apache serve them from
> memory, not the filesystem.
>
> For databases, some sites use a distributed object
> memory caching system such as memcached.
>
> I was hoping the idea translates to Lucene and
> was trying to overcome reading multiple indexes
> from attached disks into main memory on one
> machine first, then across a balanced farm.
>
> I think what you are saying is use FSDirectories and
> the regular built-in Lucene caching provided instead.
>
> Let's give that a try in the servlet init() method!
>
> Regards,
>
> Peter W.
>
>
> On May 21, 2007, at 2:46 PM, Erick Erickson wrote:
>
> > Why are you doing this in the first place? Do you actually have
> > evidence that the default Lucene behavior (caching, etc) is inadequate
> > for your needs?
> >
> > I'd *strongly* recommend, if you haven't, just using the regular
> > FSDirectories rather than RAMDirectories and only getting
> > complex if that's too slow...
> >
> > I ask because I am searching FS-based indexes that are 4G with
> > no problem. The index *was* 8G and had only a 10% performance hit.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: In memory MultiSearcher

Posted by "Peter W." <pe...@marketingbrokers.com>.

Erick,

Thanks for the reply, this is a web application.

If you want to serve image files in a scalable fashion
on the Internet you make Apache serve them from
memory, not the filesystem.

For databases, some sites use a distributed object
memory caching system such as memcached.

I was hoping the idea translates to Lucene and
was trying to overcome reading multiple indexes
from attached disks into main memory on one
machine first, then across a balanced farm.

I think what you are saying is use FSDirectories and
the regular built-in Lucene caching provided instead.

Let's give that a try in the servlet init() method!

Regards,

Peter W.

On May 21, 2007, at 2:46 PM, Erick Erickson wrote:

> Why are you doing this in the first place? Do you actually have
> evidence that the default Lucene behavior (caching, etc) is inadequate
> for your needs?
>
> I'd *strongly* recommend, if you haven't, just using the regular
> FSDirectories rather than RAMDirectories and only getting
> complex if that's too slow...
>
> I ask because I am searching FS-based indexes that are 4G with
> no problem. The index *was* 8G and had only a 10% performance hit.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: In memory MultiSearcher

Posted by "Peter W." <pe...@marketingbrokers.com>.

Hoss,

My Lucene scaling strategy involves creating
numerous indexes, so I was looking for a way
to read them in together for quickness.

For those interested, your suggestion of using a
single IndexSearcher on a MultiReader works well
by itself.

Or, you can still place in memory like this:

IndexReader[] indexr_a=
            {
            IndexReader.open(new RAMDirectory(index_one_path)),
            IndexReader.open(new RAMDirectory(index_two_path)),
            IndexReader.open(new RAMDirectory(index_three_path)),
            IndexReader.open(new RAMDirectory(index_n_path))
            };

          MultiReader mr=new MultiReader(indexr_a);
          IndexSearcher is=new IndexSearcher(mr);

Regards,

Peter W.



On May 22, 2007, at 1:10 AM, Chris Hostetter wrote:

> ...and if you are "Multi Searching" over a bunch of local directories
> anyway, then use a single INdexSearcher on a MultiReader instead ...
> that should be much faster then youre MultiSearcher.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: In memory MultiSearcher

Posted by Chris Hostetter <ho...@fucit.org>.

: I'd *strongly* recommend, if you haven't, just using the regular
: FSDirectories rather than RAMDirectories and only getting
: complex if that's too slow...

...and if you are "Multi Searching" over a bunch of local directories
anyway, then use a single INdexSearcher on a MultiReader instead ...
that should be much faster then youre MultiSearcher.





-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: In memory MultiSearcher

Posted by Erick Erickson <er...@gmail.com>.

Why are you doing this in the first place? Do you actually have
evidence that the default Lucene behavior (caching, etc) is inadequate
for your needs?

I'd *strongly* recommend, if you haven't, just using the regular
FSDirectories rather than RAMDirectories and only getting
complex if that's too slow...

I ask because I am searching FS-based indexes that are 4G with
no problem. The index *was* 8G and had only a 10% performance hit.

Best
Erick

On 5/21/07, Peter W. <pe...@marketingbrokers.com> wrote:
>
> Hello,
>
> I have been using a large, in memory MultiSearcher that
> is reaching the limits of my hardware RAM with this code:
>
>        try
>           {
>           IndexSearcher[] searcher_a=
>              {
>              new IndexSearcher(new RAMDirectory(index_one_path)),
>              new IndexSearcher(new RAMDirectory(index_two_path)),
>              new IndexSearcher(new RAMDirectory(index_thee_path)),
>              new IndexSearcher(new RAMDirectory(index_four_path)),
>              new IndexSearcher(new RAMDirectory(index_n_path))
>              };
>
>          MultiSearcher searcher_ms=new MultiSearcher(searcher_a);
>         ...
>           }
>        catch(Exception e)
>           {
>           System.out.println(e);
>           }
>
> For example, one of several indexes is 768MB. Is there possibly a
> better way to do this?
>
> Regards,
>
> Peter W.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>