You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Greg Conway <gc...@textwise.com> on 2004/04/28 21:44:53 UTC

lucene applicability and performance

Hello.  Apologies if this has come up before, I'm new to the list and
didn't see anything in the archives that exactly matched my situation.

I am considering using Lucene to index and search a large collection of
small documents in a  specialized domain -- probably only a few
thousands unique terms spanning across anywhere from one million to ten
million small source documents.  I hope to be able to get ranked search
results back in less than 400 msec.

I suspect one issue I may face is index density owing to the large
numbers of documents and relatively small vocabulary.  That, in turn,
may be a drag on query processing.  I am working on strategies to
ameliorate that somewhat but it may be difficult.

In the meantime, I'm looking for some gut reactions from the experts
before I take this to the next stage.  Can Lucene scale well to this
kind of situation?  Can I realistically hope to get anywhere near my
performance targets?  Will I have to distribute pieces of the index
across several machines,  parallelize my retrievals, and merge the
results to do so?  If so, does Lucene already support that or will I
have to develop that logic in house?  (Seems like I saw a reference
somewhere that such a feature was coming soon, but I'm not sure when or
how it will be implemented.)

Any help, tips, references, or advice would be welcome and appreciated.
Thank you!

Regards,

Greg 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: lucene applicability and performance

Posted by Ype Kingma <yk...@xs4all.nl>.
Greg,

> Yes, see RemoteSearchable and MultiSearcher in org.apache.lucene.search.
> (See the javadoc on the website)

I meant ParallelMultiSearcher.

Good night,
Ype


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: lucene applicability and performance

Posted by Ype Kingma <yk...@xs4all.nl>.
Greg,

On Wednesday 28 April 2004 21:44, Greg Conway wrote:
> Hello.  Apologies if this has come up before, I'm new to the list and
> didn't see anything in the archives that exactly matched my situation.

It has, but each situation is different. Try this:
http://jakarta.apache.org/lucene/docs/benchmarks.html

> I am considering using Lucene to index and search a large collection of
> small documents in a  specialized domain -- probably only a few
>
> thousands unique terms spanning across anywhere from one million to ten
> million small source documents.  I hope to be able to get ranked search
> results back in less than 400 msec.
>
> I suspect one issue I may face is index density owing to the large
> numbers of documents and relatively small vocabulary.  That, in turn,
> may be a drag on query processing.  I am working on strategies to
> ameliorate that somewhat but it may be difficult.

A text search engine is your best bet in this situation.

> In the meantime, I'm looking for some gut reactions from the experts
> before I take this to the next stage.  Can Lucene scale well to this
> kind of situation?  Can I realistically hope to get anywhere near my

Yes.

> performance targets?  Will I have to distribute pieces of the index

Yes.

> across several machines,  parallelize my retrievals, and merge the

That's more difficult to say. You'll need to try.

> results to do so?  If so, does Lucene already support that or will I

Yes, see RemoteSearchable and MultiSearcher in org.apache.lucene.search.
(See the javadoc on the website)
But first make sure that the Analyzer you use for indexing fits your needs.

> have to develop that logic in house?  (Seems like I saw a reference

No.

> somewhere that such a feature was coming soon, but I'm not sure when or
> how it will be implemented.)

Have fun,
Ype


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org