You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Greg Conway <gc...@textwise.com> on 2004/04/28 21:44:53 UTC
lucene applicability and performance
Hello. Apologies if this has come up before, I'm new to the list and
didn't see anything in the archives that exactly matched my situation.
I am considering using Lucene to index and search a large collection of
small documents in a specialized domain -- probably only a few
thousands unique terms spanning across anywhere from one million to ten
million small source documents. I hope to be able to get ranked search
results back in less than 400 msec.
I suspect one issue I may face is index density owing to the large
numbers of documents and relatively small vocabulary. That, in turn,
may be a drag on query processing. I am working on strategies to
ameliorate that somewhat but it may be difficult.
In the meantime, I'm looking for some gut reactions from the experts
before I take this to the next stage. Can Lucene scale well to this
kind of situation? Can I realistically hope to get anywhere near my
performance targets? Will I have to distribute pieces of the index
across several machines, parallelize my retrievals, and merge the
results to do so? If so, does Lucene already support that or will I
have to develop that logic in house? (Seems like I saw a reference
somewhere that such a feature was coming soon, but I'm not sure when or
how it will be implemented.)
Any help, tips, references, or advice would be welcome and appreciated.
Thank you!
Regards,
Greg
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: lucene applicability and performance
Posted by Ype Kingma <yk...@xs4all.nl>.
Greg,
> Yes, see RemoteSearchable and MultiSearcher in org.apache.lucene.search.
> (See the javadoc on the website)
I meant ParallelMultiSearcher.
Good night,
Ype
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: lucene applicability and performance
Posted by Ype Kingma <yk...@xs4all.nl>.
Greg,
On Wednesday 28 April 2004 21:44, Greg Conway wrote:
> Hello. Apologies if this has come up before, I'm new to the list and
> didn't see anything in the archives that exactly matched my situation.
It has, but each situation is different. Try this:
http://jakarta.apache.org/lucene/docs/benchmarks.html
> I am considering using Lucene to index and search a large collection of
> small documents in a specialized domain -- probably only a few
>
> thousands unique terms spanning across anywhere from one million to ten
> million small source documents. I hope to be able to get ranked search
> results back in less than 400 msec.
>
> I suspect one issue I may face is index density owing to the large
> numbers of documents and relatively small vocabulary. That, in turn,
> may be a drag on query processing. I am working on strategies to
> ameliorate that somewhat but it may be difficult.
A text search engine is your best bet in this situation.
> In the meantime, I'm looking for some gut reactions from the experts
> before I take this to the next stage. Can Lucene scale well to this
> kind of situation? Can I realistically hope to get anywhere near my
Yes.
> performance targets? Will I have to distribute pieces of the index
Yes.
> across several machines, parallelize my retrievals, and merge the
That's more difficult to say. You'll need to try.
> results to do so? If so, does Lucene already support that or will I
Yes, see RemoteSearchable and MultiSearcher in org.apache.lucene.search.
(See the javadoc on the website)
But first make sure that the Analyzer you use for indexing fits your needs.
> have to develop that logic in house? (Seems like I saw a reference
No.
> somewhere that such a feature was coming soon, but I'm not sure when or
> how it will be implemented.)
Have fun,
Ype
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org