You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Tea Yu <te...@netvigator.com> on 2004/07/26 15:54:01 UTC

RE: Anyone use MultiSearcher class

Mark,

I'm also planning a distributed index system.  After reading some code, I
think it's more efficient to get rid of Hits and work directly with TopDocs
returned by ParallelMultiSearcher.search(), I dun need the cache anyway as I
dun need stateful netvigation.

Another question is - does each Hits.doc(i) lead to an obj
serialization/traffic/deserialization?  Do we need a ValueListHolder to
optimize that?

I also wonder why many search() methods don't throw RemoteExceptions, any
idea?

Thanks
Tea

> Don, I think I finally understand your problem -- and mine -- with
> MultiSearcher. I had tested an implementation of my system using
> ParallelMultiSearcher to split a huge index over many computers.
> I was very impressed by the results on my test data, but alarmed
> after a trial with live data :)
>
> Consider MultiSearcher.search(Query Q). Suppose that Q aggregated
> over ALL the Searchables in the MultiSearcher would return 1000
> documents. But, the Hits object created by search() will only cache
> the first 100 documents. When Hits.doc(101) is called, Hits will
> cache 200 documents -- then 400, 800, 1600 and so on. How does Hits
> get these extra documents? By calling the MultiSearcher again.
>
> Now consider a MultiSearcher as described above with 2 Searchables.
> With respect to Q, Searchable S has 1000 documents, Searchable T
> has zero. So to fetch the 101st document, not only is S searched,
> but T is too, even though the result of Q applied to T is still zero
> and will always be zero. The same thing will happen when fetching
> the 201st, 401st and 801st document.
>
> This accounts for my slow performance, and I think yours too. That
> your observed degradation is a power of 2 is a clue.
>
> My performance is especially vulnerable because "slave" Searchables
> in the MultiSearcher are Remote -- accessed via RMI.
>
> I guess I have to code smarter around MultiSearcher. One problem
> you highlight is that Hits is final -- so it is not possible even to
> modify the "100/200/400" cache size logic.
>
> Any ideas from anyone would be much appreciated.
>
> Mark Florence
> CTO, AIRS
> 800-897-7714 x 1703
> [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org