You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Robert Newson <rn...@connected.com> on 2005/06/23 22:38:59 UTC

Hits not serializable.

Can Hits be made serializable?

I'm finding that almost all of the time for a remote search is spent 
lazily retrieving document objects.

I'd like to create a remote interface like with a method like;

Hits search(Query query, Filter filter, int prefetch)

The remote end would call Hits.doc() for the first $prefetch entries.

This will make a huge difference to remote searching performance;

total	fetch	server1	server2	server3
862     699     86      69      96

For now, I'll use Document[] as the return value, but Hits feels more 
natural.

B.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Hits not serializable. (bulk document retrieval)

Posted by Robert Newson <rn...@connected.com>.
Thanks for the suggestion. I have solved this problem locally, I'm 
wondering if this should be in Lucene core.

I have seven machines in a rack, each with Lucene indexes of about 30 
million messages each. I'm trying to search across them with 
RemoteSearcher and ParallelMultiSearcher.

Search times are impressive, only hundreds of milliseconds (for multiple 
term queries).

Unfortunately, in order for the search to be useful, I need to pull back 
a page worth of hits. In my case this is the first 25 results.

With the current out-of-the-box API this causes 50 sequential RMI calls, 
which seriously degrades the total time that the client must wait for a 
response.

ParallelMultiSearcher itself is pretty reasonable, though I have my own 
re-implementation using the java.util.concurrent framework. However, the 
Lucene API is simply not optimised for retrieving Documents in bulk.

Obviously we can all work around it in different ways, but I feel that 
it should be core functionality.

Searchable could have a bulk retrieval method and ParallelMultiSearcher 
should be able to execute it *in parallel* to each underlying searcher.

I've implemented it locally. If anyone feels that this addresses a 
genuine problem, let me know.

In short, should Lucene provide an efficient document paging facility, 
or is it not considered core?

B.

P.S. I'm using a CVS snapshot of Lucene 1.9.

Nrupal Akolkar wrote:
> Hi,
> Dear try doing the following,
> 1. write an extension class and extend the class containing search(...) 
> method you listed. Define that class to be serialized.
> 2. let the class be overriding search method with just same content in it as 
> in the super class.
> 3. build your lucene 1.** file again with ant, and try working out the way 
> you desire.
> I think this solves your problem.
> Nrupal
> 
> 
>  On 6/24/05, Robert Newson <rn...@connected.com> wrote: 
> 
>>
>>Can Hits be made serializable?
>>
>>I'm finding that almost all of the time for a remote search is spent
>>lazily retrieving document objects.
>>
>>I'd like to create a remote interface like with a method like;
>>
>>Hits search(Query query, Filter filter, int prefetch)
>>
>>The remote end would call Hits.doc() for the first $prefetch entries.
>>
>>This will make a huge difference to remote searching performance;
>>
>>total fetch server1 server2 server3
>>862 699 86 69 96
>>
>>For now, I'll use Document[] as the return value, but Hits feels more
>>natural.
>>
>>B.
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Hits not serializable.

Posted by Nrupal Akolkar <nr...@gmail.com>.
Hi,
Dear try doing the following,
1. write an extension class and extend the class containing search(...) 
method you listed. Define that class to be serialized.
2. let the class be overriding search method with just same content in it as 
in the super class.
3. build your lucene 1.** file again with ant, and try working out the way 
you desire.
I think this solves your problem.
Nrupal


 On 6/24/05, Robert Newson <rn...@connected.com> wrote: 
> 
> 
> Can Hits be made serializable?
> 
> I'm finding that almost all of the time for a remote search is spent
> lazily retrieving document objects.
> 
> I'd like to create a remote interface like with a method like;
> 
> Hits search(Query query, Filter filter, int prefetch)
> 
> The remote end would call Hits.doc() for the first $prefetch entries.
> 
> This will make a huge difference to remote searching performance;
> 
> total fetch server1 server2 server3
> 862 699 86 69 96
> 
> For now, I'll use Document[] as the return value, but Hits feels more
> natural.
> 
> B.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
>