You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Tim Patton <tp...@dealcatcher.com> on 2007/02/27 22:53:22 UTC

Federated Search

I just downloaded Solr to try out, it seems like it will replace a ton 
of code I've written.  I saw a few posts about the FederatedSearch and 
skimmed the ideas at http://wiki.apache.org/solr/FederatedSearch.  The 
project I am working on has several Lucene indexes 20-40GB in size 
spread among a few machines.  I've also run into problems figuring out 
how to work with Lucene in a distributed fashion, though all of my 
difficulties were in indexing, searching with Multisearcher and a few 
custom classes on top of the hits was not that difficult.

Indexing involved using a SQL database as a master db so you could find 
documents by their unique ID and a JMS server to distribute additions, 
deletions and updates to each of the indexing servers.  I eventually 
replaced the JMS server with someone custom I wrote that is much more 
lightweight, and less prone to bogging down.

I'd be curious if Yonik was still on the list and if he or anyone had 
any new ideas for Federated Searching.

Tim P.

Re: Federated Search

Posted by Mike Klaas <mi...@gmail.com>.

On 2/27/07, Ken Krugler <kk...@transpac.com> wrote:

> I'm also interested in this. For me, I don't need sorted output,
> faceted browsing, or alternative output formats - so something along
> the lines of the "Merge XML responses w/o Schema" proposal would be
> just fine.
>
> Open issues:

3.  Highlighting as a separate step.

Currently a bit of work needs to be done to do this efficiently with
Solr.  The way I set it up is roughly:
 - turn on lazy field loading.  For best effect, compress the main text field.
 - create a new request handler that is similar to dismax, but uses
the query for highlighting only.  A separate parameter allows the
specification of document keys to highlight
 - highlighting requires the internal lucene document id, not the
document key, and it can be slow to execute queries to get the ids.  I
created a custom cache that maps doc keys -> doc ids, populate it
during the main query, and grab ids from the cache during the
highlighting step.

regards,
-Mike

Re: Federated Search

Posted by Ken Krugler <kk...@transpac.com>.

>I just downloaded Solr to try out, it seems like it will replace a 
>ton of code I've written.  I saw a few posts about the 
>FederatedSearch and skimmed the ideas at 
>http://wiki.apache.org/solr/FederatedSearch.  The project I am 
>working on has several Lucene indexes 20-40GB in size spread among a 
>few machines.  I've also run into problems figuring out how to work 
>with Lucene in a distributed fashion, though all of my difficulties 
>were in indexing, searching with Multisearcher and a few custom 
>classes on top of the hits was not that difficult.
>
>Indexing involved using a SQL database as a master db so you could 
>find documents by their unique ID and a JMS server to distribute 
>additions, deletions and updates to each of the indexing servers.  I 
>eventually replaced the JMS server with someone custom I wrote that 
>is much more lightweight, and less prone to bogging down.
>
>I'd be curious if Yonik was still on the list and if he or anyone 
>had any new ideas for Federated Searching.

I'm also interested in this. For me, I don't need sorted output, 
faceted browsing, or alternative output formats - so something along 
the lines of the "Merge XML responses w/o Schema" proposal would be 
just fine.

Open issues:

1. How much better (if at all) would it be to use Hadoop PRC (versus 
HTTP) to call the sub-searchers? I'm assuming it has better 
performance, and there might be fewer connectivity issues, but then 
you aren't leveraging the work being done on embedded Jetty, for 
example. Anybody have data points on relative performance?

2. Is there one master schema on the "main" search server that could 
get distributed to the remote searchers, or would that be part of a 
snappuller-ish update mechanism?

Thanks,

-- Ken
-- 
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"