You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shinichiro Abe <sh...@gmail.com> on 2013/03/07 09:46:02 UTC

Distributed Search Question

Hi,
Does the distributed search work when Solr servers have each different schema.xml?
Can it work as long as I search for common field?
I have two Solr servers. The one has id, title, body and filename fields
 (indexing file server's data) and the other has id, title, body and url fields
 (indexing web server's data ) in schema.xml.
Does the distributed search for these servers work 
as long as I search for title and body field? 
Though I think it is important to have the same schema.xml in these servers
when using distributed search, is there problem when Solr servers have 
each different schema.xml? 

Thanks in advance,
Shinichiro Abe

Re: Distributed Search Question

Posted by Upayavira <uv...@odoko.co.uk>.
Firstly, you could combine your two schemas into one, and have id,
title, body, filename and url. I'd also add 'source' too. Then all
questions of different schemas go away :-)

But, to answer your original question - so long as the fields that are
queried on exist on both sides, you should be okay. However, you will
want to make sure you are using a similar analysis chain/field type for
your text fields, so that scoring will be similar on both sides.

You should also be aware of issues around the lack of distributed IDF
and whether that'll cause you scoring issues.

To put that issue simply - the number of times a term appears in the
whole index forms a part of the score. If the distribution of terms
across both indexes is not particularly even, then you can get differing
IDF values for the same term, meaning a document will get a different
score based upon which index it is in. This will spoil the accuracy of
your search results.

The general recommendation when 'sharding' your index is to have your
documents evenly split across shards, but that may not be an option for
you.

Upayavira

On Thu, Mar 7, 2013, at 08:46 AM, Shinichiro Abe wrote:
> Hi,
> Does the distributed search work when Solr servers have each different
> schema.xml?
> Can it work as long as I search for common field?
> I have two Solr servers. The one has id, title, body and filename fields
>  (indexing file server's data) and the other has id, title, body and url
>  fields
>  (indexing web server's data ) in schema.xml.
> Does the distributed search for these servers work 
> as long as I search for title and body field? 
> Though I think it is important to have the same schema.xml in these
> servers
> when using distributed search, is there problem when Solr servers have 
> each different schema.xml? 
> 
> Thanks in advance,
> Shinichiro Abe