You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Paul Rosen <pa...@performantsoftware.com> on 2009/09/14 23:09:41 UTC

multicore shards and relevancy score

Hi,

I've done a few experiments with searching two cores with the same 
schema using the shard syntax. (using solr 1.3)

My use case is that I want to have multiple cores because a few 
different people will be managing the indexing, and that will happen at 
different times. The data, however, is homogeneous.

I've noticed in my tests that the results are not interwoven, but it 
might just be my test data. In other words, all the results from one 
core appear, then all the results from the other core.

In thinking about it, it would make sense if the relevancy scores for 
each core were completely independent of each other. And that would mean 
that there is no way to compare the relevancy scores between the cores.

In other words, I'd like the following results:

- really relevant hit from core0
- pretty relevant hit from core1
- kind of relevant hit from core0
- not so relevant hit from core1

but I get:

- really relevant hit from core0
- kind of relevant hit from core0
- pretty relevant hit from core1
- not so relevant hit from core1

So, are the results supposed to be interwoven, and I need to study my 
data more, or is this just not something that is possible?

Also, if this is insurmountable, I've discovered two show stoppers that 
will prevent using multicore in my project (counting the lack of support 
for faceting in multicore). Are these issues addressed in solr 1.4?

Thanks,
Paul

Re: multicore shards and relevancy score

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

On Tue, Sep 15, 2009 at 8:11 PM, Paul Rosen <pa...@performantsoftware.com>wrote:

>
> The second issue was detailed in an email last week "shards and facet
> count". The facet information is lost when doing a search over two shards,
> so if I use multicore, I can no longer have facets.
>

If both cores' schema is same and a uniqueKey is specified, then you can do
a distributed search between two cores. Facets work fine with distributed
search. There may be something wrong with your setup.

-- 
Regards,
Shalin Shekhar Mangar.

Re: multicore shards and relevancy score

Posted by Erik Hatcher <er...@gmail.com>.

On Sep 17, 2009, at 7:11 PM, Lance Norskog wrote:

>  This looks like a Ruby client bug.

Maybe, but I doubt it in this case.

But let's have some details of the Ruby code used to make the request,  
and what gets logged on the first Solr for the request.

	Erik



> If you do the same query with the HTTP url, it should work.
>
> On Tue, Sep 15, 2009 at 7:41 AM, Paul Rosen <paul@performantsoftware.com 
> > wrote:
>> Shalin Shekhar Mangar wrote:
>>>
>>> On Tue, Sep 15, 2009 at 2:39 AM, Paul Rosen
>>> <pa...@performantsoftware.com>wrote:
>>>
>>>> I've done a few experiments with searching two cores with the  
>>>> same schema
>>>> using the shard syntax. (using solr 1.3)
>>>>
>>>> My use case is that I want to have multiple cores because a few  
>>>> different
>>>> people will be managing the indexing, and that will happen at  
>>>> different
>>>> times. The data, however, is homogeneous.
>>>>
>>>>
>>> Multiple cores were not built for distributed search. It is  
>>> inefficient as
>>> compared to a single index. But if you want to use them that way,  
>>> that's
>>> your choice.
>>
>> Well, I'm experimenting with them because it will simplify index  
>> maintenance
>> greatly. I am beginning to think that it won't work in my case,  
>> though.
>>
>>>
>>>> I've noticed in my tests that the results are not interwoven, but  
>>>> it
>>>> might
>>>> just be my test data. In other words, all the results from one core
>>>> appear,
>>>> then all the results from the other core.
>>>>
>>>> In thinking about it, it would make sense if the relevancy scores  
>>>> for
>>>> each
>>>> core were completely independent of each other. And that would  
>>>> mean that
>>>> there is no way to compare the relevancy scores between the cores.
>>>>
>>>> In other words, I'd like the following results:
>>>>
>>>> - really relevant hit from core0
>>>> - pretty relevant hit from core1
>>>> - kind of relevant hit from core0
>>>> - not so relevant hit from core1
>>>>
>>>> but I get:
>>>>
>>>> - really relevant hit from core0
>>>> - kind of relevant hit from core0
>>>> - pretty relevant hit from core1
>>>> - not so relevant hit from core1
>>>>
>>>> So, are the results supposed to be interwoven, and I need to  
>>>> study my
>>>> data
>>>> more, or is this just not something that is possible?
>>>>
>>>>
>>> The only difference wrt relevancy between a distributed search and a
>>> single-node search is that there is no distributed IDF and  
>>> therefore a
>>> distributed search assumes a random distribution of terms among  
>>> shards.
>>> I'm
>>> not sure if that is what you are seeing.
>>>
>>>
>>>> Also, if this is insurmountable, I've discovered two show  
>>>> stoppers that
>>>> will prevent using multicore in my project (counting the lack of  
>>>> support
>>>> for
>>>> faceting in multicore). Are these issues addressed in solr 1.4?
>>>>
>>>>
>>> Can you give more details on what these two issues are?
>>>
>>
>> The first issue is detailed above, where the results from a search  
>> over two
>> shards don't appear to be returned in relevancy order.
>>
>> The second issue was detailed in an email last week "shards and facet
>> count". The facet information is lost when doing a search over two  
>> shards,
>> so if I use multicore, I can no longer have facets.
>>
>>
>>
>
>
>
> -- 
> Lance Norskog
> goksron@gmail.com

Re: multicore shards and relevancy score

Posted by Lance Norskog <go...@gmail.com>.

(I responded in the other thread.) This looks like a Ruby client bug.
If you do the same query with the HTTP url, it should work.

On Tue, Sep 15, 2009 at 7:41 AM, Paul Rosen <pa...@performantsoftware.com> wrote:
> Shalin Shekhar Mangar wrote:
>>
>> On Tue, Sep 15, 2009 at 2:39 AM, Paul Rosen
>> <pa...@performantsoftware.com>wrote:
>>
>>> I've done a few experiments with searching two cores with the same schema
>>> using the shard syntax. (using solr 1.3)
>>>
>>> My use case is that I want to have multiple cores because a few different
>>> people will be managing the indexing, and that will happen at different
>>> times. The data, however, is homogeneous.
>>>
>>>
>> Multiple cores were not built for distributed search. It is inefficient as
>> compared to a single index. But if you want to use them that way, that's
>> your choice.
>
> Well, I'm experimenting with them because it will simplify index maintenance
> greatly. I am beginning to think that it won't work in my case, though.
>
>>
>>> I've noticed in my tests that the results are not interwoven, but it
>>> might
>>> just be my test data. In other words, all the results from one core
>>> appear,
>>> then all the results from the other core.
>>>
>>> In thinking about it, it would make sense if the relevancy scores for
>>> each
>>> core were completely independent of each other. And that would mean that
>>> there is no way to compare the relevancy scores between the cores.
>>>
>>> In other words, I'd like the following results:
>>>
>>> - really relevant hit from core0
>>> - pretty relevant hit from core1
>>> - kind of relevant hit from core0
>>> - not so relevant hit from core1
>>>
>>> but I get:
>>>
>>> - really relevant hit from core0
>>> - kind of relevant hit from core0
>>> - pretty relevant hit from core1
>>> - not so relevant hit from core1
>>>
>>> So, are the results supposed to be interwoven, and I need to study my
>>> data
>>> more, or is this just not something that is possible?
>>>
>>>
>> The only difference wrt relevancy between a distributed search and a
>> single-node search is that there is no distributed IDF and therefore a
>> distributed search assumes a random distribution of terms among shards.
>> I'm
>> not sure if that is what you are seeing.
>>
>>
>>> Also, if this is insurmountable, I've discovered two show stoppers that
>>> will prevent using multicore in my project (counting the lack of support
>>> for
>>> faceting in multicore). Are these issues addressed in solr 1.4?
>>>
>>>
>> Can you give more details on what these two issues are?
>>
>
> The first issue is detailed above, where the results from a search over two
> shards don't appear to be returned in relevancy order.
>
> The second issue was detailed in an email last week "shards and facet
> count". The facet information is lost when doing a search over two shards,
> so if I use multicore, I can no longer have facets.
>
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: multicore shards and relevancy score

Posted by Jason Rutherglen <ja...@gmail.com>.

You can query multiple cores using MultiEmbeddedSearchHandler in
SOLR-1431.  Then the facet counts will be merged just like the current
distributed requests.

On Tue, Sep 15, 2009 at 7:41 AM, Paul Rosen <pa...@performantsoftware.com> wrote:
> Shalin Shekhar Mangar wrote:
>>
>> On Tue, Sep 15, 2009 at 2:39 AM, Paul Rosen
>> <pa...@performantsoftware.com>wrote:
>>
>>> I've done a few experiments with searching two cores with the same schema
>>> using the shard syntax. (using solr 1.3)
>>>
>>> My use case is that I want to have multiple cores because a few different
>>> people will be managing the indexing, and that will happen at different
>>> times. The data, however, is homogeneous.
>>>
>>>
>> Multiple cores were not built for distributed search. It is inefficient as
>> compared to a single index. But if you want to use them that way, that's
>> your choice.
>
> Well, I'm experimenting with them because it will simplify index maintenance
> greatly. I am beginning to think that it won't work in my case, though.
>
>>
>>> I've noticed in my tests that the results are not interwoven, but it
>>> might
>>> just be my test data. In other words, all the results from one core
>>> appear,
>>> then all the results from the other core.
>>>
>>> In thinking about it, it would make sense if the relevancy scores for
>>> each
>>> core were completely independent of each other. And that would mean that
>>> there is no way to compare the relevancy scores between the cores.
>>>
>>> In other words, I'd like the following results:
>>>
>>> - really relevant hit from core0
>>> - pretty relevant hit from core1
>>> - kind of relevant hit from core0
>>> - not so relevant hit from core1
>>>
>>> but I get:
>>>
>>> - really relevant hit from core0
>>> - kind of relevant hit from core0
>>> - pretty relevant hit from core1
>>> - not so relevant hit from core1
>>>
>>> So, are the results supposed to be interwoven, and I need to study my
>>> data
>>> more, or is this just not something that is possible?
>>>
>>>
>> The only difference wrt relevancy between a distributed search and a
>> single-node search is that there is no distributed IDF and therefore a
>> distributed search assumes a random distribution of terms among shards.
>> I'm
>> not sure if that is what you are seeing.
>>
>>
>>> Also, if this is insurmountable, I've discovered two show stoppers that
>>> will prevent using multicore in my project (counting the lack of support
>>> for
>>> faceting in multicore). Are these issues addressed in solr 1.4?
>>>
>>>
>> Can you give more details on what these two issues are?
>>
>
> The first issue is detailed above, where the results from a search over two
> shards don't appear to be returned in relevancy order.
>
> The second issue was detailed in an email last week "shards and facet
> count". The facet information is lost when doing a search over two shards,
> so if I use multicore, I can no longer have facets.
>
>
>

Re: multicore shards and relevancy score

Posted by Paul Rosen <pa...@performantsoftware.com>.

Shalin Shekhar Mangar wrote:
> On Tue, Sep 15, 2009 at 2:39 AM, Paul Rosen <pa...@performantsoftware.com>wrote:
> 
>> I've done a few experiments with searching two cores with the same schema
>> using the shard syntax. (using solr 1.3)
>>
>> My use case is that I want to have multiple cores because a few different
>> people will be managing the indexing, and that will happen at different
>> times. The data, however, is homogeneous.
>>
>>
> Multiple cores were not built for distributed search. It is inefficient as
> compared to a single index. But if you want to use them that way, that's
> your choice.

Well, I'm experimenting with them because it will simplify index 
maintenance greatly. I am beginning to think that it won't work in my 
case, though.

> 
>> I've noticed in my tests that the results are not interwoven, but it might
>> just be my test data. In other words, all the results from one core appear,
>> then all the results from the other core.
>>
>> In thinking about it, it would make sense if the relevancy scores for each
>> core were completely independent of each other. And that would mean that
>> there is no way to compare the relevancy scores between the cores.
>>
>> In other words, I'd like the following results:
>>
>> - really relevant hit from core0
>> - pretty relevant hit from core1
>> - kind of relevant hit from core0
>> - not so relevant hit from core1
>>
>> but I get:
>>
>> - really relevant hit from core0
>> - kind of relevant hit from core0
>> - pretty relevant hit from core1
>> - not so relevant hit from core1
>>
>> So, are the results supposed to be interwoven, and I need to study my data
>> more, or is this just not something that is possible?
>>
>>
> The only difference wrt relevancy between a distributed search and a
> single-node search is that there is no distributed IDF and therefore a
> distributed search assumes a random distribution of terms among shards. I'm
> not sure if that is what you are seeing.
> 
> 
>> Also, if this is insurmountable, I've discovered two show stoppers that
>> will prevent using multicore in my project (counting the lack of support for
>> faceting in multicore). Are these issues addressed in solr 1.4?
>>
>>
> Can you give more details on what these two issues are?
> 

The first issue is detailed above, where the results from a search over 
two shards don't appear to be returned in relevancy order.

The second issue was detailed in an email last week "shards and facet 
count". The facet information is lost when doing a search over two 
shards, so if I use multicore, I can no longer have facets.

Re: multicore shards and relevancy score

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

On Tue, Sep 15, 2009 at 2:39 AM, Paul Rosen <pa...@performantsoftware.com>wrote:

>
> I've done a few experiments with searching two cores with the same schema
> using the shard syntax. (using solr 1.3)
>
> My use case is that I want to have multiple cores because a few different
> people will be managing the indexing, and that will happen at different
> times. The data, however, is homogeneous.
>
>
Multiple cores were not built for distributed search. It is inefficient as
compared to a single index. But if you want to use them that way, that's
your choice.


> I've noticed in my tests that the results are not interwoven, but it might
> just be my test data. In other words, all the results from one core appear,
> then all the results from the other core.
>
> In thinking about it, it would make sense if the relevancy scores for each
> core were completely independent of each other. And that would mean that
> there is no way to compare the relevancy scores between the cores.
>
> In other words, I'd like the following results:
>
> - really relevant hit from core0
> - pretty relevant hit from core1
> - kind of relevant hit from core0
> - not so relevant hit from core1
>
> but I get:
>
> - really relevant hit from core0
> - kind of relevant hit from core0
> - pretty relevant hit from core1
> - not so relevant hit from core1
>
> So, are the results supposed to be interwoven, and I need to study my data
> more, or is this just not something that is possible?
>
>
The only difference wrt relevancy between a distributed search and a
single-node search is that there is no distributed IDF and therefore a
distributed search assumes a random distribution of terms among shards. I'm
not sure if that is what you are seeing.


> Also, if this is insurmountable, I've discovered two show stoppers that
> will prevent using multicore in my project (counting the lack of support for
> faceting in multicore). Are these issues addressed in solr 1.4?
>
>
Can you give more details on what these two issues are?

-- 
Regards,
Shalin Shekhar Mangar.