You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Walter Underwood <wu...@netflix.com> on 2009/04/10 19:56:05 UTC

Help with relevance failure in Solr 1.3

We have a rare, hard-to-reproduce problem with our Solr 1.3 servers, and
I would appreciate any ideas.

Ocassionally, a server will start returning results with really poor
relevance. Single term queries work fine, but multi-term queries are
scored based on the most common term (lowest IDF).

I don't see anything in the logs when this happens. We have a monitor
doing a search for the 100 most popular movies once per minute to
catch this, so we know when it was first detected.

I'm attaching two explain outputs, one for the query "changeling" and
one for "the changeling".

We are running Solr 1.3 with Lucene 2.4.0, and have added a fuzzy query
using JaroWinkler matching.

I'd appreciate ideas about where to look, what debug output to try, etc.

wunder

Re: Help with relevance failure in Solr 1.3

Posted by Walter Underwood <wu...@netflix.com>.

Sorry to not respond for a week, it got busy here.

Here are the URL params for one request:

qt=simple
facet=true
facet.limit=-1
facet.mincount=3
q=type:group AND qt_all:IsCriticallyAcclaimed AND qt_all:InGenreComedy
facet.field=qt_toddscoregenres
facet.field=qt_genres
facet.field=qt_moods
[and so on for quite a few more fields]

I don't see any facet-specific stuff in solrconfig.xml.

wunder

On 4/14/09 7:57 PM, "Grant Ingersoll" <gs...@apache.org> wrote:

> OK, I guess details on the new faceting stuff would be in order.
> Which faceting are using?  Are you sure that it never occurred before
> (i.e. it slipped under the radar)?
> 
> Obviously, the key is reproducibility here, but this has all the
> earmarks of some weird threading issue, it seems, at least IMO.
> 
> 
> On Apr 14, 2009, at 5:32 PM, Walter Underwood wrote:
> 
>> I already ruled out cosmic rays. It has happened on different
>> hardware and at different times of day, including low load.
>> 
>> The only thing associated with it is load from a new faceted
>> browse thing we turned on.
>> 
>> wunder
>> 
>> On 4/14/09 2:23 PM, "Grant Ingersoll" <gs...@apache.org> wrote:
>> 
>>> Is bad memory a possibility?  i.e. is it the same machine all the
>>> time?  Is there any recognizable pattern for when it happens?
>>> 
>>> -Grant (grasping at straws)
>>> 
>>> 
>>> On Apr 14, 2009, at 2:51 PM, Walter Underwood wrote:
>>> 
>>>> Nope. This is a slave, so no indexing happens, just a sync. The
>>>> sync happens once per day. It went bad at a different time.
>>>> 
>>>> wunder
>>>> 
>>>> On 4/14/09 11:42 AM, "Grant Ingersoll" <gs...@apache.org> wrote:
>>>> 
>>>>> Are there changes occuring when it goes bad that maybe aren't
>>>>> committed?
>>>>> 
>>>>> On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote:
>>>>> 
>>>>>> But why would it work for a few days, then go bad and stay bad?
>>>>>> 
>>>>>> It fails for every multi-term query, even those not in cache.
>>>>>> I ran a test with more queries than the cache size.
>>>>>> 
>>>>>> We do use autowarming.
>>>>>> 
>>>>>> wunder
>>>>>> 
>>>>>> On 4/14/09 10:55 AM, "Yonik Seeley" <yo...@lucidimagination.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood
>>>>>>> <wu...@netflix.com> wrote:
>>>>>>>> The JaroWinkler equals was broken, but I fixed that a month ago.
>>>>>>>> 
>>>>>>>> Query cache sounds possible, but those are cleared on a commit,
>>>>>>>> right?
>>>>>>> 
>>>>>>> Yes, but if you use autowarming, those items are regenerated
>>>>>>> and if
>>>>>>> there is a problem with equals() then it could re-appear (the
>>>>>>> cache
>>>>>>> items are correct, it's just the lookup that returns the wrong
>>>>>>> one).
>>>>>>> 
>>>>>>> -Yonik
>>>>>>> http://www.lucidimagination.com
>>>>>> 
>>>>> 
>>>>> --------------------------
>>>>> Grant Ingersoll
>>>>> http://www.lucidimagination.com/
>>>>> 
>>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>>>>> using Solr/Lucene:
>>>>> http://www.lucidimagination.com/search
>>>>> 
>>>> 
>>> 
>>> --------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>> 
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>>> using Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>> 
>> 
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
> using Solr/Lucene:
> http://www.lucidimagination.com/search
>

Re: Help with relevance failure in Solr 1.3

Posted by Grant Ingersoll <gs...@apache.org>.

OK, I guess details on the new faceting stuff would be in order.   
Which faceting are using?  Are you sure that it never occurred before  
(i.e. it slipped under the radar)?

Obviously, the key is reproducibility here, but this has all the  
earmarks of some weird threading issue, it seems, at least IMO.


On Apr 14, 2009, at 5:32 PM, Walter Underwood wrote:

> I already ruled out cosmic rays. It has happened on different
> hardware and at different times of day, including low load.
>
> The only thing associated with it is load from a new faceted
> browse thing we turned on.
>
> wunder
>
> On 4/14/09 2:23 PM, "Grant Ingersoll" <gs...@apache.org> wrote:
>
>> Is bad memory a possibility?  i.e. is it the same machine all the
>> time?  Is there any recognizable pattern for when it happens?
>>
>> -Grant (grasping at straws)
>>
>>
>> On Apr 14, 2009, at 2:51 PM, Walter Underwood wrote:
>>
>>> Nope. This is a slave, so no indexing happens, just a sync. The
>>> sync happens once per day. It went bad at a different time.
>>>
>>> wunder
>>>
>>> On 4/14/09 11:42 AM, "Grant Ingersoll" <gs...@apache.org> wrote:
>>>
>>>> Are there changes occuring when it goes bad that maybe aren't
>>>> committed?
>>>>
>>>> On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote:
>>>>
>>>>> But why would it work for a few days, then go bad and stay bad?
>>>>>
>>>>> It fails for every multi-term query, even those not in cache.
>>>>> I ran a test with more queries than the cache size.
>>>>>
>>>>> We do use autowarming.
>>>>>
>>>>> wunder
>>>>>
>>>>> On 4/14/09 10:55 AM, "Yonik Seeley" <yo...@lucidimagination.com>
>>>>> wrote:
>>>>>
>>>>>> On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood
>>>>>> <wu...@netflix.com> wrote:
>>>>>>> The JaroWinkler equals was broken, but I fixed that a month ago.
>>>>>>>
>>>>>>> Query cache sounds possible, but those are cleared on a commit,
>>>>>>> right?
>>>>>>
>>>>>> Yes, but if you use autowarming, those items are regenerated  
>>>>>> and if
>>>>>> there is a problem with equals() then it could re-appear (the  
>>>>>> cache
>>>>>> items are correct, it's just the lookup that returns the wrong
>>>>>> one).
>>>>>>
>>>>>> -Yonik
>>>>>> http://www.lucidimagination.com
>>>>>
>>>>
>>>> --------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com/
>>>>
>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>>>> using Solr/Lucene:
>>>> http://www.lucidimagination.com/search
>>>>
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>> using Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search

Re: Help with relevance failure in Solr 1.3

Posted by Walter Underwood <wu...@netflix.com>.

I already ruled out cosmic rays. It has happened on different
hardware and at different times of day, including low load.

The only thing associated with it is load from a new faceted
browse thing we turned on.

wunder

On 4/14/09 2:23 PM, "Grant Ingersoll" <gs...@apache.org> wrote:

> Is bad memory a possibility?  i.e. is it the same machine all the
> time?  Is there any recognizable pattern for when it happens?
> 
> -Grant (grasping at straws)
> 
> 
> On Apr 14, 2009, at 2:51 PM, Walter Underwood wrote:
> 
>> Nope. This is a slave, so no indexing happens, just a sync. The
>> sync happens once per day. It went bad at a different time.
>> 
>> wunder
>> 
>> On 4/14/09 11:42 AM, "Grant Ingersoll" <gs...@apache.org> wrote:
>> 
>>> Are there changes occuring when it goes bad that maybe aren't
>>> committed?
>>> 
>>> On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote:
>>> 
>>>> But why would it work for a few days, then go bad and stay bad?
>>>> 
>>>> It fails for every multi-term query, even those not in cache.
>>>> I ran a test with more queries than the cache size.
>>>> 
>>>> We do use autowarming.
>>>> 
>>>> wunder
>>>> 
>>>> On 4/14/09 10:55 AM, "Yonik Seeley" <yo...@lucidimagination.com>
>>>> wrote:
>>>> 
>>>>> On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood
>>>>> <wu...@netflix.com> wrote:
>>>>>> The JaroWinkler equals was broken, but I fixed that a month ago.
>>>>>> 
>>>>>> Query cache sounds possible, but those are cleared on a commit,
>>>>>> right?
>>>>> 
>>>>> Yes, but if you use autowarming, those items are regenerated and if
>>>>> there is a problem with equals() then it could re-appear (the cache
>>>>> items are correct, it's just the lookup that returns the wrong
>>>>> one).
>>>>> 
>>>>> -Yonik
>>>>> http://www.lucidimagination.com
>>>> 
>>> 
>>> --------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>> 
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>>> using Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>> 
>> 
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
> using Solr/Lucene:
> http://www.lucidimagination.com/search
>

Re: Help with relevance failure in Solr 1.3

Posted by Grant Ingersoll <gs...@apache.org>.

Is bad memory a possibility?  i.e. is it the same machine all the  
time?  Is there any recognizable pattern for when it happens?

-Grant (grasping at straws)


On Apr 14, 2009, at 2:51 PM, Walter Underwood wrote:

> Nope. This is a slave, so no indexing happens, just a sync. The
> sync happens once per day. It went bad at a different time.
>
> wunder
>
> On 4/14/09 11:42 AM, "Grant Ingersoll" <gs...@apache.org> wrote:
>
>> Are there changes occuring when it goes bad that maybe aren't  
>> committed?
>>
>> On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote:
>>
>>> But why would it work for a few days, then go bad and stay bad?
>>>
>>> It fails for every multi-term query, even those not in cache.
>>> I ran a test with more queries than the cache size.
>>>
>>> We do use autowarming.
>>>
>>> wunder
>>>
>>> On 4/14/09 10:55 AM, "Yonik Seeley" <yo...@lucidimagination.com>
>>> wrote:
>>>
>>>> On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood
>>>> <wu...@netflix.com> wrote:
>>>>> The JaroWinkler equals was broken, but I fixed that a month ago.
>>>>>
>>>>> Query cache sounds possible, but those are cleared on a commit,
>>>>> right?
>>>>
>>>> Yes, but if you use autowarming, those items are regenerated and if
>>>> there is a problem with equals() then it could re-appear (the cache
>>>> items are correct, it's just the lookup that returns the wrong  
>>>> one).
>>>>
>>>> -Yonik
>>>> http://www.lucidimagination.com
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>> using Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search

Re: Help with relevance failure in Solr 1.3

Posted by Walter Underwood <wu...@netflix.com>.

Nope. This is a slave, so no indexing happens, just a sync. The
sync happens once per day. It went bad at a different time.

wunder

On 4/14/09 11:42 AM, "Grant Ingersoll" <gs...@apache.org> wrote:

> Are there changes occuring when it goes bad that maybe aren't committed?
> 
> On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote:
> 
>> But why would it work for a few days, then go bad and stay bad?
>> 
>> It fails for every multi-term query, even those not in cache.
>> I ran a test with more queries than the cache size.
>> 
>> We do use autowarming.
>> 
>> wunder
>> 
>> On 4/14/09 10:55 AM, "Yonik Seeley" <yo...@lucidimagination.com>
>> wrote:
>> 
>>> On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood
>>> <wu...@netflix.com> wrote:
>>>> The JaroWinkler equals was broken, but I fixed that a month ago.
>>>> 
>>>> Query cache sounds possible, but those are cleared on a commit,
>>>> right?
>>> 
>>> Yes, but if you use autowarming, those items are regenerated and if
>>> there is a problem with equals() then it could re-appear (the cache
>>> items are correct, it's just the lookup that returns the wrong one).
>>> 
>>> -Yonik
>>> http://www.lucidimagination.com
>> 
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
> using Solr/Lucene:
> http://www.lucidimagination.com/search
>

Re: Help with relevance failure in Solr 1.3

Posted by Grant Ingersoll <gs...@apache.org>.

Are there changes occuring when it goes bad that maybe aren't committed?

On Apr 14, 2009, at 1:59 PM, Walter Underwood wrote:

> But why would it work for a few days, then go bad and stay bad?
>
> It fails for every multi-term query, even those not in cache.
> I ran a test with more queries than the cache size.
>
> We do use autowarming.
>
> wunder
>
> On 4/14/09 10:55 AM, "Yonik Seeley" <yo...@lucidimagination.com>  
> wrote:
>
>> On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood
>> <wu...@netflix.com> wrote:
>>> The JaroWinkler equals was broken, but I fixed that a month ago.
>>>
>>> Query cache sounds possible, but those are cleared on a commit,
>>> right?
>>
>> Yes, but if you use autowarming, those items are regenerated and if
>> there is a problem with equals() then it could re-appear (the cache
>> items are correct, it's just the lookup that returns the wrong one).
>>
>> -Yonik
>> http://www.lucidimagination.com
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search

Re: Help with relevance failure in Solr 1.3

Posted by Walter Underwood <wu...@netflix.com>.

But why would it work for a few days, then go bad and stay bad?

It fails for every multi-term query, even those not in cache.
I ran a test with more queries than the cache size.

We do use autowarming.

wunder

On 4/14/09 10:55 AM, "Yonik Seeley" <yo...@lucidimagination.com> wrote:

> On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood
> <wu...@netflix.com> wrote:
>> The JaroWinkler equals was broken, but I fixed that a month ago.
>> 
>> Query cache sounds possible, but those are cleared on a commit,
>> right?
> 
> Yes, but if you use autowarming, those items are regenerated and if
> there is a problem with equals() then it could re-appear (the cache
> items are correct, it's just the lookup that returns the wrong one).
> 
> -Yonik
> http://www.lucidimagination.com

Re: Help with relevance failure in Solr 1.3

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Tue, Apr 14, 2009 at 12:19 PM, Walter Underwood
<wu...@netflix.com> wrote:
> The JaroWinkler equals was broken, but I fixed that a month ago.
>
> Query cache sounds possible, but those are cleared on a commit,
> right?

Yes, but if you use autowarming, those items are regenerated and if
there is a problem with equals() then it could re-appear (the cache
items are correct, it's just the lookup that returns the wrong one).

-Yonik
http://www.lucidimagination.com

Re: Help with relevance failure in Solr 1.3

Posted by Walter Underwood <wu...@netflix.com>.

The JaroWinkler equals was broken, but I fixed that a month ago.

Query cache sounds possible, but those are cleared on a commit,
right?

I could run with a cache size of 0, since our middle tier HTTP
cache is leaving almost nothing for the caches to do.

I'll try that explain. The stored fields for the "correct" doc
are fine, because I can see them when I use a single-term query.
The indexed fields seem OK, because that query works.

wunder

On 4/14/09 9:11 AM, "Yonik Seeley" <yo...@lucidimagination.com> wrote:

> It just occurred to me that a query cache issue could potentially
> cause this... if it's caching it would most likely be a query.equals()
> implementation incorrectly returning true.
> Perhaps check the JaroWinkler.equals() first?
> 
> Also, when one server starts to return bad results, have you tried
> using explainOther=id:id_of_other_doc_that_should_score_higher?
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 
> On Tue, Apr 14, 2009 at 11:43 AM, Walter Underwood
> <wu...@netflix.com> wrote:
>> Dang, had another server do this.
>> 
>> Syncing and committing a new index does not fix it. The two servers
>> show the same bad results.
>> 
>> wunder
>> 
>> On 4/11/09 9:12 AM, "Walter Underwood" <wu...@netflix.com> wrote:
>> 
>>> Restarting Solr fixes it. If I remember correctly, a sync and commit
>>> does not fix it. I have disabled snappuller this time, so I can study
>>> the broken instance.
>>> 
>>> wunder

Re: Help with relevance failure in Solr 1.3

Posted by Yonik Seeley <yo...@lucidimagination.com>.

It just occurred to me that a query cache issue could potentially
cause this... if it's caching it would most likely be a query.equals()
implementation incorrectly returning true.
Perhaps check the JaroWinkler.equals() first?

Also, when one server starts to return bad results, have you tried
using explainOther=id:id_of_other_doc_that_should_score_higher?

-Yonik
http://www.lucidimagination.com

On Tue, Apr 14, 2009 at 11:43 AM, Walter Underwood
<wu...@netflix.com> wrote:
> Dang, had another server do this.
>
> Syncing and committing a new index does not fix it. The two servers
> show the same bad results.
>
> wunder
>
> On 4/11/09 9:12 AM, "Walter Underwood" <wu...@netflix.com> wrote:
>
>> Restarting Solr fixes it. If I remember correctly, a sync and commit
>> does not fix it. I have disabled snappuller this time, so I can study
>> the broken instance.
>>
>> wunder

Re: Help with relevance failure in Solr 1.3

Posted by Walter Underwood <wu...@netflix.com>.

Dang, had another server do this.

Syncing and committing a new index does not fix it. The two servers
show the same bad results.

wunder

On 4/11/09 9:12 AM, "Walter Underwood" <wu...@netflix.com> wrote:

> Restarting Solr fixes it. If I remember correctly, a sync and commit
> does not fix it. I have disabled snappuller this time, so I can study
> the broken instance.
> 
> wunder
> 
> On 4/11/09 5:03 AM, "Grant Ingersoll" <gs...@apache.org> wrote:
> 
>> 
>> On Apr 10, 2009, at 5:50 PM, Walter Underwood wrote:
>> 
>>> Normally, both "changeling" and "the changeling" work fine. This one
>>> server is misbehaving like this for all multi-term queries.
>>> 
>>> Yes, it is VERY weird that the term "changeling" does not show up in
>>> the explain.
>>> 
>>> A server will occasionally "go bad" and stay in that state. In one
>>> case,
>>> two servers went bad and both gave the same wrong results.
>>> 
>> 
>> What's the solution for when they go bad?  Do you have to restart Solr
>> or reboot or what?
>> 
>> 
>>> Here is the dismax config. "groups" means "movies". The title* fields
>>> are stemmed and stopped, the "exact*" fields are not.
>>> 
>>>  <!-- groups and people  -->
>>> 
>>>  <requestHandler name="groups_people" class="solr.SearchHandler">
>>>    <lst name="defaults">
>>>     <str name="defType">dismax</str>
>>>     <str name="echoParams">none</str>
>>>     <float name="tie">0.01</float>
>>>     <str name="qf">
>>>        exact^6.0 exact_alt^6.0 exact_base~jw_0.7_1^8.0 exact_alias^8.0
>>> title^3.0 title_alt^3.0 title_base^4.0
>>>     </str>
>>> 
>>>     <str name="pf">
>>>        exact^9.0 exact_alt^9.0 exact_base^12.0 exact_alias^12.0
>>> title^3.0
>>> title_alt^4.0 title_base^6.0
>>>     </str>
>>>     <str name="bf">
>>>        search_popularity^100.0
>>>     </str>
>>>     <str name="mm">1</str>
>>>     <int name="ps">100</int>
>>>     <str name="fl">id,type,movieid,personid,genreid</str>
>>> 
>>>    </lst>
>>>    <lst name="appends">
>>>      <str name="fq">type:group OR type:person</str>
>>>    </lst>
>>>  </requestHandler>
>>> 
>>> 
>>> wunder
>>> 
>>> On 4/10/09 12:51 PM, "Grant Ingersoll" <gs...@apache.org> wrote:
>>> 
>>>> 
>>>> On Apr 10, 2009, at 1:56 PM, Walter Underwood wrote:
>>>> 
>>>>> We have a rare, hard-to-reproduce problem with our Solr 1.3 servers,
>>>>> and
>>>>> I would appreciate any ideas.
>>>>> 
>>>>> Ocassionally, a server will start returning results with really poor
>>>>> relevance. Single term queries work fine, but multi-term queries are
>>>>> scored based on the most common term (lowest IDF).
>>>>> 
>>>>> I don't see anything in the logs when this happens. We have a
>>>>> monitor
>>>>> doing a search for the 100 most popular movies once per minute to
>>>>> catch this, so we know when it was first detected.
>>>>> 
>>>>> I'm attaching two explain outputs, one for the query "changeling"
>>>>> and
>>>>> one for "the changeling".
>>>> 
>>>> 
>>>> I'm not sure what exactly  you are asking, so bear with me...
>>>> 
>>>> Are you saying that "the changeling" normally returns results just
>>>> fine and then periodically it will "go bad" or are you saying you
>>>> don't understand why "the changeling" scores differently from
>>>> "changeling"?  In looking at the explains, it is weird that in the
>>>> "the changeling" case, the term changeling doesn't even show up as a
>>>> term.
>>>> 
>>>> Can you share your dismax configuration?  That will be easier to
>>>> parse
>>>> than trying to make sense of the debug query parsing.
>>>> 
>>>> -Grant
>>> 
>> 
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> 
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>> using Solr/Lucene:
>> http://www.lucidimagination.com/search
>> 
>

Re: Help with relevance failure in Solr 1.3

Posted by Walter Underwood <wu...@netflix.com>.

Restarting Solr fixes it. If I remember correctly, a sync and commit
does not fix it. I have disabled snappuller this time, so I can study
the broken instance.

wunder

On 4/11/09 5:03 AM, "Grant Ingersoll" <gs...@apache.org> wrote:

> 
> On Apr 10, 2009, at 5:50 PM, Walter Underwood wrote:
> 
>> Normally, both "changeling" and "the changeling" work fine. This one
>> server is misbehaving like this for all multi-term queries.
>> 
>> Yes, it is VERY weird that the term "changeling" does not show up in
>> the explain.
>> 
>> A server will occasionally "go bad" and stay in that state. In one
>> case,
>> two servers went bad and both gave the same wrong results.
>> 
> 
> What's the solution for when they go bad?  Do you have to restart Solr
> or reboot or what?
> 
> 
>> Here is the dismax config. "groups" means "movies". The title* fields
>> are stemmed and stopped, the "exact*" fields are not.
>> 
>>  <!-- groups and people  -->
>> 
>>  <requestHandler name="groups_people" class="solr.SearchHandler">
>>    <lst name="defaults">
>>     <str name="defType">dismax</str>
>>     <str name="echoParams">none</str>
>>     <float name="tie">0.01</float>
>>     <str name="qf">
>>        exact^6.0 exact_alt^6.0 exact_base~jw_0.7_1^8.0 exact_alias^8.0
>> title^3.0 title_alt^3.0 title_base^4.0
>>     </str>
>> 
>>     <str name="pf">
>>        exact^9.0 exact_alt^9.0 exact_base^12.0 exact_alias^12.0
>> title^3.0
>> title_alt^4.0 title_base^6.0
>>     </str>
>>     <str name="bf">
>>        search_popularity^100.0
>>     </str>
>>     <str name="mm">1</str>
>>     <int name="ps">100</int>
>>     <str name="fl">id,type,movieid,personid,genreid</str>
>> 
>>    </lst>
>>    <lst name="appends">
>>      <str name="fq">type:group OR type:person</str>
>>    </lst>
>>  </requestHandler>
>> 
>> 
>> wunder
>> 
>> On 4/10/09 12:51 PM, "Grant Ingersoll" <gs...@apache.org> wrote:
>> 
>>> 
>>> On Apr 10, 2009, at 1:56 PM, Walter Underwood wrote:
>>> 
>>>> We have a rare, hard-to-reproduce problem with our Solr 1.3 servers,
>>>> and
>>>> I would appreciate any ideas.
>>>> 
>>>> Ocassionally, a server will start returning results with really poor
>>>> relevance. Single term queries work fine, but multi-term queries are
>>>> scored based on the most common term (lowest IDF).
>>>> 
>>>> I don't see anything in the logs when this happens. We have a
>>>> monitor
>>>> doing a search for the 100 most popular movies once per minute to
>>>> catch this, so we know when it was first detected.
>>>> 
>>>> I'm attaching two explain outputs, one for the query "changeling"
>>>> and
>>>> one for "the changeling".
>>> 
>>> 
>>> I'm not sure what exactly  you are asking, so bear with me...
>>> 
>>> Are you saying that "the changeling" normally returns results just
>>> fine and then periodically it will "go bad" or are you saying you
>>> don't understand why "the changeling" scores differently from
>>> "changeling"?  In looking at the explains, it is weird that in the
>>> "the changeling" case, the term changeling doesn't even show up as a
>>> term.
>>> 
>>> Can you share your dismax configuration?  That will be easier to
>>> parse
>>> than trying to make sense of the debug query parsing.
>>> 
>>> -Grant
>> 
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
> using Solr/Lucene:
> http://www.lucidimagination.com/search
>

Re: Help with relevance failure in Solr 1.3

Posted by Grant Ingersoll <gs...@apache.org>.

On Apr 10, 2009, at 5:50 PM, Walter Underwood wrote:

> Normally, both "changeling" and "the changeling" work fine. This one
> server is misbehaving like this for all multi-term queries.
>
> Yes, it is VERY weird that the term "changeling" does not show up in
> the explain.
>
> A server will occasionally "go bad" and stay in that state. In one  
> case,
> two servers went bad and both gave the same wrong results.
>

What's the solution for when they go bad?  Do you have to restart Solr  
or reboot or what?


> Here is the dismax config. "groups" means "movies". The title* fields
> are stemmed and stopped, the "exact*" fields are not.
>
>  <!-- groups and people  -->
>
>  <requestHandler name="groups_people" class="solr.SearchHandler">
>    <lst name="defaults">
>     <str name="defType">dismax</str>
>     <str name="echoParams">none</str>
>     <float name="tie">0.01</float>
>     <str name="qf">
>        exact^6.0 exact_alt^6.0 exact_base~jw_0.7_1^8.0 exact_alias^8.0
> title^3.0 title_alt^3.0 title_base^4.0
>     </str>
>
>     <str name="pf">
>        exact^9.0 exact_alt^9.0 exact_base^12.0 exact_alias^12.0  
> title^3.0
> title_alt^4.0 title_base^6.0
>     </str>
>     <str name="bf">
>        search_popularity^100.0
>     </str>
>     <str name="mm">1</str>
>     <int name="ps">100</int>
>     <str name="fl">id,type,movieid,personid,genreid</str>
>
>    </lst>
>    <lst name="appends">
>      <str name="fq">type:group OR type:person</str>
>    </lst>
>  </requestHandler>
>
>
> wunder
>
> On 4/10/09 12:51 PM, "Grant Ingersoll" <gs...@apache.org> wrote:
>
>>
>> On Apr 10, 2009, at 1:56 PM, Walter Underwood wrote:
>>
>>> We have a rare, hard-to-reproduce problem with our Solr 1.3 servers,
>>> and
>>> I would appreciate any ideas.
>>>
>>> Ocassionally, a server will start returning results with really poor
>>> relevance. Single term queries work fine, but multi-term queries are
>>> scored based on the most common term (lowest IDF).
>>>
>>> I don't see anything in the logs when this happens. We have a  
>>> monitor
>>> doing a search for the 100 most popular movies once per minute to
>>> catch this, so we know when it was first detected.
>>>
>>> I'm attaching two explain outputs, one for the query "changeling"  
>>> and
>>> one for "the changeling".
>>
>>
>> I'm not sure what exactly  you are asking, so bear with me...
>>
>> Are you saying that "the changeling" normally returns results just
>> fine and then periodically it will "go bad" or are you saying you
>> don't understand why "the changeling" scores differently from
>> "changeling"?  In looking at the explains, it is weird that in the
>> "the changeling" case, the term changeling doesn't even show up as a
>> term.
>>
>> Can you share your dismax configuration?  That will be easier to  
>> parse
>> than trying to make sense of the debug query parsing.
>>
>> -Grant
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search

Re: Help with relevance failure in Solr 1.3

Posted by Walter Underwood <wu...@netflix.com>.

Normally, both "changeling" and "the changeling" work fine. This one
server is misbehaving like this for all multi-term queries.

Yes, it is VERY weird that the term "changeling" does not show up in
the explain.

A server will occasionally "go bad" and stay in that state. In one case,
two servers went bad and both gave the same wrong results.

Here is the dismax config. "groups" means "movies". The title* fields
are stemmed and stopped, the "exact*" fields are not.

  <!-- groups and people  -->

  <requestHandler name="groups_people" class="solr.SearchHandler">
    <lst name="defaults">
     <str name="defType">dismax</str>
     <str name="echoParams">none</str>
     <float name="tie">0.01</float>
     <str name="qf">
        exact^6.0 exact_alt^6.0 exact_base~jw_0.7_1^8.0 exact_alias^8.0
title^3.0 title_alt^3.0 title_base^4.0
     </str>

     <str name="pf">
        exact^9.0 exact_alt^9.0 exact_base^12.0 exact_alias^12.0 title^3.0
title_alt^4.0 title_base^6.0
     </str>
     <str name="bf">
        search_popularity^100.0
     </str>
     <str name="mm">1</str>
     <int name="ps">100</int>
     <str name="fl">id,type,movieid,personid,genreid</str>

    </lst>
    <lst name="appends">
      <str name="fq">type:group OR type:person</str>
    </lst>
  </requestHandler>


wunder

On 4/10/09 12:51 PM, "Grant Ingersoll" <gs...@apache.org> wrote:

> 
> On Apr 10, 2009, at 1:56 PM, Walter Underwood wrote:
> 
>> We have a rare, hard-to-reproduce problem with our Solr 1.3 servers,
>> and
>> I would appreciate any ideas.
>> 
>> Ocassionally, a server will start returning results with really poor
>> relevance. Single term queries work fine, but multi-term queries are
>> scored based on the most common term (lowest IDF).
>> 
>> I don't see anything in the logs when this happens. We have a monitor
>> doing a search for the 100 most popular movies once per minute to
>> catch this, so we know when it was first detected.
>> 
>> I'm attaching two explain outputs, one for the query "changeling" and
>> one for "the changeling".
> 
> 
> I'm not sure what exactly  you are asking, so bear with me...
> 
> Are you saying that "the changeling" normally returns results just
> fine and then periodically it will "go bad" or are you saying you
> don't understand why "the changeling" scores differently from
> "changeling"?  In looking at the explains, it is weird that in the
> "the changeling" case, the term changeling doesn't even show up as a
> term.
> 
> Can you share your dismax configuration?  That will be easier to parse
> than trying to make sense of the debug query parsing.
> 
> -Grant

Re: Help with relevance failure in Solr 1.3

Posted by Grant Ingersoll <gs...@apache.org>.

On Apr 10, 2009, at 1:56 PM, Walter Underwood wrote:

> We have a rare, hard-to-reproduce problem with our Solr 1.3 servers,  
> and
> I would appreciate any ideas.
>
> Ocassionally, a server will start returning results with really poor
> relevance. Single term queries work fine, but multi-term queries are
> scored based on the most common term (lowest IDF).
>
> I don't see anything in the logs when this happens. We have a monitor
> doing a search for the 100 most popular movies once per minute to
> catch this, so we know when it was first detected.
>
> I'm attaching two explain outputs, one for the query "changeling" and
> one for "the changeling".

I'm not sure what exactly  you are asking, so bear with me...

Are you saying that "the changeling" normally returns results just  
fine and then periodically it will "go bad" or are you saying you  
don't understand why "the changeling" scores differently from  
"changeling"?  In looking at the explains, it is weird that in the  
"the changeling" case, the term changeling doesn't even show up as a  
term.

Can you share your dismax configuration?  That will be easier to parse  
than trying to make sense of the debug query parsing.

-Grant

Re: Help with relevance failure in Solr 1.3

Posted by Walter Underwood <wu...@netflix.com>.

If you don't see the attachments, you can get them here:

http://wunderwood.org/solr/

wunder

On 4/10/09 10:56 AM, "Walter Underwood" <wu...@netflix.com> wrote:

> We have a rare, hard-to-reproduce problem with our Solr 1.3 servers, and
> I would appreciate any ideas.
> 
> Ocassionally, a server will start returning results with really poor
> relevance. Single term queries work fine, but multi-term queries are
> scored based on the most common term (lowest IDF).
> 
> I don't see anything in the logs when this happens. We have a monitor
> doing a search for the 100 most popular movies once per minute to
> catch this, so we know when it was first detected.
> 
> I'm attaching two explain outputs, one for the query "changeling" and
> one for "the changeling".
> 
> We are running Solr 1.3 with Lucene 2.4.0, and have added a fuzzy query
> using JaroWinkler matching.
> 
> I'd appreciate ideas about where to look, what debug output to try, etc.
> 
> wunder