You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Peter Wolanin <pe...@acquia.com> on 2010/03/27 16:01:46 UTC

Solr 1.4 bug? search fails but analyzer indicates a match

Ran into an odd situation today searching for a string like a domain
name containing a '.', the Solr 1.4 analyzer tells me that I will get
a match, but when I enter the search either in the client or directly
in Solr, the search fails.  Our default handler is dismax, but this
also fails with the standard handler.  So I'm wondering if this is a
known issue, or am I missing something subtle in the analysis chain?
Solr is 1.4.0 that I built.

test string:  Identi.ca

queries that fail:  IdentiCa, Identi.ca, Identi-ca

query that matches: Identi ca

I would expect all the queries that fail to match.  Looking at the
schema browser, the index contains the expected terms: identica,
identi, ca

schema in use is:
http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1

Screen shots:

analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png

dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png

dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png

standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png

-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wolanin@acquia.com

Re: Solr 1.4 bug? search fails but analyzer indicates a match

Posted by Peter Wolanin <pe...@acquia.com>.
I think it is clearly a bug - see comments on the issue by Robert
Muir.   https://issues.apache.org/jira/browse/SOLR-1852

The patch is a backport by Mark Miller of Robert's fixes for other
problems for the WordDelimiterFilter in Solr trunk.  Those fixes also
fix this bug as a side effect.

-Peter

On Sun, Mar 28, 2010 at 4:09 AM, MitchK <mi...@web.de> wrote:
>
> Peter,
>
> following your discussion, I was a bit confused: Is this still a bug or is
> the behaviour correct (since the positionIncrement is set to be true) and
> what changes did you do in the patch?
>
> Does the patch fits all your needs (Matches at "identi ca", "identica",
> "identi-ca", "identi.ca")?
>
> - Mitch
> --
> View this message in context: http://n3.nabble.com/Solr-1-4-bug-search-fails-but-analyzer-indicates-a-match-tp680066p681185.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wolanin@acquia.com

Re: Solr 1.4 bug? search fails but analyzer indicates a match

Posted by MitchK <mi...@web.de>.
Peter,

following your discussion, I was a bit confused: Is this still a bug or is
the behaviour correct (since the positionIncrement is set to be true) and
what changes did you do in the patch?

Does the patch fits all your needs (Matches at "identi ca", "identica",
"identi-ca", "identi.ca")?

- Mitch
-- 
View this message in context: http://n3.nabble.com/Solr-1-4-bug-search-fails-but-analyzer-indicates-a-match-tp680066p681185.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 1.4 bug? search fails but analyzer indicates a match

Posted by Peter Wolanin <pe...@acquia.com>.
Created a new issue:  https://issues.apache.org/jira/browse/SOLR-1852

further discussion there.

-Peter

On Sat, Mar 27, 2010 at 5:51 PM, Peter Wolanin <pe...@acquia.com> wrote:
> Discussing this with Mark Miller in IRC - we are honing in on the problem.
>
> Looks as though Identi.ca is treated as phrase query as if I had
> quoted it like "Identi ca".  That phrase search also fails.  I had
> expected that Identi.ca would be the same as Identi ca (i.e. 2
> separate tokens, not a phrase).
>
> -Peter
>
> On Sat, Mar 27, 2010 at 4:32 PM, Peter Wolanin <pe...@acquia.com> wrote:
>> The stopwords stanza looks like:
>>
>>        <filter class="solr.StopFilterFactory"
>>                ignoreCase="true"
>>                words="stopwords.txt"
>>                enablePositionIncrements="true"
>>                />
>>
>> Which is the same as the example schema
>> http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4/example/solr/conf/schema.xml
>>
>> changing this to enablePositionIncrements="false" seems to make the
>> searching work as expected.  Is it incorrect to have that directive
>> here, or is this a bug?
>>
>> -Peter
>>
>>
>> On Sat, Mar 27, 2010 at 4:25 PM, Peter Wolanin <pe...@acquia.com> wrote:
>>> The output on the analysis screen does look correct. Here are 2 screen shots:
>>>
>>> empty stopwords: http://img.skitch.com/20100327-rcsjdih4bn3y8ahajqa5wjwybd.png
>>>
>>> standard stopwords:
>>> http://img.skitch.com/20100327-1w5ct1wr25jkir4sji8kumefn1.png
>>>
>>> -Peter
>>>
>>> On Sat, Mar 27, 2010 at 4:13 PM, MitchK <mi...@web.de> wrote:
>>>>
>>>> Peter,
>>>>
>>>> if you are right, please outcomment the stopword filter to make clear, that
>>>> the problem is really a problem of how the stopword filter deletes
>>>> stopwords.
>>>>
>>>> Is the output correct, if you enter "would be great to have support for
>>>> Identi.ca on the follow block" in the query-label at the analysis.jsp? Can
>>>> you make a screenshot for this sentence?
>>>>
>>>> - Mitch
>>>> --
>>>> View this message in context: http://n3.nabble.com/Solr-1-4-bug-search-fails-but-analyzer-indicates-a-match-tp680066p680530.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>
>>>
>>>
>>> --
>>> Peter M. Wolanin, Ph.D.
>>> Momentum Specialist,  Acquia. Inc.
>>> peter.wolanin@acquia.com
>>>
>>
>>
>>
>> --
>> Peter M. Wolanin, Ph.D.
>> Momentum Specialist,  Acquia. Inc.
>> peter.wolanin@acquia.com
>>
>
>
>
> --
> Peter M. Wolanin, Ph.D.
> Momentum Specialist,  Acquia. Inc.
> peter.wolanin@acquia.com
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wolanin@acquia.com

Re: Solr 1.4 bug? search fails but analyzer indicates a match

Posted by Peter Wolanin <pe...@acquia.com>.
Discussing this with Mark Miller in IRC - we are honing in on the problem.

Looks as though Identi.ca is treated as phrase query as if I had
quoted it like "Identi ca".  That phrase search also fails.  I had
expected that Identi.ca would be the same as Identi ca (i.e. 2
separate tokens, not a phrase).

-Peter

On Sat, Mar 27, 2010 at 4:32 PM, Peter Wolanin <pe...@acquia.com> wrote:
> The stopwords stanza looks like:
>
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords.txt"
>                enablePositionIncrements="true"
>                />
>
> Which is the same as the example schema
> http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4/example/solr/conf/schema.xml
>
> changing this to enablePositionIncrements="false" seems to make the
> searching work as expected.  Is it incorrect to have that directive
> here, or is this a bug?
>
> -Peter
>
>
> On Sat, Mar 27, 2010 at 4:25 PM, Peter Wolanin <pe...@acquia.com> wrote:
>> The output on the analysis screen does look correct. Here are 2 screen shots:
>>
>> empty stopwords: http://img.skitch.com/20100327-rcsjdih4bn3y8ahajqa5wjwybd.png
>>
>> standard stopwords:
>> http://img.skitch.com/20100327-1w5ct1wr25jkir4sji8kumefn1.png
>>
>> -Peter
>>
>> On Sat, Mar 27, 2010 at 4:13 PM, MitchK <mi...@web.de> wrote:
>>>
>>> Peter,
>>>
>>> if you are right, please outcomment the stopword filter to make clear, that
>>> the problem is really a problem of how the stopword filter deletes
>>> stopwords.
>>>
>>> Is the output correct, if you enter "would be great to have support for
>>> Identi.ca on the follow block" in the query-label at the analysis.jsp? Can
>>> you make a screenshot for this sentence?
>>>
>>> - Mitch
>>> --
>>> View this message in context: http://n3.nabble.com/Solr-1-4-bug-search-fails-but-analyzer-indicates-a-match-tp680066p680530.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>
>>
>>
>> --
>> Peter M. Wolanin, Ph.D.
>> Momentum Specialist,  Acquia. Inc.
>> peter.wolanin@acquia.com
>>
>
>
>
> --
> Peter M. Wolanin, Ph.D.
> Momentum Specialist,  Acquia. Inc.
> peter.wolanin@acquia.com
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wolanin@acquia.com

Re: Solr 1.4 bug? search fails but analyzer indicates a match

Posted by Peter Wolanin <pe...@acquia.com>.
The stopwords stanza looks like:

        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />

Which is the same as the example schema
http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4/example/solr/conf/schema.xml

changing this to enablePositionIncrements="false" seems to make the
searching work as expected.  Is it incorrect to have that directive
here, or is this a bug?

-Peter


On Sat, Mar 27, 2010 at 4:25 PM, Peter Wolanin <pe...@acquia.com> wrote:
> The output on the analysis screen does look correct. Here are 2 screen shots:
>
> empty stopwords: http://img.skitch.com/20100327-rcsjdih4bn3y8ahajqa5wjwybd.png
>
> standard stopwords:
> http://img.skitch.com/20100327-1w5ct1wr25jkir4sji8kumefn1.png
>
> -Peter
>
> On Sat, Mar 27, 2010 at 4:13 PM, MitchK <mi...@web.de> wrote:
>>
>> Peter,
>>
>> if you are right, please outcomment the stopword filter to make clear, that
>> the problem is really a problem of how the stopword filter deletes
>> stopwords.
>>
>> Is the output correct, if you enter "would be great to have support for
>> Identi.ca on the follow block" in the query-label at the analysis.jsp? Can
>> you make a screenshot for this sentence?
>>
>> - Mitch
>> --
>> View this message in context: http://n3.nabble.com/Solr-1-4-bug-search-fails-but-analyzer-indicates-a-match-tp680066p680530.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> Peter M. Wolanin, Ph.D.
> Momentum Specialist,  Acquia. Inc.
> peter.wolanin@acquia.com
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wolanin@acquia.com

Re: Solr 1.4 bug? search fails but analyzer indicates a match

Posted by Peter Wolanin <pe...@acquia.com>.
The output on the analysis screen does look correct. Here are 2 screen shots:

empty stopwords: http://img.skitch.com/20100327-rcsjdih4bn3y8ahajqa5wjwybd.png

standard stopwords:
http://img.skitch.com/20100327-1w5ct1wr25jkir4sji8kumefn1.png

-Peter

On Sat, Mar 27, 2010 at 4:13 PM, MitchK <mi...@web.de> wrote:
>
> Peter,
>
> if you are right, please outcomment the stopword filter to make clear, that
> the problem is really a problem of how the stopword filter deletes
> stopwords.
>
> Is the output correct, if you enter "would be great to have support for
> Identi.ca on the follow block" in the query-label at the analysis.jsp? Can
> you make a screenshot for this sentence?
>
> - Mitch
> --
> View this message in context: http://n3.nabble.com/Solr-1-4-bug-search-fails-but-analyzer-indicates-a-match-tp680066p680530.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wolanin@acquia.com

Re: Solr 1.4 bug? search fails but analyzer indicates a match

Posted by MitchK <mi...@web.de>.
Peter,

if you are right, please outcomment the stopword filter to make clear, that
the problem is really a problem of how the stopword filter deletes
stopwords.

Is the output correct, if you enter "would be great to have support for
Identi.ca on the follow block" in the query-label at the analysis.jsp? Can
you make a screenshot for this sentence?

- Mitch
-- 
View this message in context: http://n3.nabble.com/Solr-1-4-bug-search-fails-but-analyzer-indicates-a-match-tp680066p680530.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 1.4 bug? search fails but analyzer indicates a match

Posted by Peter Wolanin <pe...@acquia.com>.
If I empty the stopword file and re-index, all expected matches
happen.  So maybe that provides a further suggestion of where the
problem is.  This certainly feels like a Solr bug (or lucene bug?).

-Peter

On Sat, Mar 27, 2010 at 3:05 PM, Peter Wolanin <pe...@acquia.com> wrote:
> Hi Mitch,
>
> I am also seeing this locally with the exact same solr.war,
> solrconfig.xml, and schema.xml running under Jetty, as well as on 2
> different production servers with the same content indexed.
>
> So this is really weird - this seems to be influenced by the surrounding text:
>
> "would be great to have support for Identi.ca on the follow block"
>
> fails to match "Identi.ca", but putting the content on its own or in
> another sentence:
>
> "Support Identi.ca"
>
> the search matches.  More testing suggests the word "for" is the
> problem.  I don't see an exception or error. Could be a problem with
> how stopwords are removed?
>
> -Peter
>
>
> On Sat, Mar 27, 2010 at 1:19 PM, MitchK <mi...@web.de> wrote:
>>
>> Hi Peter,
>>
>> have you tried to reindex your data and did you do a commit?
>> If you changed anything, have you restarted your Solr-server?
>>
>> I can't understand why this problem occurs, since the example seem to work
>> at analysis.jsp.
>>
>> Kind regards
>> - Mitch
>> --
>> View this message in context: http://n3.nabble.com/Solr-1-4-bug-search-fails-but-analyzer-indicates-a-match-tp680066p680313.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> Peter M. Wolanin, Ph.D.
> Momentum Specialist,  Acquia. Inc.
> peter.wolanin@acquia.com
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wolanin@acquia.com

Re: Solr 1.4 bug? search fails but analyzer indicates a match

Posted by Peter Wolanin <pe...@acquia.com>.
Hi Mitch,

I am also seeing this locally with the exact same solr.war,
solrconfig.xml, and schema.xml running under Jetty, as well as on 2
different production servers with the same content indexed.

So this is really weird - this seems to be influenced by the surrounding text:

"would be great to have support for Identi.ca on the follow block"

fails to match "Identi.ca", but putting the content on its own or in
another sentence:

"Support Identi.ca"

the search matches.  More testing suggests the word "for" is the
problem.  I don't see an exception or error. Could be a problem with
how stopwords are removed?

-Peter


On Sat, Mar 27, 2010 at 1:19 PM, MitchK <mi...@web.de> wrote:
>
> Hi Peter,
>
> have you tried to reindex your data and did you do a commit?
> If you changed anything, have you restarted your Solr-server?
>
> I can't understand why this problem occurs, since the example seem to work
> at analysis.jsp.
>
> Kind regards
> - Mitch
> --
> View this message in context: http://n3.nabble.com/Solr-1-4-bug-search-fails-but-analyzer-indicates-a-match-tp680066p680313.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wolanin@acquia.com

Re: Solr 1.4 bug? search fails but analyzer indicates a match

Posted by MitchK <mi...@web.de>.
Hi Peter,

have you tried to reindex your data and did you do a commit?
If you changed anything, have you restarted your Solr-server?

I can't understand why this problem occurs, since the example seem to work
at analysis.jsp.

Kind regards
- Mitch
-- 
View this message in context: http://n3.nabble.com/Solr-1-4-bug-search-fails-but-analyzer-indicates-a-match-tp680066p680313.html
Sent from the Solr - User mailing list archive at Nabble.com.