You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pa...@rtl.de on 2009/10/27 14:44:15 UTC

Wildcard on first char, Possible Bug 1.4?

Hey,

i found out when i use the new feature solr.ReversedWildcardFilterFactory that the following happens:

I query for thomas.

<str name="rawquerystring">nickname:thomas</str>
<str name="querystring">nickname:thomas</str>
<str name="parsedquery">nickname:thoma</str>
<str name="parsedquery_toString">nickname:thoma</str>

We see the parsed String ist thoma.

I query for *thomas

<str name="rawquerystring">nickname:*thomas</str>
<str name="querystring">nickname:*thomas</str>
<str name="parsedquery">nickname:#1;samoht*</str>
<str name="parsedquery_toString">nickname:#1;samoht*</str>

the parsed String is samoht*  -> thoams* other way arround. The the ending s wont be trimmed. But when i query *thoma anything will be ok. Any suggestions?

Kind regards,
Patric


Die Information in dieser E-Mail ist vertraulich und exklusiv fuer den Adressatenkreis bestimmt. Unbefugte Empfaenger haben kein Recht, vom Inhalt Kenntnis zu nehmen, fehlgeleitete E-mails sind sofort zu loeschen. Weiterleiten oder Kopieren darf, auch auszugsweise nur mit ausdruecklicher, schriftlicher Einwilligung des Absenders erfolgen. In jedem Fall ist sicherzustellen, dass keinerlei inhaltliche Veraenderungen erfolgen. Der Absender ist von der Richtigkeit des Inhalts und der Uebertragung dieser E-Mail ueberzeugt. Eine Haftung dafuer ist jedoch ausgeschlossen. 

This is a confidential communication intended only for the named adresses. If you received this communication in error, please notify us and return and delete it without reading it. This e-mail may not be disclosed, copied or distributed in any form without the obtained permission in writing of the sender. In any case it may not be altered or otherwise changed. Whilst the sender believes that the information is correct at the date of the e-mail, no warranty and representation is given to this effect and no responsibility can be accepted by the sender.


Re: Wildcard on first char, Possible Bug 1.4?

Posted by Chris Hostetter <ho...@fucit.org>.
: i found out when i use the new feature solr.ReversedWildcardFilterFactory that the following happens:

Hmmm.... in spite of ReversedWildcardFilterFactory being somethng designed 
to be used as part of the analysis chain for a fields *so* you can do 
leading wildcard queries on them, it doesn't change the fact that wild 
card queries are not analyzed at query time...

http://wiki.apache.org/lucene-java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sensitive.3F

...so any stemmers you have configured won't be used.

(It seems like we need some major caveat documentation about this in the 
example schema and on the wiki)

: I query for thomas.
: 
: <str name="rawquerystring">nickname:thomas</str>
: <str name="querystring">nickname:thomas</str>
: <str name="parsedquery">nickname:thoma</str>
: <str name="parsedquery_toString">nickname:thoma</str>
: 
: We see the parsed String ist thoma.
: 
: I query for *thomas
: 
: <str name="rawquerystring">nickname:*thomas</str>
: <str name="querystring">nickname:*thomas</str>
: <str name="parsedquery">nickname:#1;samoht*</str>
: <str name="parsedquery_toString">nickname:#1;samoht*</str>
: 
: the parsed String is samoht*  -> thoams* other way arround. The the ending s wont be trimmed. But when i query *thoma anything will be ok. Any suggestions?
: 
: Kind regards,
: Patric
: 
: 
: Die Information in dieser E-Mail ist vertraulich und exklusiv fuer den Adressatenkreis bestimmt. Unbefugte Empfaenger haben kein Recht, vom Inhalt Kenntnis zu nehmen, fehlgeleitete E-mails sind sofort zu loeschen. Weiterleiten oder Kopieren darf, auch auszugsweise nur mit ausdruecklicher, schriftlicher Einwilligung des Absenders erfolgen. In jedem Fall ist sicherzustellen, dass keinerlei inhaltliche Veraenderungen erfolgen. Der Absender ist von der Richtigkeit des Inhalts und der Uebertragung dieser E-Mail ueberzeugt. Eine Haftung dafuer ist jedoch ausgeschlossen. 
: 
: This is a confidential communication intended only for the named adresses. If you received this communication in error, please notify us and return and delete it without reading it. This e-mail may not be disclosed, copied or distributed in any form without the obtained permission in writing of the sender. In any case it may not be altered or otherwise changed. Whilst the sender believes that the information is correct at the date of the e-mail, no warranty and representation is given to this effect and no responsibility can be accepted by the sender.
: 
: 



-Hoss


Re: Wildcard on first char, Possible Bug 1.4?

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Tue, Oct 27, 2009 at 4:14 PM, Grant Ingersoll <gs...@apache.org> wrote:
> I'm pretty sure wildcard queries don't go through analysis, hence they are
> probably not stemmed.

Right - same thing would happen w/o the reverse filter.
Also, wildcarding mixes poorly with stemming - trying to analyze won't
fix the problem.

-Yonik
http://www.lucidimagination.com

Re: Wildcard on first char, Possible Bug 1.4?

Posted by Grant Ingersoll <gs...@apache.org>.
On Oct 27, 2009, at 9:44 AM, <Pa...@rtl.de>  
<Pa...@rtl.de> wrote:

> Hey,
>
> i found out when i use the new feature  
> solr.ReversedWildcardFilterFactory that the following happens:
>
> I query for thomas.
>
> <str name="rawquerystring">nickname:thomas</str>
> <str name="querystring">nickname:thomas</str>
> <str name="parsedquery">nickname:thoma</str>
> <str name="parsedquery_toString">nickname:thoma</str>
>
> We see the parsed String ist thoma.
>
> I query for *thomas
>
> <str name="rawquerystring">nickname:*thomas</str>
> <str name="querystring">nickname:*thomas</str>
> <str name="parsedquery">nickname:#1;samoht*</str>
> <str name="parsedquery_toString">nickname:#1;samoht*</str>
>
> the parsed String is samoht*  -> thoams* other way arround. The the  
> ending s wont be trimmed. But when i query *thoma anything will be  
> ok. Any suggestions?
>

I'm pretty sure wildcard queries don't go through analysis, hence they  
are probably not stemmed.

> Kind regards,
> Patric
>
>
> Die Information in dieser E-Mail ist vertraulich und exklusiv fuer  
> den Adressatenkreis bestimmt. Unbefugte Empfaenger haben kein Recht,  
> vom Inhalt Kenntnis zu nehmen, fehlgeleitete E-mails sind sofort zu  
> loeschen. Weiterleiten oder Kopieren darf, auch auszugsweise nur mit  
> ausdruecklicher, schriftlicher Einwilligung des Absenders erfolgen.  
> In jedem Fall ist sicherzustellen, dass keinerlei inhaltliche  
> Veraenderungen erfolgen. Der Absender ist von der Richtigkeit des  
> Inhalts und der Uebertragung dieser E-Mail ueberzeugt. Eine Haftung  
> dafuer ist jedoch ausgeschlossen.
>
> This is a confidential communication intended only for the named  
> adresses. If you received this communication in error, please notify  
> us and return and delete it without reading it. This e-mail may not  
> be disclosed, copied or distributed in any form without the obtained  
> permission in writing of the sender. In any case it may not be  
> altered or otherwise changed. Whilst the sender believes that the  
> information is correct at the date of the e-mail, no warranty and  
> representation is given to this effect and no responsibility can be  
> accepted by the sender.
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search