You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alexander Herzog <he...@ait.co.at> on 2009/08/21 07:19:57 UTC
Re: Is wildcard search not correctly analyzed at query? [solved]

Hi

Thanks for the info!

best,
Alexander

Avlesh Singh schrieb:
> Wildcard queries are not analyzed by Lucene and hence the behavior. A
> similar thread earlier -
> http://www.lucidimagination.com/search/document/a6b9144ecab9d0ff/search_phrase_wildcard
> 
> Cheers
> Avlesh
> 
> On Thu, Aug 20, 2009 at 7:03 PM, Alexander Herzog <he...@ait.co.at> wrote:
> 
>> It seems like the analyzer/filter isn't affected at all, since the query
>>
>> http://localhost:8983/solr/select/?q=PhysicalDescription:nü*&debugQuery=true<http://localhost:8983/solr/select/?q=PhysicalDescription:n%C3%BC*&debugQuery=true>
>>
>> does not return a
>> <str name="parsedquery">PhysicalDescription:nu*</str>
>> as I would expect.
>>
>> So can I just have a "you're right, wildcard search is passed to lucene
>> directly without any analyzing".
>>
>> If it is like this, I'm happy with that as well.
>>
>> best,
>> Alexander
>>
>>
>> Alexander Herzog schrieb:
>>> Hi all
>>>
>>> sorry for the long post
>>>
>>> We are switching from indexdata's zebra to solr for a new book
>>> archival/preservation project with multiple languages, so expect more
>>> questions soon (sorry for that)
>>> The features of solr are pretty cool and more or less overwhelming!
>>>
>>> But there is one thing I found after a little test with wildcards.
>>>
>>> I'm using the latest svn build and didn't change anything except the
>>> schema.xml
>>> Solr Specification Version: 1.3.0.2009.08.20.07.53.52
>>> Solr Implementation Version: 1.4-dev 806060 - ait015 - 2009-08-20
>> 07:53:52
>>> Lucene Specification Version: 2.9-dev
>>> Lucene Implementation Version: 2.9-dev 804692 - 2009-08-16 09:33:41
>>>
>>> I have a text_ws field with this schema config:
>>>
>>> <fieldType name="text_ws" class="solr.TextField"
>> positionIncrementGap="100">
>>>    <analyzer>
>>>       <charFilter class="solr.MappingCharFilterFactory"
>>> mapping="mapping-ISOLatin1Accent.txt"/>
>>>       <filter class="solr.LowerCaseFilterFactory"/>
>>>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>    </analyzer>
>>> </fieldType>
>>> ...
>>> and I added a dynamic field for everything since I'm not sure what field
>>> we will use...
>>>
>>> <dynamicField name="*"  type="text_ws"    indexed="true"  stored="true"
>>> multiValued="true"/>
>>> ...
>>>
>>>
>>> So I <add>ed this content:
>>> ...
>>> <field name="PhysicalDescription">
>>>    X, 143, XIV S.:
>>>    124 feine Farbendrucktafeln mit über 600 Abbildungen;
>>>    24,5 cm.
>>> </field>
>>> ...
>>>
>>> since it's German, and I couldn't find a tokenizer for German compound
>>> words (any help appreciated) I wanted to search for 'Farb*'
>>>
>>> The final row of the query analyzer in the admin section told me:
>>> farb*
>>> for the content:
>>> x,    143,    xiv     s.:     124     feine   farbendrucktafeln       mit
>>     uber    600     abbildungen;
>>> 24,5  cm.
>>>
>>> so everything seems to be ok, everything in lower case
>>>
>>> Now, for the rest service:
>>>
>> http://localhost:8983/solr/select/?q=PhysicalDescription:Farb*&debugQuery=true
>>> <str name="rawquerystring">PhysicalDescription:Farb*</str>
>>> <str name="querystring">PhysicalDescription:Farb*</str>
>>> <str name="parsedquery">PhysicalDescription:Farb*</str>
>>> <str name="parsedquery_toString">PhysicalDescription:Farb*</str>
>>>
>>> Since Farb* has a capital letter, nothing is found.
>>> When using farb* as query, I get the result.
>>>
>>> Where can I add/change a query anaylizer that "lower cases" wildcard
>>> searches?
>>>
>>> thanks, best wishes,
>>> Alexander
>>>
>