You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Wagner,Harry" <wa...@oclc.org> on 2007/11/07 16:26:38 UTC

Analysis / Query problem

I have the following custom field defined for author names.   After indexing the 2 documents below the admin analysis tool looks right for field-name=au and field-value=Schröder, Jürgen   The highlight matching also seems right.  However, if I search for au:Schröder, Jürgen using the admin tool I do not get any hits (see below).  This appears to be the case whenever there are 2 non-ascii characters in the author name.  Searching for au:Schröder, Jurgen finds both of these records.  Any idea what is causing this?

 

Thanks!

harry

 

<fieldtype name="textNoStem" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/> 
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/> 
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldtype>    


   <field name="au" type="textNoStem" indexed="true" stored="true"/>        <!-- author -->

<doc>
  <field name="no">008053223</field> 
  <field name="au">Schröder, Jürgen,</field>
  <field name="ti">Gottfried Benn : Poesie u. Sozialisation /</field>
  <field name="f500a">Includes index.</field> 
  <field name="pn">Benn, Gottfried,.</field>
  <field name="tt">Authors, German--20th century--Biography.</field>
  <field name="dd">831.912</field> 
  <field name="s1">8</field>
  <field name="s2">83</field>
  <field name="s3">831</field>
  <field name="aud">Special</field> 
  <field name="hold">137</field>
  <field name="swid">008053223</field>
  <field name="swkey">schroder, jurgen$1935/gottfried benn poesie u sozialisation</field> 
  <field name="edt">1</field>
  <field name="fmt1">01</field>
  <field name="fmt2">Book</field>
  <field name="ln">ger</field> 
  <field name="yr">1978-12-31T23:59:59Z</field>
  <field name="nb">317004446X</field>
  <field name="nb">9783170044463</field>
</doc>
<doc>
  <field name="no">000000001</field>
  <field name="au">Schröder, Jurgen,</field>
  <field name="ti">Gottfried Benn : Poesie u. Sozialisation /</field> 
  <field name="f500a">Includes index.</field>
  <field name="pn">Benn, Gottfried,.</field>
  <field name="tt">Authors, German--20th century--Biography.</field> 
  <field name="dd">831.912</field>
  <field name="s1">8</field>
  <field name="s2">83</field>
  <field name="s3">831</field> 
  <field name="aud">Special</field>
  <field name="hold">137</field>
  <field name="swid">008053223</field>
  <field name="swkey">schroder, jurgen$1935/gottfried benn poesie u sozialisation</field> 
  <field name="edt">1</field>
  <field name="fmt1">01</field>
  <field name="fmt2">Book</field>
  <field name="ln">ger</field> 
  <field name="yr">1978-12-31T23:59:59Z</field>
  <field name="nb">317004446X</field>
  <field name="nb">9783170044463</field>
</doc>

 

 

<response>

<lst name="responseHeader">

<int name="status">0</int>

<int name="QTime">0</int>

            <lst name="params">

<str name="indent">on</str>

<str name="start">0</str>

<str name="q">au:Schröder, Jürgen</str>

<str name="rows">10</str>

<str name="version">2.2</str>

</lst>

</lst>

<result name="response" numFound="0" start="0"/>

</response>


RE: Analysis / Query problem

Posted by "Wagner,Harry" <wa...@oclc.org>.
Thanks Erik.  That helps.

-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com] 
Sent: Wednesday, November 07, 2007 11:36 AM
To: solr-user@lucene.apache.org
Subject: Re: Analysis / Query problem


On Nov 7, 2007, at 10:26 AM, Wagner,Harry wrote:
> I have the following custom field defined for author names.   After  
> indexing the 2 documents below the admin analysis tool looks right  
> for field-name=au and field-value=Schröder, Jürgen   The highlight  
> matching also seems right.  However, if I search for au:Schröder,  
> Jürgen using the admin tool I do not get any hits (see below).   
> This appears to be the case whenever there are 2 non-ascii  
> characters in the author name.  Searching for au:Schröder, Jurgen  
> finds both of these records.  Any idea what is causing this?
>

> <response>
>
> <lst name="responseHeader">
>
> <int name="status">0</int>
>
> <int name="QTime">0</int>
>
>             <lst name="params">
>
> <str name="indent">on</str>
>
> <str name="start">0</str>
>
> <str name="q">au:Schröder, Jürgen</str>

One thing to note is that query "au:Schröder, Jürgen" is being  
translated (try &debugQuery=true to see) to:

	au:schröder  <AND/OR> <defaultField>:jürgen

AND/OR depends on how you have things configured, as well as the  
default field.

You probably want to use the ISOLatin1AccentFilterFactory to have the  
diacritics "flattened" to the ASCII character they look like.

	Erik



Re: Analysis / Query problem

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Nov 7, 2007, at 10:26 AM, Wagner,Harry wrote:
> I have the following custom field defined for author names.   After  
> indexing the 2 documents below the admin analysis tool looks right  
> for field-name=au and field-value=Schröder, Jürgen   The highlight  
> matching also seems right.  However, if I search for au:Schröder,  
> Jürgen using the admin tool I do not get any hits (see below).   
> This appears to be the case whenever there are 2 non-ascii  
> characters in the author name.  Searching for au:Schröder, Jurgen  
> finds both of these records.  Any idea what is causing this?
>

> <response>
>
> <lst name="responseHeader">
>
> <int name="status">0</int>
>
> <int name="QTime">0</int>
>
>             <lst name="params">
>
> <str name="indent">on</str>
>
> <str name="start">0</str>
>
> <str name="q">au:Schröder, Jürgen</str>

One thing to note is that query "au:Schröder, Jürgen" is being  
translated (try &debugQuery=true to see) to:

	au:schröder  <AND/OR> <defaultField>:jürgen

AND/OR depends on how you have things configured, as well as the  
default field.

You probably want to use the ISOLatin1AccentFilterFactory to have the  
diacritics "flattened" to the ASCII character they look like.

	Erik