You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Denis WSRosa <de...@gmail.com> on 2011/08/17 22:18:04 UTC

Solr Accent Insensitive and sensitive search

Hi all!

I have configured my schema to use the solr.ASCIIFoldingFilterFactory
filter, this way I'm able to search a word like "ferias" and get "férias",
but when I try to search the exact word "férias" I got nothing as result.

Is there a way to configure both cases in the search?

Best Regards!

-- 
Denis Wilson Souza Rosa
----------------------------------------------------
Systems Architect
mobile: +55 11 8112 8284
email: deniswsrosa@gmail.com / deniswsrosa@hotmail.com

Re: Solr Accent Insensitive and sensitive search

Posted by Erick Erickson <er...@gmail.com>.

Well,  we can't tell. Because you haven't identified the field you are
working with.

So, we need two additional pieces of information:
the query you use that works
the query you use that doesn't work

And attach &debugQuery=on to both of them and post the results back please.

But looking at the admin/analysis page with the fields and input in
question may help
you get an idea what's going on. Also, the "full interface" on the
admin page will put the
debug information in a pretty format (make sure to check the "debug" checkbox).

How are you trying to get an "exact match?"

Best
Erick


On Thu, Aug 18, 2011 at 8:26 AM, Denis WSRosa <de...@gmail.com> wrote:
> Hi! Thank you for your response!
>
> here is my full schema:
>
> <?xml version="1.0" encoding="UTF-8" ?>
> <!-- Licensed to the Apache Software Foundation (ASF) under one or more
> contributor
>    license agreements. See the NOTICE file distributed with this work for
> additional
>    information regarding copyright ownership. The ASF licenses this file to
>
>    You under the Apache License, Version 2.0 (the "License"); you may not
> use
>    this file except in compliance with the License. You may obtain a copy
> of
>    the License at http://www.apache.org/licenses/LICENSE-2.0 Unless
> required
>    by applicable law or agreed to in writing, software distributed under
> the
>    License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
> CONDITIONS
>    OF ANY KIND, either express or implied. See the License for the specific
>
>    language governing permissions and limitations under the License. -->
>
> <!-- This is the Solr schema file. This file should be named "schema.xml"
>    and should be in the conf directory under the solr home (i.e.
> ./solr/conf/schema.xml
>    by default) or located where the classloader for the Solr webapp can
> find
>    it. This example schema is the recommended starting point for users. It
> should
>    be kept correct and concise, usable out-of-the-box. For more
> information,
>    on how to customize this file, please see
> http://wiki.apache.org/solr/SchemaXml
>    PERFORMANCE NOTE: this schema includes many optional features and should
>
>    not be used for benchmarking. To improve performance one could - set
> stored="false"
>    for all fields possible (esp large fields) when you only need to search
> on
>    the field but don't need to return the original value. - set
> indexed="false"
>    if you don't need to search on the field, but only return the field as a
>
>    result of searching on other indexed fields. - remove all unneeded
> copyField
>    statements - for best index size and searching performance, set "index"
> to
>    false for all general text fields, use copyField to copy them to the
> catchall
>    "text" field, and use that for searching. - For maximum indexing
> performance,
>    use the StreamingUpdateSolrServer java client. - Remember to run the JVM
>
>    in server mode, and use a higher logging level that avoids logging every
>
>    request -->
>
> <schema name="example" version="1.4">
>
>    <types>
>
>        <fieldType name="uuid" class="solr.StrField" multiValued="false" />
>        <!-- Not analized field -->
>        <fieldType name="string" class="solr.StrField" multiValued="false"
>            omitNorms="true" />
>
>        <!-- boolean type: "true" or "false" -->
>        <fieldType name="boolean" class="solr.BoolField"
>            sortMissingLast="true" omitNorms="true" />
>        <!--Binary data type. The data should be sent/retrieved in as Base64
> encoded
>            Strings -->
>        <fieldtype name="binary" class="solr.BinaryField" />
>        <!-- Default numeric field types. For faster range queries, consider
> the
>            tint/tfloat/tlong/tdouble types. -->
>        <fieldType name="int" class="solr.TrieIntField"
>            precisionStep="0" omitNorms="true" positionIncrementGap="0" />
>        <fieldType name="float" class="solr.TrieFloatField"
>            precisionStep="0" omitNorms="true" positionIncrementGap="0" />
>        <fieldType name="long" class="solr.TrieLongField"
>            precisionStep="0" omitNorms="true" positionIncrementGap="0" />
>        <fieldType name="double" class="solr.TrieDoubleField"
>            precisionStep="0" omitNorms="true" positionIncrementGap="0" />
>        <fieldType name="date" class="solr.DateField"
>            sortMissingLast="true" omitNorms="true" />
>
>        <!-- Numeric field types that index each value at various levels of
> precision
>            to accelerate range queries when the number of values between
> the range endpoints
>            is large. See the javadoc for NumericRangeQuery for internal
> implementation
>            details. Smaller precisionStep values (specified in bits) will
> lead to more
>            tokens indexed per value, slightly larger index size, and faster
> range queries.
>            A precisionStep of 0 disables indexing at different precision
> levels. -->
>        <fieldType name="tint" class="solr.TrieIntField"
>            precisionStep="8" omitNorms="true" positionIncrementGap="0" />
>        <fieldType name="tfloat" class="solr.TrieFloatField"
>            precisionStep="8" omitNorms="true" positionIncrementGap="0" />
>        <fieldType name="tlong" class="solr.TrieLongField"
>            precisionStep="8" omitNorms="true" positionIncrementGap="0" />
>        <fieldType name="tdouble" class="solr.TrieDoubleField"
>            precisionStep="8" omitNorms="true" positionIncrementGap="0" />
>        <!-- A Trie based date field for faster date range queries and date
> faceting. -->
>        <fieldType name="tdate" class="solr.TrieDateField"
>            omitNorms="true" precisionStep="6" positionIncrementGap="0" />
>
>        <!-- Key type fields, no filers -->
>        <fieldType name="keytype" class="solr.TextField"
>            multiValued="false" omitNorms="true">
>            <analyzer>
>                <tokenizer class="solr.KeywordTokenizerFactory" />
>            </analyzer>
>        </fieldType>
>
>        <!-- A general text field that has reasonable, generic
> cross-language defaults:
>            it tokenizes with StandardTokenizer, removes stop words from
> case-insensitive
>            "stopwords.txt" (empty by default), and down cases. At query
> time only, it
>            also applies synonyms. -->
>        <fieldType name="text" class="solr.TextField"
>            positionIncrementGap="100">
>            <analyzer type="index">
>                <tokenizer class="solr.StandardTokenizerFactory" />
>                <!-- <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="stopwords.txt"
>                    enablePositionIncrements="true" /> -->
>                <filter class="solr.ASCIIFoldingFilterFactory" />
>                <filter class="solr.LowerCaseFilterFactory" />
>            </analyzer>
>            <analyzer type="query">
>                <tokenizer class="solr.StandardTokenizerFactory" />
>                <!-- filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"
>                    enablePositionIncrements="true" / -->
>                <!--filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt"
>                    ignoreCase="true" expand="true"/ -->
>                <filter class="solr.ASCIIFoldingFilterFactory" />
>                <filter class="solr.LowerCaseFilterFactory" />
>            </analyzer>
>        </fieldType>
>
>        <!-- lowercases the entire field value, keeping it as a single
> token. -->
>        <fieldType name="tags" class="solr.TextField"
>            positionIncrementGap="100">
>            <analyzer>
>                <filter class="solr.ASCIIFoldingFilterFactory" />
>                <tokenizer class="solr.PatternTokenizerFactory" pattern=","
> />
>                <filter class="solr.LowerCaseFilterFactory" />
>            </analyzer>
>        </fieldType>
>
>        <!-- lowercases the entire field value, keeping it as a single
> token. -->
>        <fieldType name="number" class="solr.TextField"
>            positionIncrementGap="100">
>            <analyzer>
>                <tokenizer class="solr.WhitespaceTokenizerFactory" />
>                <filter class="solr.TrimFilterFactory" />
>            </analyzer>
>        </fieldType>
>
>        <!-- A general content field used for search. Should be used for
> content
>            strings. This sort of field will have the html tags removed -->
>        <fieldType name="content" class="solr.TextField"
>            positionIncrementGap="100">
>            <analyzer>
>                <charFilter class="solr.HTMLStripCharFilterFactory" />
>                <tokenizer class="solr.StandardTokenizerFactory" />
>                <!-- <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="stopwords.txt"
>                    enablePositionIncrements="true" /> -->
>                <!-- <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt"
>                    ignoreCase="true" expand="false"/> -->
>                <filter class="solr.ASCIIFoldingFilterFactory" />
>                <filter class="solr.LowerCaseFilterFactory" />
>            </analyzer>
>        </fieldType>
>
>        <!-- Just like text except it reverses the characters of each token,
> to
>            enable more efficient leading wildcard queries. -->
>        <fieldType name="text_rev" class="solr.TextField"
>            positionIncrementGap="100">
>            <analyzer type="index">
>                <tokenizer class="solr.StandardTokenizerFactory" />
>                <!-- <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="stopwords.txt"
>                    enablePositionIncrements="true" /> -->
>                <filter class="solr.ASCIIFoldingFilterFactory" />
>                <filter class="solr.LowerCaseFilterFactory" />
>                <filter class="solr.ReversedWildcardFilterFactory"
>                    withOriginal="true" maxPosAsterisk="3"
> maxPosQuestion="2"
>                    maxFractionAsterisk="0.33" />
>            </analyzer>
>            <analyzer type="query">
>                <tokenizer class="solr.StandardTokenizerFactory" />
>                <!-- <filter class="solr.StopFilterFactory"
> ignoreCase="true" words="stopwords.txt"
>                    enablePositionIncrements="true" /> -->
>                <filter class="solr.ASCIIFoldingFilterFactory" />
>                <filter class="solr.LowerCaseFilterFactory" />
>            </analyzer>
>        </fieldType>
>
>        <!-- Just like tags except it reverses the characters of each token,
> to
>            enable more efficient leading wildcard queries. -->
>        <fieldType name="tags_rev" class="solr.TextField"
>            positionIncrementGap="100">
>            <analyzer type="index">
>                <filter class="solr.ASCIIFoldingFilterFactory" />
>                <tokenizer class="solr.PatternTokenizerFactory" pattern=","
> />
>                <filter class="solr.LowerCaseFilterFactory" />
>                <filter class="solr.ReversedWildcardFilterFactory"
>                    withOriginal="true" maxPosAsterisk="3"
> maxPosQuestion="2"
>                    maxFractionAsterisk="0.33" />
>            </analyzer>
>            <analyzer type="query">
>                <filter class="solr.ASCIIFoldingFilterFactory" />
>                <tokenizer class="solr.PatternTokenizerFactory" pattern=","
> />
>                <filter class="solr.LowerCaseFilterFactory" />
>            </analyzer>
>        </fieldType>
>
>    </types>
>
>
>    <fields>
>
>        <!-- Basic fields -->
>        <field name="UUID" type="uuid" indexed="true" stored="false"
>            multiValued="false" required="true" />
>        <field name="DocumentType" type="keytype" indexed="true"
> stored="false"
>            required="true" multiValued="false"/>
>        <field name="DocumentLocale" type="string" indexed="true"
>            stored="true" required="false" />
>        <field name="DocumentId" type="string" indexed="false" stored="true"
>            required="true" />
>        <field name="DocumentName" type="text" indexed="true" stored="true"
>            required="false" />
>        <field name="DocumentDisplayName" type="text" indexed="true"
>            stored="true" required="true" />
>        <field name="DocumentCreateDate" type="text" indexed="false"
>            stored="true" required="false" />
>        <field name="DocumentLastUpdateDate" type="text" indexed="false"
>            stored="true" required="false" />
>        <field name="DocumentContent" type="content" indexed="true"
>            stored="false" required="false" />
>        <field name="DocumentMIME" type="text" indexed="true" stored="true"
>            required="false" />
>        <field name="DocumentTAGS" type="tags" indexed="true" stored="true"
>            required="false" />
>        <field name="URL" type="string" indexed="false" stored="true"
>            required="false" />
>        <field name="DocumentUSER" type="long" indexed="false" stored="true"
>            required="false" />
>        <field name="DocumentAuthor" type="string" indexed="false"
>            stored="true" required="false" />
>        <field name="DocumentSpace" type="long" indexed="false"
> stored="true"
>            required="false" />
>        <field name="DocumentTenant" type="long" indexed="false"
> stored="true"
>            required="false" />
>        <field name="DocumentDescription" type="text" indexed="false"
>            stored="true" required="false" />
>        <field name="META.Content-Type" type="string" indexed="false"
>            stored="false" required="false" />
>        <field name="DELETED" type="string" indexed="true" stored="false"
>            required="false" />
>
>        <!-- Extra Fields -->
>
>        <!-- Indexed general text field -->
>        <dynamicField name="*_text_i" type="text" indexed="true"
>            stored="false" />
>        <!-- Stored general text field -->
>        <dynamicField name="*_text_s" type="text" indexed="false"
>            stored="true" />
>        <!-- Indexed and stored general text field -->
>        <dynamicField name="*_text_is" type="text" indexed="true"
>            stored="true" />
>        <!-- Indexed general number field -->
>        <dynamicField name="*_long_i" type="long" indexed="true"
>            stored="false" />
>        <!-- Stored general number field -->
>        <dynamicField name="*_long_s" type="long" indexed="false"
>            stored="true" />
>        <!-- Indexed and stored general number field -->
>        <dynamicField name="*_long_is" type="long" indexed="true"
>            stored="true" />
>        <!-- Indexed general date field -->
>        <dynamicField name="*_date_i" type="long" indexed="true"
>            stored="false" />
>        <!-- Stored general date field -->
>        <dynamicField name="*_date_s" type="long" indexed="false"
>            stored="true" />
>        <!-- Indexed and stored general date field -->
>        <dynamicField name="*_date_is" type="long" indexed="true"
>            stored="true" />
>        <!-- Indexed general boolean field -->
>        <dynamicField name="*_boolean_i" type="boolean" indexed="true"
>            stored="false" />
>        <!-- Stored general boolean field -->
>        <dynamicField name="*_boolean_s" type="boolean" indexed="false"
>            stored="true" />
>        <!-- Indexed and stored general boolean field -->
>        <dynamicField name="*_boolean_is" type="boolean" indexed="true"
>            stored="true" />
>        <!-- Indexed general mult valuated number fields -->
>        <dynamicField name="*_number_i" type="number" indexed="true"
>            stored="false" />
>        <!-- Stored general mult valuated number fields -->
>        <dynamicField name="*_number_s" type="number" indexed="false"
>            stored="true" />
>        <!-- Indexed and stored general mult valuated number fields -->
>        <dynamicField name="*_number_is" type="number" indexed="true"
>            stored="true" />
>
>        <!-- catchall text field that indexes tokens both normally and in
> reverse
>            for efficient leading wildcard queries. -->
>        <field name="DocumentDisplayName_rev" type="text_rev" indexed="true"
>            stored="false" multiValued="false" />
>        <field name="DocumentDescription_rev" type="text_rev" indexed="true"
>            stored="false" multiValued="false" />
>        <field name="DocumentTAGS_rev" type="tags_rev" indexed="true"
>            stored="false" multiValued="true" />
>        <field name="DocumentContent_rev" type="text_rev" indexed="true"
>            stored="false" multiValued="true" />
>
>        <!-- All other fields -->
>        <dynamicField name="*" type="string" indexed="true"
>            stored="false" />
>
>    </fields>
>
>    <!-- Field to use to determine and enforce document uniqueness. Unless
> this
>        field is marked with required="false", it will be a required field
> -->
>    <uniqueKey>UUID</uniqueKey>
>
>    <!-- field for the QueryParser to use when an explicit fieldname is
> absent -->
>    <defaultSearchField>DocumentDisplayName</defaultSearchField>
>
>    <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
>    <solrQueryParser defaultOperator="AND" />
>
>    <!-- copyField commands copy one field to another at the time a document
>
>        is added to the index. It's used either to index the same field
> differently,
>        or to add multiple fields to the same field for easier/faster
> searching. -->
>    <copyField source="DocumentDescription" dest="DocumentDescription_rev"
> />
>    <copyField source="DocumentDisplayName" dest="DocumentDisplayName_rev"
> />
>    <copyField source="DocumentTAGS" dest="DocumentTAGS_rev" />
>    <copyField source="DocumentContent" dest="DocumentContent_rev" />
>
> </schema>
>
>
> What I'm doing wrong?
>
>
>
>
> On Wed, Aug 17, 2011 at 5:37 PM, Michael Ryan <mr...@moreover.com> wrote:
>
>> Are you using the same analyzer for both type="query" and type="index"? Can
>> you show us the fieldType from your schema?
>>
>> -Michael
>>
>
>
>
> --
> Denis Wilson Souza Rosa
> ----------------------------------------------------
> Systems Architect
> mobile: +55 11 8112 8284
> email: deniswsrosa@gmail.com / deniswsrosa@hotmail.com
>

Re: Solr Accent Insensitive and sensitive search

Posted by Denis WSRosa <de...@gmail.com>.

Hi! Thank you for your response!

here is my full schema:

<?xml version="1.0" encoding="UTF-8" ?>
<!-- Licensed to the Apache Software Foundation (ASF) under one or more
contributor
    license agreements. See the NOTICE file distributed with this work for
additional
    information regarding copyright ownership. The ASF licenses this file to

    You under the Apache License, Version 2.0 (the "License"); you may not
use
    this file except in compliance with the License. You may obtain a copy
of
    the License at http://www.apache.org/licenses/LICENSE-2.0 Unless
required
    by applicable law or agreed to in writing, software distributed under
the
    License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
CONDITIONS
    OF ANY KIND, either express or implied. See the License for the specific

    language governing permissions and limitations under the License. -->

<!-- This is the Solr schema file. This file should be named "schema.xml"
    and should be in the conf directory under the solr home (i.e.
./solr/conf/schema.xml
    by default) or located where the classloader for the Solr webapp can
find
    it. This example schema is the recommended starting point for users. It
should
    be kept correct and concise, usable out-of-the-box. For more
information,
    on how to customize this file, please see
http://wiki.apache.org/solr/SchemaXml
    PERFORMANCE NOTE: this schema includes many optional features and should

    not be used for benchmarking. To improve performance one could - set
stored="false"
    for all fields possible (esp large fields) when you only need to search
on
    the field but don't need to return the original value. - set
indexed="false"
    if you don't need to search on the field, but only return the field as a

    result of searching on other indexed fields. - remove all unneeded
copyField
    statements - for best index size and searching performance, set "index"
to
    false for all general text fields, use copyField to copy them to the
catchall
    "text" field, and use that for searching. - For maximum indexing
performance,
    use the StreamingUpdateSolrServer java client. - Remember to run the JVM

    in server mode, and use a higher logging level that avoids logging every

    request -->

<schema name="example" version="1.4">

    <types>

        <fieldType name="uuid" class="solr.StrField" multiValued="false" />
        <!-- Not analized field -->
        <fieldType name="string" class="solr.StrField" multiValued="false"
            omitNorms="true" />

        <!-- boolean type: "true" or "false" -->
        <fieldType name="boolean" class="solr.BoolField"
            sortMissingLast="true" omitNorms="true" />
        <!--Binary data type. The data should be sent/retrieved in as Base64
encoded
            Strings -->
        <fieldtype name="binary" class="solr.BinaryField" />
        <!-- Default numeric field types. For faster range queries, consider
the
            tint/tfloat/tlong/tdouble types. -->
        <fieldType name="int" class="solr.TrieIntField"
            precisionStep="0" omitNorms="true" positionIncrementGap="0" />
        <fieldType name="float" class="solr.TrieFloatField"
            precisionStep="0" omitNorms="true" positionIncrementGap="0" />
        <fieldType name="long" class="solr.TrieLongField"
            precisionStep="0" omitNorms="true" positionIncrementGap="0" />
        <fieldType name="double" class="solr.TrieDoubleField"
            precisionStep="0" omitNorms="true" positionIncrementGap="0" />
        <fieldType name="date" class="solr.DateField"
            sortMissingLast="true" omitNorms="true" />

        <!-- Numeric field types that index each value at various levels of
precision
            to accelerate range queries when the number of values between
the range endpoints
            is large. See the javadoc for NumericRangeQuery for internal
implementation
            details. Smaller precisionStep values (specified in bits) will
lead to more
            tokens indexed per value, slightly larger index size, and faster
range queries.
            A precisionStep of 0 disables indexing at different precision
levels. -->
        <fieldType name="tint" class="solr.TrieIntField"
            precisionStep="8" omitNorms="true" positionIncrementGap="0" />
        <fieldType name="tfloat" class="solr.TrieFloatField"
            precisionStep="8" omitNorms="true" positionIncrementGap="0" />
        <fieldType name="tlong" class="solr.TrieLongField"
            precisionStep="8" omitNorms="true" positionIncrementGap="0" />
        <fieldType name="tdouble" class="solr.TrieDoubleField"
            precisionStep="8" omitNorms="true" positionIncrementGap="0" />
        <!-- A Trie based date field for faster date range queries and date
faceting. -->
        <fieldType name="tdate" class="solr.TrieDateField"
            omitNorms="true" precisionStep="6" positionIncrementGap="0" />

        <!-- Key type fields, no filers -->
        <fieldType name="keytype" class="solr.TextField"
            multiValued="false" omitNorms="true">
            <analyzer>
                <tokenizer class="solr.KeywordTokenizerFactory" />
            </analyzer>
        </fieldType>

        <!-- A general text field that has reasonable, generic
cross-language defaults:
            it tokenizes with StandardTokenizer, removes stop words from
case-insensitive
            "stopwords.txt" (empty by default), and down cases. At query
time only, it
            also applies synonyms. -->
        <fieldType name="text" class="solr.TextField"
            positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.StandardTokenizerFactory" />
                <!-- <filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt"
                    enablePositionIncrements="true" /> -->
                <filter class="solr.ASCIIFoldingFilterFactory" />
                <filter class="solr.LowerCaseFilterFactory" />
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.StandardTokenizerFactory" />
                <!-- filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"
                    enablePositionIncrements="true" / -->
                <!--filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt"
                    ignoreCase="true" expand="true"/ -->
                <filter class="solr.ASCIIFoldingFilterFactory" />
                <filter class="solr.LowerCaseFilterFactory" />
            </analyzer>
        </fieldType>

        <!-- lowercases the entire field value, keeping it as a single
token. -->
        <fieldType name="tags" class="solr.TextField"
            positionIncrementGap="100">
            <analyzer>
                <filter class="solr.ASCIIFoldingFilterFactory" />
                <tokenizer class="solr.PatternTokenizerFactory" pattern=","
/>
                <filter class="solr.LowerCaseFilterFactory" />
            </analyzer>
        </fieldType>

        <!-- lowercases the entire field value, keeping it as a single
token. -->
        <fieldType name="number" class="solr.TextField"
            positionIncrementGap="100">
            <analyzer>
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.TrimFilterFactory" />
            </analyzer>
        </fieldType>

        <!-- A general content field used for search. Should be used for
content
            strings. This sort of field will have the html tags removed -->
        <fieldType name="content" class="solr.TextField"
            positionIncrementGap="100">
            <analyzer>
                <charFilter class="solr.HTMLStripCharFilterFactory" />
                <tokenizer class="solr.StandardTokenizerFactory" />
                <!-- <filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt"
                    enablePositionIncrements="true" /> -->
                <!-- <filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt"
                    ignoreCase="true" expand="false"/> -->
                <filter class="solr.ASCIIFoldingFilterFactory" />
                <filter class="solr.LowerCaseFilterFactory" />
            </analyzer>
        </fieldType>

        <!-- Just like text except it reverses the characters of each token,
to
            enable more efficient leading wildcard queries. -->
        <fieldType name="text_rev" class="solr.TextField"
            positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.StandardTokenizerFactory" />
                <!-- <filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt"
                    enablePositionIncrements="true" /> -->
                <filter class="solr.ASCIIFoldingFilterFactory" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.ReversedWildcardFilterFactory"
                    withOriginal="true" maxPosAsterisk="3"
maxPosQuestion="2"
                    maxFractionAsterisk="0.33" />
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.StandardTokenizerFactory" />
                <!-- <filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt"
                    enablePositionIncrements="true" /> -->
                <filter class="solr.ASCIIFoldingFilterFactory" />
                <filter class="solr.LowerCaseFilterFactory" />
            </analyzer>
        </fieldType>

        <!-- Just like tags except it reverses the characters of each token,
to
            enable more efficient leading wildcard queries. -->
        <fieldType name="tags_rev" class="solr.TextField"
            positionIncrementGap="100">
            <analyzer type="index">
                <filter class="solr.ASCIIFoldingFilterFactory" />
                <tokenizer class="solr.PatternTokenizerFactory" pattern=","
/>
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.ReversedWildcardFilterFactory"
                    withOriginal="true" maxPosAsterisk="3"
maxPosQuestion="2"
                    maxFractionAsterisk="0.33" />
            </analyzer>
            <analyzer type="query">
                <filter class="solr.ASCIIFoldingFilterFactory" />
                <tokenizer class="solr.PatternTokenizerFactory" pattern=","
/>
                <filter class="solr.LowerCaseFilterFactory" />
            </analyzer>
        </fieldType>

    </types>


    <fields>

        <!-- Basic fields -->
        <field name="UUID" type="uuid" indexed="true" stored="false"
            multiValued="false" required="true" />
        <field name="DocumentType" type="keytype" indexed="true"
stored="false"
            required="true" multiValued="false"/>
        <field name="DocumentLocale" type="string" indexed="true"
            stored="true" required="false" />
        <field name="DocumentId" type="string" indexed="false" stored="true"
            required="true" />
        <field name="DocumentName" type="text" indexed="true" stored="true"
            required="false" />
        <field name="DocumentDisplayName" type="text" indexed="true"
            stored="true" required="true" />
        <field name="DocumentCreateDate" type="text" indexed="false"
            stored="true" required="false" />
        <field name="DocumentLastUpdateDate" type="text" indexed="false"
            stored="true" required="false" />
        <field name="DocumentContent" type="content" indexed="true"
            stored="false" required="false" />
        <field name="DocumentMIME" type="text" indexed="true" stored="true"
            required="false" />
        <field name="DocumentTAGS" type="tags" indexed="true" stored="true"
            required="false" />
        <field name="URL" type="string" indexed="false" stored="true"
            required="false" />
        <field name="DocumentUSER" type="long" indexed="false" stored="true"
            required="false" />
        <field name="DocumentAuthor" type="string" indexed="false"
            stored="true" required="false" />
        <field name="DocumentSpace" type="long" indexed="false"
stored="true"
            required="false" />
        <field name="DocumentTenant" type="long" indexed="false"
stored="true"
            required="false" />
        <field name="DocumentDescription" type="text" indexed="false"
            stored="true" required="false" />
        <field name="META.Content-Type" type="string" indexed="false"
            stored="false" required="false" />
        <field name="DELETED" type="string" indexed="true" stored="false"
            required="false" />

        <!-- Extra Fields -->

        <!-- Indexed general text field -->
        <dynamicField name="*_text_i" type="text" indexed="true"
            stored="false" />
        <!-- Stored general text field -->
        <dynamicField name="*_text_s" type="text" indexed="false"
            stored="true" />
        <!-- Indexed and stored general text field -->
        <dynamicField name="*_text_is" type="text" indexed="true"
            stored="true" />
        <!-- Indexed general number field -->
        <dynamicField name="*_long_i" type="long" indexed="true"
            stored="false" />
        <!-- Stored general number field -->
        <dynamicField name="*_long_s" type="long" indexed="false"
            stored="true" />
        <!-- Indexed and stored general number field -->
        <dynamicField name="*_long_is" type="long" indexed="true"
            stored="true" />
        <!-- Indexed general date field -->
        <dynamicField name="*_date_i" type="long" indexed="true"
            stored="false" />
        <!-- Stored general date field -->
        <dynamicField name="*_date_s" type="long" indexed="false"
            stored="true" />
        <!-- Indexed and stored general date field -->
        <dynamicField name="*_date_is" type="long" indexed="true"
            stored="true" />
        <!-- Indexed general boolean field -->
        <dynamicField name="*_boolean_i" type="boolean" indexed="true"
            stored="false" />
        <!-- Stored general boolean field -->
        <dynamicField name="*_boolean_s" type="boolean" indexed="false"
            stored="true" />
        <!-- Indexed and stored general boolean field -->
        <dynamicField name="*_boolean_is" type="boolean" indexed="true"
            stored="true" />
        <!-- Indexed general mult valuated number fields -->
        <dynamicField name="*_number_i" type="number" indexed="true"
            stored="false" />
        <!-- Stored general mult valuated number fields -->
        <dynamicField name="*_number_s" type="number" indexed="false"
            stored="true" />
        <!-- Indexed and stored general mult valuated number fields -->
        <dynamicField name="*_number_is" type="number" indexed="true"
            stored="true" />

        <!-- catchall text field that indexes tokens both normally and in
reverse
            for efficient leading wildcard queries. -->
        <field name="DocumentDisplayName_rev" type="text_rev" indexed="true"
            stored="false" multiValued="false" />
        <field name="DocumentDescription_rev" type="text_rev" indexed="true"
            stored="false" multiValued="false" />
        <field name="DocumentTAGS_rev" type="tags_rev" indexed="true"
            stored="false" multiValued="true" />
        <field name="DocumentContent_rev" type="text_rev" indexed="true"
            stored="false" multiValued="true" />

        <!-- All other fields -->
        <dynamicField name="*" type="string" indexed="true"
            stored="false" />

    </fields>

    <!-- Field to use to determine and enforce document uniqueness. Unless
this
        field is marked with required="false", it will be a required field
-->
    <uniqueKey>UUID</uniqueKey>

    <!-- field for the QueryParser to use when an explicit fieldname is
absent -->
    <defaultSearchField>DocumentDisplayName</defaultSearchField>

    <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
    <solrQueryParser defaultOperator="AND" />

    <!-- copyField commands copy one field to another at the time a document

        is added to the index. It's used either to index the same field
differently,
        or to add multiple fields to the same field for easier/faster
searching. -->
    <copyField source="DocumentDescription" dest="DocumentDescription_rev"
/>
    <copyField source="DocumentDisplayName" dest="DocumentDisplayName_rev"
/>
    <copyField source="DocumentTAGS" dest="DocumentTAGS_rev" />
    <copyField source="DocumentContent" dest="DocumentContent_rev" />

</schema>


What I'm doing wrong?




On Wed, Aug 17, 2011 at 5:37 PM, Michael Ryan <mr...@moreover.com> wrote:

> Are you using the same analyzer for both type="query" and type="index"? Can
> you show us the fieldType from your schema?
>
> -Michael
>



-- 
Denis Wilson Souza Rosa
----------------------------------------------------
Systems Architect
mobile: +55 11 8112 8284
email: deniswsrosa@gmail.com / deniswsrosa@hotmail.com

RE: Solr Accent Insensitive and sensitive search

Posted by Michael Ryan <mr...@moreover.com>.

Are you using the same analyzer for both type="query" and type="index"? Can you show us the fieldType from your schema?

-Michael