You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Zheng Lin Edwin Yeo <ed...@gmail.com> on 2018/10/02 02:39:22 UTC
Re: Solr edismax multi-word match issue

Sorry, couldn't quite get your issue. Are you trying to search for "viet
nam", and you are expecting to find a match for "Vietnam" in your index but
you could not find it?
Also, which version of Solr are you using?

Regards,
Edwin

On Thu, 20 Sep 2018 at 15:09, Simon Bloch <si...@gmail.com> wrote:

> Hi,
>
> I'm having issues getting an edismax query to match a certain document via
> a particular field ("name_c"). I believe this issue is related to
> whitespace removal and field/edismax configuration.
>
> *Search term:* "viet nam"
> *Document name:* "Vietnam"
>
> *Field Type: *
>   <!-- Exact match, whitespace ignored (e.g. "$Fish %Sticks"=="fishsticks")
> -->
>   <fieldType class="solr.TextField" name="text_exact_concat"
> omitNorms="true"
>              positionIncrementGap="0" omitTermFreqAndPositions="true">
>     <analyzer>
>       <charFilter class="solr.PatternReplaceCharFilterFactory"
>                   pattern="([^a-z0-9])" replacement=""/>
>       <tokenizer class="solr.KeywordTokenizerFactory"/>
>       <filter class="solr.PatternReplaceFilterFactory" pattern="(\s+)"
> replacement="" replace="all" />
>       <filter class="solr.ASCIIFoldingFilterFactory"
> preserveOriginal="false"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>   </fieldType>
>
> *Field: *
> <field name="name_c" type="text_exact_concat" multiValued="false"
> indexed="true" required="false" stored="false"/>
>
> *Raw Query (from Solr Admin Console):*
> q=viet nam&
> defType=edismax&
> sow=false&
> qf=name^1.0 name_c^10.0 ancestor_name^1.25&
> sort=score desc, name_c asc&
> wt=json&indent=true
>
> *Issue Explanation:*
> When I execute the query in my local admin console (with debugQuery
> enabled) I don't see a match or score for "Vietnam" for the field "name_c".
>
>    - I have this field boosted extra high so any match will take
> precedence.
>    - I'm confident that this isn't being caused by any other fields I have
>    more not listed but I removed for clarity
>    - I believe this is caused by whitespace interpretation
>    - Interestingly, the space is removed for the "name_c" field in the
>    parsedquery:
>
> ########################################################################
> "parsedquery":"(+DisjunctionMaxQuery(((name_c:vietnam)^10.0 |
>                                       (ancestor_name:viet nam)^1.25 |
>                                       (name:viet name_ps:nam)^1.0)"
>
> "parsedquery_toString":"+((name_c:vietnam)^10.0 |
>                           (ancestor_name:viet nam)^1.25 |
>                           (name:viet nam)^1.0)
> ########################################################################
>
> I would really appreciate any support or debugging advice in this matter!
> -Simon Bloch
>