You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Simon Bloch <si...@gmail.com> on 2018/09/19 22:21:53 UTC

Solr edismax multi-word match issue

Hi,

I'm having issues getting an edismax query to match a certain document via
a particular field ("name_c"). I believe this issue is related to
whitespace removal and field/edismax configuration.

*Search term:* "viet nam"
*Document name:* "Vietnam"

*Field Type: *
  <!-- Exact match, whitespace ignored (e.g. "$Fish %Sticks"=="fishsticks")
-->
  <fieldType class="solr.TextField" name="text_exact_concat"
omitNorms="true"
             positionIncrementGap="0" omitTermFreqAndPositions="true">
    <analyzer>
      <charFilter class="solr.PatternReplaceCharFilterFactory"
                  pattern="([^a-z0-9])" replacement=""/>
      <tokenizer class="solr.KeywordTokenizerFactory"/>
      <filter class="solr.PatternReplaceFilterFactory" pattern="(\s+)"
replacement="" replace="all" />
      <filter class="solr.ASCIIFoldingFilterFactory"
preserveOriginal="false"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

*Field: *
<field name="name_c" type="text_exact_concat" multiValued="false"
indexed="true" required="false" stored="false"/>

*Raw Query (from Solr Admin Console):*
q=viet nam&
defType=edismax&
sow=false&
qf=name^1.0 name_c^10.0 ancestor_name^1.25&
sort=score desc, name_c asc&
wt=json&indent=true

*Issue Explanation:*
When I execute the query in my local admin console (with debugQuery
enabled) I don't see a match or score for "Vietnam" for the field "name_c".

   - I have this field boosted extra high so any match will take precedence.
   - I'm confident that this isn't being caused by any other fields I have
   more not listed but I removed for clarity
   - I believe this is caused by whitespace interpretation
   - Interestingly, the space is removed for the "name_c" field in the
   parsedquery:

########################################################################
"parsedquery":"(+DisjunctionMaxQuery(((name_c:vietnam)^10.0 |
                                      (ancestor_name:viet nam)^1.25 |
                                      (name:viet name_ps:nam)^1.0)"

"parsedquery_toString":"+((name_c:vietnam)^10.0 |
                          (ancestor_name:viet nam)^1.25 |
                          (name:viet nam)^1.0)
########################################################################

I would really appreciate any support or debugging advice in this matter!
-Simon Bloch

Re: Solr edismax multi-word match issue

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Sorry, couldn't quite get your issue. Are you trying to search for "viet
nam", and you are expecting to find a match for "Vietnam" in your index but
you could not find it?
Also, which version of Solr are you using?

Regards,
Edwin

On Thu, 20 Sep 2018 at 15:09, Simon Bloch <si...@gmail.com> wrote:

> Hi,
>
> I'm having issues getting an edismax query to match a certain document via
> a particular field ("name_c"). I believe this issue is related to
> whitespace removal and field/edismax configuration.
>
> *Search term:* "viet nam"
> *Document name:* "Vietnam"
>
> *Field Type: *
>   <!-- Exact match, whitespace ignored (e.g. "$Fish %Sticks"=="fishsticks")
> -->
>   <fieldType class="solr.TextField" name="text_exact_concat"
> omitNorms="true"
>              positionIncrementGap="0" omitTermFreqAndPositions="true">
>     <analyzer>
>       <charFilter class="solr.PatternReplaceCharFilterFactory"
>                   pattern="([^a-z0-9])" replacement=""/>
>       <tokenizer class="solr.KeywordTokenizerFactory"/>
>       <filter class="solr.PatternReplaceFilterFactory" pattern="(\s+)"
> replacement="" replace="all" />
>       <filter class="solr.ASCIIFoldingFilterFactory"
> preserveOriginal="false"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>   </fieldType>
>
> *Field: *
> <field name="name_c" type="text_exact_concat" multiValued="false"
> indexed="true" required="false" stored="false"/>
>
> *Raw Query (from Solr Admin Console):*
> q=viet nam&
> defType=edismax&
> sow=false&
> qf=name^1.0 name_c^10.0 ancestor_name^1.25&
> sort=score desc, name_c asc&
> wt=json&indent=true
>
> *Issue Explanation:*
> When I execute the query in my local admin console (with debugQuery
> enabled) I don't see a match or score for "Vietnam" for the field "name_c".
>
>    - I have this field boosted extra high so any match will take
> precedence.
>    - I'm confident that this isn't being caused by any other fields I have
>    more not listed but I removed for clarity
>    - I believe this is caused by whitespace interpretation
>    - Interestingly, the space is removed for the "name_c" field in the
>    parsedquery:
>
> ########################################################################
> "parsedquery":"(+DisjunctionMaxQuery(((name_c:vietnam)^10.0 |
>                                       (ancestor_name:viet nam)^1.25 |
>                                       (name:viet name_ps:nam)^1.0)"
>
> "parsedquery_toString":"+((name_c:vietnam)^10.0 |
>                           (ancestor_name:viet nam)^1.25 |
>                           (name:viet nam)^1.0)
> ########################################################################
>
> I would really appreciate any support or debugging advice in this matter!
> -Simon Bloch
>