You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Tiffany Goguen (JIRA)" <ji...@apache.org> on 2016/03/28 22:00:26 UTC

[jira] [Created] (SOLR-8915) Issue with CJK and mm being ignored when searching with white space

Tiffany Goguen created SOLR-8915:
------------------------------------

             Summary: Issue with CJK and mm being ignored when searching with white space
                 Key: SOLR-8915
                 URL: https://issues.apache.org/jira/browse/SOLR-8915
             Project: Solr
          Issue Type: Bug
    Affects Versions: 5.5
            Reporter: Tiffany Goguen
            Priority: Minor


I am using edismax and I have set mm=100

I have the following in the request
handler:

       <str name="defType">edismax</str>
       <str name="mm">100</str>

I am not using q.op or <solrQueryParser 
> defaultOperator="AND"/>

My search terms are クイックリファレンス
Term 1 - クイック
Term 2- リファレンス

If I search forクイックリファレンス (no spaces) I get no results.  This expected.

If I search for クイック リファレンス (space between ク リ) I get 1 result.  This
is bad.  I am expecting mm=100 to still apply.

If I search for クイックOR リファレンス I get 1 result.  This expected.  The OR
is overriding the mm=100.

If I search for クイック AND リファレンス I get 1 result.  This is bad.  I am expecting
mm=100 to still apply.

In CJK searches spaces should not matter.  In the Analysis tool I can see the correct tokens
being generated.  The parser is doing different things based on space or no space in the query.

With space (not expected result):

When the query is space delimited to two terms, I see each term analyzed separately, per the
following debugQuery output:
クイック is treated in one section:

title_ja:クイック^1.2 | primary_header_ja:クイック^1.2 | file_name:クイック^1.2
| meta_description_ja:クイック^0.5 | secondary_header_ja:クイック^0.5 | body_ja:クイック^0.5
| inlink_text_ja:クイック^1.2)~0.17

リファレンス is treated in one section:

title_ja:リファレンス^1.2 | primary_header_ja:リファレンス^1.2 | file_name:リファレンス^1.2
| meta_description_ja:リファレンス^0.5 | secondary_header_ja:リファレンス^0.5
| body_ja:リファレンス^0.5 | inlink_text_ja:リファレンス^1.2)~0.17

Without space (expected result):

When the query is one term I see that Solr analyzes it once and Japanese tokenizer does tokenize
it to two terms:
(title_ja:クイック title_ja:リファレンス)

Given that クイック and リファレンス do not appear together in any of the fields
listed in the query filter,
body_en^0.5 title_en^1.2 url_path^1.2 file_name^1.2 primary_header_en^1.2 secondary_header_en^0.5
meta_description_en^0.5 inlink_text_en^1.2 body_ja^0.5 title_ja^1.2 primary_header_ja^1.2
secondary_header_ja^0.5 meta_description_ja^0.5 inlink_text_ja^1.2

and I have specified mm=100

nothing will be matched. i.e. (title_ja:クイック title_ja:リファレンス)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org