You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Tiffany Goguen (JIRA)" <ji...@apache.org> on 2016/03/28 22:00:26 UTC
[jira] [Created] (SOLR-8915) Issue with CJK and mm being ignored
when searching with white space
Tiffany Goguen created SOLR-8915:
------------------------------------
Summary: Issue with CJK and mm being ignored when searching with white space
Key: SOLR-8915
URL: https://issues.apache.org/jira/browse/SOLR-8915
Project: Solr
Issue Type: Bug
Affects Versions: 5.5
Reporter: Tiffany Goguen
Priority: Minor
I am using edismax and I have set mm=100
I have the following in the request
handler:
<str name="defType">edismax</str>
<str name="mm">100</str>
I am not using q.op or <solrQueryParser
> defaultOperator="AND"/>
My search terms are クイックリファレンス
Term 1 - クイック
Term 2- リファレンス
If I search forクイックリファレンス (no spaces) I get no results. This expected.
If I search for クイック リファレンス (space between ク リ) I get 1 result. This
is bad. I am expecting mm=100 to still apply.
If I search for クイックOR リファレンス I get 1 result. This expected. The OR
is overriding the mm=100.
If I search for クイック AND リファレンス I get 1 result. This is bad. I am expecting
mm=100 to still apply.
In CJK searches spaces should not matter. In the Analysis tool I can see the correct tokens
being generated. The parser is doing different things based on space or no space in the query.
With space (not expected result):
When the query is space delimited to two terms, I see each term analyzed separately, per the
following debugQuery output:
クイック is treated in one section:
title_ja:クイック^1.2 | primary_header_ja:クイック^1.2 | file_name:クイック^1.2
| meta_description_ja:クイック^0.5 | secondary_header_ja:クイック^0.5 | body_ja:クイック^0.5
| inlink_text_ja:クイック^1.2)~0.17
リファレンス is treated in one section:
title_ja:リファレンス^1.2 | primary_header_ja:リファレンス^1.2 | file_name:リファレンス^1.2
| meta_description_ja:リファレンス^0.5 | secondary_header_ja:リファレンス^0.5
| body_ja:リファレンス^0.5 | inlink_text_ja:リファレンス^1.2)~0.17
Without space (expected result):
When the query is one term I see that Solr analyzes it once and Japanese tokenizer does tokenize
it to two terms:
(title_ja:クイック title_ja:リファレンス)
Given that クイック and リファレンス do not appear together in any of the fields
listed in the query filter,
body_en^0.5 title_en^1.2 url_path^1.2 file_name^1.2 primary_header_en^1.2 secondary_header_en^0.5
meta_description_en^0.5 inlink_text_en^1.2 body_ja^0.5 title_ja^1.2 primary_header_ja^1.2
secondary_header_ja^0.5 meta_description_ja^0.5 inlink_text_ja^1.2
and I have specified mm=100
nothing will be matched. i.e. (title_ja:クイック title_ja:リファレンス)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org