You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by 李威 <li...@antvision.cn> on 2013/03/12 02:15:50 UTC
It seems a issue of deal with chinese synonym for solr
in org.apache.solr.parser.SolrQueryParserBase, there is a function: "protected Query newFieldQuery(Analyzer analyzer, String field, String queryText, boolean quoted) throws SyntaxError"
The below code can't process chinese rightly.
" BooleanClause.Occur occur = positionCount > 1 && operator == AND_OPERATOR ?
BooleanClause.Occur.MUST : BooleanClause.Occur.SHOULD;
"
For example, “北京市" and “北京" are synonym, if I seach "北京市动物园", the expected parse result is "+(北京市 北京) +动物园", but actually it would be parsed to "+北京市 +北京 +动物园".
The code can process English, because English word is seperate by space, and only one position.
In order to process Chinese, I think it can charge by position increment, but not by position count.
Could you help take a look?
Thanks,
Wei Li
Re: It seems a issue of deal with chinese synonym for solr
Posted by Kuro Kurosaka <ku...@sonic.net>.
On 3/11/13 6:15 PM, 李威 wrote:
> in org.apache.solr.parser.SolrQueryParserBase, there is a function: "protected Query newFieldQuery(Analyzer analyzer, String field, String queryText, boolean quoted) throws SyntaxError"
>
> The below code can't process chinese rightly.
>
> " BooleanClause.Occur occur = positionCount > 1 && operator == AND_OPERATOR ?
> BooleanClause.Occur.MUST : BooleanClause.Occur.SHOULD;
>
> "
>
> For example, “北京市" and “北京" are synonym, if I seach "北京市动物园", the expected parse result is "+(北京市 北京) +动物园", but actually it would be parsed to "+北京市 +北京 +动物园".
>
> The code can process English, because English word is seperate by space, and only one position.
An interesting feature of this example is that difference between the two
synonyms is
omission of one token "市" (city). Doesn't the same same problem happen if we
define
"London City" and "London" as synonyms, and execute a query like "London City Zoo"?
Must Chinese Analyzer be used to reproduce this problem?
I tried to test this but I couldn't. The result of query string expansion using
Solr 4.2's
query interface with debug output shows:
<str name="parsedquery">MultiPhraseQuery(text:"(london london) city zoo")</str>
I see no plus (+). What query parser did you use?
--
Kuro Kurosaka
Re: It seems a issue of deal with chinese synonym for solr
Posted by Robert Muir <rc...@gmail.com>.
I agree. Actually that top-level logic is fine. its the loop that
follows thats wrong: it needs to look at position increment and do the
right thing.
Want to open a JIRA issue?
On Mon, Mar 11, 2013 at 9:15 PM, 李威 <li...@antvision.cn> wrote:
> in org.apache.solr.parser.SolrQueryParserBase, there is a function: "protected Query newFieldQuery(Analyzer analyzer, String field, String queryText, boolean quoted) throws SyntaxError"
>
> The below code can't process chinese rightly.
>
> " BooleanClause.Occur occur = positionCount > 1 && operator == AND_OPERATOR ?
> BooleanClause.Occur.MUST : BooleanClause.Occur.SHOULD;
>
> "
>
> For example, “北京市" and “北京" are synonym, if I seach "北京市动物园", the expected parse result is "+(北京市 北京) +动物园", but actually it would be parsed to "+北京市 +北京 +动物园".
>
> The code can process English, because English word is seperate by space, and only one position.
>
> In order to process Chinese, I think it can charge by position increment, but not by position count.
>
> Could you help take a look?
>
>
>
>
> Thanks,
>
> Wei Li