You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by 陈志祥 <zh...@alibaba-inc.com> on 2020/01/30 00:02:26 UTC

回复：ComplexPhraseQueryParser class question

the standard phrasequery cannot do this, but you can prefilter the invalid term(abcd) out by using MultiTerms api.

Also, I have found that “a b c”~2 phrase query does not really match “a x x b x x c” by its implementation……








  	
 陈志祥 
阿里巴巴 地图引擎核心算法工程师 
 电话：057128223456-81124100 
 邮箱：zhixiang.czx@alibaba-inc.com 
 地址：上海-长宁-申通信息广场 
	     
	   		 阿里巴巴  	 企业主页  		      
 信息安全声明：本邮件包含信息归发件人所在组织所有，发件人所在组织对该邮件拥有所有权利。
请接收者注意保密，未经发件人书面许可，不得向任何第三方组织和个人透露本邮件所含信息的全部或部分。以上声明仅适用于工作邮件。
Information Security Notice: The information contained in this mail is solely property of the sender's organization. 
This mail communication is confidential. Recipients named above are obligated to maintain secrecy and are not permitted to disclose the contents of this communication to others.  ------------------------------------------------------------------
发件人：<ba...@oracle.com>
日　期：2020年01月30日 05:02:50
收件人：java-user@lucene.apache.org<ja...@lucene.apache.org>
抄　送：baris.kazar<ba...@oracle.com>
主　题：ComplexPhraseQueryParser class question

Hi,-

  I hope everyone is doing great.


i have a question regarrding ComplexPhraseQueryParser class.

This class can handle this queryText case very well:


"term1 erm2 abcd term3*"~2

(last term3 has * at the end and the whole phrase has slop value 2)


The term1, term2 and term3 are all in the Lucene index but abcd is not.

In other words there is no "term1 term2 abcd term3" in the Lucene index

but i still would like to find the following in my results:

"term1 term2 term3" despite having abcd term there.

How can i achieve this?


i setInOrder as true setPhraseSlop as 2 for the ComplexPhraseQueryParser.


Best regards

baris



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: ComplexPhraseQueryParser class question

Posted by ba...@oracle.com.

Thanks Zhixiang. Yes, it cant find when there is an unrelated term in the middle that is not indexed.

Similar to what You suggested:
i can try the queryText by excluding one term at a time with the ComplexPhraseQueryParser and see best matches.
 But, i'd rather this is embedded into a Lucene api.

My question is asking also whether ComplexPhraseQueryParser has a way to support partial phrase match capability? 

Elastic Search has this capability with a percentage indication. 

i am surprised Lucene Core does not have this, i hope i am wrong.

Best regards


> On Jan 29, 2020, at 7:02 PM, 陈志祥 <zh...@alibaba-inc.com> wrote:
> 
> the standard phrasequery cannot do this, but you can prefilter the invalid term(abcd) out by using MultiTerms api.
> 
> Also, I have found that “a b c”~2 phrase query does not really match “a x x b x x c” by its implementation……
> 
> 
> 
> 
> 
> 
> 
> 陈志祥
> 阿里巴巴 地图引擎核心算法工程师
> 电话：057128223456-81124100
> 邮箱：zhixiang.czx@alibaba-inc.com
> 地址：上海-长宁-申通信息广场
>  
>  		阿里巴巴	企业主页		 
> 信息安全声明：本邮件包含信息归发件人所在组织所有，发件人所在组织对该邮件拥有所有权利。
> 请接收者注意保密，未经发件人书面许可，不得向任何第三方组织和个人透露本邮件所含信息的全部或部分。以上声明仅适用于工作邮件。
> Information Security Notice: The information contained in this mail is solely property of the sender's organization. 
> This mail communication is confidential. Recipients named above are obligated to maintain secrecy and are not permitted to disclose the contents of this communication to others.
> ------------------------------------------------------------------
> 发件人：<ba...@oracle.com>
> 日　期：2020年01月30日 05:02:50
> 收件人：java-user@lucene.apache.org<ja...@lucene.apache.org>
> 抄　送：baris.kazar<ba...@oracle.com>
> 主　题：ComplexPhraseQueryParser class question
> 
> Hi,-
> 
>   I hope everyone is doing great.
> 
> 
> i have a question regarrding ComplexPhraseQueryParser class.
> 
> This class can handle this queryText case very well:
> 
> 
> "term1 erm2 abcd term3*"~2
> 
> (last term3 has * at the end and the whole phrase has slop value 2)
> 
> 
> The term1, term2 and term3 are all in the Lucene index but abcd is not.
> 
> In other words there is no "term1 term2 abcd term3" in the Lucene index
> 
> but i still would like to find the following in my results:
> 
> "term1 term2 term3" despite having abcd term there.
> 
> How can i achieve this?
> 
> 
> i setInOrder as true setPhraseSlop as 2 for the ComplexPhraseQueryParser.
> 
> 
> Best regards
> 
> baris
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org