You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jeroen Lauwers <Je...@CTLO.NET> on 2020/01/09 15:17:37 UTC

How to query for 'any word' in a phrase

Dear all,

Is there a way to construct (spans?) a phrase search like the following:
the quick brown * jumps over the * *
where * = any word but exactly 1 word

I introduced these *’s at a specific position, so a PhraseQuery with slop of 2 is just not good enough
and the two *’s at the end must be matched as well.

Is there such a thing as a Term or BytesRef that always matches everything?

Thanks,
Jeroen

回复: 回复:How to query for 'any word' in a phrase

Posted by 陈志祥 <zh...@alibaba-inc.com>.
i guess when you use * to mask a word,that is slop +1,continuous words means slop 0。PhaseQuery can only set a slop which is the max skip words count between terms,so that is a static config,not “dynamically set”







  	
 陈志祥 
阿里巴巴 地图引擎核心算法工程师 
 电话:057128223456-81124100 
 邮箱:zhixiang.czx@alibaba-inc.com 
 地址:上海-长宁-申通信息广场 
	     
	   		 阿里巴巴  	 企业主页  		      
 信息安全声明:本邮件包含信息归发件人所在组织所有,发件人所在组织对该邮件拥有所有权利。
请接收者注意保密,未经发件人书面许可,不得向任何第三方组织和个人透露本邮件所含信息的全部或部分。以上声明仅适用于工作邮件。
Information Security Notice: The information contained in this mail is solely property of the sender's organization. 
This mail communication is confidential. Recipients named above are obligated to maintain secrecy and are not permitted to disclose the contents of this communication to others.  ------------------------------------------------------------------
发件人:Jeroen Lauwers<Je...@CTLO.NET>
日 期:2020年01月09日 23:41:37
收件人:java-user@lucene.apache.org<ja...@lucene.apache.org>
主 题:RE: 回复:How to query for 'any word' in a phrase

I don’t understand your question:

In general: can it be set? Yes, : PhraseQuery<https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/search/PhraseQuery.html#PhraseQuery-int-java.lang.String-org.apache.lucene.util.BytesRef...->(int slop, String<https://docs.oracle.com/javase/8/docs/api/java/lang/String.html?is-external=true> field, BytesRef<https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/util/BytesRef.html>... terms)
In my specific case: also Yes. I’m parsing the query myself in a custom parser, so yes I can do it

As far as I understand, the slop is not specific to a position
Please explain how this could help.

Jeroen

From: 陈志祥 <zh...@alibaba-inc.com>
Sent: donderdag 9 januari 2020 16:31
To: java-user@lucene.apache.org
Subject: 回复:How to query for 'any word' in a phrase

could the slop parameter in phasequery be dynamically set?

------------------------------------------------------------------
发件人:Jeroen Lauwers<Je...@CTLO.NET>>
日 期:2020年01月09日 23:17:37
收件人:java-user@lucene.apache.org<ja...@lucene.apache.org>>
主 题:How to query for 'any word' in a phrase

Dear all,

Is there a way to construct (spans?) a phrase search like the following:
the quick brown * jumps over the * *
where * = any word but exactly 1 word

I introduced these *’s at a specific position, so a PhraseQuery with slop of 2 is just not good enough
and the two *’s at the end must be matched as well.

Is there such a thing as a Term or BytesRef that always matches everything?

Thanks,
Jeroen


Re: How to query for 'any word' in a phrase

Posted by Tomoko Uchida <to...@gmail.com>.
Hi,
did you try or consider SpanNearQuery?
You might need to insert some kind of spetial token (e.g., <EOS>) to the
end of the text field to match the "end of the sentence" anyways.

2020年1月10日(金) 1:30 陈志祥 <zh...@alibaba-inc.com>:

> To be more clear,i think you need build a custom PhraseQuery class,which
> can set each slop value between sub terms,also you need a special
> WildchardTerm matching any term which is only used in this custom
> PhraseQuery context……
>
> Or just use grep tool or regex automata to scan?
>
>
>
>
>
> 陈志祥
> 阿里巴巴 地图引擎核心算法工程师
> 电话:057128223456-81124100
> 邮箱:zhixiang.czx@alibaba-inc.com
> 地址:上海-长宁-申通信息广场
>
> <https://tms.dingtalk.com/markets/dingtalk/person-view-v2?token=1B6294454CD1D4499FF5DBCBBB2150CB765636FFF84AD096D62C7A74B9DD20DD7E289FE886C65C3A037689E72B9EF3FC>
>
> <https://h5.dingtalk.com/home/index.html?corpId=dingd8e1123006514592&token=dd9393e11685028a443f58f91cb00b2a&from=emailSign> 阿里巴巴
> 企业主页
> <https://h5.dingtalk.com/home/index.html?corpId=dingd8e1123006514592&token=dd9393e11685028a443f58f91cb00b2a&from=emailSign>
> <https://h5.dingtalk.com/home/index.html?corpId=dingd8e1123006514592&token=dd9393e11685028a443f58f91cb00b2a&from=emailSign>
> 信息安全声明:本邮件包含信息归发件人所在组织所有,发件人所在组织对该邮件拥有所有权利。
> 请接收者注意保密,未经发件人书面许可,不得向任何第三方组织和个人透露本邮件所含信息的全部或部分。以上声明仅适用于工作邮件。
> Information Security Notice: The information contained in this mail is
> solely property of the sender's organization.
> This mail communication is confidential. Recipients named above are
> obligated to maintain secrecy and are not permitted to disclose the
> contents of this communication to others.
>
> ------------------------------------------------------------------
> 发件人:Jeroen Lauwers<Je...@CTLO.NET>
> 日 期:2020年01月09日 23:41:37
> 收件人:java-user@lucene.apache.org<ja...@lucene.apache.org>
> 主 题:RE: 回复:How to query for 'any word' in a phrase
>
> I don’t understand your question:
>
> In general: can it be set? Yes, : PhraseQuery<
> https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/search/PhraseQuery.html#PhraseQuery-int-java.lang.String-org.apache.lucene.util.BytesRef...-
> >(int slop, String<
> https://docs.oracle.com/javase/8/docs/api/java/lang/String.html?is-external=true
> > field, BytesRef<
> https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/util/BytesRef.html
> >... terms)
>
> In my specific case: also Yes. I’m parsing the query myself in a custom parser, so yes I can do it
>
> As far as I understand, the slop is not specific to a position
> Please explain how this could help.
>
> Jeroen
>
> From: 陈志祥 <zh...@alibaba-inc.com>
> Sent: donderdag 9 januari 2020 16:31
> To: java-user@lucene.apache.org
> Subject: 回复:How to query for 'any word' in a phrase
>
> could the slop parameter in phasequery be dynamically set?
>
> ------------------------------------------------------------------
> 发件人:Jeroen Lauwers<Jeroen.Lauwers@CTLO.NET<mailto:Jeroen.Lauwers@CTLO.NET
> >>
> 日 期:2020年01月09日 23:17:37
> 收件人:java-user@lucene.apache.org<java-user@lucene.apache.org<mailto:
> java-user@lucene.apache.org%3cjava-user@lucene.apache.org>>
> 主 题:How to query for 'any word' in a phrase
>
> Dear all,
>
> Is there a way to construct (spans?) a phrase search like the following:
> the quick brown * jumps over the * *
> where * = any word but exactly 1 word
>
>
> I introduced these *’s at a specific position, so a PhraseQuery with slop of 2 is just not good enough
> and the two *’s at the end must be matched as well.
>
> Is there such a thing as a Term or BytesRef that always matches everything?
>
> Thanks,
> Jeroen
>
>

回复: 回复:How to query for 'any word' in a phrase

Posted by 陈志祥 <zh...@alibaba-inc.com>.
To be more clear,i think you need build a custom PhraseQuery class,which can set each slop value between sub terms,also you need a special WildchardTerm matching any term which is only used in this custom PhraseQuery context……

Or just use grep tool or regex automata to scan?







  	
 陈志祥 
阿里巴巴 地图引擎核心算法工程师 
 电话:057128223456-81124100 
 邮箱:zhixiang.czx@alibaba-inc.com 
 地址:上海-长宁-申通信息广场 
	     
	   		 阿里巴巴  	 企业主页  		      
 信息安全声明:本邮件包含信息归发件人所在组织所有,发件人所在组织对该邮件拥有所有权利。
请接收者注意保密,未经发件人书面许可,不得向任何第三方组织和个人透露本邮件所含信息的全部或部分。以上声明仅适用于工作邮件。
Information Security Notice: The information contained in this mail is solely property of the sender's organization. 
This mail communication is confidential. Recipients named above are obligated to maintain secrecy and are not permitted to disclose the contents of this communication to others.  ------------------------------------------------------------------
发件人:Jeroen Lauwers<Je...@CTLO.NET>
日 期:2020年01月09日 23:41:37
收件人:java-user@lucene.apache.org<ja...@lucene.apache.org>
主 题:RE: 回复:How to query for 'any word' in a phrase

I don’t understand your question:

In general: can it be set? Yes, : PhraseQuery<https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/search/PhraseQuery.html#PhraseQuery-int-java.lang.String-org.apache.lucene.util.BytesRef...->(int slop, String<https://docs.oracle.com/javase/8/docs/api/java/lang/String.html?is-external=true> field, BytesRef<https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/util/BytesRef.html>... terms)
In my specific case: also Yes. I’m parsing the query myself in a custom parser, so yes I can do it

As far as I understand, the slop is not specific to a position
Please explain how this could help.

Jeroen

From: 陈志祥 <zh...@alibaba-inc.com>
Sent: donderdag 9 januari 2020 16:31
To: java-user@lucene.apache.org
Subject: 回复:How to query for 'any word' in a phrase

could the slop parameter in phasequery be dynamically set?

------------------------------------------------------------------
发件人:Jeroen Lauwers<Je...@CTLO.NET>>
日 期:2020年01月09日 23:17:37
收件人:java-user@lucene.apache.org<ja...@lucene.apache.org>>
主 题:How to query for 'any word' in a phrase

Dear all,

Is there a way to construct (spans?) a phrase search like the following:
the quick brown * jumps over the * *
where * = any word but exactly 1 word

I introduced these *’s at a specific position, so a PhraseQuery with slop of 2 is just not good enough
and the two *’s at the end must be matched as well.

Is there such a thing as a Term or BytesRef that always matches everything?

Thanks,
Jeroen


RE: 回复:How to query for 'any word' in a phrase

Posted by Jeroen Lauwers <Je...@CTLO.NET>.
I don’t understand your question:

In general: can it be set? Yes, : PhraseQuery<https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/search/PhraseQuery.html#PhraseQuery-int-java.lang.String-org.apache.lucene.util.BytesRef...->(int slop, String<https://docs.oracle.com/javase/8/docs/api/java/lang/String.html?is-external=true> field, BytesRef<https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/util/BytesRef.html>... terms)
In my specific case: also Yes. I’m parsing the query myself in a custom parser, so yes I can do it

As far as I understand, the slop is not specific to a position
Please explain how this could help.

Jeroen

From: 陈志祥 <zh...@alibaba-inc.com>
Sent: donderdag 9 januari 2020 16:31
To: java-user@lucene.apache.org
Subject: 回复:How to query for 'any word' in a phrase

could the slop parameter in phasequery be dynamically set?

------------------------------------------------------------------
发件人:Jeroen Lauwers<Je...@CTLO.NET>>
日 期:2020年01月09日 23:17:37
收件人:java-user@lucene.apache.org<ja...@lucene.apache.org>>
主 题:How to query for 'any word' in a phrase

Dear all,

Is there a way to construct (spans?) a phrase search like the following:
the quick brown * jumps over the * *
where * = any word but exactly 1 word

I introduced these *’s at a specific position, so a PhraseQuery with slop of 2 is just not good enough
and the two *’s at the end must be matched as well.

Is there such a thing as a Term or BytesRef that always matches everything?

Thanks,
Jeroen

回复:How to query for 'any word' in a phrase

Posted by 陈志祥 <zh...@alibaba-inc.com>.
could the slop parameter in phasequery be dynamically set?







  	
 陈志祥 
阿里巴巴 地图引擎核心算法工程师 
 电话:057128223456-81124100 
 邮箱:zhixiang.czx@alibaba-inc.com 
 地址:上海-长宁-申通信息广场 
	     
	   		 阿里巴巴  	 企业主页  		      
 信息安全声明:本邮件包含信息归发件人所在组织所有,发件人所在组织对该邮件拥有所有权利。
请接收者注意保密,未经发件人书面许可,不得向任何第三方组织和个人透露本邮件所含信息的全部或部分。以上声明仅适用于工作邮件。
Information Security Notice: The information contained in this mail is solely property of the sender's organization. 
This mail communication is confidential. Recipients named above are obligated to maintain secrecy and are not permitted to disclose the contents of this communication to others.  ------------------------------------------------------------------
发件人:Jeroen Lauwers<Je...@CTLO.NET>
日 期:2020年01月09日 23:17:37
收件人:java-user@lucene.apache.org<ja...@lucene.apache.org>
主 题:How to query for 'any word' in a phrase

Dear all,

Is there a way to construct (spans?) a phrase search like the following:
the quick brown * jumps over the * *
where * = any word but exactly 1 word

I introduced these *’s at a specific position, so a PhraseQuery with slop of 2 is just not good enough
and the two *’s at the end must be matched as well.

Is there such a thing as a Term or BytesRef that always matches everything?

Thanks,
Jeroen