You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by "Kohlhepp, Justin W ()" <Ju...@thehartford.com> on 2012/07/12 15:15:14 UTC

Expected behavior of phrase search with wildcard

I have an index of about 30M records.  One of the fields contains
company names.  I am using an out-of-the-box QueryParser to create
queries.  My users want to do a search that will match both STATE
INDUSTRY and STATE INDUSTRIES as a company name.  So they are trying the
search "state industr*" or "state industr"*.  The first search returns
nothing.  The next search returns every document in the index, even
though most of them don't have STATE or INDUSTRY anywhere in them.  This
seems to be true regardless of which phrase is used.

 

Two questions:

*         What is the proper syntax for a QueryParser to do a wildcard
query combined with a phrase?

*         Why does the syntax "my phrase"* return every document?

 

Thanks,


~ Justin

************************************************************
This communication, including attachments, is for the exclusive use of addressee and may contain proprietary, confidential and/or privileged information.  If you are not the intended recipient, any use, copying, disclosure, dissemination or distribution is strictly prohibited.  If you are not the intended recipient, please notify the sender immediately by return e-mail, delete this communication and destroy all copies.
************************************************************

RE: Expected behavior of phrase search with wildcard

Posted by Franklin Simmons <fs...@sccmediaserver.com>.
Using out of the box Lucene, yes.  
 

-----Original Message-----
From: Kohlhepp, Justin W () [mailto:Justin.Kohlhepp@thehartford.com] 
Sent: Thursday, July 12, 2012 12:39 PM
To: lucene-net-user@lucene.apache.org
Subject: RE: Expected behavior of phrase search with wildcard

Thanks for the information Franklin.  Just to be sure I understood
correctly: there is no syntax I can send at a QueryParser that will match a phrase with a wildcard at the end?



-----Original Message-----
From: Franklin Simmons [mailto:fsimmons@sccmediaserver.com]
Sent: Thursday, July 12, 2012 12:36 PM
To: lucene-net-user@lucene.apache.org
Subject: RE: Expected behavior of phrase search with wildcard

As is plainly stated in the plethora of "Lucene Query Syntax" pages on the web, Lucene does not support wildcard terms in phrase queries.

As to "my phrase"* returning every document, I can only guess you've set
QueryParser.SetAllowLeadingWildcard(true) to cause "my phrase"* to parse to default_field:"my phrase" default_field:*.



 
-----Original Message-----
From: Kohlhepp, Justin W () [mailto:Justin.Kohlhepp@thehartford.com]
Sent: Thursday, July 12, 2012 10:52 AM
To: lucene-net-user@lucene.apache.org
Subject: RE: Expected behavior of phrase search with wildcard

Yes.  If you read my original email, I had already tried that.  It returns zero records.

-----Original Message-----
From: Simon Svensson [mailto:sisve@devhost.se]
Sent: Thursday, July 12, 2012 10:45 AM
To: lucene-net-user@lucene.apache.org
Subject: Re: Expected behavior of phrase search with wildcard

Hi,

Have you tried using "state industr*", i.e. having the wildcard within the quotes?

// Simon

On 2012-07-12 15:15, Kohlhepp, Justin W () wrote:
> I have an index of about 30M records.  One of the fields contains 
> company names.  I am using an out-of-the-box QueryParser to create 
> queries.  My users want to do a search that will match both STATE 
> INDUSTRY and STATE INDUSTRIES as a company name.  So they are trying 
> the search "state industr*" or "state industr"*.  The first search 
> returns nothing.  The next search returns every document in the index,

> even though most of them don't have STATE or INDUSTRY anywhere in 
> them.  This seems to be true regardless of which phrase is used.
>
>   
>
> Two questions:
>
> *         What is the proper syntax for a QueryParser to do a wildcard
> query combined with a phrase?
>
> *         Why does the syntax "my phrase"* return every document?
>
>   
>
> Thanks,
>
>
> ~ Justin
>
> ************************************************************
> This communication, including attachments, is for the exclusive use of
addressee and may contain proprietary, confidential and/or privileged information.  If you are not the intended recipient, any use, copying, disclosure, dissemination or distribution is strictly prohibited.  If you are not the intended recipient, please notify the sender immediately by return e-mail, delete this communication and destroy all copies.
> ************************************************************
>


************************************************************
This communication, including attachments, is for the exclusive use of addressee and may contain proprietary, confidential and/or privileged information.  If you are not the intended recipient, any use, copying, disclosure, dissemination or distribution is strictly prohibited.  If you are not the intended recipient, please notify the sender immediately by return e-mail, delete this communication and destroy all copies.
************************************************************



************************************************************
This communication, including attachments, is for the exclusive use of addressee and may contain proprietary, confidential and/or privileged information.  If you are not the intended recipient, any use, copying, disclosure, dissemination or distribution is strictly prohibited.  If you are not the intended recipient, please notify the sender immediately by return e-mail, delete this communication and destroy all copies.
************************************************************




RE: Expected behavior of phrase search with wildcard

Posted by "Kohlhepp, Justin W ()" <Ju...@thehartford.com>.
Thanks for the information Franklin.  Just to be sure I understood
correctly: there is no syntax I can send at a QueryParser that will
match a phrase with a wildcard at the end?



-----Original Message-----
From: Franklin Simmons [mailto:fsimmons@sccmediaserver.com] 
Sent: Thursday, July 12, 2012 12:36 PM
To: lucene-net-user@lucene.apache.org
Subject: RE: Expected behavior of phrase search with wildcard

As is plainly stated in the plethora of "Lucene Query Syntax" pages on
the web, Lucene does not support wildcard terms in phrase queries.

As to "my phrase"* returning every document, I can only guess you've set
QueryParser.SetAllowLeadingWildcard(true) to cause "my phrase"* to parse
to default_field:"my phrase" default_field:*.



 
-----Original Message-----
From: Kohlhepp, Justin W () [mailto:Justin.Kohlhepp@thehartford.com]
Sent: Thursday, July 12, 2012 10:52 AM
To: lucene-net-user@lucene.apache.org
Subject: RE: Expected behavior of phrase search with wildcard

Yes.  If you read my original email, I had already tried that.  It
returns zero records.

-----Original Message-----
From: Simon Svensson [mailto:sisve@devhost.se]
Sent: Thursday, July 12, 2012 10:45 AM
To: lucene-net-user@lucene.apache.org
Subject: Re: Expected behavior of phrase search with wildcard

Hi,

Have you tried using "state industr*", i.e. having the wildcard within
the quotes?

// Simon

On 2012-07-12 15:15, Kohlhepp, Justin W () wrote:
> I have an index of about 30M records.  One of the fields contains 
> company names.  I am using an out-of-the-box QueryParser to create 
> queries.  My users want to do a search that will match both STATE 
> INDUSTRY and STATE INDUSTRIES as a company name.  So they are trying 
> the search "state industr*" or "state industr"*.  The first search 
> returns nothing.  The next search returns every document in the index,

> even though most of them don't have STATE or INDUSTRY anywhere in 
> them.  This seems to be true regardless of which phrase is used.
>
>   
>
> Two questions:
>
> *         What is the proper syntax for a QueryParser to do a wildcard
> query combined with a phrase?
>
> *         Why does the syntax "my phrase"* return every document?
>
>   
>
> Thanks,
>
>
> ~ Justin
>
> ************************************************************
> This communication, including attachments, is for the exclusive use of
addressee and may contain proprietary, confidential and/or privileged
information.  If you are not the intended recipient, any use, copying,
disclosure, dissemination or distribution is strictly prohibited.  If
you are not the intended recipient, please notify the sender immediately
by return e-mail, delete this communication and destroy all copies.
> ************************************************************
>


************************************************************
This communication, including attachments, is for the exclusive use of
addressee and may contain proprietary, confidential and/or privileged
information.  If you are not the intended recipient, any use, copying,
disclosure, dissemination or distribution is strictly prohibited.  If
you are not the intended recipient, please notify the sender immediately
by return e-mail, delete this communication and destroy all copies.
************************************************************



************************************************************
This communication, including attachments, is for the exclusive use of addressee and may contain proprietary, confidential and/or privileged information.  If you are not the intended recipient, any use, copying, disclosure, dissemination or distribution is strictly prohibited.  If you are not the intended recipient, please notify the sender immediately by return e-mail, delete this communication and destroy all copies.
************************************************************


RE: Expected behavior of phrase search with wildcard

Posted by Franklin Simmons <fs...@sccmediaserver.com>.
As is plainly stated in the plethora of "Lucene Query Syntax" pages on the web, Lucene does not support wildcard terms in phrase queries.

As to "my phrase"* returning every document, I can only guess you've set QueryParser.SetAllowLeadingWildcard(true) to cause "my phrase"* to parse to default_field:"my phrase" default_field:*.



 
-----Original Message-----
From: Kohlhepp, Justin W () [mailto:Justin.Kohlhepp@thehartford.com] 
Sent: Thursday, July 12, 2012 10:52 AM
To: lucene-net-user@lucene.apache.org
Subject: RE: Expected behavior of phrase search with wildcard

Yes.  If you read my original email, I had already tried that.  It returns zero records.

-----Original Message-----
From: Simon Svensson [mailto:sisve@devhost.se]
Sent: Thursday, July 12, 2012 10:45 AM
To: lucene-net-user@lucene.apache.org
Subject: Re: Expected behavior of phrase search with wildcard

Hi,

Have you tried using "state industr*", i.e. having the wildcard within the quotes?

// Simon

On 2012-07-12 15:15, Kohlhepp, Justin W () wrote:
> I have an index of about 30M records.  One of the fields contains 
> company names.  I am using an out-of-the-box QueryParser to create 
> queries.  My users want to do a search that will match both STATE 
> INDUSTRY and STATE INDUSTRIES as a company name.  So they are trying 
> the search "state industr*" or "state industr"*.  The first search 
> returns nothing.  The next search returns every document in the index,

> even though most of them don't have STATE or INDUSTRY anywhere in 
> them.  This seems to be true regardless of which phrase is used.
>
>   
>
> Two questions:
>
> *         What is the proper syntax for a QueryParser to do a wildcard
> query combined with a phrase?
>
> *         Why does the syntax "my phrase"* return every document?
>
>   
>
> Thanks,
>
>
> ~ Justin
>
> ************************************************************
> This communication, including attachments, is for the exclusive use of
addressee and may contain proprietary, confidential and/or privileged information.  If you are not the intended recipient, any use, copying, disclosure, dissemination or distribution is strictly prohibited.  If you are not the intended recipient, please notify the sender immediately by return e-mail, delete this communication and destroy all copies.
> ************************************************************
>


************************************************************
This communication, including attachments, is for the exclusive use of addressee and may contain proprietary, confidential and/or privileged information.  If you are not the intended recipient, any use, copying, disclosure, dissemination or distribution is strictly prohibited.  If you are not the intended recipient, please notify the sender immediately by return e-mail, delete this communication and destroy all copies.
************************************************************




RE: Expected behavior of phrase search with wildcard

Posted by "Kohlhepp, Justin W ()" <Ju...@thehartford.com>.
Sorry.  I probably should have included that information.  The field is
analyzed using a StandardAnalyzer, as is the QueryParser.

~ Justin


-----Original Message-----
From: Allan, Brad (Wokingham) [mailto:Brad.Allan@Fiserv.com] 
Sent: Thursday, July 12, 2012 10:58 AM
To: lucene-net-user@lucene.apache.org
Subject: RE: Expected behavior of phrase search with wildcard

I'm a little new to Lucene so forgive me if I'm talking nonsense.

Is the Index field option NOT_ANALYSED?

Then using the KeywordAnalyser on the parser "state industr*" might just
work. Of course the number of spaces would have to match...

Maybe something like that?



-----Original Message-----
From: Kohlhepp, Justin W () [mailto:Justin.Kohlhepp@thehartford.com]
Sent: 12 July 2012 15:52
To: lucene-net-user@lucene.apache.org
Subject: RE: Expected behavior of phrase search with wildcard

Yes.  If you read my original email, I had already tried that.  It
returns zero records.

-----Original Message-----
From: Simon Svensson [mailto:sisve@devhost.se]
Sent: Thursday, July 12, 2012 10:45 AM
To: lucene-net-user@lucene.apache.org
Subject: Re: Expected behavior of phrase search with wildcard

Hi,

Have you tried using "state industr*", i.e. having the wildcard within
the quotes?

// Simon

On 2012-07-12 15:15, Kohlhepp, Justin W () wrote:
> I have an index of about 30M records.  One of the fields contains 
> company names.  I am using an out-of-the-box QueryParser to create 
> queries.  My users want to do a search that will match both STATE 
> INDUSTRY and STATE INDUSTRIES as a company name.  So they are trying 
> the search "state industr*" or "state industr"*.  The first search 
> returns nothing.  The next search returns every document in the index,

> even though most of them don't have STATE or INDUSTRY anywhere in 
> them.  This seems to be true regardless of which phrase is used.
>
>
>
> Two questions:
>
> *         What is the proper syntax for a QueryParser to do a wildcard
> query combined with a phrase?
>
> *         Why does the syntax "my phrase"* return every document?
>
>
>
> Thanks,
>
>
> ~ Justin
>
> ************************************************************
> This communication, including attachments, is for the exclusive use of
addressee and may contain proprietary, confidential and/or privileged
information.  If you are not the intended recipient, any use, copying,
disclosure, dissemination or distribution is strictly prohibited.  If
you are not the intended recipient, please notify the sender immediately
by return e-mail, delete this communication and destroy all copies.
> ************************************************************
>


************************************************************
This communication, including attachments, is for the exclusive use of
addressee and may contain proprietary, confidential and/or privileged
information.  If you are not the intended recipient, any use, copying,
disclosure, dissemination or distribution is strictly prohibited.  If
you are not the intended recipient, please notify the sender immediately
by return e-mail, delete this communication and destroy all copies.
************************************************************


CheckFree Solutions Limited (trading as Fiserv) Registered Office:
Eversheds House, 70 Great Bridgewater Street, Manchester, M15 ES
Registered in England: No. 2694333
************************************************************
This communication, including attachments, is for the exclusive use of addressee and may contain proprietary, confidential and/or privileged information.  If you are not the intended recipient, any use, copying, disclosure, dissemination or distribution is strictly prohibited.  If you are not the intended recipient, please notify the sender immediately by return e-mail, delete this communication and destroy all copies.
************************************************************


RE: Expected behavior of phrase search with wildcard

Posted by "Allan, Brad (Wokingham)" <Br...@Fiserv.com>.
I'm a little new to Lucene so forgive me if I'm talking nonsense.

Is the Index field option NOT_ANALYSED?

Then using the KeywordAnalyser on the parser "state industr*" might just work. Of course the number of spaces would have to match...

Maybe something like that?



-----Original Message-----
From: Kohlhepp, Justin W () [mailto:Justin.Kohlhepp@thehartford.com]
Sent: 12 July 2012 15:52
To: lucene-net-user@lucene.apache.org
Subject: RE: Expected behavior of phrase search with wildcard

Yes.  If you read my original email, I had already tried that.  It
returns zero records.

-----Original Message-----
From: Simon Svensson [mailto:sisve@devhost.se]
Sent: Thursday, July 12, 2012 10:45 AM
To: lucene-net-user@lucene.apache.org
Subject: Re: Expected behavior of phrase search with wildcard

Hi,

Have you tried using "state industr*", i.e. having the wildcard within
the quotes?

// Simon

On 2012-07-12 15:15, Kohlhepp, Justin W () wrote:
> I have an index of about 30M records.  One of the fields contains
> company names.  I am using an out-of-the-box QueryParser to create
> queries.  My users want to do a search that will match both STATE
> INDUSTRY and STATE INDUSTRIES as a company name.  So they are trying
> the search "state industr*" or "state industr"*.  The first search
> returns nothing.  The next search returns every document in the index,

> even though most of them don't have STATE or INDUSTRY anywhere in
> them.  This seems to be true regardless of which phrase is used.
>
>
>
> Two questions:
>
> *         What is the proper syntax for a QueryParser to do a wildcard
> query combined with a phrase?
>
> *         Why does the syntax "my phrase"* return every document?
>
>
>
> Thanks,
>
>
> ~ Justin
>
> ************************************************************
> This communication, including attachments, is for the exclusive use of
addressee and may contain proprietary, confidential and/or privileged
information.  If you are not the intended recipient, any use, copying,
disclosure, dissemination or distribution is strictly prohibited.  If
you are not the intended recipient, please notify the sender immediately
by return e-mail, delete this communication and destroy all copies.
> ************************************************************
>


************************************************************
This communication, including attachments, is for the exclusive use of addressee and may contain proprietary, confidential and/or privileged information.  If you are not the intended recipient, any use, copying, disclosure, dissemination or distribution is strictly prohibited.  If you are not the intended recipient, please notify the sender immediately by return e-mail, delete this communication and destroy all copies.
************************************************************


CheckFree Solutions Limited (trading as Fiserv)
Registered Office: Eversheds House, 70 Great Bridgewater Street, Manchester, M15 ES
Registered in England: No. 2694333

RE: Expected behavior of phrase search with wildcard

Posted by "Kohlhepp, Justin W ()" <Ju...@thehartford.com>.
Yes.  If you read my original email, I had already tried that.  It
returns zero records.

-----Original Message-----
From: Simon Svensson [mailto:sisve@devhost.se] 
Sent: Thursday, July 12, 2012 10:45 AM
To: lucene-net-user@lucene.apache.org
Subject: Re: Expected behavior of phrase search with wildcard

Hi,

Have you tried using "state industr*", i.e. having the wildcard within
the quotes?

// Simon

On 2012-07-12 15:15, Kohlhepp, Justin W () wrote:
> I have an index of about 30M records.  One of the fields contains 
> company names.  I am using an out-of-the-box QueryParser to create 
> queries.  My users want to do a search that will match both STATE 
> INDUSTRY and STATE INDUSTRIES as a company name.  So they are trying 
> the search "state industr*" or "state industr"*.  The first search 
> returns nothing.  The next search returns every document in the index,

> even though most of them don't have STATE or INDUSTRY anywhere in 
> them.  This seems to be true regardless of which phrase is used.
>
>   
>
> Two questions:
>
> *         What is the proper syntax for a QueryParser to do a wildcard
> query combined with a phrase?
>
> *         Why does the syntax "my phrase"* return every document?
>
>   
>
> Thanks,
>
>
> ~ Justin
>
> ************************************************************
> This communication, including attachments, is for the exclusive use of
addressee and may contain proprietary, confidential and/or privileged
information.  If you are not the intended recipient, any use, copying,
disclosure, dissemination or distribution is strictly prohibited.  If
you are not the intended recipient, please notify the sender immediately
by return e-mail, delete this communication and destroy all copies.
> ************************************************************
>


************************************************************
This communication, including attachments, is for the exclusive use of addressee and may contain proprietary, confidential and/or privileged information.  If you are not the intended recipient, any use, copying, disclosure, dissemination or distribution is strictly prohibited.  If you are not the intended recipient, please notify the sender immediately by return e-mail, delete this communication and destroy all copies.
************************************************************


Re: Expected behavior of phrase search with wildcard

Posted by Simon Svensson <si...@devhost.se>.
Hi,

Have you tried using "state industr*", i.e. having the wildcard within 
the quotes?

// Simon

On 2012-07-12 15:15, Kohlhepp, Justin W () wrote:
> I have an index of about 30M records.  One of the fields contains
> company names.  I am using an out-of-the-box QueryParser to create
> queries.  My users want to do a search that will match both STATE
> INDUSTRY and STATE INDUSTRIES as a company name.  So they are trying the
> search "state industr*" or "state industr"*.  The first search returns
> nothing.  The next search returns every document in the index, even
> though most of them don't have STATE or INDUSTRY anywhere in them.  This
> seems to be true regardless of which phrase is used.
>
>   
>
> Two questions:
>
> *         What is the proper syntax for a QueryParser to do a wildcard
> query combined with a phrase?
>
> *         Why does the syntax "my phrase"* return every document?
>
>   
>
> Thanks,
>
>
> ~ Justin
>
> ************************************************************
> This communication, including attachments, is for the exclusive use of addressee and may contain proprietary, confidential and/or privileged information.  If you are not the intended recipient, any use, copying, disclosure, dissemination or distribution is strictly prohibited.  If you are not the intended recipient, please notify the sender immediately by return e-mail, delete this communication and destroy all copies.
> ************************************************************
>