You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by ra...@barclays.com on 2013/10/10 17:54:07 UTC

Multiple Keywords - Regular and Any Order Search

Hi,

I have implemented Lucene to search for a single keyword across multiple fields and it works great. I did this by concatenating all the fields into a "contents" field and searching against this field.

When I give multiple keywords against this setup, Lucene by default does an OR search, leading to loads of duplicates. This, I understand is an expected behaviour.


1.       Hence the first thing that I am trying to achieve is search functionality for multiple keywords. The most popular suggestion is to implement PhraseQuery. I will try this out, but please let me know if you can provide an example or any suggestions.



2.       Once the multiple keywords search is implemented, I need to provide another option to the users. They should be able to check a checkbox "Search in any order". If checked, if the same keywords of the phrase are present "in a particular field" BUT in different order, that should still be a match. I don't know how to implement this without forming all permutations of the phrase and then performing an AND search. This could be very expensive in terms of performance. Please let me know if Lucene provides a way to do this.



Examples for Item 2:



3.       Field1: "RAINING HEAVILY TODAY" Field2: "BEAUTIFUL MORNING" Field3: "ABC CORPORATION LIMITED"



Search1: "RAINING HEAVILY TODAY" - Should Match

Search2: "RAINING TODAY HEAVILY" - Should Match

Search3: "RAIN TODAY HEAVILY" - Should NOT Match

Search4: "ABC CORPORATION LIMITED" - Should Match

Search5: "ABC CORP LIMITED" - Should NOT Match

Search6: "ABC LIMITED CORPORATION" - Should Match



I am also not sure if the "contents" field approach will work in this case. Do I need to index the fields separately using "MultiFieldQueryParser" to achieve this?


Sorry for the lengthy question. I would greatly appreciate any suggestions or inputs.

Regards,
Raghu


_______________________________________________

This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer.

For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com.

_______________________________________________

RE: Multiple Keywords - Regular and Any Order Search

Posted by ra...@barclays.com.
Hi,

I found a solution to my earlier question. If I provide search strings in lowercase characters, then it works fine. For eg. if I submit N(abc,corp) instead of N(ABC,CORP) as my search string then it works fine! Although my index strings are all in upper case letters.

Got the hint from this link:
http://lucene.472066.n3.nabble.com/Surround-query-parser-not-working-td4075066.html

I have some more queries regarding the Surround QueryParser. Could any of you please suggest?

1. How to search multiple fields at once?

As shown below, the syntax allows to search against one field. But how do I submit a query like "FIELD1:N(abc,corp) FIELD2:N(xyz,corp)". Is something like this possible with Surround QueryParser?

SrndQuery srndQuery = org.apache.lucene.queryparser.surround.parser.QueryParser.parse(strTxtSearchString);
Query query = srndQuery.makeLuceneQueryField(<FIELD1>, new BasicQueryFactory());

2. How to escape special characters the way we do in the regular QueryParser as queryparser.escape(<string>);

3. How to escape words like "and", "or", "W", "N" etc.? The search string itself might have the words such as "and". In that case, my query would look something like "N(abc,and,sons)" or "W(abc,n,company)".

I get a org.apache.lucene.queryparser.surround.parser.ParseException when I submit such a query.

4. How to provide wild card in the beginning of the words?

The regular QueryParser lets us do parser.setAllowLeadingWildcard(true); Is there some way to do this with the Surround QueryParser?

Any inputs will be very helpful. Thanks!

Regards,
Raghu

-----Original Message-----
From: Rao, Raghavendra: IT (NYK) 
Sent: Saturday, October 12, 2013 6:16 PM
To: java-user@lucene.apache.org
Subject: RE: Multiple Keywords - Regular and Any Order Search

The sysout copy-paste got corrupted for some reason previously. Please see below.

+org.apache.lucene.queryparser.surround.query.SimpleTermRewriteQuery(unused: )(LUC_FLD_ACTVY_DTLS, ABC, org.apache.lucene.queryparser.surround.query.BasicQueryFactory(maxBasicQueries: 1024, queriesMade: 0)) +org.apache.lucene.queryparser.surround.query.SimpleTermRewriteQuery(unused: )(LUC_FLD_ACTVY_DTLS, CORP, org.apache.lucene.queryparser.surround.query.BasicQueryFactory(maxBasicQueries: 1024, queriesMade: 0))


Regards,
Raghu


-----Original Message-----
From: Rao, Raghavendra: IT (NYK) 
Sent: Saturday, October 12, 2013 6:14 PM
To: java-user@lucene.apache.org
Subject: RE: Multiple Keywords - Regular and Any Order Search

I made some progress and prepared the below syntax. But I don't get any results when I search for "ABC CORP" even though there are matching records.

SrndQuery srndQuery = org.apache.lucene.queryparser.surround.parser.QueryParser.parse("(ABC AND CORP)"); Query query = srndQuery.makeLuceneQueryField("LUC_FLD_ACTVY_DTLS", new BasicQueryFactory());
			    
System.out.println("query.toString: " + query.toString());
			    
topDocs = isearcher.search(query, filterBookDate, Integer.MAX_VALUE, sortByBookDate);

The above sysout output is shown below.

+org.apache.lucene.queryparser.surround.query.SimpleTermRewriteQuery(unu
+sed: )(LUC_FLD_ACTVY_DTLS, ABC, 
+org.apache.lucene.queryparser.surround.query.BasicQueryFactory(maxBasic
+Queries: 1024, queriesMade: 0)) 
++org.apache.lucene.queryparser.surround.query.SimpleTermRewriteQuery(un
+used: )(LUC_FLD_ACTVY_DTLS, CORP, 
+org.apache.lucene.queryparser.surround.query.BasicQueryFactory(maxBasic
+Queries: 1024, queriesMade: 0))

Please let me know what I am doing wrong. I don't know how to use the BasicQueryFactory() object. Is that the reason?

Regards,
Raghu


-----Original Message-----
From: Rao, Raghavendra: IT (NYK)
Sent: Saturday, October 12, 2013 5:32 PM
To: java-user@lucene.apache.org
Subject: RE: Multiple Keywords - Regular and Any Order Search

Ian,

Thank you very much for your valuable inputs.

The surround parser sounds very powerful and it just may be the single answer to all what I am looking for. I have been trying hard to find an example for its implementation but haven't been able to find one online. Could you please help?

IndexSearcher isearcher = new IndexSearcher(ireader); SrndQuery srndQuery = QueryParser.parse("<my query>");

topFieldDocs = isearcher.search(srndQuery, filterBookDate, Integer.MAX_VALUE, sortByBookDate);

This is where I have the problem. I don't know which IndexSearcher.search to use that will accommodate the SrndQuery. If this isn't the way to use SrndQuery, please suggest.

Regards,
Raghu


-----Original Message-----
From: Ian Lea [mailto:ian.lea@gmail.com]
Sent: Friday, October 11, 2013 7:05 AM
To: java-user@lucene.apache.org
Subject: Re: Multiple Keywords - Regular and Any Order Search

Looks like you can achieve most of what you want by using AND rather than OR.  I think that all the should/should not examples you give will work if you use AND on your content field.

For ordering, I suggest you look at SpanNearQuery.  That can consider order and slop, the distance between the search terms.

You may also want to consider separate fields if you care whether "raining beautiful abc" should match or not.  You could use MultiFieldQueryParser or build up a BooleanQuery in code, or build a complicated string to parse to the standard query parser.  There are other query parsers as well that might work for you e.g.
org.apache.lucene.queryparser.surround.parser.QueryParser


--
Ian.


On Thu, Oct 10, 2013 at 4:54 PM,  <ra...@barclays.com> wrote:
> Hi,
>
> I have implemented Lucene to search for a single keyword across multiple fields and it works great. I did this by concatenating all the fields into a "contents" field and searching against this field.
>
> When I give multiple keywords against this setup, Lucene by default does an OR search, leading to loads of duplicates. This, I understand is an expected behaviour.
>
>
> 1.       Hence the first thing that I am trying to achieve is search functionality for multiple keywords. The most popular suggestion is to implement PhraseQuery. I will try this out, but please let me know if you can provide an example or any suggestions.
>
>
>
> 2.       Once the multiple keywords search is implemented, I need to provide another option to the users. They should be able to check a checkbox "Search in any order". If checked, if the same keywords of the phrase are present "in a particular field" BUT in different order, that should still be a match. I don't know how to implement this without forming all permutations of the phrase and then performing an AND search. This could be very expensive in terms of performance. Please let me know if Lucene provides a way to do this.
>
>
>
> Examples for Item 2:
>
>
>
> 3.       Field1: "RAINING HEAVILY TODAY" Field2: "BEAUTIFUL MORNING" Field3: "ABC CORPORATION LIMITED"
>
>
>
> Search1: "RAINING HEAVILY TODAY" - Should Match
>
> Search2: "RAINING TODAY HEAVILY" - Should Match
>
> Search3: "RAIN TODAY HEAVILY" - Should NOT Match
>
> Search4: "ABC CORPORATION LIMITED" - Should Match
>
> Search5: "ABC CORP LIMITED" - Should NOT Match
>
> Search6: "ABC LIMITED CORPORATION" - Should Match
>
>
>
> I am also not sure if the "contents" field approach will work in this case. Do I need to index the fields separately using "MultiFieldQueryParser" to achieve this?
>
>
> Sorry for the lengthy question. I would greatly appreciate any suggestions or inputs.
>
> Regards,
> Raghu
>
>
> _______________________________________________
>
> This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer.
>
> For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com.
>
> _______________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

_______________________________________________

This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer.

For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com.

_______________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

_______________________________________________

This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer.

For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com.

_______________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

_______________________________________________

This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer.

For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com.

_______________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

_______________________________________________

This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer.

For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com.

_______________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Multiple Keywords - Regular and Any Order Search

Posted by ra...@barclays.com.
The sysout copy-paste got corrupted for some reason previously. Please see below.

+org.apache.lucene.queryparser.surround.query.SimpleTermRewriteQuery(unused: )(LUC_FLD_ACTVY_DTLS, ABC, org.apache.lucene.queryparser.surround.query.BasicQueryFactory(maxBasicQueries: 1024, queriesMade: 0)) +org.apache.lucene.queryparser.surround.query.SimpleTermRewriteQuery(unused: )(LUC_FLD_ACTVY_DTLS, CORP, org.apache.lucene.queryparser.surround.query.BasicQueryFactory(maxBasicQueries: 1024, queriesMade: 0))


Regards,
Raghu


-----Original Message-----
From: Rao, Raghavendra: IT (NYK) 
Sent: Saturday, October 12, 2013 6:14 PM
To: java-user@lucene.apache.org
Subject: RE: Multiple Keywords - Regular and Any Order Search

I made some progress and prepared the below syntax. But I don't get any results when I search for "ABC CORP" even though there are matching records.

SrndQuery srndQuery = org.apache.lucene.queryparser.surround.parser.QueryParser.parse("(ABC AND CORP)"); Query query = srndQuery.makeLuceneQueryField("LUC_FLD_ACTVY_DTLS", new BasicQueryFactory());
			    
System.out.println("query.toString: " + query.toString());
			    
topDocs = isearcher.search(query, filterBookDate, Integer.MAX_VALUE, sortByBookDate);

The above sysout output is shown below.

+org.apache.lucene.queryparser.surround.query.SimpleTermRewriteQuery(unu
+sed: )(LUC_FLD_ACTVY_DTLS, ABC, 
+org.apache.lucene.queryparser.surround.query.BasicQueryFactory(maxBasic
+Queries: 1024, queriesMade: 0)) 
++org.apache.lucene.queryparser.surround.query.SimpleTermRewriteQuery(un
+used: )(LUC_FLD_ACTVY_DTLS, CORP, 
+org.apache.lucene.queryparser.surround.query.BasicQueryFactory(maxBasic
+Queries: 1024, queriesMade: 0))

Please let me know what I am doing wrong. I don't know how to use the BasicQueryFactory() object. Is that the reason?

Regards,
Raghu


-----Original Message-----
From: Rao, Raghavendra: IT (NYK)
Sent: Saturday, October 12, 2013 5:32 PM
To: java-user@lucene.apache.org
Subject: RE: Multiple Keywords - Regular and Any Order Search

Ian,

Thank you very much for your valuable inputs.

The surround parser sounds very powerful and it just may be the single answer to all what I am looking for. I have been trying hard to find an example for its implementation but haven't been able to find one online. Could you please help?

IndexSearcher isearcher = new IndexSearcher(ireader); SrndQuery srndQuery = QueryParser.parse("<my query>");

topFieldDocs = isearcher.search(srndQuery, filterBookDate, Integer.MAX_VALUE, sortByBookDate);

This is where I have the problem. I don't know which IndexSearcher.search to use that will accommodate the SrndQuery. If this isn't the way to use SrndQuery, please suggest.

Regards,
Raghu


-----Original Message-----
From: Ian Lea [mailto:ian.lea@gmail.com]
Sent: Friday, October 11, 2013 7:05 AM
To: java-user@lucene.apache.org
Subject: Re: Multiple Keywords - Regular and Any Order Search

Looks like you can achieve most of what you want by using AND rather than OR.  I think that all the should/should not examples you give will work if you use AND on your content field.

For ordering, I suggest you look at SpanNearQuery.  That can consider order and slop, the distance between the search terms.

You may also want to consider separate fields if you care whether "raining beautiful abc" should match or not.  You could use MultiFieldQueryParser or build up a BooleanQuery in code, or build a complicated string to parse to the standard query parser.  There are other query parsers as well that might work for you e.g.
org.apache.lucene.queryparser.surround.parser.QueryParser


--
Ian.


On Thu, Oct 10, 2013 at 4:54 PM,  <ra...@barclays.com> wrote:
> Hi,
>
> I have implemented Lucene to search for a single keyword across multiple fields and it works great. I did this by concatenating all the fields into a "contents" field and searching against this field.
>
> When I give multiple keywords against this setup, Lucene by default does an OR search, leading to loads of duplicates. This, I understand is an expected behaviour.
>
>
> 1.       Hence the first thing that I am trying to achieve is search functionality for multiple keywords. The most popular suggestion is to implement PhraseQuery. I will try this out, but please let me know if you can provide an example or any suggestions.
>
>
>
> 2.       Once the multiple keywords search is implemented, I need to provide another option to the users. They should be able to check a checkbox "Search in any order". If checked, if the same keywords of the phrase are present "in a particular field" BUT in different order, that should still be a match. I don't know how to implement this without forming all permutations of the phrase and then performing an AND search. This could be very expensive in terms of performance. Please let me know if Lucene provides a way to do this.
>
>
>
> Examples for Item 2:
>
>
>
> 3.       Field1: "RAINING HEAVILY TODAY" Field2: "BEAUTIFUL MORNING" Field3: "ABC CORPORATION LIMITED"
>
>
>
> Search1: "RAINING HEAVILY TODAY" - Should Match
>
> Search2: "RAINING TODAY HEAVILY" - Should Match
>
> Search3: "RAIN TODAY HEAVILY" - Should NOT Match
>
> Search4: "ABC CORPORATION LIMITED" - Should Match
>
> Search5: "ABC CORP LIMITED" - Should NOT Match
>
> Search6: "ABC LIMITED CORPORATION" - Should Match
>
>
>
> I am also not sure if the "contents" field approach will work in this case. Do I need to index the fields separately using "MultiFieldQueryParser" to achieve this?
>
>
> Sorry for the lengthy question. I would greatly appreciate any suggestions or inputs.
>
> Regards,
> Raghu
>
>
> _______________________________________________
>
> This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer.
>
> For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com.
>
> _______________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

_______________________________________________

This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer.

For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com.

_______________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

_______________________________________________

This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer.

For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com.

_______________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

_______________________________________________

This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer.

For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com.

_______________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Multiple Keywords - Regular and Any Order Search

Posted by ra...@barclays.com.
I made some progress and prepared the below syntax. But I don't get any results when I search for "ABC CORP" even though there are matching records.

SrndQuery srndQuery = org.apache.lucene.queryparser.surround.parser.QueryParser.parse("(ABC AND CORP)");
Query query = srndQuery.makeLuceneQueryField("LUC_FLD_ACTVY_DTLS", new BasicQueryFactory());
			    
System.out.println("query.toString: " + query.toString());
			    
topDocs = isearcher.search(query, filterBookDate, Integer.MAX_VALUE, sortByBookDate);

The above sysout output is shown below.

+org.apache.lucene.queryparser.surround.query.SimpleTermRewriteQuery(unused: )(LUC_FLD_ACTVY_DTLS, ABC, org.apache.lucene.queryparser.surround.query.BasicQueryFactory(maxBasicQueries: 1024, queriesMade: 0)) +org.apache.lucene.queryparser.surround.query.SimpleTermRewriteQuery(unused: )(LUC_FLD_ACTVY_DTLS, CORP, org.apache.lucene.queryparser.surround.query.BasicQueryFactory(maxBasicQueries: 1024, queriesMade: 0))

Please let me know what I am doing wrong. I don't know how to use the BasicQueryFactory() object. Is that the reason?

Regards,
Raghu


-----Original Message-----
From: Rao, Raghavendra: IT (NYK) 
Sent: Saturday, October 12, 2013 5:32 PM
To: java-user@lucene.apache.org
Subject: RE: Multiple Keywords - Regular and Any Order Search

Ian,

Thank you very much for your valuable inputs.

The surround parser sounds very powerful and it just may be the single answer to all what I am looking for. I have been trying hard to find an example for its implementation but haven't been able to find one online. Could you please help?

IndexSearcher isearcher = new IndexSearcher(ireader); SrndQuery srndQuery = QueryParser.parse("<my query>");

topFieldDocs = isearcher.search(srndQuery, filterBookDate, Integer.MAX_VALUE, sortByBookDate);

This is where I have the problem. I don't know which IndexSearcher.search to use that will accommodate the SrndQuery. If this isn't the way to use SrndQuery, please suggest.

Regards,
Raghu


-----Original Message-----
From: Ian Lea [mailto:ian.lea@gmail.com]
Sent: Friday, October 11, 2013 7:05 AM
To: java-user@lucene.apache.org
Subject: Re: Multiple Keywords - Regular and Any Order Search

Looks like you can achieve most of what you want by using AND rather than OR.  I think that all the should/should not examples you give will work if you use AND on your content field.

For ordering, I suggest you look at SpanNearQuery.  That can consider order and slop, the distance between the search terms.

You may also want to consider separate fields if you care whether "raining beautiful abc" should match or not.  You could use MultiFieldQueryParser or build up a BooleanQuery in code, or build a complicated string to parse to the standard query parser.  There are other query parsers as well that might work for you e.g.
org.apache.lucene.queryparser.surround.parser.QueryParser


--
Ian.


On Thu, Oct 10, 2013 at 4:54 PM,  <ra...@barclays.com> wrote:
> Hi,
>
> I have implemented Lucene to search for a single keyword across multiple fields and it works great. I did this by concatenating all the fields into a "contents" field and searching against this field.
>
> When I give multiple keywords against this setup, Lucene by default does an OR search, leading to loads of duplicates. This, I understand is an expected behaviour.
>
>
> 1.       Hence the first thing that I am trying to achieve is search functionality for multiple keywords. The most popular suggestion is to implement PhraseQuery. I will try this out, but please let me know if you can provide an example or any suggestions.
>
>
>
> 2.       Once the multiple keywords search is implemented, I need to provide another option to the users. They should be able to check a checkbox "Search in any order". If checked, if the same keywords of the phrase are present "in a particular field" BUT in different order, that should still be a match. I don't know how to implement this without forming all permutations of the phrase and then performing an AND search. This could be very expensive in terms of performance. Please let me know if Lucene provides a way to do this.
>
>
>
> Examples for Item 2:
>
>
>
> 3.       Field1: "RAINING HEAVILY TODAY" Field2: "BEAUTIFUL MORNING" Field3: "ABC CORPORATION LIMITED"
>
>
>
> Search1: "RAINING HEAVILY TODAY" - Should Match
>
> Search2: "RAINING TODAY HEAVILY" - Should Match
>
> Search3: "RAIN TODAY HEAVILY" - Should NOT Match
>
> Search4: "ABC CORPORATION LIMITED" - Should Match
>
> Search5: "ABC CORP LIMITED" - Should NOT Match
>
> Search6: "ABC LIMITED CORPORATION" - Should Match
>
>
>
> I am also not sure if the "contents" field approach will work in this case. Do I need to index the fields separately using "MultiFieldQueryParser" to achieve this?
>
>
> Sorry for the lengthy question. I would greatly appreciate any suggestions or inputs.
>
> Regards,
> Raghu
>
>
> _______________________________________________
>
> This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer.
>
> For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com.
>
> _______________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

_______________________________________________

This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer.

For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com.

_______________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

_______________________________________________

This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer.

For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com.

_______________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Multiple Keywords - Regular and Any Order Search

Posted by ra...@barclays.com.
Ian,

Thank you very much for your valuable inputs.

The surround parser sounds very powerful and it just may be the single answer to all what I am looking for. I have been trying hard to find an example for its implementation but haven't been able to find one online. Could you please help?

IndexSearcher isearcher = new IndexSearcher(ireader);
SrndQuery srndQuery = QueryParser.parse("<my query>");

topFieldDocs = isearcher.search(srndQuery, filterBookDate, Integer.MAX_VALUE, sortByBookDate);

This is where I have the problem. I don't know which IndexSearcher.search to use that will accommodate the SrndQuery. If this isn't the way to use SrndQuery, please suggest.

Regards,
Raghu


-----Original Message-----
From: Ian Lea [mailto:ian.lea@gmail.com] 
Sent: Friday, October 11, 2013 7:05 AM
To: java-user@lucene.apache.org
Subject: Re: Multiple Keywords - Regular and Any Order Search

Looks like you can achieve most of what you want by using AND rather than OR.  I think that all the should/should not examples you give will work if you use AND on your content field.

For ordering, I suggest you look at SpanNearQuery.  That can consider order and slop, the distance between the search terms.

You may also want to consider separate fields if you care whether "raining beautiful abc" should match or not.  You could use MultiFieldQueryParser or build up a BooleanQuery in code, or build a complicated string to parse to the standard query parser.  There are other query parsers as well that might work for you e.g.
org.apache.lucene.queryparser.surround.parser.QueryParser


--
Ian.


On Thu, Oct 10, 2013 at 4:54 PM,  <ra...@barclays.com> wrote:
> Hi,
>
> I have implemented Lucene to search for a single keyword across multiple fields and it works great. I did this by concatenating all the fields into a "contents" field and searching against this field.
>
> When I give multiple keywords against this setup, Lucene by default does an OR search, leading to loads of duplicates. This, I understand is an expected behaviour.
>
>
> 1.       Hence the first thing that I am trying to achieve is search functionality for multiple keywords. The most popular suggestion is to implement PhraseQuery. I will try this out, but please let me know if you can provide an example or any suggestions.
>
>
>
> 2.       Once the multiple keywords search is implemented, I need to provide another option to the users. They should be able to check a checkbox "Search in any order". If checked, if the same keywords of the phrase are present "in a particular field" BUT in different order, that should still be a match. I don't know how to implement this without forming all permutations of the phrase and then performing an AND search. This could be very expensive in terms of performance. Please let me know if Lucene provides a way to do this.
>
>
>
> Examples for Item 2:
>
>
>
> 3.       Field1: "RAINING HEAVILY TODAY" Field2: "BEAUTIFUL MORNING" Field3: "ABC CORPORATION LIMITED"
>
>
>
> Search1: "RAINING HEAVILY TODAY" - Should Match
>
> Search2: "RAINING TODAY HEAVILY" - Should Match
>
> Search3: "RAIN TODAY HEAVILY" - Should NOT Match
>
> Search4: "ABC CORPORATION LIMITED" - Should Match
>
> Search5: "ABC CORP LIMITED" - Should NOT Match
>
> Search6: "ABC LIMITED CORPORATION" - Should Match
>
>
>
> I am also not sure if the "contents" field approach will work in this case. Do I need to index the fields separately using "MultiFieldQueryParser" to achieve this?
>
>
> Sorry for the lengthy question. I would greatly appreciate any suggestions or inputs.
>
> Regards,
> Raghu
>
>
> _______________________________________________
>
> This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer.
>
> For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com.
>
> _______________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

_______________________________________________

This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer.

For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com.

_______________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Multiple Keywords - Regular and Any Order Search

Posted by Ian Lea <ia...@gmail.com>.
Looks like you can achieve most of what you want by using AND rather
than OR.  I think that all the should/should not examples you give
will work if you use AND on your content field.

For ordering, I suggest you look at SpanNearQuery.  That can consider
order and slop, the distance between the search terms.

You may also want to consider separate fields if you care whether
"raining beautiful abc" should match or not.  You could use
MultiFieldQueryParser or build up a BooleanQuery in code, or build a
complicated string to parse to the standard query parser.  There are
other query parsers as well that might work for you e.g.
org.apache.lucene.queryparser.surround.parser.QueryParser


--
Ian.


On Thu, Oct 10, 2013 at 4:54 PM,  <ra...@barclays.com> wrote:
> Hi,
>
> I have implemented Lucene to search for a single keyword across multiple fields and it works great. I did this by concatenating all the fields into a "contents" field and searching against this field.
>
> When I give multiple keywords against this setup, Lucene by default does an OR search, leading to loads of duplicates. This, I understand is an expected behaviour.
>
>
> 1.       Hence the first thing that I am trying to achieve is search functionality for multiple keywords. The most popular suggestion is to implement PhraseQuery. I will try this out, but please let me know if you can provide an example or any suggestions.
>
>
>
> 2.       Once the multiple keywords search is implemented, I need to provide another option to the users. They should be able to check a checkbox "Search in any order". If checked, if the same keywords of the phrase are present "in a particular field" BUT in different order, that should still be a match. I don't know how to implement this without forming all permutations of the phrase and then performing an AND search. This could be very expensive in terms of performance. Please let me know if Lucene provides a way to do this.
>
>
>
> Examples for Item 2:
>
>
>
> 3.       Field1: "RAINING HEAVILY TODAY" Field2: "BEAUTIFUL MORNING" Field3: "ABC CORPORATION LIMITED"
>
>
>
> Search1: "RAINING HEAVILY TODAY" - Should Match
>
> Search2: "RAINING TODAY HEAVILY" - Should Match
>
> Search3: "RAIN TODAY HEAVILY" - Should NOT Match
>
> Search4: "ABC CORPORATION LIMITED" - Should Match
>
> Search5: "ABC CORP LIMITED" - Should NOT Match
>
> Search6: "ABC LIMITED CORPORATION" - Should Match
>
>
>
> I am also not sure if the "contents" field approach will work in this case. Do I need to index the fields separately using "MultiFieldQueryParser" to achieve this?
>
>
> Sorry for the lengthy question. I would greatly appreciate any suggestions or inputs.
>
> Regards,
> Raghu
>
>
> _______________________________________________
>
> This message is for information purposes only, it is not a recommendation, advice, offer or solicitation to buy or sell a product or service nor an official confirmation of any transaction. It is directed at persons who are professionals and is not intended for retail customer use. Intended for recipient only. This message is subject to the terms at: www.barclays.com/emaildisclaimer.
>
> For important disclosures, please see: www.barclays.com/salesandtradingdisclaimer regarding market commentary from Barclays Sales and/or Trading, who are active market participants; and in respect of Barclays Research, including disclosures relating to specific issuers, please see http://publicresearch.barclays.com.
>
> _______________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org