You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Trevor Watson <tw...@datassimilate.com> on 2009/08/17 22:54:43 UTC

Document equals, not contains

Last question for the day for you folks :)

We've written the code for our webservice to do a "contains" search with 
Lucene.NET (easy thanks to the developers).

Now the next part they added to it (development and planning in one go, 
/slaps project lead's hand) is to do a "Is Exactly" and "Is Not Exactly" 
searches.

I know we could do this by doing a search and looking for anything with 
a score of _near_ 1. (even copied/pasted stuff gave me 0.99999994 
scores, my guess is due to punctuation), however, we're using the 
Lucene.Net.QueryParsers.QueryParser and Lucene.Net.Search.Query to allow 
multiple clauses in the query.  So I can't do "Is exactly "Bob is 
smiling"" then just loop through the returned values' scores.  (instead 
we want to do "Field1 Is Exactly "Bob is smiling" AND Field2 Contains 
"happy" AND Field3 does not contain "Dancing ensued")

Is it possible to add anything to the query to restrict the scores 
returned in Lucene.Net?  For example, "Bob is happy"$0.9 would return 
anything with a 0.9 score or higher.

Any advice would be greatly appreciated.

Trevor Watson

RE: Document equals, not contains

Posted by Franklin Simmons <fs...@sccmediaserver.com>.
Trevor,

You might use PhraseQuery and adjust the slop factor.  However, what exactly is meant by "exact"?

For example, the  StandardAnalyzer will tokenize the text "Bob is smiling" to two tokens, 'bob' and 'smiling' (note the lowercasing as well as the dropped 'is'). Clearly, "bob is smiling" is not exactly "Bob is smiling", so it all hinges on what is meant by "exact".






-----Original Message-----
From: Trevor Watson [mailto:twatson@datassimilate.com] 
Sent: Monday, August 17, 2009 4:55 PM
To: lucene-net-user@incubator.apache.org
Subject: Document equals, not contains

Last question for the day for you folks :)

We've written the code for our webservice to do a "contains" search with 
Lucene.NET (easy thanks to the developers).

Now the next part they added to it (development and planning in one go, 
/slaps project lead's hand) is to do a "Is Exactly" and "Is Not Exactly" 
searches.

I know we could do this by doing a search and looking for anything with 
a score of _near_ 1. (even copied/pasted stuff gave me 0.99999994 
scores, my guess is due to punctuation), however, we're using the 
Lucene.Net.QueryParsers.QueryParser and Lucene.Net.Search.Query to allow 
multiple clauses in the query.  So I can't do "Is exactly "Bob is 
smiling"" then just loop through the returned values' scores.  (instead 
we want to do "Field1 Is Exactly "Bob is smiling" AND Field2 Contains 
"happy" AND Field3 does not contain "Dancing ensued")

Is it possible to add anything to the query to restrict the scores 
returned in Lucene.Net?  For example, "Bob is happy"$0.9 would return 
anything with a 0.9 score or higher.

Any advice would be greatly appreciated.

Trevor Watson

RE: Document equals, not contains

Posted by Digy <di...@gmail.com>.
Hi Trevor,

I am not sure that I understand what you are exactly trying to do.

First of all, Lucene tokenizes the text (or splits it into menaningful
parts) and indexes those terms(words?) depending on the analyzer you are
using.

For ex, an analyzer ignoring the stop words may index [what a feeling] as 2
terms "what" and "feeling" (or even an english stemmer would index it as
"what" and "feel"). So -using the same analyzer- with a search [what feel],
you can get hits with very high scores, but this does not mean that you have
an exact match. 

Also, StartsWith or EndsWith for phares is not applicable to Lucene.Net. You
can make wildcard searches only on the terms that are indexed. [wha*] or
[fee*] are valid searches, but if you assume that [what a fee*] will only
returns the results starting with this query, you are wrong.

To inspect what you have indexed, you can use the tool Luke(
http://www.getopt.org/luke/ ).

Good Luck.

DIGY


-----Original Message-----
From: Trevor Watson [mailto:twatson@datassimilate.com] 
Sent: Monday, August 17, 2009 11:55 PM
To: lucene-net-user@incubator.apache.org
Subject: Document equals, not contains

Last question for the day for you folks :)

We've written the code for our webservice to do a "contains" search with 
Lucene.NET (easy thanks to the developers).

Now the next part they added to it (development and planning in one go, 
/slaps project lead's hand) is to do a "Is Exactly" and "Is Not Exactly" 
searches.

I know we could do this by doing a search and looking for anything with 
a score of _near_ 1. (even copied/pasted stuff gave me 0.99999994 
scores, my guess is due to punctuation), however, we're using the 
Lucene.Net.QueryParsers.QueryParser and Lucene.Net.Search.Query to allow 
multiple clauses in the query.  So I can't do "Is exactly "Bob is 
smiling"" then just loop through the returned values' scores.  (instead 
we want to do "Field1 Is Exactly "Bob is smiling" AND Field2 Contains 
"happy" AND Field3 does not contain "Dancing ensued")

Is it possible to add anything to the query to restrict the scores 
returned in Lucene.Net?  For example, "Bob is happy"$0.9 would return 
anything with a 0.9 score or higher.

Any advice would be greatly appreciated.

Trevor Watson