You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Xiaozheng Ma <Xi...@redwood.com> on 2004/11/10 17:06:27 UTC

[PATCH]multiple wildcards ? at the end of search pattern return incorrect hits

Hi all,

I sent a patch regarding wildcard search a couple of days ago(that was
my 1st time sending anything to the list). I've seen no response so far.
Not sure if it has been received by any of you. On the other hand, based
on what I see these two days, you guys usually response to issues
promptly. 

The problem is if you search on "ca??", the hit includes 'cat', 'CA',
etc, while the user only wants 4 letter words start with CA, such as
'card', 'cash', to be returned. This happens only when multiple '?' at
the end of search pattern. The solution is to check if the word that is
matching against search pattern ends while there is still '?' left. If
this is the case, match should return false. 

The patch file is attached and here is the text copy:
------------------------------------------------------------------------
-
--- WildcardTermEnum.org	2004-05-11 11:42:10.000000000 -0400
+++ WildcardTermEnum.java	2004-11-08 14:35:14.823610500 -0500
@@ -132,6 +132,10 @@
             }
             else
             {
+	      //to prevent "cat" matches "ca??"
+	      if(wildchar == WILDCARD_CHAR){
+		return false;
+	      }	      
               // Look at the next character
               wildcardSearchPos++;
             } 

------------------------------------------------------------------------
--
Thanks!

Xiaozheng

Re: [PATCH]multiple wildcards ? at the end of search pattern return incorrect hits

Posted by Andrzej Bialecki <ab...@getopt.org>.
Erik Hatcher wrote:

> Xiaozheng,
> 
> I tried your patch locally by adding a test case to testQuestionmark of  
> TestWildcardQuery:
> 
>         Query query6 = new WildcardQuery(new Term("body", "metal??"));
>         assertMatches(searcher, query6, 0);
> 
> I was not able to get it to work properly, as this test case failed  
> after adding your patch.  Could you enhance this test case to include  
> the bug you're fixing so that we can show that your implementation  
> works properly?  I'd commit it if I can get this test case to pass :)

I just wanted to note that this patch redefines the usual meaning of '?' 
wildcard, which means "exactly one or zero characters" - and that is the 
way it's working now. I'm not sure if this change is good, it is 
certainly surprising...

What the original poster wanted is commonly known as '.' wildcard, which 
means "exactly one character".

-- 
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: [PATCH]multiple wildcards ? at the end of search pattern return incorrect hits

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Xiaozheng,

I tried your patch locally by adding a test case to testQuestionmark of  
TestWildcardQuery:

         Query query6 = new WildcardQuery(new Term("body", "metal??"));
         assertMatches(searcher, query6, 0);

I was not able to get it to work properly, as this test case failed  
after adding your patch.  Could you enhance this test case to include  
the bug you're fixing so that we can show that your implementation  
works properly?  I'd commit it if I can get this test case to pass :)

	Erik


On Nov 10, 2004, at 11:06 AM, Xiaozheng Ma wrote:

>
> Hi all,
>
> I sent a patch regarding wildcard search a couple of days ago(that was
> my 1st time sending anything to the list). I've seen no response so  
> far.
> Not sure if it has been received by any of you. On the other hand,  
> based
> on what I see these two days, you guys usually response to issues
> promptly.
>
> The problem is if you search on "ca??", the hit includes 'cat', 'CA',
> etc, while the user only wants 4 letter words start with CA, such as
> 'card', 'cash', to be returned. This happens only when multiple '?' at
> the end of search pattern. The solution is to check if the word that is
> matching against search pattern ends while there is still '?' left. If
> this is the case, match should return false.
>
> The patch file is attached and here is the text copy:
> ----------------------------------------------------------------------- 
> -
> -
> --- WildcardTermEnum.org	2004-05-11 11:42:10.000000000 -0400
> +++ WildcardTermEnum.java	2004-11-08 14:35:14.823610500 -0500
> @@ -132,6 +132,10 @@
>              }
>              else
>              {
> +	      //to prevent "cat" matches "ca??"
> +	      if(wildchar == WILDCARD_CHAR){
> +		return false;
> +	      }	
>                // Look at the next character
>                wildcardSearchPos++;
>              }
>
> ----------------------------------------------------------------------- 
> -
> --
> Thanks!
>
> Xiaozheng
> <WildcardPatch.txt>---------------------------------------------------- 
> -----------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: [PATCH]multiple wildcards ? at the end of search pattern return incorrect hits

Posted by Bernhard Messer <bm...@apache.org>.
Hi,

thanks for your work. Usually Bugzilla would be the best way to place 
your code, so it doesn't get lost. Open a new entry in Bugzilla, prefix 
the summary line with [PATCH], and then attach your code.

thanks
Bernhard

>Hi all,
>
>I sent a patch regarding wildcard search a couple of days ago(that was
>my 1st time sending anything to the list). I've seen no response so far.
>Not sure if it has been received by any of you. On the other hand, based
>on what I see these two days, you guys usually response to issues
>promptly. 
>
>The problem is if you search on "ca??", the hit includes 'cat', 'CA',
>etc, while the user only wants 4 letter words start with CA, such as
>'card', 'cash', to be returned. This happens only when multiple '?' at
>the end of search pattern. The solution is to check if the word that is
>matching against search pattern ends while there is still '?' left. If
>this is the case, match should return false. 
>
>The patch file is attached and here is the text copy:
>------------------------------------------------------------------------
>-
>--- WildcardTermEnum.org	2004-05-11 11:42:10.000000000 -0400
>+++ WildcardTermEnum.java	2004-11-08 14:35:14.823610500 -0500
>@@ -132,6 +132,10 @@
>             }
>             else
>             {
>+	      //to prevent "cat" matches "ca??"
>+	      if(wildchar == WILDCARD_CHAR){
>+		return false;
>+	      }	      
>               // Look at the next character
>               wildcardSearchPos++;
>             } 
>
>------------------------------------------------------------------------
>--
>Thanks!
>
>Xiaozheng
>  
>
>------------------------------------------------------------------------
>
>--- WildcardTermEnum.org	2004-05-11 11:42:10.000000000 -0400
>+++ WildcardTermEnum.java	2004-11-08 14:35:14.823610500 -0500
>@@ -132,6 +132,10 @@
>             }
>             else
>             {
>+	      //to prevent "cat" matches "ca??"
>+	      if(wildchar == WILDCARD_CHAR){
>+		return false;
>+	      }	      
>               // Look at the next character
>               wildcardSearchPos++;
>             }
>  
>
>------------------------------------------------------------------------
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>