You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by rm...@apache.org on 2012/09/13 19:42:18 UTC

svn commit: r1384427 - /lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java

Author: rmuir
Date: Thu Sep 13 17:42:18 2012
New Revision: 1384427

URL: http://svn.apache.org/viewvc?rev=1384427&view=rev
Log:
add note about escaping terms

Modified:
    lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java

Modified: lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java?rev=1384427&r1=1384426&r2=1384427&view=diff
==============================================================================
--- lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java (original)
+++ lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java Thu Sep 13 17:42:18 2012
@@ -28,7 +28,10 @@ import java.util.List;
 
 /** Implements the wildcard search query. Supported wildcards are <code>*</code>, which
  * matches any character sequence (including the empty one), and <code>?</code>,
- * which matches any single character. Note this query can be slow, as it
+ * which matches any single character. If you want to treat a wildcard as a literal
+ * character instead, escape it with '\'.
+ * <p>
+ * Note this query can be slow, as it
  * needs to iterate over many terms. In order to prevent extremely slow WildcardQueries,
  * a Wildcard term should not start with the wildcard <code>*</code>
  * 



Re: svn commit: r1384427 - /lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java

Posted by Jack Krupansky <ja...@basetechnology.com>.
Good enough.

The reason I stumbled into this was for the user-reported case of parsing 
"*\:*" (without quotes). The escape is needed at the query parser level so 
that it doesn't look like a field reference, but the escape gets passed down 
into WildcardQuery itself. It is a useless escape down at that level, but 
there nonetheless. I was initially surprised when debugQuery in Solr showed 
the backslash in the parsed query; normally escapes get thrown away by the 
query parser before the query is generated, but not since wild escaping was 
added.

-- Jack Krupansky

-----Original Message----- 
From: Robert Muir
Sent: Thursday, September 13, 2012 3:18 PM
To: dev@lucene.apache.org
Subject: Re: svn commit: r1384427 - 
/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java

On Thu, Sep 13, 2012 at 3:15 PM, Jack Krupansky <ja...@basetechnology.com> 
wrote:
>
> So, if the user wants a backslash in their wildcard term, it does need to 
> be
> escaped. I think. If I am wrong, please explain further.
>

its not necessary if its at the end (its lenient). Anyway I'll just
change it to say '\' is the escape character.

I want to keep it concise: most people dont have these characters in
their terms.

-- 
lucidworks.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: svn commit: r1384427 - /lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java

Posted by Robert Muir <rc...@gmail.com>.
On Thu, Sep 13, 2012 at 3:15 PM, Jack Krupansky <ja...@basetechnology.com> wrote:
>
> So, if the user wants a backslash in their wildcard term, it does need to be
> escaped. I think. If I am wrong, please explain further.
>

its not necessary if its at the end (its lenient). Anyway I'll just
change it to say '\' is the escape character.

I want to keep it concise: most people dont have these characters in
their terms.

-- 
lucidworks.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: svn commit: r1384427 - /lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java

Posted by Jack Krupansky <ja...@basetechnology.com>.
I'm just reading the code:

case WILDCARD_ESCAPE:
  // add the next codepoint instead, if it exists
  if (i + length < wildcardText.length()) {
    final int nextChar = wildcardText.codePointAt(i + length);
    length += Character.charCount(nextChar);
    automata.add(BasicAutomata.makeChar(nextChar));
    break;
  } // else fallthru, lenient parsing with a trailing \


So, if the user places a backslash before a non-wild character, the 
backslash will be discarded. For example:

    abc\\def-x*y

As is, the code will remove that first backslash and include the next 
character from the pattern string, which happens to be a backslash as well.

If the user merely wrote:

    abc\def-x*y

As is, the code will remove the backslash and merely accept the next 
character.

So, if the user wants a backslash in their wildcard term, it does need to be 
escaped. I think. If I am wrong, please explain further.

-- Jack Krupansky

-----Original Message----- 
From: Robert Muir
Sent: Thursday, September 13, 2012 3:01 PM
To: dev@lucene.apache.org
Subject: Re: svn commit: r1384427 - 
/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java

But thats not true.

On Thu, Sep 13, 2012 at 2:52 PM, Jack Krupansky <ja...@basetechnology.com> 
wrote:
> Technically, should also indicate that backslashes need to be escaped to
> include them in a wildcard term.
>
> -- Jack Krupansky
>
> -----Original Message----- From: rmuir@apache.org
> Sent: Thursday, September 13, 2012 1:42 PM
> To: commits@lucene.apache.org
> Subject: svn commit: r1384427 -
> /lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
>
>
> Author: rmuir
> Date: Thu Sep 13 17:42:18 2012
> New Revision: 1384427
>
> URL: http://svn.apache.org/viewvc?rev=1384427&view=rev
> Log:
> add note about escaping terms
>
> Modified:
>
> lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
>
> Modified:
> lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java?rev=1384427&r1=1384426&r2=1384427&view=diff
> ==============================================================================
> ---
> lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
> (original)
> +++
> lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
> Thu Sep 13 17:42:18 2012
> @@ -28,7 +28,10 @@ import java.util.List;
>
> /** Implements the wildcard search query. Supported wildcards are
> <code>*</code>, which
>  * matches any character sequence (including the empty one), and
> <code>?</code>,
> - * which matches any single character. Note this query can be slow, as it
> + * which matches any single character. If you want to treat a wildcard as 
> a
> literal
> + * character instead, escape it with '\'.
> + * <p>
> + * Note this query can be slow, as it
>  * needs to iterate over many terms. In order to prevent extremely slow
> WildcardQueries,
>  * a Wildcard term should not start with the wildcard <code>*</code>
>  *
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>



-- 
lucidworks.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: svn commit: r1384427 - /lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java

Posted by Robert Muir <rc...@gmail.com>.
But thats not true.

On Thu, Sep 13, 2012 at 2:52 PM, Jack Krupansky <ja...@basetechnology.com> wrote:
> Technically, should also indicate that backslashes need to be escaped to
> include them in a wildcard term.
>
> -- Jack Krupansky
>
> -----Original Message----- From: rmuir@apache.org
> Sent: Thursday, September 13, 2012 1:42 PM
> To: commits@lucene.apache.org
> Subject: svn commit: r1384427 -
> /lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
>
>
> Author: rmuir
> Date: Thu Sep 13 17:42:18 2012
> New Revision: 1384427
>
> URL: http://svn.apache.org/viewvc?rev=1384427&view=rev
> Log:
> add note about escaping terms
>
> Modified:
>
> lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
>
> Modified:
> lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java?rev=1384427&r1=1384426&r2=1384427&view=diff
> ==============================================================================
> ---
> lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
> (original)
> +++
> lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
> Thu Sep 13 17:42:18 2012
> @@ -28,7 +28,10 @@ import java.util.List;
>
> /** Implements the wildcard search query. Supported wildcards are
> <code>*</code>, which
>  * matches any character sequence (including the empty one), and
> <code>?</code>,
> - * which matches any single character. Note this query can be slow, as it
> + * which matches any single character. If you want to treat a wildcard as a
> literal
> + * character instead, escape it with '\'.
> + * <p>
> + * Note this query can be slow, as it
>  * needs to iterate over many terms. In order to prevent extremely slow
> WildcardQueries,
>  * a Wildcard term should not start with the wildcard <code>*</code>
>  *
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>



-- 
lucidworks.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: svn commit: r1384427 - /lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java

Posted by Jack Krupansky <ja...@basetechnology.com>.
Technically, should also indicate that backslashes need to be escaped to 
include them in a wildcard term.

-- Jack Krupansky

-----Original Message----- 
From: rmuir@apache.org
Sent: Thursday, September 13, 2012 1:42 PM
To: commits@lucene.apache.org
Subject: svn commit: r1384427 - 
/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java

Author: rmuir
Date: Thu Sep 13 17:42:18 2012
New Revision: 1384427

URL: http://svn.apache.org/viewvc?rev=1384427&view=rev
Log:
add note about escaping terms

Modified:
    lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java

Modified: 
lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
URL: 
http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java?rev=1384427&r1=1384426&r2=1384427&view=diff
==============================================================================
---  
lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java 
(original)
+++ 
lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java 
Thu Sep 13 17:42:18 2012
@@ -28,7 +28,10 @@ import java.util.List;

/** Implements the wildcard search query. Supported wildcards are 
<code>*</code>, which
  * matches any character sequence (including the empty one), and 
<code>?</code>,
- * which matches any single character. Note this query can be slow, as it
+ * which matches any single character. If you want to treat a wildcard as a 
literal
+ * character instead, escape it with '\'.
+ * <p>
+ * Note this query can be slow, as it
  * needs to iterate over many terms. In order to prevent extremely slow 
WildcardQueries,
  * a Wildcard term should not start with the wildcard <code>*</code>
  *


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org