You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by rm...@apache.org on 2012/09/13 19:42:18 UTC
svn commit: r1384427 -
/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
Author: rmuir
Date: Thu Sep 13 17:42:18 2012
New Revision: 1384427
URL: http://svn.apache.org/viewvc?rev=1384427&view=rev
Log:
add note about escaping terms
Modified:
lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
Modified: lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
URL: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java?rev=1384427&r1=1384426&r2=1384427&view=diff
==============================================================================
--- lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java (original)
+++ lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java Thu Sep 13 17:42:18 2012
@@ -28,7 +28,10 @@ import java.util.List;
/** Implements the wildcard search query. Supported wildcards are <code>*</code>, which
* matches any character sequence (including the empty one), and <code>?</code>,
- * which matches any single character. Note this query can be slow, as it
+ * which matches any single character. If you want to treat a wildcard as a literal
+ * character instead, escape it with '\'.
+ * <p>
+ * Note this query can be slow, as it
* needs to iterate over many terms. In order to prevent extremely slow WildcardQueries,
* a Wildcard term should not start with the wildcard <code>*</code>
*
Re: svn commit: r1384427 - /lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
Posted by Jack Krupansky <ja...@basetechnology.com>.
Good enough.
The reason I stumbled into this was for the user-reported case of parsing
"*\:*" (without quotes). The escape is needed at the query parser level so
that it doesn't look like a field reference, but the escape gets passed down
into WildcardQuery itself. It is a useless escape down at that level, but
there nonetheless. I was initially surprised when debugQuery in Solr showed
the backslash in the parsed query; normally escapes get thrown away by the
query parser before the query is generated, but not since wild escaping was
added.
-- Jack Krupansky
-----Original Message-----
From: Robert Muir
Sent: Thursday, September 13, 2012 3:18 PM
To: dev@lucene.apache.org
Subject: Re: svn commit: r1384427 -
/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
On Thu, Sep 13, 2012 at 3:15 PM, Jack Krupansky <ja...@basetechnology.com>
wrote:
>
> So, if the user wants a backslash in their wildcard term, it does need to
> be
> escaped. I think. If I am wrong, please explain further.
>
its not necessary if its at the end (its lenient). Anyway I'll just
change it to say '\' is the escape character.
I want to keep it concise: most people dont have these characters in
their terms.
--
lucidworks.com
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: svn commit: r1384427 - /lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
Posted by Robert Muir <rc...@gmail.com>.
On Thu, Sep 13, 2012 at 3:15 PM, Jack Krupansky <ja...@basetechnology.com> wrote:
>
> So, if the user wants a backslash in their wildcard term, it does need to be
> escaped. I think. If I am wrong, please explain further.
>
its not necessary if its at the end (its lenient). Anyway I'll just
change it to say '\' is the escape character.
I want to keep it concise: most people dont have these characters in
their terms.
--
lucidworks.com
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: svn commit: r1384427 - /lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
Posted by Jack Krupansky <ja...@basetechnology.com>.
I'm just reading the code:
case WILDCARD_ESCAPE:
// add the next codepoint instead, if it exists
if (i + length < wildcardText.length()) {
final int nextChar = wildcardText.codePointAt(i + length);
length += Character.charCount(nextChar);
automata.add(BasicAutomata.makeChar(nextChar));
break;
} // else fallthru, lenient parsing with a trailing \
So, if the user places a backslash before a non-wild character, the
backslash will be discarded. For example:
abc\\def-x*y
As is, the code will remove that first backslash and include the next
character from the pattern string, which happens to be a backslash as well.
If the user merely wrote:
abc\def-x*y
As is, the code will remove the backslash and merely accept the next
character.
So, if the user wants a backslash in their wildcard term, it does need to be
escaped. I think. If I am wrong, please explain further.
-- Jack Krupansky
-----Original Message-----
From: Robert Muir
Sent: Thursday, September 13, 2012 3:01 PM
To: dev@lucene.apache.org
Subject: Re: svn commit: r1384427 -
/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
But thats not true.
On Thu, Sep 13, 2012 at 2:52 PM, Jack Krupansky <ja...@basetechnology.com>
wrote:
> Technically, should also indicate that backslashes need to be escaped to
> include them in a wildcard term.
>
> -- Jack Krupansky
>
> -----Original Message----- From: rmuir@apache.org
> Sent: Thursday, September 13, 2012 1:42 PM
> To: commits@lucene.apache.org
> Subject: svn commit: r1384427 -
> /lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
>
>
> Author: rmuir
> Date: Thu Sep 13 17:42:18 2012
> New Revision: 1384427
>
> URL: http://svn.apache.org/viewvc?rev=1384427&view=rev
> Log:
> add note about escaping terms
>
> Modified:
>
> lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
>
> Modified:
> lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java?rev=1384427&r1=1384426&r2=1384427&view=diff
> ==============================================================================
> ---
> lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
> (original)
> +++
> lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
> Thu Sep 13 17:42:18 2012
> @@ -28,7 +28,10 @@ import java.util.List;
>
> /** Implements the wildcard search query. Supported wildcards are
> <code>*</code>, which
> * matches any character sequence (including the empty one), and
> <code>?</code>,
> - * which matches any single character. Note this query can be slow, as it
> + * which matches any single character. If you want to treat a wildcard as
> a
> literal
> + * character instead, escape it with '\'.
> + * <p>
> + * Note this query can be slow, as it
> * needs to iterate over many terms. In order to prevent extremely slow
> WildcardQueries,
> * a Wildcard term should not start with the wildcard <code>*</code>
> *
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
--
lucidworks.com
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: svn commit: r1384427 - /lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
Posted by Robert Muir <rc...@gmail.com>.
But thats not true.
On Thu, Sep 13, 2012 at 2:52 PM, Jack Krupansky <ja...@basetechnology.com> wrote:
> Technically, should also indicate that backslashes need to be escaped to
> include them in a wildcard term.
>
> -- Jack Krupansky
>
> -----Original Message----- From: rmuir@apache.org
> Sent: Thursday, September 13, 2012 1:42 PM
> To: commits@lucene.apache.org
> Subject: svn commit: r1384427 -
> /lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
>
>
> Author: rmuir
> Date: Thu Sep 13 17:42:18 2012
> New Revision: 1384427
>
> URL: http://svn.apache.org/viewvc?rev=1384427&view=rev
> Log:
> add note about escaping terms
>
> Modified:
>
> lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
>
> Modified:
> lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
> URL:
> http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java?rev=1384427&r1=1384426&r2=1384427&view=diff
> ==============================================================================
> ---
> lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
> (original)
> +++
> lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
> Thu Sep 13 17:42:18 2012
> @@ -28,7 +28,10 @@ import java.util.List;
>
> /** Implements the wildcard search query. Supported wildcards are
> <code>*</code>, which
> * matches any character sequence (including the empty one), and
> <code>?</code>,
> - * which matches any single character. Note this query can be slow, as it
> + * which matches any single character. If you want to treat a wildcard as a
> literal
> + * character instead, escape it with '\'.
> + * <p>
> + * Note this query can be slow, as it
> * needs to iterate over many terms. In order to prevent extremely slow
> WildcardQueries,
> * a Wildcard term should not start with the wildcard <code>*</code>
> *
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
--
lucidworks.com
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: svn commit: r1384427 - /lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
Posted by Jack Krupansky <ja...@basetechnology.com>.
Technically, should also indicate that backslashes need to be escaped to
include them in a wildcard term.
-- Jack Krupansky
-----Original Message-----
From: rmuir@apache.org
Sent: Thursday, September 13, 2012 1:42 PM
To: commits@lucene.apache.org
Subject: svn commit: r1384427 -
/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
Author: rmuir
Date: Thu Sep 13 17:42:18 2012
New Revision: 1384427
URL: http://svn.apache.org/viewvc?rev=1384427&view=rev
Log:
add note about escaping terms
Modified:
lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
Modified:
lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
URL:
http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java?rev=1384427&r1=1384426&r2=1384427&view=diff
==============================================================================
---
lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
(original)
+++
lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/search/WildcardQuery.java
Thu Sep 13 17:42:18 2012
@@ -28,7 +28,10 @@ import java.util.List;
/** Implements the wildcard search query. Supported wildcards are
<code>*</code>, which
* matches any character sequence (including the empty one), and
<code>?</code>,
- * which matches any single character. Note this query can be slow, as it
+ * which matches any single character. If you want to treat a wildcard as a
literal
+ * character instead, escape it with '\'.
+ * <p>
+ * Note this query can be slow, as it
* needs to iterate over many terms. In order to prevent extremely slow
WildcardQueries,
* a Wildcard term should not start with the wildcard <code>*</code>
*
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org