You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Tiago Silveira <ti...@commworld.de> on 2006/02/22 11:29:46 UTC
wildcard search with variable length
IMHO, using "cat cat?" or even "cat cat? cat??" is so simple that it doesn't
justify keeping the old, undocumented, arguably incorrect behavior.
Regards,
Tiago Silveira
-----Ursprüngliche Nachricht-----
Von: Terry Steichen [mailto:terry@net-frame.com]
Gesendet: Mittwoch, 22. Februar 2006 05:10
An: java-dev@lucene.apache.org
Betreff: Re: Lucene 1.9 RC1 release available
1) Having a simple way to match singular and plural forms of a term with
a single wildcard expression is quite useful.
2) The trailing '?' behavior has been present since that wildcard was
first introduced. Why not provide a flag to allow the original behavior
to optionally be preserved?
3) The fact that virtually no one objected to the original behavior
suggests that few if any were confused by it.
Chris Hostetter wrote:
>: In either case, what I'm arguing is that the current behavior makes more
>: sense in the real world of query expressions (that is, makes the most
>: common query expressions simpler), so why not continue it?
>
>I disagree with that statment. People familiar with shell globing are
>going to be confused if "riot??????????????????????" matches "riot" and
>"riotXXX".
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: wildcard search with variable length
Posted by DM Smith <dm...@gmail.com>.
John Haxby wrote:
> Doug Cutting wrote:
>
>> DM Smith wrote:
>>
>>> Personally, I don't want an either/or. I want a both/and. Modern
>>> unix shells provide both/and, albeit with different syntax.
>>>
>>> I see this more as a feature request than an argument as to the
>>> usefulness or properness of either. Both are useful. Both are
>>> proper. Both are intuitive. Both are counterintuitive. It all
>>> depends on your "tradition".
>>
>> +1
>>
>> Doug
>>
> Doesn't the RegexQuery do this for you?
>
> jch
I have not looked at it (yet). If so, that would be the "both/and".
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: wildcard search with variable length
Posted by John Haxby <jc...@scalix.com>.
Doug Cutting wrote:
> DM Smith wrote:
>
>> Personally, I don't want an either/or. I want a both/and. Modern unix
>> shells provide both/and, albeit with different syntax.
>>
>> I see this more as a feature request than an argument as to the
>> usefulness or properness of either. Both are useful. Both are proper.
>> Both are intuitive. Both are counterintuitive. It all depends on your
>> "tradition".
>
> +1
>
> Doug
>
Doesn't the RegexQuery do this for you?
jch
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: wildcard search with variable length
Posted by Doug Cutting <cu...@apache.org>.
DM Smith wrote:
> Personally, I don't want an either/or. I want a both/and. Modern unix
> shells provide both/and, albeit with different syntax.
>
> I see this more as a feature request than an argument as to the
> usefulness or properness of either. Both are useful. Both are proper.
> Both are intuitive. Both are counterintuitive. It all depends on your
> "tradition".
+1
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: wildcard search with variable length
Posted by DM Smith <dm...@gmail.com>.
Andrzej Bialecki wrote:
> Tiago Silveira wrote:
>> IMHO, using "cat cat?" or even "cat cat? cat??" is so simple that it
>> doesn't
>> justify keeping the old, undocumented, arguably incorrect behavior.
>>
>
> I have a different view on this issue - IMHO treating "?" as "exactly
> one character" is counterintuitive for people familiar with the use of
> wildcards: in all popular regular expression languages, and also in
> DTD/XML world, a single "?" metacharacter means "zero or one", which
> is probably why the original behavior was introduced (or at least it
> was more compatible with the use of "?" in other contexts).
There are two distinctly different traditions for ?, *, and +. One is
globbing (standard in UNIX shells) and the other is regular expression.
In the case of globbing ? has always stood for a single character, *
stands for one or more and + is not defined. In regular expression,
these modify the prior regular expression to mean 0 or 1; 0 or more; and
1 or more.
Lucene seems to support globbing (trailing) and not regex. To me this is
clear in the documentation.
That said, a search seems to be a kind of regex and blending these two
traditions leads to confusion. Though the first time I tried lucene to
do a search, I used these metacharacters as if they were regex modifiers
not globbing characters. (Natural behavior of a perl programmer!) It did
not work as expected. This led me to read the docs and then I understood
the errors of my ways.
Personally, I don't want an either/or. I want a both/and. Modern unix
shells provide both/and, albeit with different syntax.
I see this more as a feature request than an argument as to the
usefulness or properness of either. Both are useful. Both are proper.
Both are intuitive. Both are counterintuitive. It all depends on your
"tradition".
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: wildcard search with variable length
Posted by John Haxby <jc...@scalix.com>.
Andrzej Bialecki wrote:
> Tiago Silveira wrote:
>
>> IMHO, using "cat cat?" or even "cat cat? cat??" is so simple that it
>> doesn't
>> justify keeping the old, undocumented, arguably incorrect behavior.
>
> I have a different view on this issue - IMHO treating "?" as "exactly
> one character" is counterintuitive for people familiar with the use of
> wildcards: in all popular regular expression languages, and also in
> DTD/XML world, a single "?" metacharacter means "zero or one", which
> is probably why the original behavior was introduced (or at least it
> was more compatible with the use of "?" in other contexts).
>
Ahh. Well. If "cat?" is a regular expression then it will match "ca"
and "cat". "cat??" is probably not a valid regular expression: the
final ? means "one or zero occurances of t?" which means that it too
matches "ca" and "cat". However, the javadoc defines "?" and its
definition matches the shell glob definition and it's quite clear that
WildcardQuery is not a RegexQuery just from the docs.
I can't comment about the wildcard character a DTD/XML context, I'm not
that familiar with it.
jch
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: wildcard search with variable length
Posted by Andrzej Bialecki <ab...@getopt.org>.
Tiago Silveira wrote:
> IMHO, using "cat cat?" or even "cat cat? cat??" is so simple that it doesn't
> justify keeping the old, undocumented, arguably incorrect behavior.
>
I have a different view on this issue - IMHO treating "?" as "exactly
one character" is counterintuitive for people familiar with the use of
wildcards: in all popular regular expression languages, and also in
DTD/XML world, a single "?" metacharacter means "zero or one", which is
probably why the original behavior was introduced (or at least it was
more compatible with the use of "?" in other contexts).
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: wildcard search with variable length
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Feb 22, 2006, at 7:22 AM, John Haxby wrote:
> Tiago Silveira wrote:
>
>> IMHO, using "cat cat?" or even "cat cat? cat??" is so simple that
>> it doesn't
>> justify keeping the old, undocumented, arguably incorrect behavior.
>>
> I don't think there's any question of the old behaviour being
> incorrect -- the javadoc says that ? matches a single character,
> not zero or one characters, a single character.
>
> On the other hand, does Erik's new RegexQuery support "cat.?" (the
> ".?" does match zero or one characters)? (Where's the javadoc
> for that? I don't see any comments in the source, let alone
> anything else :-))
You dinged me fairly! Ok, tonight's TODO list, add javadocs to
RegexQuery! The RegexQuery supports _whatever_ syntax the regular
expression implementation you chose supports. It simply enumerates
terms and does the match using the expression you provided using the
desired implementation (JDK 1.4 regex, or ORO, etc).
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: wildcard search with variable length
Posted by John Haxby <jc...@scalix.com>.
Tiago Silveira wrote:
>IMHO, using "cat cat?" or even "cat cat? cat??" is so simple that it doesn't
>justify keeping the old, undocumented, arguably incorrect behavior.
>
>
I don't think there's any question of the old behaviour being incorrect
-- the javadoc says that ? matches a single character, not zero or one
characters, a single character.
On the other hand, does Erik's new RegexQuery support "cat.?" (the ".?"
does match zero or one characters)? (Where's the javadoc for that?
I don't see any comments in the source, let alone anything else :-))
jch
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org