You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Tiago Silveira <ti...@commworld.de> on 2006/02/22 11:29:46 UTC

wildcard search with variable length

IMHO, using "cat cat?" or even "cat cat? cat??" is so simple that it doesn't
justify keeping the old, undocumented, arguably incorrect behavior.

Regards,
Tiago Silveira

-----Ursprüngliche Nachricht-----
Von: Terry Steichen [mailto:terry@net-frame.com] 
Gesendet: Mittwoch, 22. Februar 2006 05:10
An: java-dev@lucene.apache.org
Betreff: Re: Lucene 1.9 RC1 release available

1) Having a simple way to match singular and plural forms of a term with 
a single wildcard expression is quite useful.
2) The trailing '?' behavior has been present since that wildcard was 
first introduced.  Why not provide a flag to allow the original behavior 
to optionally be preserved?
3) The fact that virtually no one objected to the original behavior 
suggests that few if any were confused by it.

Chris Hostetter wrote:

>: In either case, what I'm arguing is that the current behavior makes more
>: sense in the real world of query expressions (that is, makes the most
>: common query expressions simpler), so why not continue it?
>
>I disagree with that statment.  People familiar with shell globing are
>going to be confused if "riot??????????????????????" matches "riot" and
>"riotXXX".
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: wildcard search with variable length

Posted by DM Smith <dm...@gmail.com>.
John Haxby wrote:
> Doug Cutting wrote:
>
>> DM Smith wrote:
>>
>>> Personally, I don't want an either/or. I want a both/and. Modern 
>>> unix shells provide both/and, albeit with different syntax.
>>>
>>> I see this more as a feature request than an argument as to the 
>>> usefulness or properness of either. Both are useful. Both are 
>>> proper. Both are intuitive. Both are counterintuitive. It all 
>>> depends on your "tradition".
>>
>> +1
>>
>> Doug
>>
> Doesn't the RegexQuery do this for you?
>
> jch

I have not looked at it (yet). If so, that would be the "both/and".

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: wildcard search with variable length

Posted by John Haxby <jc...@scalix.com>.
Doug Cutting wrote:

> DM Smith wrote:
>
>> Personally, I don't want an either/or. I want a both/and. Modern unix 
>> shells provide both/and, albeit with different syntax.
>>
>> I see this more as a feature request than an argument as to the 
>> usefulness or properness of either. Both are useful. Both are proper. 
>> Both are intuitive. Both are counterintuitive. It all depends on your 
>> "tradition".
>
> +1
>
> Doug
>
Doesn't the RegexQuery do this for you?

jch

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: wildcard search with variable length

Posted by Doug Cutting <cu...@apache.org>.
DM Smith wrote:
> Personally, I don't want an either/or. I want a both/and. Modern unix 
> shells provide both/and, albeit with different syntax.
> 
> I see this more as a feature request than an argument as to the 
> usefulness or properness of either. Both are useful. Both are proper. 
> Both are intuitive. Both are counterintuitive. It all depends on your 
> "tradition".

+1

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: wildcard search with variable length

Posted by DM Smith <dm...@gmail.com>.
Andrzej Bialecki wrote:
> Tiago Silveira wrote:
>> IMHO, using "cat cat?" or even "cat cat? cat??" is so simple that it 
>> doesn't
>> justify keeping the old, undocumented, arguably incorrect behavior.
>>   
>
> I have a different view on this issue - IMHO treating "?" as "exactly 
> one character" is counterintuitive for people familiar with the use of 
> wildcards: in all popular regular expression languages, and also in 
> DTD/XML world, a single "?" metacharacter means "zero or one", which 
> is probably why the original behavior was introduced (or at least it 
> was more compatible with the use of "?" in other contexts).

There are two distinctly different traditions for ?, *, and +. One is 
globbing (standard in UNIX shells) and the other is regular expression. 
In the case of globbing ? has always stood for a single character, * 
stands for one or more and + is not defined. In regular expression, 
these modify the prior regular expression to mean 0 or 1; 0 or more; and 
1 or more.

Lucene seems to support globbing (trailing) and not regex. To me this is 
clear in the documentation.

That said, a search seems to be a kind of regex and blending these two 
traditions leads to confusion. Though the first time I tried lucene to 
do a search, I used these metacharacters as if they were regex modifiers 
not globbing characters. (Natural behavior of a perl programmer!) It did 
not work as expected. This led me to read the docs and then I understood 
the errors of my ways.

Personally, I don't want an either/or. I want a both/and. Modern unix 
shells provide both/and, albeit with different syntax.

I see this more as a feature request than an argument as to the 
usefulness or properness of either. Both are useful. Both are proper. 
Both are intuitive. Both are counterintuitive. It all depends on your 
"tradition".



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: wildcard search with variable length

Posted by John Haxby <jc...@scalix.com>.
Andrzej Bialecki wrote:

> Tiago Silveira wrote:
>
>> IMHO, using "cat cat?" or even "cat cat? cat??" is so simple that it 
>> doesn't
>> justify keeping the old, undocumented, arguably incorrect behavior.
>
> I have a different view on this issue - IMHO treating "?" as "exactly 
> one character" is counterintuitive for people familiar with the use of 
> wildcards: in all popular regular expression languages, and also in 
> DTD/XML world, a single "?" metacharacter means "zero or one", which 
> is probably why the original behavior was introduced (or at least it 
> was more compatible with the use of "?" in other contexts).
>
Ahh.   Well.   If "cat?" is a regular expression then it will match "ca" 
and "cat".   "cat??" is probably not a valid regular expression: the 
final ? means "one or zero occurances of t?" which means that it too 
matches "ca" and "cat".   However, the javadoc defines "?" and its 
definition matches the shell glob definition and it's quite clear that 
WildcardQuery is not a RegexQuery just from the docs.

I can't comment about the wildcard character a DTD/XML context, I'm not 
that familiar with it.

jch


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: wildcard search with variable length

Posted by Andrzej Bialecki <ab...@getopt.org>.
Tiago Silveira wrote:
> IMHO, using "cat cat?" or even "cat cat? cat??" is so simple that it doesn't
> justify keeping the old, undocumented, arguably incorrect behavior.
>   

I have a different view on this issue - IMHO treating "?" as "exactly 
one character" is counterintuitive for people familiar with the use of 
wildcards: in all popular regular expression languages, and also in 
DTD/XML world, a single "?" metacharacter means "zero or one", which is 
probably why the original behavior was introduced (or at least it was 
more compatible with the use of "?" in other contexts).

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: wildcard search with variable length

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Feb 22, 2006, at 7:22 AM, John Haxby wrote:
> Tiago Silveira wrote:
>
>> IMHO, using "cat cat?" or even "cat cat? cat??" is so simple that  
>> it doesn't
>> justify keeping the old, undocumented, arguably incorrect behavior.
>>
> I don't think there's any question of the old behaviour being  
> incorrect -- the javadoc says that ? matches a single character,  
> not zero or one characters, a single character.
>
> On the other hand, does Erik's new RegexQuery support "cat.?" (the  
> ".?" does match zero or one characters)?    (Where's the javadoc  
> for that?   I don't see any comments in the source, let alone  
> anything else :-))

You dinged me fairly!   Ok, tonight's TODO list, add javadocs to  
RegexQuery!   The RegexQuery supports _whatever_ syntax the regular  
expression implementation you chose supports.  It simply enumerates  
terms and does the match using the expression you provided using the  
desired implementation (JDK 1.4 regex, or ORO, etc).

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: wildcard search with variable length

Posted by John Haxby <jc...@scalix.com>.
Tiago Silveira wrote:

>IMHO, using "cat cat?" or even "cat cat? cat??" is so simple that it doesn't
>justify keeping the old, undocumented, arguably incorrect behavior.
>  
>
I don't think there's any question of the old behaviour being incorrect 
-- the javadoc says that ? matches a single character, not zero or one 
characters, a single character.

On the other hand, does Erik's new RegexQuery support "cat.?" (the ".?" 
does match zero or one characters)?    (Where's the javadoc for that?   
I don't see any comments in the source, let alone anything else :-))

jch

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org