You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Morus Walter <mo...@tanto-xipolis.de> on 2003/03/25 12:03:18 UTC

Wildcard Queries

Hi,

is it intentional that '?' matches exactly one character within
wildcard terms but one or zero characters at the end of wildcard terms?

That is:
r?? matches r ra rab ...
whereas
r?b matches rab rbb ... and not rb

The AFAIK common definition of '*' and '?' (e.g. in unix glob pattern) is
to match exactly one character for '?' independent of it's position.

I think lucenes behavior comes from WildcarTermEnum.java line 157
where WILDCARD_CHAR and WILDCARD_STRING are ignored at the end of the
pattern, if the strings matched so far.

greetings
	Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Wildcard Queries

Posted by Otis Gospodnetic <ot...@yahoo.com>.

No, I don't think this is intentional.  Sounds like a bug.  We should
probably add a unit test for WildcardTermEnum that shows this bug and
then fix it to get the test to pass.

Otis

--- Morus Walter <mo...@tanto-xipolis.de> wrote:
> Hi,
> 
> is it intentional that '?' matches exactly one character within
> wildcard terms but one or zero characters at the end of wildcard
> terms?
> 
> That is:
> r?? matches r ra rab ...
> whereas
> r?b matches rab rbb ... and not rb
> 
> The AFAIK common definition of '*' and '?' (e.g. in unix glob
> pattern) is
> to match exactly one character for '?' independent of it's position.
> 
> I think lucenes behavior comes from WildcarTermEnum.java line 157
> where WILDCARD_CHAR and WILDCARD_STRING are ignored at the end of the
> pattern, if the strings matched so far.
> 
> greetings
> 	Morus
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


__________________________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo
http://search.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: Tokenize negative number

Posted by Otis Gospodnetic <ot...@yahoo.com>.

You are using an Analyzer that throws out non-alphanumeric characters,
StandardAnalyzer most likely.
You can create your own Analyzer to do exactly what you want.  A sample
Analyzer is in the Lucene FAQ at http://jguru.com/ .

Otis



--- Lixin Meng <li...@fulldegree.com> wrote:
> Browsing through some of previous discussion, but I have to say that
> I
> couldn't find a solution for this. Would you mind provide more clue
> on this?
> 
> Regards,
> Lixin
> 
> -----Original Message-----
> From: Terry Steichen [mailto:terry@net-frame.com]
> Sent: Tuesday, March 25, 2003 7:14 PM
> To: Lucene Users List; lixin@fulldegree.com
> Subject: Re: Tokenize negative number
> 
> 
> Probably tokenized 1234 as a string and treated '-' as a separator. 
> See
> previous discussion on "query".
> 
> Regards,
> 
> Terry
> 
> ----- Original Message -----
> From: "Lixin Meng" <li...@fulldegree.com>
> To: "'Lucene Users List'" <lu...@jakarta.apache.org>
> Sent: Tuesday, March 25, 2003 9:16 PM
> Subject: Tokenize negative number
> 
> 
> > I have a document with content '.... -1234 ....'. However, after
> calling
> the
> > StandardTokenizer, the token only has '1234' (missed the '-') as
> tokeText.
> >
> > Did anyone experience the similar problem and is there a work
> around?
> >
> > Regards,
> > Lixin
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> >
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
http://platinum.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

RE: Tokenize negative number

Posted by Lixin Meng <li...@fulldegree.com>.

Browsing through some of previous discussion, but I have to say that I
couldn't find a solution for this. Would you mind provide more clue on this?

Regards,
Lixin

-----Original Message-----
From: Terry Steichen [mailto:terry@net-frame.com]
Sent: Tuesday, March 25, 2003 7:14 PM
To: Lucene Users List; lixin@fulldegree.com
Subject: Re: Tokenize negative number


Probably tokenized 1234 as a string and treated '-' as a separator.  See
previous discussion on "query".

Regards,

Terry

----- Original Message -----
From: "Lixin Meng" <li...@fulldegree.com>
To: "'Lucene Users List'" <lu...@jakarta.apache.org>
Sent: Tuesday, March 25, 2003 9:16 PM
Subject: Tokenize negative number


> I have a document with content '.... -1234 ....'. However, after calling
the
> StandardTokenizer, the token only has '1234' (missed the '-') as tokeText.
>
> Did anyone experience the similar problem and is there a work around?
>
> Regards,
> Lixin
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Tokenize negative number

Posted by Terry Steichen <te...@net-frame.com>.

Probably tokenized 1234 as a string and treated '-' as a separator.  See
previous discussion on "query".

Regards,

Terry

----- Original Message -----
From: "Lixin Meng" <li...@fulldegree.com>
To: "'Lucene Users List'" <lu...@jakarta.apache.org>
Sent: Tuesday, March 25, 2003 9:16 PM
Subject: Tokenize negative number


> I have a document with content '.... -1234 ....'. However, after calling
the
> StandardTokenizer, the token only has '1234' (missed the '-') as tokeText.
>
> Did anyone experience the similar problem and is there a work around?
>
> Regards,
> Lixin
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Tokenize negative number

Posted by Lixin Meng <li...@fulldegree.com>.

I have a document with content '.... -1234 ....'. However, after calling the
StandardTokenizer, the token only has '1234' (missed the '-') as tokeText.

Did anyone experience the similar problem and is there a work around?

Regards,
Lixin


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org