You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Morus Walter <mo...@tanto-xipolis.de> on 2003/03/25 12:03:18 UTC
Wildcard Queries
Hi,
is it intentional that '?' matches exactly one character within
wildcard terms but one or zero characters at the end of wildcard terms?
That is:
r?? matches r ra rab ...
whereas
r?b matches rab rbb ... and not rb
The AFAIK common definition of '*' and '?' (e.g. in unix glob pattern) is
to match exactly one character for '?' independent of it's position.
I think lucenes behavior comes from WildcarTermEnum.java line 157
where WILDCARD_CHAR and WILDCARD_STRING are ignored at the end of the
pattern, if the strings matched so far.
greetings
Morus
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Wildcard Queries
Posted by Otis Gospodnetic <ot...@yahoo.com>.
No, I don't think this is intentional. Sounds like a bug. We should
probably add a unit test for WildcardTermEnum that shows this bug and
then fix it to get the test to pass.
Otis
--- Morus Walter <mo...@tanto-xipolis.de> wrote:
> Hi,
>
> is it intentional that '?' matches exactly one character within
> wildcard terms but one or zero characters at the end of wildcard
> terms?
>
> That is:
> r?? matches r ra rab ...
> whereas
> r?b matches rab rbb ... and not rb
>
> The AFAIK common definition of '*' and '?' (e.g. in unix glob
> pattern) is
> to match exactly one character for '?' independent of it's position.
>
> I think lucenes behavior comes from WildcarTermEnum.java line 157
> where WILDCARD_CHAR and WILDCARD_STRING are ignored at the end of the
> pattern, if the strings matched so far.
>
> greetings
> Morus
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
__________________________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo
http://search.yahoo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
RE: Tokenize negative number
Posted by Otis Gospodnetic <ot...@yahoo.com>.
You are using an Analyzer that throws out non-alphanumeric characters,
StandardAnalyzer most likely.
You can create your own Analyzer to do exactly what you want. A sample
Analyzer is in the Lucene FAQ at http://jguru.com/ .
Otis
--- Lixin Meng <li...@fulldegree.com> wrote:
> Browsing through some of previous discussion, but I have to say that
> I
> couldn't find a solution for this. Would you mind provide more clue
> on this?
>
> Regards,
> Lixin
>
> -----Original Message-----
> From: Terry Steichen [mailto:terry@net-frame.com]
> Sent: Tuesday, March 25, 2003 7:14 PM
> To: Lucene Users List; lixin@fulldegree.com
> Subject: Re: Tokenize negative number
>
>
> Probably tokenized 1234 as a string and treated '-' as a separator.
> See
> previous discussion on "query".
>
> Regards,
>
> Terry
>
> ----- Original Message -----
> From: "Lixin Meng" <li...@fulldegree.com>
> To: "'Lucene Users List'" <lu...@jakarta.apache.org>
> Sent: Tuesday, March 25, 2003 9:16 PM
> Subject: Tokenize negative number
>
>
> > I have a document with content '.... -1234 ....'. However, after
> calling
> the
> > StandardTokenizer, the token only has '1234' (missed the '-') as
> tokeText.
> >
> > Did anyone experience the similar problem and is there a work
> around?
> >
> > Regards,
> > Lixin
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
__________________________________________________
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
http://platinum.yahoo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
RE: Tokenize negative number
Posted by Lixin Meng <li...@fulldegree.com>.
Browsing through some of previous discussion, but I have to say that I
couldn't find a solution for this. Would you mind provide more clue on this?
Regards,
Lixin
-----Original Message-----
From: Terry Steichen [mailto:terry@net-frame.com]
Sent: Tuesday, March 25, 2003 7:14 PM
To: Lucene Users List; lixin@fulldegree.com
Subject: Re: Tokenize negative number
Probably tokenized 1234 as a string and treated '-' as a separator. See
previous discussion on "query".
Regards,
Terry
----- Original Message -----
From: "Lixin Meng" <li...@fulldegree.com>
To: "'Lucene Users List'" <lu...@jakarta.apache.org>
Sent: Tuesday, March 25, 2003 9:16 PM
Subject: Tokenize negative number
> I have a document with content '.... -1234 ....'. However, after calling
the
> StandardTokenizer, the token only has '1234' (missed the '-') as tokeText.
>
> Did anyone experience the similar problem and is there a work around?
>
> Regards,
> Lixin
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Tokenize negative number
Posted by Terry Steichen <te...@net-frame.com>.
Probably tokenized 1234 as a string and treated '-' as a separator. See
previous discussion on "query".
Regards,
Terry
----- Original Message -----
From: "Lixin Meng" <li...@fulldegree.com>
To: "'Lucene Users List'" <lu...@jakarta.apache.org>
Sent: Tuesday, March 25, 2003 9:16 PM
Subject: Tokenize negative number
> I have a document with content '.... -1234 ....'. However, after calling
the
> StandardTokenizer, the token only has '1234' (missed the '-') as tokeText.
>
> Did anyone experience the similar problem and is there a work around?
>
> Regards,
> Lixin
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Tokenize negative number
Posted by Lixin Meng <li...@fulldegree.com>.
I have a document with content '.... -1234 ....'. However, after calling the
StandardTokenizer, the token only has '1234' (missed the '-') as tokeText.
Did anyone experience the similar problem and is there a work around?
Regards,
Lixin
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org