You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Aruna Raghavan <Ar...@opin.com> on 2001/12/22 00:05:11 UTC

Limit on number of characters before wildcard?

Hi,
>From some testing that I have done it appears that there is a limit of 3
characters before the wild card for wildcard queries. In other words, if the
word is dogleash and I looking by using do* it returns wrong results
(usually only a asubset) where as if I use dog*, I get correct results.

Also, wildcard at the begining of the keyword does not seem to be supported.
(*ogleash)
Can some one confirm this? Is this documented anywhere?

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: Limit on number of characters before wildcard?

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Just so that nobody is confused in the future, PrefixQuery that Dave is
mentioning is actually a query that lets you make searches such as
'Consult*'.

See http://jguru.com/faq/view.jsp?EID=480194

Otis

--- Dave Kor <da...@nexusedge.com> wrote:
> First character asterisk (eg, *ogleash) is performed by PrefixQuery,
> which
> executes much faster than WildcardQuery.
> 
> 
> Dave Kor Kian Wei
> Consultant
> Product Engineering
> NexusEdge Technologies Pte. Ltd.
> 6 Aljunied Ave 3, #01-02 (Level 4)
> Singapore 389932
> Tel : (+65)848-2552
> Fax : (+65)747-4536
> Web : www.nexusedge.com
> 
> > -----Original Message-----
> > From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> > Sent: Friday, January 11, 2002 11:40 AM
> > To: Lucene Users List
> > Subject: Re: Limit on number of characters before wildcard?
> >
> >
> > Hello,
> >
> > I haven't tested this like you did, but from looking at the query
> > parser (QueryParser.jj file in the Lucene distribution)
> > it seems that only a single character is required before '*' or
> '?':
> >
> > ...
> > | <WILDTERM:  <_TERM_START_CHAR>
> >               (<_TERM_CHAR> | ( [ "*", "?" ] ))* >
> > ...
> >
> > _TERM_START_CHAR is defined as:
> > [ "a"-"z", "A"-"Z", "_", "\u0080"-"\uFFFE" ]
> >
> > and as you can see from the first definition above this character
> can
> > be followed by either zero or more _TERM_CHAR or "*" or "?".
> >
> > This also answers your question about using an asterisk as the very
> > first character in the query.
> >
> > It would be great if Doug or Brian Goetz could confirm or dispute
> this,
> > so that I can add it to the Lucene FAQ at jGuru.com.
> >
> > Otis
> >
> >
> >
> >
> >
> > --- Aruna Raghavan <Ar...@opin.com> wrote:
> > > Hi,
> > > From some testing that I have done it appears that there is a
> limit
> > > of 3
> > > characters before the wild card for wildcard queries. In other
> > words,
> > > if the
> > > word is dogleash and I looking by using do* it returns wrong
> > results
> > > (usually only a asubset) where as if I use dog*, I get correct
> > > results.
> > >
> > > Also, wildcard at the begining of the keyword does not seem to be
> > > supported.
> > > (*ogleash)
> > > Can some one confirm this? Is this documented anywhere?
> > >
> > > --
> > > To unsubscribe, e-mail:
> > > <ma...@jakarta.apache.org>
> > > For additional commands, e-mail:
> > > <ma...@jakarta.apache.org>
> > >
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Send FREE video emails in Yahoo! Mail!
> > http://promo.yahoo.com/videomail/
> >
> > --
> > To unsubscribe, e-mail:
> > <ma...@jakarta.apache.org>
> > For additional commands, e-mail:
> > <ma...@jakarta.apache.org>
> >
> >
> >
> 
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 


__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: Limit on number of characters before wildcard?

Posted by Dave Kor <da...@nexusedge.com>.
First character asterisk (eg, *ogleash) is performed by PrefixQuery, which
executes much faster than WildcardQuery.


Dave Kor Kian Wei
Consultant
Product Engineering
NexusEdge Technologies Pte. Ltd.
6 Aljunied Ave 3, #01-02 (Level 4)
Singapore 389932
Tel : (+65)848-2552
Fax : (+65)747-4536
Web : www.nexusedge.com

> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Friday, January 11, 2002 11:40 AM
> To: Lucene Users List
> Subject: Re: Limit on number of characters before wildcard?
>
>
> Hello,
>
> I haven't tested this like you did, but from looking at the query
> parser (QueryParser.jj file in the Lucene distribution)
> it seems that only a single character is required before '*' or '?':
>
> ...
> | <WILDTERM:  <_TERM_START_CHAR>
>               (<_TERM_CHAR> | ( [ "*", "?" ] ))* >
> ...
>
> _TERM_START_CHAR is defined as:
> [ "a"-"z", "A"-"Z", "_", "\u0080"-"\uFFFE" ]
>
> and as you can see from the first definition above this character can
> be followed by either zero or more _TERM_CHAR or "*" or "?".
>
> This also answers your question about using an asterisk as the very
> first character in the query.
>
> It would be great if Doug or Brian Goetz could confirm or dispute this,
> so that I can add it to the Lucene FAQ at jGuru.com.
>
> Otis
>
>
>
>
>
> --- Aruna Raghavan <Ar...@opin.com> wrote:
> > Hi,
> > From some testing that I have done it appears that there is a limit
> > of 3
> > characters before the wild card for wildcard queries. In other
> words,
> > if the
> > word is dogleash and I looking by using do* it returns wrong
> results
> > (usually only a asubset) where as if I use dog*, I get correct
> > results.
> >
> > Also, wildcard at the begining of the keyword does not seem to be
> > supported.
> > (*ogleash)
> > Can some one confirm this? Is this documented anywhere?
> >
> > --
> > To unsubscribe, e-mail:
> > <ma...@jakarta.apache.org>
> > For additional commands, e-mail:
> > <ma...@jakarta.apache.org>
> >
>
>
> __________________________________________________
> Do You Yahoo!?
> Send FREE video emails in Yahoo! Mail!
> http://promo.yahoo.com/videomail/
>
> --
> To unsubscribe, e-mail:
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Limit on number of characters before wildcard?

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hello,

I haven't tested this like you did, but from looking at the query
parser (QueryParser.jj file in the Lucene distribution)
it seems that only a single character is required before '*' or '?':

...
| <WILDTERM:  <_TERM_START_CHAR> 
              (<_TERM_CHAR> | ( [ "*", "?" ] ))* >
...

_TERM_START_CHAR is defined as:
[ "a"-"z", "A"-"Z", "_", "\u0080"-"\uFFFE" ]

and as you can see from the first definition above this character can
be followed by either zero or more _TERM_CHAR or "*" or "?".

This also answers your question about using an asterisk as the very
first character in the query.

It would be great if Doug or Brian Goetz could confirm or dispute this,
so that I can add it to the Lucene FAQ at jGuru.com.

Otis





--- Aruna Raghavan <Ar...@opin.com> wrote:
> Hi,
> From some testing that I have done it appears that there is a limit
> of 3
> characters before the wild card for wildcard queries. In other
words,
> if the
> word is dogleash and I looking by using do* it returns wrong
results
> (usually only a asubset) where as if I use dog*, I get correct
> results.
> 
> Also, wildcard at the begining of the keyword does not seem to be
> supported.
> (*ogleash)
> Can some one confirm this? Is this documented anywhere?
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 


__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


case sensitive boolean keywords

Posted by ca...@bookandhammer.com.
Hi,

Is there any reason why the boolean terms (AND, OR and NOT) are case 
sensitive?
For example
Query 1: "test and process"
	removes "and" as a stop word and does an or search (198 results in 
my test case)

Query 2: "test AND process"
	performs a boolean search with AND (5 results in my test case).

If I change the queryParser to include the lower case "and", "or" and 
"not" will that cause other problems?

Thanks

--Peter


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>