You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Lukas Zapletal <lz...@root.cz> on 2003/01/31 21:07:52 UTC

Problem with wildchars (bug?)

Hello all,

I have a small problem. Let`s have a word 'Microsoft' indexed in Lucene. 
When I query Microsoft, it returns the document, but when I try Micro* 
then nothing is found. After lowercasing the first letter to micro* 
Lucene returns the document.

The same thing is with ?. When I use it, only lower-cased words are matched.

Is this a bug or Am I missing something?

ps - where can I find some information how Lucene parse the input when 
using StandardFilter. I mean I don`t know what is ignored and what not. 
For example acronyms (U.S.A), dates (2002-11-07 or 1. 1. 2003) etc... I 
cannot find it in the documentation. In the StandardFilter API there is 
onthing, it seems to be generated from JavaCC.

-- 
Lukas Zapletal      [lzap@root.cz]
http://www.tanecni-olomouc.cz/lzap




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Problem with wildchars (bug?)

Posted by Lukas Zapletal <lz...@root.cz>.
Sorry for all I found the solution in the FAQ:

*Are prefix queries case sensitive?* 
<http://www.jguru.com/faq/view.jsp?EID=538312>
** <http://www.jguru.com/faq/subtopic.jsp?topicID=473821>Yes, unlike 
other types of Lucene queries, prefix queries are case sensitive. That 
is because prefix queries are not passed through the Analyzer...

Well that was a rhetorical question... :-)

ps - the FAQ should be in the releases, I hate this 'over the night' 
dial-uping at home and searching on the internet... :-(

> Hello all,
>
> I have a small problem. Let`s have a word 'Microsoft' indexed in 
> Lucene. When I query Microsoft, it returns the document, but when I 
> try Micro* then nothing is found. After lowercasing the first letter 
> to micro* Lucene returns the document.
>
> The same thing is with ?. When I use it, only lower-cased words are 
> matched.
>
> Is this a bug or Am I missing something?
>
> ps - where can I find some information how Lucene parse the input when 
> using StandardFilter. I mean I don`t know what is ignored and what 
> not. For example acronyms (U.S.A), dates (2002-11-07 or 1. 1. 2003) 
> etc... I cannot find it in the documentation. In the StandardFilter 
> API there is onthing, it seems to be generated from JavaCC.
>


-- 
Lukas Zapletal      [lzap@root.cz]
http://www.tanecni-olomouc.cz/lzap




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: Problem with wildchars (bug?)

Posted by Otis Gospodnetic <ot...@yahoo.com>.
The thread more suited for lucene-user.

I think that's just a bad work to be indexing....Microsoft...sheesh.
Ok, it's not.  This is a known thing, even mentioned in the FAQ on
jGuru.
For how StandardFilter works it's best to look at the source, it's
quite simple.  I think I might have mentioned that in the Lucene
article on Onjava.com as well....not 100% sure any more :)

Otis


--- Lukas Zapletal <lz...@root.cz> wrote:
> Hello all,
> 
> I have a small problem. Let`s have a word 'Microsoft' indexed in
> Lucene. 
> When I query Microsoft, it returns the document, but when I try
> Micro* 
> then nothing is found. After lowercasing the first letter to micro* 
> Lucene returns the document.
> 
> The same thing is with ?. When I use it, only lower-cased words are
> matched.
> 
> Is this a bug or Am I missing something?
> 
> ps - where can I find some information how Lucene parse the input
> when 
> using StandardFilter. I mean I don`t know what is ignored and what
> not. 
> For example acronyms (U.S.A), dates (2002-11-07 or 1. 1. 2003) etc...
> I 
> cannot find it in the documentation. In the StandardFilter API there
> is 
> onthing, it seems to be generated from JavaCC.
> 
> -- 
> Lukas Zapletal      [lzap@root.cz]
> http://www.tanecni-olomouc.cz/lzap
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org