You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Lukas Zapletal <lz...@root.cz> on 2003/01/31 21:07:52 UTC
Problem with wildchars (bug?)
Hello all,
I have a small problem. Let`s have a word 'Microsoft' indexed in Lucene.
When I query Microsoft, it returns the document, but when I try Micro*
then nothing is found. After lowercasing the first letter to micro*
Lucene returns the document.
The same thing is with ?. When I use it, only lower-cased words are matched.
Is this a bug or Am I missing something?
ps - where can I find some information how Lucene parse the input when
using StandardFilter. I mean I don`t know what is ignored and what not.
For example acronyms (U.S.A), dates (2002-11-07 or 1. 1. 2003) etc... I
cannot find it in the documentation. In the StandardFilter API there is
onthing, it seems to be generated from JavaCC.
--
Lukas Zapletal [lzap@root.cz]
http://www.tanecni-olomouc.cz/lzap
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: Problem with wildchars (bug?)
Posted by Lukas Zapletal <lz...@root.cz>.
Sorry for all I found the solution in the FAQ:
*Are prefix queries case sensitive?*
<http://www.jguru.com/faq/view.jsp?EID=538312>
** <http://www.jguru.com/faq/subtopic.jsp?topicID=473821>Yes, unlike
other types of Lucene queries, prefix queries are case sensitive. That
is because prefix queries are not passed through the Analyzer...
Well that was a rhetorical question... :-)
ps - the FAQ should be in the releases, I hate this 'over the night'
dial-uping at home and searching on the internet... :-(
> Hello all,
>
> I have a small problem. Let`s have a word 'Microsoft' indexed in
> Lucene. When I query Microsoft, it returns the document, but when I
> try Micro* then nothing is found. After lowercasing the first letter
> to micro* Lucene returns the document.
>
> The same thing is with ?. When I use it, only lower-cased words are
> matched.
>
> Is this a bug or Am I missing something?
>
> ps - where can I find some information how Lucene parse the input when
> using StandardFilter. I mean I don`t know what is ignored and what
> not. For example acronyms (U.S.A), dates (2002-11-07 or 1. 1. 2003)
> etc... I cannot find it in the documentation. In the StandardFilter
> API there is onthing, it seems to be generated from JavaCC.
>
--
Lukas Zapletal [lzap@root.cz]
http://www.tanecni-olomouc.cz/lzap
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: Problem with wildchars (bug?)
Posted by Otis Gospodnetic <ot...@yahoo.com>.
The thread more suited for lucene-user.
I think that's just a bad work to be indexing....Microsoft...sheesh.
Ok, it's not. This is a known thing, even mentioned in the FAQ on
jGuru.
For how StandardFilter works it's best to look at the source, it's
quite simple. I think I might have mentioned that in the Lucene
article on Onjava.com as well....not 100% sure any more :)
Otis
--- Lukas Zapletal <lz...@root.cz> wrote:
> Hello all,
>
> I have a small problem. Let`s have a word 'Microsoft' indexed in
> Lucene.
> When I query Microsoft, it returns the document, but when I try
> Micro*
> then nothing is found. After lowercasing the first letter to micro*
> Lucene returns the document.
>
> The same thing is with ?. When I use it, only lower-cased words are
> matched.
>
> Is this a bug or Am I missing something?
>
> ps - where can I find some information how Lucene parse the input
> when
> using StandardFilter. I mean I don`t know what is ignored and what
> not.
> For example acronyms (U.S.A), dates (2002-11-07 or 1. 1. 2003) etc...
> I
> cannot find it in the documentation. In the StandardFilter API there
> is
> onthing, it seems to be generated from JavaCC.
>
> --
> Lukas Zapletal [lzap@root.cz]
> http://www.tanecni-olomouc.cz/lzap
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org