You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jan Agermose <ja...@agermose.dk> on 2003/07/16 19:03:51 UTC
keyword indexing
I'm having some problems with chars in keywords that are not a-z0-9 chars...
If I have a keyword like "Det Naturvidenskabelige Fakultet" or a name "Jan Agermose" - well besides the fact I need to lowercase the keywords as the querystring is lowercased by lucene, I still cannot get any hits on the keywords.
"Det Naturvidenskabelige Fakultet" - hits = 0
Det* - hits!
Det Naturvidenskabelige Fakultet - hits = 0
I can understand the last one - but shouldn't the first one return hits? If not, using keywords seems to be limited to keywords composed of [a-z0-9]+ ???
Now I do a string replace on [^a-z0-9]+ / "" (removing all the chars) but this gives the queryparse some problems I would think - unless in my special case where the user is not really free to compose queries on there own - therefore I can do the same stringreplace thing on the input :-D But I would like for the poweruser to input real queries - and this leaves me with the problem of parsing queries. I need to do stringreplace only within double quotes... This should be lucenes problem not mine :-D
Am I missing something ??
Jan Agermose
Re: keyword indexing
Posted by Jan Agermose <ja...@agermose.dk>.
So you cannot use the QueryBuilder if You are using keywords - is that it?
Jan
----- Original Message -----
From: "Aviran Mordo" <am...@infosciences.com>
To: "'Lucene Users List'" <lu...@jakarta.apache.org>
Sent: Wednesday, July 16, 2003 7:23 PM
Subject: RE: keyword indexing
> If you are searching on keyword you might need to use TermQuery in order
> to have an exact match
>
> -----Original Message-----
> From: Jan Agermose [mailto:jan@agermose.dk]
> Sent: Wednesday, July 16, 2003 1:04 PM
> To: lucene-user@jakarta.apache.org
> Subject: keyword indexing
>
>
> I'm having some problems with chars in keywords that are not a-z0-9
> chars...
>
> If I have a keyword like "Det Naturvidenskabelige Fakultet" or a name
> "Jan Agermose" - well besides the fact I need to lowercase the keywords
> as the querystring is lowercased by lucene, I still cannot get any hits
> on the keywords.
>
> "Det Naturvidenskabelige Fakultet" - hits = 0
> Det* - hits!
> Det Naturvidenskabelige Fakultet - hits = 0
>
> I can understand the last one - but shouldn't the first one return hits?
> If not, using keywords seems to be limited to keywords composed of
> [a-z0-9]+ ???
>
> Now I do a string replace on [^a-z0-9]+ / "" (removing all the chars)
> but this gives the queryparse some problems I would think - unless in my
> special case where the user is not really free to compose queries on
> there own - therefore I can do the same stringreplace thing on the input
> :-D But I would like for the poweruser to input real queries - and this
> leaves me with the problem of parsing queries. I need to do
> stringreplace only within double quotes... This should be lucenes
> problem not mine :-D
>
> Am I missing something ??
>
> Jan Agermose
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
RE: keyword indexing
Posted by Aviran Mordo <am...@infosciences.com>.
If you are searching on keyword you might need to use TermQuery in order
to have an exact match
-----Original Message-----
From: Jan Agermose [mailto:jan@agermose.dk]
Sent: Wednesday, July 16, 2003 1:04 PM
To: lucene-user@jakarta.apache.org
Subject: keyword indexing
I'm having some problems with chars in keywords that are not a-z0-9
chars...
If I have a keyword like "Det Naturvidenskabelige Fakultet" or a name
"Jan Agermose" - well besides the fact I need to lowercase the keywords
as the querystring is lowercased by lucene, I still cannot get any hits
on the keywords.
"Det Naturvidenskabelige Fakultet" - hits = 0
Det* - hits!
Det Naturvidenskabelige Fakultet - hits = 0
I can understand the last one - but shouldn't the first one return hits?
If not, using keywords seems to be limited to keywords composed of
[a-z0-9]+ ???
Now I do a string replace on [^a-z0-9]+ / "" (removing all the chars)
but this gives the queryparse some problems I would think - unless in my
special case where the user is not really free to compose queries on
there own - therefore I can do the same stringreplace thing on the input
:-D But I would like for the poweruser to input real queries - and this
leaves me with the problem of parsing queries. I need to do
stringreplace only within double quotes... This should be lucenes
problem not mine :-D
Am I missing something ??
Jan Agermose
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org