You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jan Agermose <ja...@agermose.dk> on 2003/07/16 19:03:51 UTC

keyword indexing

I'm having some problems with chars in keywords that are not a-z0-9 chars...

If I have a keyword like "Det Naturvidenskabelige Fakultet" or a name "Jan Agermose" - well besides the fact I need to lowercase the keywords as the querystring is lowercased by lucene, I still cannot get any hits on the keywords. 

"Det Naturvidenskabelige Fakultet" - hits = 0
Det* - hits!
Det Naturvidenskabelige Fakultet - hits = 0

I can understand the last one - but shouldn't the first one return hits? If not, using keywords seems to be limited to keywords composed of [a-z0-9]+ ??? 

Now I do a string replace on [^a-z0-9]+ / "" (removing all the chars) but this gives the queryparse some problems I would think - unless in my special case where the user is not really free to compose queries on there own - therefore I can do the same stringreplace thing on the input :-D But I would like for the poweruser to input real queries - and this leaves me with the problem of parsing queries. I need to do stringreplace only within double quotes... This should be lucenes problem not mine :-D

Am I missing something ??

Jan Agermose

Re: keyword indexing

Posted by Jan Agermose <ja...@agermose.dk>.
So you cannot use the QueryBuilder if You are using keywords - is that it? 

Jan


----- Original Message ----- 
From: "Aviran Mordo" <am...@infosciences.com>
To: "'Lucene Users List'" <lu...@jakarta.apache.org>
Sent: Wednesday, July 16, 2003 7:23 PM
Subject: RE: keyword indexing


> If you are searching on keyword you might need to use TermQuery in order
> to have an exact match
> 
> -----Original Message-----
> From: Jan Agermose [mailto:jan@agermose.dk] 
> Sent: Wednesday, July 16, 2003 1:04 PM
> To: lucene-user@jakarta.apache.org
> Subject: keyword indexing
> 
> 
> I'm having some problems with chars in keywords that are not a-z0-9
> chars...
> 
> If I have a keyword like "Det Naturvidenskabelige Fakultet" or a name
> "Jan Agermose" - well besides the fact I need to lowercase the keywords
> as the querystring is lowercased by lucene, I still cannot get any hits
> on the keywords. 
> 
> "Det Naturvidenskabelige Fakultet" - hits = 0
> Det* - hits!
> Det Naturvidenskabelige Fakultet - hits = 0
> 
> I can understand the last one - but shouldn't the first one return hits?
> If not, using keywords seems to be limited to keywords composed of
> [a-z0-9]+ ??? 
> 
> Now I do a string replace on [^a-z0-9]+ / "" (removing all the chars)
> but this gives the queryparse some problems I would think - unless in my
> special case where the user is not really free to compose queries on
> there own - therefore I can do the same stringreplace thing on the input
> :-D But I would like for the poweruser to input real queries - and this
> leaves me with the problem of parsing queries. I need to do
> stringreplace only within double quotes... This should be lucenes
> problem not mine :-D
> 
> Am I missing something ??
> 
> Jan Agermose
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: keyword indexing

Posted by Aviran Mordo <am...@infosciences.com>.
If you are searching on keyword you might need to use TermQuery in order
to have an exact match

-----Original Message-----
From: Jan Agermose [mailto:jan@agermose.dk] 
Sent: Wednesday, July 16, 2003 1:04 PM
To: lucene-user@jakarta.apache.org
Subject: keyword indexing


I'm having some problems with chars in keywords that are not a-z0-9
chars...

If I have a keyword like "Det Naturvidenskabelige Fakultet" or a name
"Jan Agermose" - well besides the fact I need to lowercase the keywords
as the querystring is lowercased by lucene, I still cannot get any hits
on the keywords. 

"Det Naturvidenskabelige Fakultet" - hits = 0
Det* - hits!
Det Naturvidenskabelige Fakultet - hits = 0

I can understand the last one - but shouldn't the first one return hits?
If not, using keywords seems to be limited to keywords composed of
[a-z0-9]+ ??? 

Now I do a string replace on [^a-z0-9]+ / "" (removing all the chars)
but this gives the queryparse some problems I would think - unless in my
special case where the user is not really free to compose queries on
there own - therefore I can do the same stringreplace thing on the input
:-D But I would like for the poweruser to input real queries - and this
leaves me with the problem of parsing queries. I need to do
stringreplace only within double quotes... This should be lucenes
problem not mine :-D

Am I missing something ??

Jan Agermose



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org