You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2006/01/23 02:53:58 UTC

Re: Handling of colons in QueryParserTokenManager

Gwyn - this is a question for java-user@ list, I'm answering there.

Perhaps you can write your own QueryParser.jj variant and change it so it has explicit knowledge of your indef fields.  Then, if it parses "foo:...." and "foo" is not one of the fields it knows about, it could escape the column character on behalf of the user.

If you do this, and it works out, plese share a patch.  This issue was just raised on one of the projects where I'm using Lucene, and this was going to be my first way of dealing with the problem.

Otis

----- Original Message ----
From: Gwyn Carwardine <gw...@carwardine.net>
To: java-dev@lucene.apache.org
Sent: Sat 21 Jan 2006 08:10:56 AM EST
Subject: Handling of colons in QueryParserTokenManager

Hello, I'm new here. I've actually started using dotLucene but I think I
need to make a change to the QueryParser but it's so complicated to try and
understand what it's doing I thought I'd ask if maybe one of you guys could
point me in the right direction?

In my implementation of Lucene I have the need to store keywords that are of
the form "<key>:<identity>" for example CI:123. Whilst I can store this in
Lucene using Field.Keyword("ID","CI:123") I can't easily look it up by using
QueryParser which I need to do.

Whenever I parse the query ID:CI:123 it parses it as "ID:ci". Now I've
already made a small hack so that non-tokenized values are indexed as
lowercase so at least I can get them back if I use ID:CI\:123 but colons are
commonly used and I really don't want to have to escape them everywhere

What I want to achieve is that query parser will parse ID:CI:123 as
field(ID) value(CI:123). I understand that colon is a special character but
it's only used to delimit fields and values in which case it makes sense to
react to the first colon, the second colon should be treated as part of the
text which the analyzer could strip out or keep (in my case because I'm
using a custom analyzer).

Does this make sense? How do I go about changing the QueryParserTokenManager
to achieve this? Perhaps you can point me to some documentation that
describes the code even?

Any help gratefully received!

Thanks,
Gwyn Carwardine


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Handling of colons in QueryParserTokenManager

Posted by Paul Elschot <pa...@xs4all.nl>.
On Monday 23 January 2006 13:10, Gwyn Carwardine wrote:
...
> 
> And now I've been pointed to QueryParser.jj I wonder what language that is?
> And is QueryParser.javaj create from it? If so how does it get from one to
> the other?! Help! 
> 

QueryParser.java is generated from QueryParser.jj by javacc:

https://javacc.dev.java.net/

The easiest way to get from one to the other is to install javacc and
use the various javacc ant targets in lucene's build.xml.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Handling of colons in QueryParserTokenManager

Posted by Gwyn Carwardine <gw...@carwardine.net>.
Hi Otis, sorry if I posted to the wrong group. I though user was for
usage-type queries and dev was for development-type queries. As I was asking
about changing the code itself (rather than about interfacing with it) I
assumed this was a dev forum issue. I'm still a bit confused.. can you tell
me what the distinction is? I have another issue (where I've hacked the
Lucene code and I want to discuss whether it's a valid hack or not and
possibly how to do it properly) which I would like to raise and I'm confused
as to where to raise it!

Anyway, back to the matter in hand:

At the moment if I use abc:def:123 I would expect my custom analyzer to
receive field(abc) value(def:123) but it's receiving field(abc) value(def).
somewhere along the line query parse is throwing away the 123. Which I think
is the wrong behaviour... But anyway, what I don't know is where to make a
change to (for all fields) pass the second colon through to the analyzer
where the analyzer can make the decision about what to do with it. As it
would do it I entered abc:def.123 or abc:def;123 

I can override the analyzer but I can't override the query parser behaviour
very easily. I don't understand where the post-colon text is being
discarded. In QueryParser or in QueryPArserTokenManager?

And now I've been pointed to QueryParser.jj I wonder what language that is?
And is QueryParser.javaj create from it? If so how does it get from one to
the other?! Help! 

Cheers, Gwyn



-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
Sent: 23 January 2006 01:54
To: 
Cc: Gwyn Carwardine
Subject: Re: Handling of colons in QueryParserTokenManager

Gwyn - this is a question for java-user@ list, I'm answering there.

Perhaps you can write your own QueryParser.jj variant and change it so it
has explicit knowledge of your indef fields.  Then, if it parses "foo:...."
and "foo" is not one of the fields it knows about, it could escape the
column character on behalf of the user.

If you do this, and it works out, plese share a patch.  This issue was just
raised on one of the projects where I'm using Lucene, and this was going to
be my first way of dealing with the problem.

Otis

----- Original Message ----
From: Gwyn Carwardine <gw...@carwardine.net>
To: java-dev@lucene.apache.org
Sent: Sat 21 Jan 2006 08:10:56 AM EST
Subject: Handling of colons in QueryParserTokenManager

Hello, I'm new here. I've actually started using dotLucene but I think I
need to make a change to the QueryParser but it's so complicated to try and
understand what it's doing I thought I'd ask if maybe one of you guys could
point me in the right direction?

In my implementation of Lucene I have the need to store keywords that are of
the form "<key>:<identity>" for example CI:123. Whilst I can store this in
Lucene using Field.Keyword("ID","CI:123") I can't easily look it up by using
QueryParser which I need to do.

Whenever I parse the query ID:CI:123 it parses it as "ID:ci". Now I've
already made a small hack so that non-tokenized values are indexed as
lowercase so at least I can get them back if I use ID:CI\:123 but colons are
commonly used and I really don't want to have to escape them everywhere

What I want to achieve is that query parser will parse ID:CI:123 as
field(ID) value(CI:123). I understand that colon is a special character but
it's only used to delimit fields and values in which case it makes sense to
react to the first colon, the second colon should be treated as part of the
text which the analyzer could strip out or keep (in my case because I'm
using a custom analyzer).

Does this make sense? How do I go about changing the QueryParserTokenManager
to achieve this? Perhaps you can point me to some documentation that
describes the code even?

Any help gratefully received!

Thanks,
Gwyn Carwardine


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org