You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Antony Bowesman <ad...@teamware.com> on 2006/11/21 23:26:09 UTC
Limiting QueryParser
Hi,
I have a search UI that allows search criteria to be input against specific
fields, e.g. Subject.
In order to create a suitable Lucene Query, I must analyze that String so that
it becomes a set of Tokens which I can then turn into Terms. QueryParser seems
to fit the bill for that, however, it is too clever as it assumes that anything
suffixes with a : is a field reference.
If someone enters
important:conference agenda
in the subject field, I don't want QP to translate this to
+important:conference +defaultfield:agenda
I want to end up with
+subject:important +subject:conference +subject:agenda
I've written something to do this, but I know it is not as clever as QP as
currently it can only create BooleanQueries with TermQueries and cannot handle
PhraseQuery, so would not handle
important:"conference agenda"
correctly. Does anyone have any pointers on how to limit QueryParser so that I
can force it to treat what it thinks as fields as terms.
Thanks
Antony
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Limiting QueryParser
Posted by Antony Bowesman <ad...@teamware.com>.
Erik Hatcher wrote:
>
> It doesn't seem like you need a "parser" at all for your field-specific
> search fields. Simply tokenize, using a Lucene Analyzer, the text field
> and build up a BooleanQuery of all the tokens.
That's what I'm currently doing, but I was getting bogged down with trying to
support PhraseQueries in case of quoted input. My knowledge of JavaCC is
limited, so I was trying to weigh up the effor of rolling my own or adapting QP.
> QueryParser is over prescribed - and is often not the best fit for the
> job. It's only a few lines of code to tokenize (look at the QueryParser
> code in how it creates a PhraseQuery, for example) and build a Query
> from the tokens.
>
> If you need to support +/-/AND/OR syntax in your field-specific inputs
> that's a different story - though your example does not show this need.
> If so, copying the QueryParser.jj file and removing the "field:" syntax
> and fixing all created Query objects to a specified single field might
> be the trick.
I do have other levels where +- can be used, but that's the easy bit and I'm
just constructing an outer BooleanQuery. Anyway, I'll have a go with QP.jj and
see where I get. BTW, LIA is an excellent book!
Thanks everyone for your comments.
Antony
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Limiting QueryParser
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Nov 21, 2006, at 10:34 PM, Antony Bowesman wrote:
> On the field specific fields, I want to control the parsing to
> ensure that the parser will not interpret fields in the user
> entered string, so in those fields it treats : as : all of the
> time. However, in the "free form" field, anything goes and : is a
> field delimeter all of the time. So, a user can seach for
>
> Subject - important:conference agenda
> FieldA - blah:abc
> FreeForm - fieldX:Xdata fieldY:Ydata
>
> in the above, the Subject and Field A would have been indexed using
> configurable analysers and would have indexed "important" and
> "blah", so these are relevant to the search.
>
> This should come out as a
>
> (+subject:important +subject:conference +subject:agenda)
> (+fielda:blah +fielda:abc) (+fieldx:xdata +fieldy:ydata)
>
> My framework allows for field specific parsers as well as field
> specific analysers, so having a different query parser for the
> named fields and the free form field is fine.
It doesn't seem like you need a "parser" at all for your field-
specific search fields. Simply tokenize, using a Lucene Analyzer,
the text field and build up a BooleanQuery of all the tokens.
QueryParser is over prescribed - and is often not the best fit for
the job. It's only a few lines of code to tokenize (look at the
QueryParser code in how it creates a PhraseQuery, for example) and
build a Query from the tokens.
If you need to support +/-/AND/OR syntax in your field-specific
inputs that's a different story - though your example does not show
this need. If so, copying the QueryParser.jj file and removing the
"field:" syntax and fixing all created Query objects to a specified
single field might be the trick.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Limiting QueryParser
Posted by Antony Bowesman <ad...@teamware.com>.
Chris Hostetter wrote:
> : important:conference agenda
>
> : I want to end up with
> :
> : +subject:important +subject:conference +subject:agenda
> :
> : I've written something to do this, but I know it is not as clever as QP as
> : currently it can only create BooleanQueries with TermQueries and cannot handle
> : PhraseQuery, so would not handle
> :
> : important:"conference agenda"
>
> you're on a slippery slope .. basically what you are ssaying is you want
> ":" to be treated as a field seperater *some* ofthe time, and as a
> whitespace hcaracter the rest of the time ... but it's not really clear if
> the rules you want to use are deterministic -- can you describe what you
> want to do to an *arbitrary* input string?
I have a number of UI search fields that are field specific, e.g. Subject,
Sender, amongst others as well as a "free form" google style field where
anything can be entered.
On the field specific fields, I want to control the parsing to ensure that the
parser will not interpret fields in the user entered string, so in those fields
it treats : as : all of the time. However, in the "free form" field, anything
goes and : is a field delimeter all of the time. So, a user can seach for
Subject - important:conference agenda
FieldA - blah:abc
FreeForm - fieldX:Xdata fieldY:Ydata
in the above, the Subject and Field A would have been indexed using configurable
analysers and would have indexed "important" and "blah", so these are relevant
to the search.
This should come out as a
(+subject:important +subject:conference +subject:agenda) (+fielda:blah
+fielda:abc) (+fieldx:xdata +fieldy:ydata)
My framework allows for field specific parsers as well as field specific
analysers, so having a different query parser for the named fields and the free
form field is fine.
Antony
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Limiting QueryParser
Posted by Chris Hostetter <ho...@fucit.org>.
: important:conference agenda
: I want to end up with
:
: +subject:important +subject:conference +subject:agenda
:
: I've written something to do this, but I know it is not as clever as QP as
: currently it can only create BooleanQueries with TermQueries and cannot handle
: PhraseQuery, so would not handle
:
: important:"conference agenda"
you're on a slippery slope .. basically what you are ssaying is you want
":" to be treated as a field seperater *some* ofthe time, and as a
whitespace hcaracter the rest of the time ... but it's not really clear if
the rules you want to use are deterministic -- can you describe what you
want to do to an *arbitrary* input string?
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Limiting QueryParser
Posted by Erick Erickson <er...@gmail.com>.
I've also got to ask a similar question to Michael's... Who is the UI
intended for? If it's intended for any type of "end user", even other IT
folks, who aren't Lucene junkies, trying to explain when colon's count and
when they don't is going to be a challenge. I predict it will lead to 1>
endless confusion and 2> waaaaaay more time spent than you think and 3>
little value actually added to the app.
I think that rather than spending time worrying about this issue, I'd define
it away by removing colon's from my token stream while indexing and causing
every colon in my free-form text to be a field. Unless I just took away the
field:value syntax from the free-form entry field entirely.....
Erick
On 11/22/06, Michael Rusch <mc...@facstaff.wisc.edu> wrote:
>
> Sorry if I'm missing the point here, but what about simply replacing
> colons
> with spaces first?
>
> Michael.
>
> > -----Original Message-----
> > From: Antony Bowesman [mailto:adb@teamware.com]
> > Sent: Tuesday, November 21, 2006 10:01 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Limiting QueryParser
> >
> > Mark Miller wrote:
> > > if you scan the query and escape all colons (ie \:) then you should be
> > > good (I have not verified). Of course you will not be able to do a
> field
> > > search, but that seems to be what your after.
> >
> > Thanks for that suggestion. However, a standard un-escaped parse gives
> >
> > Input - important:conference agenda
> > Query - important:conference body:agenda
> >
> > Escaping the : gives
> >
> > Input - important\:conference agenda
> > Query - subject:"important conference" subject:agenda
> >
> > which has caused it to generate a PhraseQuery for important conference
> > which is
> > incorrect.
> >
> > The following
> >
> > Input - important\:"conference agenda"
> > Query - subject:important subject:"conference agenda"
> >
> > is correct. Is that a bug in the middle one?
> > Antony
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: Limiting QueryParser
Posted by Antony Bowesman <ad...@teamware.com>.
Michael Rusch wrote:
> Sorry if I'm missing the point here, but what about simply replacing colons
> with spaces first?
>
> Michael.
Err, thanks. I've been in too deep at the wrong end :) Wood, trees and
visibility spring to mind!
Antony
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: Limiting QueryParser
Posted by Michael Rusch <mc...@facstaff.wisc.edu>.
Sorry if I'm missing the point here, but what about simply replacing colons
with spaces first?
Michael.
> -----Original Message-----
> From: Antony Bowesman [mailto:adb@teamware.com]
> Sent: Tuesday, November 21, 2006 10:01 PM
> To: java-user@lucene.apache.org
> Subject: Re: Limiting QueryParser
>
> Mark Miller wrote:
> > if you scan the query and escape all colons (ie \:) then you should be
> > good (I have not verified). Of course you will not be able to do a field
> > search, but that seems to be what your after.
>
> Thanks for that suggestion. However, a standard un-escaped parse gives
>
> Input - important:conference agenda
> Query - important:conference body:agenda
>
> Escaping the : gives
>
> Input - important\:conference agenda
> Query - subject:"important conference" subject:agenda
>
> which has caused it to generate a PhraseQuery for important conference
> which is
> incorrect.
>
> The following
>
> Input - important\:"conference agenda"
> Query - subject:important subject:"conference agenda"
>
> is correct. Is that a bug in the middle one?
> Antony
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Limiting QueryParser
Posted by Antony Bowesman <ad...@teamware.com>.
Mark Miller wrote:
> if you scan the query and escape all colons (ie \:) then you should be
> good (I have not verified). Of course you will not be able to do a field
> search, but that seems to be what your after.
Thanks for that suggestion. However, a standard un-escaped parse gives
Input - important:conference agenda
Query - important:conference body:agenda
Escaping the : gives
Input - important\:conference agenda
Query - subject:"important conference" subject:agenda
which has caused it to generate a PhraseQuery for important conference which is
incorrect.
The following
Input - important\:"conference agenda"
Query - subject:important subject:"conference agenda"
is correct. Is that a bug in the middle one?
Antony
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Limiting QueryParser
Posted by Mark Miller <ma...@gmail.com>.
Keep in mind that this would not work for
important:"conference agenda"
as the quotes would be escaped and queryparser will not generate a phrase query
- Mark
Steven Rowe wrote:
> static String QueryParser.escape(String) should do the trick:
>
> <http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html#escape(java.lang.String)>
>
> Look at the bottom of the below-linked page for the list of characters
> that the above method will escape:
>
> <http://lucene.apache.org/java/docs/queryparsersyntax.html>
>
> Steve
>
> Mark Miller wrote:
>
>> if you scan the query and escape all colons (ie \:) then you should be
>> good (I have not verified). Of course you will not be able to do a field
>> search, but that seems to be what your after.
>>
>> Antony Bowesman wrote:
>>
>>> Hi,
>>>
>>> I have a search UI that allows search criteria to be input against
>>> specific fields, e.g. Subject.
>>>
>>> In order to create a suitable Lucene Query, I must analyze that String
>>> so that it becomes a set of Tokens which I can then turn into Terms.
>>> QueryParser seems to fit the bill for that, however, it is too clever
>>> as it assumes that anything suffixes with a : is a field reference.
>>>
>>> If someone enters
>>>
>>> important:conference agenda
>>>
>>> in the subject field, I don't want QP to translate this to
>>>
>>> +important:conference +defaultfield:agenda
>>>
>>> I want to end up with
>>>
>>> +subject:important +subject:conference +subject:agenda
>>>
>>> I've written something to do this, but I know it is not as clever as
>>> QP as currently it can only create BooleanQueries with TermQueries and
>>> cannot handle PhraseQuery, so would not handle
>>>
>>> important:"conference agenda"
>>>
>>> correctly. Does anyone have any pointers on how to limit QueryParser
>>> so that I can force it to treat what it thinks as fields as terms.
>>>
>>> Thanks
>>> Antony
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Limiting QueryParser
Posted by Steven Rowe <sa...@syr.edu>.
static String QueryParser.escape(String) should do the trick:
<http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html#escape(java.lang.String)>
Look at the bottom of the below-linked page for the list of characters
that the above method will escape:
<http://lucene.apache.org/java/docs/queryparsersyntax.html>
Steve
Mark Miller wrote:
> if you scan the query and escape all colons (ie \:) then you should be
> good (I have not verified). Of course you will not be able to do a field
> search, but that seems to be what your after.
>
> Antony Bowesman wrote:
>> Hi,
>>
>> I have a search UI that allows search criteria to be input against
>> specific fields, e.g. Subject.
>>
>> In order to create a suitable Lucene Query, I must analyze that String
>> so that it becomes a set of Tokens which I can then turn into Terms.
>> QueryParser seems to fit the bill for that, however, it is too clever
>> as it assumes that anything suffixes with a : is a field reference.
>>
>> If someone enters
>>
>> important:conference agenda
>>
>> in the subject field, I don't want QP to translate this to
>>
>> +important:conference +defaultfield:agenda
>>
>> I want to end up with
>>
>> +subject:important +subject:conference +subject:agenda
>>
>> I've written something to do this, but I know it is not as clever as
>> QP as currently it can only create BooleanQueries with TermQueries and
>> cannot handle PhraseQuery, so would not handle
>>
>> important:"conference agenda"
>>
>> correctly. Does anyone have any pointers on how to limit QueryParser
>> so that I can force it to treat what it thinks as fields as terms.
>>
>> Thanks
>> Antony
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Limiting QueryParser
Posted by Mark Miller <ma...@gmail.com>.
if you scan the query and escape all colons (ie \:) then you should be
good (I have not verified). Of course you will not be able to do a field
search, but that seems to be what your after.
Antony Bowesman wrote:
> Hi,
>
> I have a search UI that allows search criteria to be input against
> specific fields, e.g. Subject.
>
> In order to create a suitable Lucene Query, I must analyze that String
> so that it becomes a set of Tokens which I can then turn into Terms.
> QueryParser seems to fit the bill for that, however, it is too clever
> as it assumes that anything suffixes with a : is a field reference.
>
> If someone enters
>
> important:conference agenda
>
> in the subject field, I don't want QP to translate this to
>
> +important:conference +defaultfield:agenda
>
> I want to end up with
>
> +subject:important +subject:conference +subject:agenda
>
> I've written something to do this, but I know it is not as clever as
> QP as currently it can only create BooleanQueries with TermQueries and
> cannot handle PhraseQuery, so would not handle
>
> important:"conference agenda"
>
> correctly. Does anyone have any pointers on how to limit QueryParser
> so that I can force it to treat what it thinks as fields as terms.
>
> Thanks
> Antony
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org