You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Antony Bowesman <ad...@teamware.com> on 2006/11/21 23:26:09 UTC

Limiting QueryParser

Hi,

I have a search UI that allows search criteria to be input against specific 
fields, e.g. Subject.

In order to create a suitable Lucene Query, I must analyze that String so that 
it becomes a set of Tokens which I can then turn into Terms.  QueryParser seems 
to fit the bill for that, however, it is too clever as it assumes that anything 
suffixes with a : is a field reference.

If someone enters

important:conference agenda

in the subject field, I don't want QP to translate this to

+important:conference +defaultfield:agenda

I want to end up with

+subject:important +subject:conference +subject:agenda

I've written something to do this, but I know it is not as clever as QP as 
currently it can only create BooleanQueries with TermQueries and cannot handle 
PhraseQuery, so would not handle

important:"conference agenda"

correctly.  Does anyone have any pointers on how to limit QueryParser so that I 
can force it to treat what it thinks as fields as terms.

Thanks
Antony


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Limiting QueryParser

Posted by Antony Bowesman <ad...@teamware.com>.

Erik Hatcher wrote:
> 
> It doesn't seem like you need a "parser" at all for your field-specific 
> search fields.  Simply tokenize, using a Lucene Analyzer, the text field 
> and build up a BooleanQuery of all the tokens.

That's what I'm currently doing, but I was getting bogged down with trying to 
support PhraseQueries in case of quoted input.  My knowledge of JavaCC is 
limited, so I was trying to weigh up the effor of rolling my own or adapting QP.

> QueryParser is over prescribed - and is often not the best fit for the 
> job.  It's only a few lines of code to tokenize (look at the QueryParser 
> code in how it creates a PhraseQuery, for example) and build a Query 
> from the tokens.
> 
> If you need to support +/-/AND/OR syntax in your field-specific inputs 
> that's a different story - though your example does not show this need.  
> If so, copying the QueryParser.jj file and removing the "field:" syntax 
> and fixing all created Query objects to a specified single field might 
> be the trick.

I do have other levels where +- can be used, but that's the easy bit and I'm 
just constructing an outer BooleanQuery.  Anyway, I'll have a go with QP.jj and 
see where I get.  BTW, LIA is an excellent book!

Thanks everyone for your comments.
Antony

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Limiting QueryParser

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Nov 21, 2006, at 10:34 PM, Antony Bowesman wrote:
> On the field specific fields, I want to control the parsing to  
> ensure that the parser will not interpret fields in the user  
> entered string, so in those fields it treats : as : all of the  
> time.  However, in the "free form" field, anything goes and : is a  
> field delimeter all of the time.  So, a user can seach for
>
> Subject  - important:conference agenda
> FieldA   - blah:abc
> FreeForm - fieldX:Xdata fieldY:Ydata
>
> in the above, the Subject and Field A would have been indexed using  
> configurable analysers and would have indexed "important" and  
> "blah", so these are relevant to the search.
>
> This should come out as a
>
> (+subject:important +subject:conference +subject:agenda)  
> (+fielda:blah +fielda:abc) (+fieldx:xdata +fieldy:ydata)
>
> My framework allows for field specific parsers as well as field  
> specific analysers, so having a different query parser for the  
> named fields and the free form field is fine.

It doesn't seem like you need a "parser" at all for your field- 
specific search fields.  Simply tokenize, using a Lucene Analyzer,  
the text field and build up a BooleanQuery of all the tokens.

QueryParser is over prescribed - and is often not the best fit for  
the job.  It's only a few lines of code to tokenize (look at the  
QueryParser code in how it creates a PhraseQuery, for example) and  
build a Query from the tokens.

If you need to support +/-/AND/OR syntax in your field-specific  
inputs that's a different story - though your example does not show  
this need.  If so, copying the QueryParser.jj file and removing the  
"field:" syntax and fixing all created Query objects to a specified  
single field might be the trick.

	Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Limiting QueryParser

Posted by Antony Bowesman <ad...@teamware.com>.

Chris Hostetter wrote:
> : important:conference agenda
> 
> : I want to end up with
> :
> : +subject:important +subject:conference +subject:agenda
> :
> : I've written something to do this, but I know it is not as clever as QP as
> : currently it can only create BooleanQueries with TermQueries and cannot handle
> : PhraseQuery, so would not handle
> :
> : important:"conference agenda"
> 
> you're on a slippery slope .. basically what you are ssaying is you want
> ":" to be treated as a field seperater *some* ofthe time, and as a
> whitespace hcaracter the rest of the time ... but it's not really clear if
> the rules you want to use are deterministic -- can you describe what you
> want to do to an *arbitrary* input string?

I have a number of UI search fields that are field specific, e.g. Subject, 
Sender, amongst others as well as a "free form" google style field where 
anything can be entered.

On the field specific fields, I want to control the parsing to ensure that the 
parser will not interpret fields in the user entered string, so in those fields 
it treats : as : all of the time.  However, in the "free form" field, anything 
goes and : is a field delimeter all of the time.  So, a user can seach for

Subject  - important:conference agenda
FieldA   - blah:abc
FreeForm - fieldX:Xdata fieldY:Ydata

in the above, the Subject and Field A would have been indexed using configurable 
analysers and would have indexed "important" and "blah", so these are relevant 
to the search.

This should come out as a

(+subject:important +subject:conference +subject:agenda) (+fielda:blah 
+fielda:abc) (+fieldx:xdata +fieldy:ydata)

My framework allows for field specific parsers as well as field specific 
analysers, so having a different query parser for the named fields and the free 
form field is fine.

Antony

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Limiting QueryParser

Posted by Chris Hostetter <ho...@fucit.org>.

: important:conference agenda

: I want to end up with
:
: +subject:important +subject:conference +subject:agenda
:
: I've written something to do this, but I know it is not as clever as QP as
: currently it can only create BooleanQueries with TermQueries and cannot handle
: PhraseQuery, so would not handle
:
: important:"conference agenda"

you're on a slippery slope .. basically what you are ssaying is you want
":" to be treated as a field seperater *some* ofthe time, and as a
whitespace hcaracter the rest of the time ... but it's not really clear if
the rules you want to use are deterministic -- can you describe what you
want to do to an *arbitrary* input string?



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Limiting QueryParser

Posted by Erick Erickson <er...@gmail.com>.

I've also got to ask a similar question to Michael's... Who is the UI
intended for? If it's intended for any type of "end user", even other IT
folks, who aren't  Lucene junkies, trying to explain when colon's count and
when they don't is going to be a challenge. I predict it will lead to 1>
endless confusion and 2> waaaaaay more time spent than you think and 3>
little value actually added to the app.

I think that rather than spending time worrying about this issue, I'd define
it away by removing colon's from my token stream while indexing and causing
every colon in my free-form text to be a field. Unless I just took away the
field:value syntax from the free-form entry field entirely.....

Erick

On 11/22/06, Michael Rusch <mc...@facstaff.wisc.edu> wrote:
>
> Sorry if I'm missing the point here, but what about simply replacing
> colons
> with spaces first?
>
> Michael.
>
> > -----Original Message-----
> > From: Antony Bowesman [mailto:adb@teamware.com]
> > Sent: Tuesday, November 21, 2006 10:01 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Limiting QueryParser
> >
> > Mark Miller wrote:
> > > if you scan the query and escape all colons (ie \:) then you should be
> > > good (I have not verified). Of course you will not be able to do a
> field
> > > search, but that seems to be what your after.
> >
> > Thanks for that suggestion.  However, a standard un-escaped parse gives
> >
> > Input - important:conference agenda
> > Query - important:conference body:agenda
> >
> > Escaping the : gives
> >
> > Input - important\:conference agenda
> > Query - subject:"important conference" subject:agenda
> >
> > which has caused it to generate a PhraseQuery for important conference
> > which is
> > incorrect.
> >
> > The following
> >
> > Input - important\:"conference agenda"
> > Query - subject:important subject:"conference agenda"
> >
> > is correct.  Is that a bug in the middle one?
> > Antony
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Limiting QueryParser

Posted by Antony Bowesman <ad...@teamware.com>.

Michael Rusch wrote:
> Sorry if I'm missing the point here, but what about simply replacing colons
> with spaces first?
> 
> Michael.

Err, thanks.  I've been in too deep at the wrong end :) Wood, trees and 
visibility spring to mind!

Antony


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Limiting QueryParser

Posted by Michael Rusch <mc...@facstaff.wisc.edu>.

Sorry if I'm missing the point here, but what about simply replacing colons
with spaces first?

Michael.

> -----Original Message-----
> From: Antony Bowesman [mailto:adb@teamware.com]
> Sent: Tuesday, November 21, 2006 10:01 PM
> To: java-user@lucene.apache.org
> Subject: Re: Limiting QueryParser
> 
> Mark Miller wrote:
> > if you scan the query and escape all colons (ie \:) then you should be
> > good (I have not verified). Of course you will not be able to do a field
> > search, but that seems to be what your after.
> 
> Thanks for that suggestion.  However, a standard un-escaped parse gives
> 
> Input - important:conference agenda
> Query - important:conference body:agenda
> 
> Escaping the : gives
> 
> Input - important\:conference agenda
> Query - subject:"important conference" subject:agenda
> 
> which has caused it to generate a PhraseQuery for important conference
> which is
> incorrect.
> 
> The following
> 
> Input - important\:"conference agenda"
> Query - subject:important subject:"conference agenda"
> 
> is correct.  Is that a bug in the middle one?
> Antony
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Limiting QueryParser

Posted by Antony Bowesman <ad...@teamware.com>.

Mark Miller wrote:
> if you scan the query and escape all colons (ie \:) then you should be 
> good (I have not verified). Of course you will not be able to do a field 
> search, but that seems to be what your after.

Thanks for that suggestion.  However, a standard un-escaped parse gives

Input - important:conference agenda
Query - important:conference body:agenda

Escaping the : gives

Input - important\:conference agenda
Query - subject:"important conference" subject:agenda

which has caused it to generate a PhraseQuery for important conference which is 
incorrect.

The following

Input - important\:"conference agenda"
Query - subject:important subject:"conference agenda"

is correct.  Is that a bug in the middle one?
Antony



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Limiting QueryParser

Posted by Mark Miller <ma...@gmail.com>.

Keep in mind that this would not work for

important:"conference agenda"

as the quotes would be escaped and queryparser will not generate a phrase query

- Mark


Steven Rowe wrote:
> static String QueryParser.escape(String) should do the trick:
>
> <http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html#escape(java.lang.String)>
>
> Look at the bottom of the below-linked page for the list of characters
> that the above method will escape:
>
> <http://lucene.apache.org/java/docs/queryparsersyntax.html>
>
> Steve
>
> Mark Miller wrote:
>   
>> if you scan the query and escape all colons (ie \:) then you should be
>> good (I have not verified). Of course you will not be able to do a field
>> search, but that seems to be what your after.
>>
>> Antony Bowesman wrote:
>>     
>>> Hi,
>>>
>>> I have a search UI that allows search criteria to be input against
>>> specific fields, e.g. Subject.
>>>
>>> In order to create a suitable Lucene Query, I must analyze that String
>>> so that it becomes a set of Tokens which I can then turn into Terms. 
>>> QueryParser seems to fit the bill for that, however, it is too clever
>>> as it assumes that anything suffixes with a : is a field reference.
>>>
>>> If someone enters
>>>
>>> important:conference agenda
>>>
>>> in the subject field, I don't want QP to translate this to
>>>
>>> +important:conference +defaultfield:agenda
>>>
>>> I want to end up with
>>>
>>> +subject:important +subject:conference +subject:agenda
>>>
>>> I've written something to do this, but I know it is not as clever as
>>> QP as currently it can only create BooleanQueries with TermQueries and
>>> cannot handle PhraseQuery, so would not handle
>>>
>>> important:"conference agenda"
>>>
>>> correctly.  Does anyone have any pointers on how to limit QueryParser
>>> so that I can force it to treat what it thinks as fields as terms.
>>>
>>> Thanks
>>> Antony
>>>       
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Limiting QueryParser

Posted by Steven Rowe <sa...@syr.edu>.

static String QueryParser.escape(String) should do the trick:

<http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html#escape(java.lang.String)>

Look at the bottom of the below-linked page for the list of characters
that the above method will escape:

<http://lucene.apache.org/java/docs/queryparsersyntax.html>

Steve

Mark Miller wrote:
> if you scan the query and escape all colons (ie \:) then you should be
> good (I have not verified). Of course you will not be able to do a field
> search, but that seems to be what your after.
> 
> Antony Bowesman wrote:
>> Hi,
>>
>> I have a search UI that allows search criteria to be input against
>> specific fields, e.g. Subject.
>>
>> In order to create a suitable Lucene Query, I must analyze that String
>> so that it becomes a set of Tokens which I can then turn into Terms. 
>> QueryParser seems to fit the bill for that, however, it is too clever
>> as it assumes that anything suffixes with a : is a field reference.
>>
>> If someone enters
>>
>> important:conference agenda
>>
>> in the subject field, I don't want QP to translate this to
>>
>> +important:conference +defaultfield:agenda
>>
>> I want to end up with
>>
>> +subject:important +subject:conference +subject:agenda
>>
>> I've written something to do this, but I know it is not as clever as
>> QP as currently it can only create BooleanQueries with TermQueries and
>> cannot handle PhraseQuery, so would not handle
>>
>> important:"conference agenda"
>>
>> correctly.  Does anyone have any pointers on how to limit QueryParser
>> so that I can force it to treat what it thinks as fields as terms.
>>
>> Thanks
>> Antony


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Limiting QueryParser

Posted by Mark Miller <ma...@gmail.com>.

if you scan the query and escape all colons (ie \:) then you should be 
good (I have not verified). Of course you will not be able to do a field 
search, but that seems to be what your after.

Antony Bowesman wrote:
> Hi,
>
> I have a search UI that allows search criteria to be input against 
> specific fields, e.g. Subject.
>
> In order to create a suitable Lucene Query, I must analyze that String 
> so that it becomes a set of Tokens which I can then turn into Terms.  
> QueryParser seems to fit the bill for that, however, it is too clever 
> as it assumes that anything suffixes with a : is a field reference.
>
> If someone enters
>
> important:conference agenda
>
> in the subject field, I don't want QP to translate this to
>
> +important:conference +defaultfield:agenda
>
> I want to end up with
>
> +subject:important +subject:conference +subject:agenda
>
> I've written something to do this, but I know it is not as clever as 
> QP as currently it can only create BooleanQueries with TermQueries and 
> cannot handle PhraseQuery, so would not handle
>
> important:"conference agenda"
>
> correctly.  Does anyone have any pointers on how to limit QueryParser 
> so that I can force it to treat what it thinks as fields as terms.
>
> Thanks
> Antony
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org