You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Tommaso Teofili <to...@gmail.com> on 2014/03/06 12:23:23 UTC

Suggestions about writing / extending QueryParsers

Hi all,

I'm thinking about writing/extending a QueryParser for MLT queries; I've
never really looked into that code too much, while I'm doing that now, I'm
wondering if anyone has suggestions on how to start with such a topic.
Should I write a new grammar for that ? Or can I just extend an existing
grammar / class?

Thanks in advance,
Tommaso

Re: Suggestions about writing / extending QueryParsers

Posted by Tommaso Teofili <to...@gmail.com>.

Hi Tim,

2014-03-07 15:20 GMT+01:00 Allison, Timothy B. <ta...@mitre.org>:

>  Tommaso,
>
>   Ah, now I see.  If you want to add new operators, you'll have to modify
> the javacc files.  For the SpanQueryParser, I added a handful of new
> operators and chose to go with regexes instead of javacc...not sure that was
> the right decision, but given my lack of knowledge of javacc, it was
> expedient.  If you have time or already know javacc, it shouldn't be
> difficult.
>

thanks, I've used javacc in the past, but I'm definitely not experienced
with it, I'll see what fits best.


>    As for nobrainer on the Solr side, y, it shouldn't be a problem.
> However, as of now the basic queryparser is a copy and paste job between
> Lucene and Solr, so you'll just have to redo your code in Solr....unless you
> do something smarter.
>

uh ok, that seems to be something to fix though, don't know if there're
specific reasons for copy pasting instead of reusing...


>    If you'd be willing to wait for LUCENE-5205 to be brought into Lucene,
> I'd consider adding this functionality into the SpanQueryParser as a later
> step.
>

cool, thanks Tim, that'd be really nice.
Thanks,
Tommaso


>
>
>   Cheers,
>
>
>
>              Tim
>
>
>
> *From:* Tommaso Teofili [mailto:tommaso.teofili@gmail.com]
> *Sent:* Friday, March 07, 2014 3:17 AM
> *To:* dev@lucene.apache.org
> *Subject:* Re: Suggestions about writing / extending QueryParsers
>
>
>
> Thanks Tim and Upayavira for your replies.
>
>
>
> I still need to decide what the final syntax could be, however generally
> speaking the ideal would be that I am able to extend the current Lucene
> syntax with a new expression which will trigger the creation of a more like
> this query with something like +title:foo +"text for similar docs"%2 where
> the phrase between quotes will generate a MoreLikeThisQuery on that text if
> it's followed by the % character (and the number 2 may control the MLT
> configuration, e.g. min document freq == min term freq = 2), similarly to
> what it's done for proximity search (not sure about using %, it's just a
> syntax example).
>
> I guess then I'd need to extend the classic query parser, as per Tim's
> suggestions and I'd assume that if this goes into the classic qp it should
> be a no brainer on the Solr side.
>
> Does it sound correct / feasible?
>
>
>
> Regards,
>
> Tommaso
>
> 2014-03-06 15:08 GMT+01:00 Upayavira <uv...@odoko.co.uk>:
>
> Tommaso,
>
>
>
> Do say more about what you're thinking of. I'm currently getting my dev
> environment up to look into enhancing the MoreLikeThisHandler to be able
> handle function query boosts. This should be eminently possible from my
> initial research. However, if you're thinking of something more powerful,
> perhaps we can work together.
>
>
>
> Upayavira
>
>
>
>
>
> On Thu, Mar 6, 2014, at 11:23 AM, Tommaso Teofili wrote:
>
>  Hi all,
>
>
>
> I'm thinking about writing/extending a QueryParser for MLT queries; I've
> never really looked into that code too much, while I'm doing that now, I'm
> wondering if anyone has suggestions on how to start with such a topic.
>
> Should I write a new grammar for that ? Or can I just extend an existing
> grammar / class?
>
>
>
> Thanks in advance,
>
> Tommaso
>
>
>

RE: Suggestions about writing / extending QueryParsers

Posted by "Allison, Timothy B." <ta...@mitre.org>.

Tommaso,
  Ah, now I see.  If you want to add new operators, you'll have to modify the javacc files.  For the SpanQueryParser, I added a handful of new operators and chose to go with regexes instead of javacc...not sure that was the right decision, but given my lack of knowledge of javacc, it was expedient.  If you have time or already know javacc, it shouldn't be difficult.
  As for nobrainer on the Solr side, y, it shouldn't be a problem.  However, as of now the basic queryparser is a copy and paste job between Lucene and Solr, so you'll just have to redo your code in Solr....unless you do something smarter.
  If you'd be willing to wait for LUCENE-5205 to be brought into Lucene, I'd consider adding this functionality into the SpanQueryParser as a later step.

  Cheers,

             Tim

From: Tommaso Teofili [mailto:tommaso.teofili@gmail.com]
Sent: Friday, March 07, 2014 3:17 AM
To: dev@lucene.apache.org
Subject: Re: Suggestions about writing / extending QueryParsers

Thanks Tim and Upayavira for your replies.

I still need to decide what the final syntax could be, however generally speaking the ideal would be that I am able to extend the current Lucene syntax with a new expression which will trigger the creation of a more like this query with something like +title:foo +"text for similar docs"%2 where the phrase between quotes will generate a MoreLikeThisQuery on that text if it's followed by the % character (and the number 2 may control the MLT configuration, e.g. min document freq == min term freq = 2), similarly to what it's done for proximity search (not sure about using %, it's just a syntax example).
I guess then I'd need to extend the classic query parser, as per Tim's suggestions and I'd assume that if this goes into the classic qp it should be a no brainer on the Solr side.
Does it sound correct / feasible?

Regards,
Tommaso
2014-03-06 15:08 GMT+01:00 Upayavira <uv...@odoko.co.uk>>:
Tommaso,

Do say more about what you're thinking of. I'm currently getting my dev environment up to look into enhancing the MoreLikeThisHandler to be able handle function query boosts. This should be eminently possible from my initial research. However, if you're thinking of something more powerful, perhaps we can work together.

Upayavira


On Thu, Mar 6, 2014, at 11:23 AM, Tommaso Teofili wrote:
Hi all,

I'm thinking about writing/extending a QueryParser for MLT queries; I've never really looked into that code too much, while I'm doing that now, I'm wondering if anyone has suggestions on how to start with such a topic.
Should I write a new grammar for that ? Or can I just extend an existing grammar / class?

Thanks in advance,
Tommaso

Re: Suggestions about writing / extending QueryParsers

Posted by Tommaso Teofili <to...@gmail.com>.

Thanks Tim and Upayavira for your replies.

I still need to decide what the final syntax could be, however generally
speaking the ideal would be that I am able to extend the current Lucene
syntax with a new expression which will trigger the creation of a more like
this query with something like +title:foo +"text for similar docs"%2 where
the phrase between quotes will generate a MoreLikeThisQuery on that text if
it's followed by the % character (and the number 2 may control the MLT
configuration, e.g. min document freq == min term freq = 2), similarly to
what it's done for proximity search (not sure about using %, it's just a
syntax example).
I guess then I'd need to extend the classic query parser, as per Tim's
suggestions and I'd assume that if this goes into the classic qp it should
be a no brainer on the Solr side.
Does it sound correct / feasible?

Regards,
Tommaso

2014-03-06 15:08 GMT+01:00 Upayavira <uv...@odoko.co.uk>:

>  Tommaso,
>
> Do say more about what you're thinking of. I'm currently getting my dev
> environment up to look into enhancing the MoreLikeThisHandler to be able
> handle function query boosts. This should be eminently possible from my
> initial research. However, if you're thinking of something more powerful,
> perhaps we can work together.
>
> Upayavira
>
>
> On Thu, Mar 6, 2014, at 11:23 AM, Tommaso Teofili wrote:
>
> Hi all,
>
> I'm thinking about writing/extending a QueryParser for MLT queries; I've
> never really looked into that code too much, while I'm doing that now, I'm
> wondering if anyone has suggestions on how to start with such a topic.
>  Should I write a new grammar for that ? Or can I just extend an existing
> grammar / class?
>
> Thanks in advance,
> Tommaso
>
>

Re: Suggestions about writing / extending QueryParsers

Posted by Upayavira <uv...@odoko.co.uk>.

Tommaso,



Do say more about what you're thinking of. I'm currently getting my dev
environment up to look into enhancing the MoreLikeThisHandler to be
able handle function query boosts. This should be eminently possible
from my initial research. However, if you're thinking of something more
powerful, perhaps we can work together.



Upayavira





On Thu, Mar 6, 2014, at 11:23 AM, Tommaso Teofili wrote:

Hi all,

I'm thinking about writing/extending a QueryParser for MLT queries;
I've never really looked into that code too much, while I'm doing that
now, I'm wondering if anyone has suggestions on how to start with such
a topic.
Should I write a new grammar for that ? Or can I just extend an
existing grammar / class?

Thanks in advance,
Tommaso

RE: Suggestions about writing / extending QueryParsers

Posted by "Allison, Timothy B." <ta...@mitre.org>.

Hi Tommaso,

  It will depend on how different your target syntax will be.  If you extend the classic parser (or, QueryParserBase), there is a fair amount of overhead and extras that you might not want or need.  On the other hand, the query syntax and the methods will be familiar to the Lucene community, and there is a large number of test cases already built for you.  On the third hand, if you need not modify the low level parsing stuff, you'll have to be familiar with javacc.

  There's the "flexible" family that should allow for easy modifications, and the "xml" family could offer an easy interface between a custom lexer and a parser.   The SimpleQueryParser offers a model of building something fairly simple and yet very elegant from scratch.

  In deciding where to start, another consideration might include how easy it will be to integrate at the Solr level.  Make sure to include field-based hooks for processing multiterms, prefix and range queries.

  For LUCENE-5205, I eventually chose to subclass QueryParserBase, and I had to override  a fair amount of code because every terminal had to be a SpanQuery - most of the queryparser infrastructure is built for traditional queries.

  So, what features do you want to add for mlt?  What capabilities do you need?

          Cheers,

                      Tim



From: Tommaso Teofili [mailto:tommaso.teofili@gmail.com]
Sent: Thursday, March 06, 2014 6:23 AM
To: dev@lucene.apache.org
Subject: Suggestions about writing / extending QueryParsers

Hi all,

I'm thinking about writing/extending a QueryParser for MLT queries; I've never really looked into that code too much, while I'm doing that now, I'm wondering if anyone has suggestions on how to start with such a topic.
Should I write a new grammar for that ? Or can I just extend an existing grammar / class?

Thanks in advance,
Tommaso

Re: Suggestions about writing / extending QueryParsers

Posted by Tommaso Teofili <to...@gmail.com>.

thanks Jack for the reference, I didn't know it.
Regards,
Tommaso


2014-03-08 1:25 GMT+01:00 Jack Krupansky <ja...@basetechnology.com>:

>   For reference, the LucidWorks Search query parser has two MLT features:
>
> 1. Like terms - does MLT on a list of terms.
>
> For example:
>
> like:(Four score "and" seven years ago our fathers brought forth)
>
> See:
> http://docs.lucidworks.com/display/lweug/Like+Term+Keyword+Option
>
> This is effectively an OR operator on the terms.
>
> 2. Like document - does MLT on a Solr document, given it's id:
>
> For example:
>
> Washington like:"http://cnn.com" -"New York"
>
> See:
> http://docs.lucidworks.com/display/lweug/Like+Document+Term+Keyword+Option
>
> -- Jack Krupansky
>
>  *From:* Tommaso Teofili <to...@gmail.com>
> *Sent:* Thursday, March 6, 2014 6:23 AM
> *To:* dev@lucene.apache.org
> *Subject:* Suggestions about writing / extending QueryParsers
>
>  Hi all,
>
> I'm thinking about writing/extending a QueryParser for MLT queries; I've
> never really looked into that code too much, while I'm doing that now, I'm
> wondering if anyone has suggestions on how to start with such a topic.
> Should I write a new grammar for that ? Or can I just extend an existing
> grammar / class?
>
> Thanks in advance,
> Tommaso
>

Re: Suggestions about writing / extending QueryParsers

Posted by Jack Krupansky <ja...@basetechnology.com>.

For reference, the LucidWorks Search query parser has two MLT features:

1. Like terms – does MLT on a list of terms.

For example:

like:(Four score "and" seven years ago our fathers brought forth)

See:
http://docs.lucidworks.com/display/lweug/Like+Term+Keyword+Option

This is effectively an OR operator on the terms.

2. Like document – does MLT on a Solr document, given it’s id:

For example:

Washington like:"http://cnn.com" -"New York"

See:
http://docs.lucidworks.com/display/lweug/Like+Document+Term+Keyword+Option

-- Jack Krupansky

From: Tommaso Teofili 
Sent: Thursday, March 6, 2014 6:23 AM
To: dev@lucene.apache.org 
Subject: Suggestions about writing / extending QueryParsers

Hi all, 

I'm thinking about writing/extending a QueryParser for MLT queries; I've never really looked into that code too much, while I'm doing that now, I'm wondering if anyone has suggestions on how to start with such a topic.
Should I write a new grammar for that ? Or can I just extend an existing grammar / class?

Thanks in advance,
Tommaso