You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Matthew Denner <ma...@codemonkeyconsultancy.net> on 2005/02/07 17:56:30 UTC

Refactoring suggestion for query parsing and creation

Hi,

I have been working with Lucene for about a month now and I am really 
impressed by the work of the people involved.  I did, however, come across 
something that I thought my be better refactored: extending the QueryParser 
to add your own handling of various Query implementations.  So I had a go at 
introducing a QueryFactory interface that classes can implement to provide 
construction of these Query implementations, and then an instance can be 
passed to a QueryParser instance for it to use.  I have a patch that provides 
this if the developers of Lucene are interested but, because it is a quite 
dramatic change (it removes alot of deprecated methods which I was very 
worried about) I would prefer someone to take a look at it and see if they 
think it is worthwhile.

The reason I made this patch is that I wanted to deal with integer ranges for 
a particular field in my application and, like I said, extending QueryParser 
felt wrong to me (and took me almost a week to actually find!).  With the 
patch I write:

QueryParser parser = 
  new QueryParser(
    "description", 
    new StandardAnalyzer(),
    new SpecialQueryFactory(new QueryFactoryImpl()));

And my specialised QueryFactory implementation gets used during parsing.  The 
implementation of QueryFactoryImpl was created by moving those methods that 
created Query instances in the original QueryFactory into a separate class.  
I also created a MultiFieldQueryFactory implementation that takes the methods 
from MultiFieldQueryParser (effectively making it now redundant).

Personally I prefer the idea of composition over inheritance, in this 
circumstance, but I can understand why other people would not want this.

Anyway, if someone would like to see the patch (made against the HEAD of the 
Lucene CVS code) I can provide it; or you can tell me where to go!  Either 
way, Lucene has still made searching so much easier and I thank you.

Matt

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: Refactoring suggestion for query parsing and creation

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

The MockObjects dependency is fine with me - it is a nice way to unit 
test for sure.  The important thing is that it is Apache Software 
Licensed, which MO is.

	Erik



On Feb 7, 2005, at 3:33 PM, Matthew Denner wrote:

> On Monday 07 February 2005 19:19, Daniel Naber wrote:
>> On Monday 07 February 2005 17:56, Matthew Denner wrote:
>>> QueryParser parser =
>>>   new QueryParser(
>>>     "description",
>>>     new StandardAnalyzer(),
>>>     new SpecialQueryFactory(new QueryFactoryImpl()));
>>
>> This sounds interesting, could you create a bug report (see "Lucene 
>> Bugs"
>> on the homepage) and then attach your patch?
>
> I will do this I just need to finish off the unit-tests and work the 
> changes
> into the QueryParser.jj file (I did them directly on the Java source). 
>  Would
> anyone mind if I added the MockObjects (http://www.mockobjects.com/) 
> JAR as a
> dependency as it makes testing the LowerCaseQueryFactory and
> MultiFieldQueryFactory implementations slightly easier?
>
> Matt


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: Refactoring suggestion for query parsing and creation

Posted by Matthew Denner <ma...@codemonkeyconsultancy.net>.

On Monday 07 February 2005 19:19, Daniel Naber wrote:
> On Monday 07 February 2005 17:56, Matthew Denner wrote:
> > QueryParser parser =
> >   new QueryParser(
> >     "description",
> >     new StandardAnalyzer(),
> >     new SpecialQueryFactory(new QueryFactoryImpl()));
>
> This sounds interesting, could you create a bug report (see "Lucene Bugs"
> on the homepage) and then attach your patch?

I will do this I just need to finish off the unit-tests and work the changes 
into the QueryParser.jj file (I did them directly on the Java source).  Would 
anyone mind if I added the MockObjects (http://www.mockobjects.com/) JAR as a 
dependency as it makes testing the LowerCaseQueryFactory and 
MultiFieldQueryFactory implementations slightly easier?

Matt

Re: Refactoring suggestion for query parsing and creation

Posted by Daniel Naber <da...@t-online.de>.

On Monday 07 February 2005 17:56, Matthew Denner wrote:

> QueryParser parser =
>   new QueryParser(
>     "description",
>     new StandardAnalyzer(),
>     new SpecialQueryFactory(new QueryFactoryImpl()));

This sounds interesting, could you create a bug report (see "Lucene Bugs" 
on the homepage) and then attach your patch?

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: Refactoring suggestion for query parsing and creation

Posted by Matthew Denner <ma...@codemonkeyconsultancy.net>.

On Monday 07 February 2005 19:17, David Spencer wrote:
> This is interesting and may match what I've wanted - a way to register
> customized query expansion modules w/ the query parser so that you can
> experiment with different ways of expanding the query e.g. I wrote a
> "WordNet" package that's in the sandbox that expands terms by adding
> their synonyms e.g. "big" might expand to "big large^.9 huge^.9".

I took a quick look at the WordNet CVS and I'm not quite sure so I'm kind of 
working off what's below!

> So is it safe to assume I could use your code like this (with 2 lines
> changed from your example):
>
> QueryParser parser =
>    new QueryParser(
>       "synonym",                 // pseudo field?
>       new StandardAnalyzer(),
>       new SpecialQueryFactory(new WordNetQueryFactoryImpl())); //
> wordnet factory?

It would probably be something like:

QueryParser parser = 
  new QueryParser(
    "defaultSearchField",			// default search field (as normal)
    new StandardAnalyzer(),
    new WordNetQueryFactoryImpl("synonym", new QueryFactoryImpl()));

I'm thinking that an instance of your QueryFactory implementation would 
effectively decorate another instance and might have your pseudo-field passed 
as a parameter.  It would then be implemented to catch any pseudo-field (in 
this case "synonym") queries and process the text itself.  

> Then a query can use the pseudo-field "synonym" any place:
>
> "+synonym:big dog"
>
> expands to:
>
> "+(big large^.9 huge^.9) dog"

Your WordNetQueryFactoryImpl would be something like this:

public class WordNetQueryFactoryImpl implements QueryFactory {
  private final QueryFactory m_delegate;
  private final String m_pseudoField;

  public WordNetQueryFactoryImpl(String pseudoField, QueryFactory delegate) {
    m_pseudoField = pseudoField;
    m_delegate = delegate;
  }

  // ... a few other interface methods here but cut for clarity ...

  public Query getFieldQuery(
      String field,
      Analyzer analyzer,
      String queryText,
      int slop)
  {
    // If the field is our pseudo-field and the value for it is 'big'
    if (m_pseudoField.equals(field) && "big".equals(queryText)) {
      // Use the delegate to build our query?
      BooleanQuery tmp = new BooleanQuery();
      tmp.add(
          m_delegate.getFieldQuery(field, analyzer, "big", slop), 
          BooleanClause.Occur.SHOULD);
      tmp.add(
          m_delegate.getFieldQuery(field, analyzer, "large", slop), 
          BooleanClause.Occur.SHOULD);
      tmp.add(
          m_delegate.getFieldQuery(field, analyzer, "huge", slop), 
          BooleanClause.Occur.SHOULD);
      return tmp;
    } else {
      // ... otherwise delegate the handling to the other QueryFactory
      return m_delegate.getFieldQuery(field, analyzer, queryText, slop);
    }
  }
}

Just as an example, I'm sure you'll work out what I mean!

> If so cool, I've wanted an extensible query parser for a while..

I don't think it quite fits an "extensible query parser", but I think it might 
get you somewhere.

Matt

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: Refactoring suggestion for query parsing and creation

Posted by David Spencer <da...@tropo.com>.

This is interesting and may match what I've wanted - a way to register 
customized query expansion modules w/ the query parser so that you can 
experiment with different ways of expanding the query e.g. I wrote a 
"WordNet" package that's in the sandbox that expands terms by adding 
their synonyms e.g. "big" might expand to "big large^.9 huge^.9".

So is it safe to assume I could use your code like this (with 2 lines 
changed from your example):

QueryParser parser =
   new QueryParser(
      "synonym",                 // pseudo field?
      new StandardAnalyzer(),
      new SpecialQueryFactory(new WordNetQueryFactoryImpl())); // 
wordnet factory?

Then a query can use the pseudo-field "synonym" any place:

"+synonym:big dog"

expands to:

"+(big large^.9 huge^.9) dog"


If so cool, I've wanted an extensible query parser for a while..

thx,
  Dave

PS
  When I wrote up my ideas on this a while ago I though the psuedo-field 
should look different from the normal fields e.g. it should have 2 
colons, not 1, but it's not a huge issue.




Matthew Denner wrote:

> Hi,
> 
> I have been working with Lucene for about a month now and I am really 
> impressed by the work of the people involved.  I did, however, come across 
> something that I thought my be better refactored: extending the QueryParser 
> to add your own handling of various Query implementations.  So I had a go at 
> introducing a QueryFactory interface that classes can implement to provide 
> construction of these Query implementations, and then an instance can be 
> passed to a QueryParser instance for it to use.  I have a patch that provides 
> this if the developers of Lucene are interested but, because it is a quite 
> dramatic change (it removes alot of deprecated methods which I was very 
> worried about) I would prefer someone to take a look at it and see if they 
> think it is worthwhile.
> 
> The reason I made this patch is that I wanted to deal with integer ranges for 
> a particular field in my application and, like I said, extending QueryParser 
> felt wrong to me (and took me almost a week to actually find!).  With the 
> patch I write:
> 
> QueryParser parser = 
>   new QueryParser(
>     "description", 
>     new StandardAnalyzer(),
>     new SpecialQueryFactory(new QueryFactoryImpl()));
> 
> And my specialised QueryFactory implementation gets used during parsing.  The 
> implementation of QueryFactoryImpl was created by moving those methods that 
> created Query instances in the original QueryFactory into a separate class.  
> I also created a MultiFieldQueryFactory implementation that takes the methods 
> from MultiFieldQueryParser (effectively making it now redundant).
> 
> Personally I prefer the idea of composition over inheritance, in this 
> circumstance, but I can understand why other people would not want this.
> 
> Anyway, if someone would like to see the patch (made against the HEAD of the 
> Lucene CVS code) I can provide it; or you can tell me where to go!  Either 
> way, Lucene has still made searching so much easier and I thank you.
> 
> Matt
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org