You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by markharw00d <ma...@yahoo.co.uk> on 2006/02/22 00:13:25 UTC

XML based Query Parser

Further to our discussions some time ago I've had some time to put 
together an XML-based query parser with support for many "advanced" 
query types not supported in the current Query parser.

More details and code here: http://www.inperspective.com/lucene/LXQuery2.htm

Cheers
Mark


	
	
		
___________________________________________________________ 
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: XML based Query Parser

Posted by Doug Cutting <cu...@apache.org>.

markharw00d wrote:
> Further to our discussions some time ago I've had some time to put 
> together an XML-based query parser with support for many "advanced" 
> query types not supported in the current Query parser.
> 
> More details and code here: 
> http://www.inperspective.com/lucene/LXQuery2.htm

Mark,

I think this could make a good contrib module.  I'd move the new query 
classes into a separate contrib/queries module (where perhaps a number 
of queries now in the core might also better live).  And it would be 
good to have a utility that builds a parser with builders for all of the 
core query classes and perhaps also all of the contrib queries. 
Overall, Bravo.

A related, useful, contribution would be a servlet that parses posted 
xml queries, runs them against an index, and returns the hit documents 
as xml.  We don't want this to compete with Solr, but, for many folks, a 
barebones solution like this might be sufficient.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: XML based Query Parser

Posted by Chris Hostetter <ho...@fucit.org>.

: : DOMUtils.getAttributeWithInheritance instead. My one scenario I came
: : across where I wanted some context passed down was "fieldName" and this
: : is handled simply by leaf nodes walking up the w3c.dom.Node tree until
: : you find an Element with this attribute set.
:
: Hmm, i can see how that would work in some cases, but it's not very
: general purpose.  Builders deep in the tree can "look up" at attributes
: higher in the tree, but that assumes the state info they want is already
: in the XML Elements (allthough i suppose builders higher up could
: mutate the attributes on their Element for discovery by lower level
: builders ... but that seems sketchy).
:
: It also doesn't address the problem of passing state up the tree ... which
: may be a less common case, but it still seems important to support it.

I should have thought about this more before i replied.  Two things
have occured to me in the last 30 minutes...

1) the idea i had once upon a time for maintaining stack frames of
"state" would have required builders that wanted to propogate state up
from their children to do so explicitly -- there's no reason why the same
thing couldn't be done by checking the attribute values of their child
Elements and setting them on their own Element before returning.  So
yeah ... using attributes to pass state up/down would probably work in all
the cases i can think of..
	....BUT...
it still seems sketchy to me to use attributes, because it could be
important to be able to tell the difference between "true" attributes
from the XML and "state" attributes passed up/down from other builders.

2) In java 1.5, org.w3c.dom.Node has setUserData/getUserData methods
specifically for the purpose of letting you annotate nodes as you move
arround the tree.

So yeah ... that's something to think about.


Perhaps attribute namespaces could be used to distinguish genuine
attributes from state information ... of course that raises a whole new
question about wether the builderfactories should support namespaces when
registering builders.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: XML based Query Parser

Posted by Chris Hostetter <ho...@fucit.org>.

: But doesn't sticking with w3c.dom.Element allow the possibility of
: standards based tools (eg XPath implementations) to be used by builders
: if they so wish?

Hmmm... that isn't something i'd considered.  You've convinced me.

: >3) I'm still confused about how state information could/would be
: >   passed up or down the tree by builders.  you mentioned something

: I ripped the ThreadLocal thing out and opted for
: DOMUtils.getAttributeWithInheritance instead. My one scenario I came
: across where I wanted some context passed down was "fieldName" and this
: is handled simply by leaf nodes walking up the w3c.dom.Node tree until
: you find an Element with this attribute set.

Hmm, i can see how that would work in some cases, but it's not very
general purpose.  Builders deep in the tree can "look up" at attributes
higher in the tree, but that assumes the state info they want is already
in the XML Elements (allthough i suppose builders higher up could
mutate the attributes on their Element for discovery by lower level
builders ... but that seems sketchy).

It also doesn't address the problem of passing state up the tree ... which
may be a less common case, but it still seems important to support it.

: The addBuilder call shown above should probably be changed to just take
: a Builder object and it should get the root tag name from the
: Builder.getTagName you propose.

Hmmm... can you remind me what i proposed again?

: I am still of the opinion that a self-documenting parser configuration
: ("what XML can I write: is it <tf> or <TermQuery>?") is important,
: otherwise users have to examine parser source code to determine it's
: capabilities. If we make the step of allowing the parser configuration
: to provide metadata about what is required/optional etc we can not only
: produce documentation but also drive a query-building GUI.

that's an interesting idea.  i suppose if each builder had a way to return
info about the tags/attrs they supported then a recursive "generateDTD()"
method could be implimented (there would be some trickiness in dealing
with the "classes" of tags ... but myabe it just seems harder to me since
i've never really bothered trying to understand DTDs

that certianly seems like a more easily solvable problem then the
toXmlString(Query) issue (since multiple tags could generate Queries of
hte same type)



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: XML based Query Parser

Posted by markharw00d <ma...@yahoo.co.uk>.

Hi Chris,
Thanks for taking the time to look at this in detail.


>1) The factory classes should have "removeBuilder" methods so people
>   subclassing parsers can flat out remove support for a particular
>   tag, not just replace it.
>  
>
Can do.

>2) This DOM version definitely seems easier to follow/use then the SAX
>   version (allthough thta could just be because i'm more familiar
>   with DOM then SAX) .. but it still seems like an intermediate
>   representation that wasn't org.w3c.dom.Element would be handy -- if
>   for no other reason then so the methods in DOMUtils could be put
>   directly in the "Element" class.
>  
>
But doesn't sticking with w3c.dom.Element allow the possibility of 
standards based tools (eg XPath implementations) to be used by builders 
if they so wish?

>3) I'm still confused about how state information could/would be
>   passed up or down the tree by builders.  you mentioned something
>   before about using ThreadLocal (which i'm not very familiar with)
>   but i don't see any examples of that here.
>  
>
I ripped the ThreadLocal thing out and opted for 
DOMUtils.getAttributeWithInheritance instead. My one scenario I came 
across where I wanted some context passed down was "fieldName" and this 
is handled simply by leaf nodes walking up the w3c.dom.Node tree until 
you find an Element with this attribute set.

>4) The various getQuery (and getFilter and getSpanQuery) methods
>   should use e.getTagName when throwing ParserExceptions about not
>   finding what they're looking for,  That way, if i write a
>   parser that has...
>      queryFactory.addBuilder("tf",new TermQueryObjectBuilder());
>   ... the ParserExceptions thrown by TermQueryObjectBuilder.getQuery
>   when i forget to include a value="foo" can refrence the tag name
>   i'm using (tf), and not the hardcoded constant "TermQuery"
>  
>
Agreed. In addition, all nested tag names (eg BooleanQuery's  "Clause") 
should be configured as properties that can be changed if so desired. 
The addBuilder call shown above should probably be changed to just take 
a Builder object and it should get the root tag name from the 
Builder.getTagName you propose.


>5) it seems like a lot of redundency could be removed between the
>   various builders by refactoring some more utility functions --
>   particularly in the error cases  (see attached patch)
>  
>

Thanks, this tidies things up nicely.

I am still of the opinion that a self-documenting parser configuration 
("what XML can I write: is it <tf> or <TermQuery>?") is important, 
otherwise users have to examine parser source code to determine it's 
capabilities. If we make the step of allowing the parser configuration 
to provide metadata about what is required/optional etc we can not only 
produce documentation but also drive a query-building GUI.

Cheers
Mark


		
___________________________________________________________ 
Win a BlackBerry device from O2 with Yahoo!. Enter now. http://www.yahoo.co.uk/blackberry

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: XML based Query Parser

Posted by Chris Hostetter <ho...@fucit.org>.

: Further to our discussions some time ago I've had some time to put
: together an XML-based query parser with support for many "advanced"
: query types not supported in the current Query parser.
:
: More details and code here: http://www.inperspective.com/lucene/LXQuery2.htm

So I *finally* got a chance to look at this in depth.  In general, i think
it looks great, but I have a few questions/comments...


1) The factory classes should have "removeBuilder" methods so people
   subclassing parsers can flat out remove support for a particular
   tag, not just replace it.

2) This DOM version definitely seems easier to follow/use then the SAX
   version (allthough thta could just be because i'm more familiar
   with DOM then SAX) .. but it still seems like an intermediate
   representation that wasn't org.w3c.dom.Element would be handy -- if
   for no other reason then so the methods in DOMUtils could be put
   directly in the "Element" class.

3) I'm still confused about how state information could/would be
   passed up or down the tree by builders.  you mentioned something
   before about using ThreadLocal (which i'm not very familiar with)
   but i don't see any examples of that here.

4) The various getQuery (and getFilter and getSpanQuery) methods
   should use e.getTagName when throwing ParserExceptions about not
   finding what they're looking for,  That way, if i write a
   parser that has...
      queryFactory.addBuilder("tf",new TermQueryObjectBuilder());
   ... the ParserExceptions thrown by TermQueryObjectBuilder.getQuery
   when i forget to include a value="foo" can refrence the tag name
   i'm using (tf), and not the hardcoded constant "TermQuery"

5) it seems like a lot of redundency could be removed between the
   various builders by refactoring some more utility functions --
   particularly in the error cases  (see attached patch)


-Hoss