You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by mark harwood <ma...@yahoo.co.uk> on 2005/12/02 16:03:22 UTC

"Advanced" query language

There seems to be a growing gap between Lucene
functionality and the query language offered by
QueryParser (eg no support for regex queries, span
queries, "more like this", filter queries,
minNumShouldMatch etc etc).

Closing this gap is hard when:
a) The availability of Javacc+Lucene skills is a
bottleneck 
b) The syntax of the query language makes it difficult
to add new features eg rapidly running out of "special
characters"

I don't think extending the existing query
parser/language is necessarily useful and I see it
being used purely to support the classic "simple
search engine" syntax. 

Unfortunately the fall-back position for applications
which require more complex queries is to "just write
some Java code to instantiate the Query objects
programmatically." This is OK but I think there is
value in having an advanced search syntax capable of
supporting the latest Lucene features and expressed in
XML. It's worth considering why it's useful to have a
String-representable form for queries:
1) Queries can be stored eg in audit logs or "saved
queries" used for tasks like auto-categorization
2) Clients built in languages other than Java can
issue queries to a Lucene server
3) I can decouple a request from the code that
implements the query when distributing software e.g my
applet may not want Lucene dragging down to the client

Currently we cannot easily do the above for any
"complex" queries  because they are not easily
persisted (yes, we could serialize Query objects but
that seems messy and does not solve points 2 and 3).

We can potentially use XML in the same way ANT does
i.e. a declarative way of invoking an extensible list
of Java-implemented features. A query interpreter is
used to instantiate the configured Java Query objects
and populates them with settings from the XML in a
generic fashion (using reflection) eg:
....
   <MoreLikeThis minNumberShouldMatch="3"
maxQueryTerms="30">
      <text>
    Lorem ipsum dolor sit amet, consectetuer
adipiscing
    elit. Morbi eget ante blandit quam faucibus
posuere. Vivamus
    porta, elit fringilla venenatis consequat, neque
lectus
    gravida dolor, sed cursus nunc elit non lorem.
Nullam congue
    orci id eros. Nunc aliquet posuere enim.
      </text>
   </MoreLikeThis>
</BooleanClause>

Do people feel this would be a worthwhile endeavour?
I'm not sure if enough people feel pain around the
points 1-3 outlined above to make it worth pursuing.


Cheers
Mark



		
___________________________________________________________ 
How much free photo storage do you get? Store your holiday 
snaps for FREE with Yahoo! Photos http://uk.photos.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Paul Elschot <pa...@xs4all.nl>.
On Friday 02 December 2005 16:03, mark harwood wrote:
> There seems to be a growing gap between Lucene
> functionality and the query language offered by
> QueryParser (eg no support for regex queries, span
> queries, "more like this", filter queries,
> minNumShouldMatch etc etc).
> 
> Closing this gap is hard when:
> a) The availability of Javacc+Lucene skills is a
> bottleneck 
> b) The syntax of the query language makes it difficult
> to add new features eg rapidly running out of "special
> characters"
> 
> I don't think extending the existing query
> parser/language is necessarily useful and I see it
> being used purely to support the classic "simple
> search engine" syntax. 
> 
> Unfortunately the fall-back position for applications
> which require more complex queries is to "just write
> some Java code to instantiate the Query objects
> programmatically." This is OK but I think there is
> value in having an advanced search syntax capable of
> supporting the latest Lucene features and expressed in
> XML. It's worth considering why it's useful to have a
> String-representable form for queries:
> 1) Queries can be stored eg in audit logs or "saved
> queries" used for tasks like auto-categorization
> 2) Clients built in languages other than Java can
> issue queries to a Lucene server
> 3) I can decouple a request from the code that
> implements the query when distributing software e.g my
> applet may not want Lucene dragging down to the client
> 
> Currently we cannot easily do the above for any
> "complex" queries  because they are not easily
> persisted (yes, we could serialize Query objects but
> that seems messy and does not solve points 2 and 3).
> 
> We can potentially use XML in the same way ANT does
> i.e. a declarative way of invoking an extensible list
> of Java-implemented features. A query interpreter is
> used to instantiate the configured Java Query objects
> and populates them with settings from the XML in a
> generic fashion (using reflection) eg:
> ....
>    <MoreLikeThis minNumberShouldMatch="3"
> maxQueryTerms="30">
>       <text>
>     Lorem ipsum dolor sit amet, consectetuer
> adipiscing
>     elit. Morbi eget ante blandit quam faucibus
> posuere. Vivamus
>     porta, elit fringilla venenatis consequat, neque
> lectus
>     gravida dolor, sed cursus nunc elit non lorem.
> Nullam congue
>     orci id eros. Nunc aliquet posuere enim.
>       </text>
>    </MoreLikeThis>
> </BooleanClause>

Quidquid id est ...
Do we have a Latin analyzer?

> 
> Do people feel this would be a worthwhile endeavour?
> I'm not sure if enough people feel pain around the
> points 1-3 outlined above to make it worth pursuing.

There are at least two more issues:

Some queries can be nested inside others, and some
nesting combinations can not be searched. For example it is
not possible to have a BooleanQuery inside a PhraseQuery.
How to deal with these?

XML is not readable/writable by the most humans that could
make good use of the extra power in the gap left open
by the default query language. See also this:
http://ciir.cs.umass.edu/irdemo/inqinfo/inqueryhelp.html
Do you want to decouple (as above) at the human interface?


There is also the contrib/surround query language/
This language avoids using special characters by using prefix
operators. Adding prefix operators like this is straightforward:

moreLikeThis(3,  30,  termList(Lorem ipsum dolor sit amet))

for practical use, this could be simplified to:

mlt(3,  30,  (Lorem ipsum dolor sit amet))

Such additions are a bit of work, but the query possibilities of Lucene
do not change that fast.
Adding infix operators with operators in between their arguments
(infix) is a bit more involved.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Daniel Naber <lu...@danielnaber.de>.
On Samstag 03 Dezember 2005 03:57, Yonik Seeley wrote:

> It would be nice to resolve/fix the whole "JavaCC using an exception
> for flow control" issue too.

Did anybody have a look yet at javacc 4.0beta1, does it maybe fix that 
problem?

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Paul Elschot <pa...@xs4all.nl>.
On Sunday 04 December 2005 22:32, markharw00d wrote:
> I think I'm with Erik on this - I generally don't see end users keen to 
> type anything other than "words with spaces" as queries. 

I think/hope that XSL allows a simplified front end that would fit
my needs.

> I do see them commonly using GUI forms with multiple inputs and behind 
> the scenes application code assembling the query - the same way just 
> about every web app in the world has forms that create SQL on the user's 
> behalf.
> Like SQL, I do see this proposed new query syntax as a language for 
> developers.
> 
> Aside from the debate over choice of query syntax we would also need to 
> consider the impact such a language has on the query objects it 
> instantiates.

For the surround language I made a layer of classes between the
parser and Lucene in the org.apache.lucene.queryParser.surround.query
package. This layer exists mainly because not all the operators in
the surround language directly match to Lucene classes. Also,
term expansion of truncations is refactored more than in Lucene
to allow for a maximum on the total number of expanded terms,
regardless of the query structure.
I don't know whether such a layer would be needed for an xml
based parser.

> I like the Spring/Ant approach which uses reflection to wire up beans 
> generically because this allows new objects to be plugged in to the 
> framework without having to rewrite the parser.
> This "generic wirer" approach requires the wirable objects to obey 
> JavaBean conventions (zero arg constructor and public getters/setters 
> for properties). Many existing Lucene Query objects have their mandatory 
> properties passed into their constructors and so would not directly fit 
> into such a framework. I can see that changing existing query classes to 
> provide a no-arg constructor would be a contentious move because it 
> would make it possible for developers using them directly to mistakenly 
> instantiate Query objects without passing mandatory parameters. Perhaps 
> in these cases it would be better to preserve the existing class and 
> provide a "parser wrapper bean" used purely to integrate the existing 
> Query class with the new parser framework.

That sounds like some good reasons for a layer between the parser
and Lucene.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by markharw00d <ma...@yahoo.co.uk>.
I think I'm with Erik on this - I generally don't see end users keen to 
type anything other than "words with spaces" as queries. 
I do see them commonly using GUI forms with multiple inputs and behind 
the scenes application code assembling the query - the same way just 
about every web app in the world has forms that create SQL on the user's 
behalf.
Like SQL, I do see this proposed new query syntax as a language for 
developers.

Aside from the debate over choice of query syntax we would also need to 
consider the impact such a language has on the query objects it 
instantiates.
I like the Spring/Ant approach which uses reflection to wire up beans 
generically because this allows new objects to be plugged in to the 
framework without having to rewrite the parser.
This "generic wirer" approach requires the wirable objects to obey 
JavaBean conventions (zero arg constructor and public getters/setters 
for properties). Many existing Lucene Query objects have their mandatory 
properties passed into their constructors and so would not directly fit 
into such a framework. I can see that changing existing query classes to 
provide a no-arg constructor would be a contentious move because it 
would make it possible for developers using them directly to mistakenly 
instantiate Query objects without passing mandatory parameters. Perhaps 
in these cases it would be better to preserve the existing class and 
provide a "parser wrapper bean" used purely to integrate the existing 
Query class with the new parser framework.



Cheers,
Mark


		
___________________________________________________________ 
Yahoo! Model Search 2005 - Find the next catwalk superstars - http://uk.news.yahoo.com/hot/model-search/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Dec 4, 2005, at 11:02 AM, Paul Elschot wrote:
> Are there XML editors that can limit their output to a given  
> stylesheet?
> In that case one only needs to predefine a style sheet for queries.

Yes, there are many sophisticated XML editors.  I'm not quite sure  
where you're going with this though.

>> Almost all users want to enter "words separated by spaces", and very
>> little else.  QueryParser succeeds fine for this purpose.
>
> Those are not the users that I'm thinking of.

Those are some highly specialized users :)

>> I think we should focus on the machine-to-machine use case of
>> communicating a Query in this discussion.
>
> That's ok, but when a few simple constraints are enough to
> make it useful for humans that need the extra query power
> enough to be willing to enter more syntax, then why not?

I agree with the sentiment, truly.  And I'm quite open to QueryParser  
itself being expanded to support more sophisticated queries if the  
additional syntax still allows the more common simpler TermQuery/ 
PhraseQuery/BooleanQuery cases.

I just don't think its very practical to come up with such syntax and  
have any kind of consensus on it across the majority of Lucene  
users.  QueryParser is surely embedded in many applications and  
exposing querying capability that the application developers may not  
be aware is possible.  Field selection itself is even questionable in  
the general sense.

In short, QueryParser is a double-edged sword - powerful, but perhaps  
too powerful.  Simple in one sense, but too complicated when digging  
deeper.  I could almost be bold enough to claim that each application  
should build this kind of parsing in a custom way.

>>> I don't know XML that well. Does it have a facility to allow
>>> different roles
>>> for nested constructs?
>>
>> I'm not following what you mean by different roles.  Could you
>> provide an example.
>
> For example the clauses in a boolean query can have these roles:
> required, optional, and excluded.
> Thinking about it, this would probably map to sth like:
>
> <BooleanQuery>
>   <BooleanClause role="required">
>     <SomeSubQuery/>
>   </BooleanClause>
>   <!-- more clauses -->
> </BooleanQuery>
>
> Is it possible in XML to predefine rc so that <rc>...</rc> means:
> <BooleanClause role="required">...</BooleanClause> ?

Only via DTD/XSD could this be done, but that is way overkill and too  
complicated for our purposes.  For the general XML case, make it  
verbose specifying everything with no shortcuts like this.  The flags  
should be spelled out explicitly for <BooleanClause>.  If someone  
wants a shortcut XML syntax, that is where XSLT could come in, possibly.

"role" seems too generic here.  I recommend something like  
occur="must/should/mustnot" which maps to the API more precisely.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Paul Elschot <pa...@xs4all.nl>.
On Sunday 04 December 2005 15:26, Erik Hatcher wrote:
> 
> On Dec 4, 2005, at 6:52 AM, Paul Elschot wrote:
> > I tried rewroting the XML query in exactly this way, with a
> > few property=.. constructs:
> >
> > boostingQuery(
> >   matchQuery=moreLikeThis(
> >                             percentTermsToMatch="0.25",
> >                             docId="44",
> >                             compareField("contents"),
> >                             compareField("title")),
> >  downGradeQuery=simpleQuery("contents")
> > ....
> > etc.
> >
> > But then I concluded that a GUI would be better for human input.
> > Nonetheless, this syntax is simpler than XML, so it might
> > be more acceptable than XML for human input.
> 
> I cannot at all fathom a use case where anything like this would be  
> human enterable.  I realize, Paul, that you're after a human- 
> enterable syntax that can create sophisticated queries, but XML  
> certainly is not appropriate, or even a short-cut of XML (see YAML -   
> http://www.yaml.org/).  It's a shame there isn't (that I can find) a  
> decent YAML parser in Java.

Are there XML editors that can limit their output to a given stylesheet?
In that case one only needs to predefine a style sheet for queries.
 
> Almost all users want to enter "words separated by spaces", and very  
> little else.  QueryParser succeeds fine for this purpose.

Those are not the users that I'm thinking of.

> I think we should focus on the machine-to-machine use case of  
> communicating a Query in this discussion.

That's ok, but when a few simple constraints are enough to
make it useful for humans that need the extra query power
enough to be willing to enter more syntax, then why not?
 
> > The problem is that query language operators form queries and have
> > properties and subqueries with possibly different roles.
> > The subqueries cause the need for nesting and the properties and roles
> > cause the need for the property=... syntax.
> 
> > I don't know XML that well. Does it have a facility to allow  
> > different roles
> > for nested constructs?
> 
> I'm not following what you mean by different roles.  Could you  
> provide an example.

For example the clauses in a boolean query can have these roles:
required, optional, and excluded.
Thinking about it, this would probably map to sth like:

<BooleanQuery>
  <BooleanClause role="required">
    <SomeSubQuery/>
  </BooleanClause>
  <!-- more clauses -->
</BooleanQuery>

Is it possible in XML to predefine rc so that <rc>...</rc> means:
<BooleanClause role="required">...</BooleanClause> ?

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Dec 4, 2005, at 6:52 AM, Paul Elschot wrote:
> I tried rewroting the XML query in exactly this way, with a
> few property=.. constructs:
>
> boostingQuery(
>   matchQuery=moreLikeThis(
>                             percentTermsToMatch="0.25",
>                             docId="44",
>                             compareField("contents"),
>                             compareField("title")),
>  downGradeQuery=simpleQuery("contents")
> ....
> etc.
>
> But then I concluded that a GUI would be better for human input.
> Nonetheless, this syntax is simpler than XML, so it might
> be more acceptable than XML for human input.

I cannot at all fathom a use case where anything like this would be  
human enterable.  I realize, Paul, that you're after a human- 
enterable syntax that can create sophisticated queries, but XML  
certainly is not appropriate, or even a short-cut of XML (see YAML -   
http://www.yaml.org/).  It's a shame there isn't (that I can find) a  
decent YAML parser in Java.

Almost all users want to enter "words separated by spaces", and very  
little else.  QueryParser succeeds fine for this purpose.

I think we should focus on the machine-to-machine use case of  
communicating a Query in this discussion.

> The problem is that query language operators form queries and have
> properties and subqueries with possibly different roles.
> The subqueries cause the need for nesting and the properties and roles
> cause the need for the property=... syntax.

> I don't know XML that well. Does it have a facility to allow  
> different roles
> for nested constructs?

I'm not following what you mean by different roles.  Could you  
provide an example.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Paul Elschot <pa...@xs4all.nl>.
On Sunday 04 December 2005 05:17, Yonik Seeley wrote:
> On 12/3/05, Paul Elschot <pa...@xs4all.nl> wrote:
> > Indeed, this is a disadvantage of the "function call" syntax.
> 
> It depends on the langage.  Take Python for example:
> 
> >>> def foo(a,b): print a,b
> >>> foo(1,2)
> 1 2
> >>> foo(a=1,b=2)
> 1 2
> >>> foo(b=2,a=1)
> 1 2
> >>>

I tried rewroting the XML query in exactly this way, with a
few property=.. constructs:

boostingQuery(
  matchQuery=moreLikeThis(
                            percentTermsToMatch="0.25",
                            docId="44",
                            compareField("contents"),
                            compareField("title")),
 downGradeQuery=simpleQuery("contents")
....
etc.

But then I concluded that a GUI would be better for human input.
Nonetheless, this syntax is simpler than XML, so it might
be more acceptable than XML for human input.
When the property=... syntax is optional, (as it is in python),
and when meaningfull abbreviations for the longNames above can be found,
it might actually be feasible.

The problem is that query language operators form queries and have
properties and subqueries with possibly different roles.
The subqueries cause the need for nesting and the properties and roles
cause the need for the property=... syntax.

XML already has the property=... syntax, and there are good GUI's available
for manually creating nested XML constructs.
Also I think we can safely assume that the users that can benefit from
more complex query facilities will be able to provide queries in XML.

I don't know XML that well. Does it have a facility to allow different roles
for nested constructs?

That only leaves the longNames in the examples above, but these
can be avoided by allowing short forms.

So I think using XML for an advanced query language is a good idea when:
- short forms are provided for the most common names to be used,
  much like <a href="...">...</a>, <p>  and <h3>...</h3>  in HTML, and
- it has an easy to use facility to allow different roles for
  nested constructs.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Yonik Seeley <ys...@gmail.com>.
On 12/3/05, Paul Elschot <pa...@xs4all.nl> wrote:
> Indeed, this is a disadvantage of the "function call" syntax.

It depends on the langage.  Take Python for example:

>>> def foo(a,b): print a,b
>>> foo(1,2)
1 2
>>> foo(a=1,b=2)
1 2
>>> foo(b=2,a=1)
1 2
>>>


-Yonik
Now hiring -- http://forms.cnet.com/slink?231706

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by markharw00d <ma...@yahoo.co.uk>.
Paul Elschot wrote:

>Would it be possible to privide such a GUI automatically
>(by introspection) given a set of Query classes of which objects
>can be mixed to form a query?
>  
>

Certainly possible - I've seen app servers with automatic GUI test 
clients which can introspect an EJB interface and let you construct 
instances of the data objects that need to be passed. As generic tools 
they can be clunky to use so it's definitely a developer-level tool, 
(Luke ?)  not an end-user level tool. I wonder if it's worth considering 
when developers have IDEs with decent autocomplete/integrated Javadoc 
hints.

If you were to provide an end user-friendly generic client I suspect 
you'd need metadata about not just the Query objects but also the 
documents in the index e.g to offer drop-down lists of values for 
certain fields in the GUI. Again, possible, but you'd have to ask 
yourself if it would just be simpler to code a custom GUI for your users 
in each case.


	
	
		
___________________________________________________________ 
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Paul Elschot <pa...@xs4all.nl>.
On Saturday 03 December 2005 19:00, markharw00d wrote:
> Erik Hatcher wrote:
> 
...
> parameters that tweak it's behaviour. If I don't have a query language 
> that names the parameters explicitly (say, XML) I end up having to 
> define what looks like a function with a long list of parameters: "like 
> (123,,,4,,,)". Ack.
> 

Indeed, this is a disadvantage of the "function call" syntax.
Human input of the arguments of more complex queries
best supported by a GUI, much like existing SQL query
front ends.
Would it be possible to privide such a GUI automatically
(by introspection) given a set of Query classes of which objects
can be mixed to form a query?

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: "Advanced" query language

Posted by Pasha Bizhan <lu...@lucenedotnet.com>.
Hi, 

> From: markharw00d [mailto:markharw00d@yahoo.co.uk] 
> Re: MoreLikeThis queries.
> Yes, they can be usefully wrapped as queries (see attached simple 
> example). In fact it was  my attempts at bastardising QueryParser to 
> support them that brought home it's limitations. I ended up with a 
> subclass hack that (mis)used the field name to parse a query string 
> "like:123" where 123 was a doc id. With the QueryParser 
> syntax I was not  able to pass other parameters which MoreLikeThis could 
> usefully use to  control the behaviour of this query type eg choice of 
> fieldname(s) used,  max number of terms generated, minNumberShouldTerms to
match etc etc.

With the _current_ QP syntax. 

In refer to my previous letter about syntax handlers you would be able to
pass the parameters to handler.

	string query = "like(param1, param2,...): (bla-bla-bla)";

A syntax of parameters isn't signifant to QP. QP do not need to know
anything about parameter's syntax.

	string query="like(percentTermsToMatch="0.25f",docId="44",...):...
";
Or
	string query="like(0.25f,44): ..."


> This is not unusual, each query type has potentially multiple 
> optional 
> parameters that tweak it's behaviour. If I don't have a query 
> language 
> that names the parameters explicitly (say, XML) I end up having to 
> define what looks like a function with a long list of 
> parameters: "like 
> (123,,,4,,,)". Ack.
 
Exactly. 
 
Pasha Bizhan


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by markharw00d <ma...@yahoo.co.uk>.
Erik Hatcher wrote:

> Rest assured that human-readable query expressions aren't going away  
> at all.  I don't think Mark even implied that.


That's right. The proposal is *not* to replace what is already there - 
QueryParser will always have a useful role to play supporting the 
"Google-like" query syntax familiar to millions.
I'd just like to see another full-featured query representation for the 
reasons already outlined.

Picking up on some points raised:

Re: MoreLikeThis queries.
Yes, they can be usefully wrapped as queries (see attached simple 
example). In fact it was  my attempts at bastardising QueryParser to 
support them that brought home it's limitations. I ended up with a 
subclass hack that (mis)used the field name to parse a query string 
"like:123" where 123 was a doc id. With the QueryParser syntax I was not 
able to pass other parameters which MoreLikeThis could usefully use to 
control the behaviour of this query type eg choice of fieldname(s) used, 
max number of terms generated, minNumberShouldTerms to match etc etc.
This is not unusual, each query type has potentially multiple optional 
parameters that tweak it's behaviour. If I don't have a query language 
that names the parameters explicitly (say, XML) I end up having to 
define what looks like a function with a long list of parameters: "like 
(123,,,4,,,)". Ack.

Here's a psuedo-code example that throws together some of the more 
obscure parts of Lucene not represented in the existing QueryParser as 
an illustration of how this could look in a more wide-reaching parser.
Imagine the user has selected an example doc #44 as something they are 
interested in, on the subject of "hockey" but they prefer to see 
documents that don't talk about ice hockey

<BoostingQuery>
             <MatchQuery>
                         <MoreLikeThisQuery percentTermsToMatch="0.25f" 
docId="44">
                                     <CompareField name="contents"/>
                                     <CompareField name="title"/>
                         </MoreLikeThis>
             </MatchQuery>
             <DowngradeQuery demoteValue="0.5" >
                      <SimpleQuery defaultField="contents">
                                <queryText>"ice hockey" OR puck OR 
rink</queryText>
                      </SimpleQuery>
             </DowngradeQuery>
</BoostingQuery>

BoostingQuery is a class that can use a second query to demote the 
results of a first query if it matches (see here: 
http://wiki.apache.org/jakarta-lucene/CommunityContributions)
For this and other forms of query to be able to plug into new parser the 
Query objects just need to adhere to bean conventions to be 
automatically wired in an ANT/Spring like way using reflection.
For example,  the implementation of BoostingQuery would need to have 
getter/setter properties for "MatchQuery" and "downgradeQuery".
Note in this example that the existing QueryParser syntax is usefully 
used in "SimpleQuery" to avoid making the XML too verbose.

There's much detail to be added in how this would work in practice but I 
thought I'd post it here to show the general shape of one possible 
direction.







Re: "Advanced" query language

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Rest assured that human-readable query expressions aren't going away  
at all.  I don't think Mark even implied that.  The idea is to have a  
way to communicate a query electronically in a precise way that  
avoids parser syntax and the awkwardness this could have with  
analysis.  This seems reasonable.

I can't imagine humans typing <BooleanQuery><TermQuery field="title"  
term="foobar"/><WildcardQuery field="body" expression="*foo*"/></ 
BooleanQuery>  :))

	Erik


On Dec 2, 2005, at 9:57 PM, Yonik Seeley wrote:
> Just as a clarification, human-readable strings for queries are
> essential for how we do things at CNET.
>
> In addition to Mark's comments:
> - standard logging mechanisms such as the access log of a app server
> are readable
> - easily human typable one-off queries during development and for
> troubleshooting + support are essential.
> - the speed at which a query can be parsed is important... in some
> systems, it's part of the transfer syntax from client to server and is
> an integral part of the system (again, analogy to SQL).
>
> That doesn't mean I fully support the XML idea, nor am I ready to
> abandon the current query syntax.  I have contemplated XML in the past
> as a way to support templating of queries... a way for a user to say,
> when someone queries field "x", expand this to this type of
> arbitrarily comples query involving fields a,s,d,f.  There might be a
> place for both LXQ (Lucene XML Query?) and the current query syntax.
>
> My (very long) todo list has support for DisjunctionMax and
> minNrShouldMatch on it, and I have worked in JavaCC in the past (an
> ASN.1 compiler, circa 1998).  No timeline promises though.  Also need
> to look closer at Paul's surround query language... I looked very
> briefly, but not enough to "get" it.
>
> It would be nice to resolve/fix the whole "JavaCC using an exception
> for flow control" issue too.
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Wolfgang Hoschek <wo...@mac.com>.
Right now the Sun STAX impl is decidedly buggy compared to xerces SAX  
(and it's not faster either). The most complete, reliable and  
efficient STAX impl seems to be woodstox.

Wolfgang.

On Dec 15, 2005, at 7:22 PM, Yonik Seeley wrote:

> Agreed, that is a significant downside.
> StAX is included in Java 6, but that doesn't help too much given the
> Java 1.4 req.
>
> -Yonik
>
> On 12/15/05, Wolfgang Hoschek <wo...@mac.com> wrote:
>
>> STAX would probably make coding easier, but unfortunately complicates
>> the packaging side: one must ship at least two additional external
>> jars (stax interfaces and impl) for it to become usable. Plus, STAX
>> is quite underspecified (I wrote a STAX parser + serializer impl
>> lately), so there's room for runtime suprises with different impls.
>> The primary advantage of SAX is that everything is included in JDK >=
>> 1.4, and that impls tend to be more mature. SAX bottom line: more
>> hassle early on, less hassle later.
>>
>> Wolfgang.
>>
>> On Dec 15, 2005, at 5:47 PM, Yonik Seeley wrote:
>>
>>
>>> On 12/15/05, markharw00d <ma...@yahoo.co.uk> wrote:
>>>
>>>
>>>> At this stage I am more interested in feedback on parser design/
>>>> approach
>>>>
>>>>
>>>
>>> Excellent idea.
>>> While SAX is fast, I've found callback interfaces more difficult to
>>> deal with while generating nested object graphs... it normally
>>> requires one to maintain state in stack(s).
>>>
>>> Have you considered a pull-parser like StAX or XPP?  They are as  
>>> fast
>>> as SAX, and allow you to ask for the next XML event you are  
>>> interested
>>> in, eliminating the need to keep track of where you are by other  
>>> means
>>> (the place in your own code and normal variables do that).  It
>>> normally turns into much more natural code.
>>>
>>> -Yonik
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Yonik Seeley <ys...@gmail.com>.
Agreed, that is a significant downside.
StAX is included in Java 6, but that doesn't help too much given the
Java 1.4 req.

-Yonik

On 12/15/05, Wolfgang Hoschek <wo...@mac.com> wrote:
> STAX would probably make coding easier, but unfortunately complicates
> the packaging side: one must ship at least two additional external
> jars (stax interfaces and impl) for it to become usable. Plus, STAX
> is quite underspecified (I wrote a STAX parser + serializer impl
> lately), so there's room for runtime suprises with different impls.
> The primary advantage of SAX is that everything is included in JDK >=
> 1.4, and that impls tend to be more mature. SAX bottom line: more
> hassle early on, less hassle later.
>
> Wolfgang.
>
> On Dec 15, 2005, at 5:47 PM, Yonik Seeley wrote:
>
> > On 12/15/05, markharw00d <ma...@yahoo.co.uk> wrote:
> >
> >> At this stage I am more interested in feedback on parser design/
> >> approach
> >>
> >
> > Excellent idea.
> > While SAX is fast, I've found callback interfaces more difficult to
> > deal with while generating nested object graphs... it normally
> > requires one to maintain state in stack(s).
> >
> > Have you considered a pull-parser like StAX or XPP?  They are as fast
> > as SAX, and allow you to ask for the next XML event you are interested
> > in, eliminating the need to keep track of where you are by other means
> > (the place in your own code and normal variables do that).  It
> > normally turns into much more natural code.
> >
> > -Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Wolfgang Hoschek <wo...@mac.com>.
STAX would probably make coding easier, but unfortunately complicates  
the packaging side: one must ship at least two additional external  
jars (stax interfaces and impl) for it to become usable. Plus, STAX  
is quite underspecified (I wrote a STAX parser + serializer impl  
lately), so there's room for runtime suprises with different impls.  
The primary advantage of SAX is that everything is included in JDK >=  
1.4, and that impls tend to be more mature. SAX bottom line: more  
hassle early on, less hassle later.

Wolfgang.

On Dec 15, 2005, at 5:47 PM, Yonik Seeley wrote:

> On 12/15/05, markharw00d <ma...@yahoo.co.uk> wrote:
>
>> At this stage I am more interested in feedback on parser design/ 
>> approach
>>
>
> Excellent idea.
> While SAX is fast, I've found callback interfaces more difficult to
> deal with while generating nested object graphs... it normally
> requires one to maintain state in stack(s).
>
> Have you considered a pull-parser like StAX or XPP?  They are as fast
> as SAX, and allow you to ask for the next XML event you are interested
> in, eliminating the need to keep track of where you are by other means
> (the place in your own code and normal variables do that).  It
> normally turns into much more natural code.
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Paul Smith <ps...@aconex.com>.
Hey all,

I haven't been paying real close attention to this thread, but if any  
of you are looking for something that has _easy_ Object->XML->Object  
you should seriously try XStream (http://xstream.codehaus.org)..   
Simplest/easiest api I've seen.  BSD licensed too (Apache friendly).   
One can register a Converter class to assist with anything the built- 
in converters don't handle well. The Convertor code is nice and elegant.

Just something to think about maybe?

cheers,

Paul
On 22/12/2005, at 11:20 AM, Chris Hostetter wrote:

>
> I finally got a chance to look at this code today (the best part  
> about the
> last day before vacation, is no one expects you to get anything  
> done, so
> you can ignore your "real work" and spend time on things that are more
> important in the long run) and while I still havne't wrapped my head
> arround all of it, I wanted to share my thoughts so far on the API...
>
> 1) I aplaud the plugable nature of your solution. Looking at the Test
> Case, it is easy to see exactly how a service provider could
> do things like override the behavior of a <PhraseQuery> to be  
> implimented
> as a SpanQuery without their clients being affected at all.  Kudos.
>
> 2) Digging into what was involved in writting an ObjectBuilder, I  
> found
> the api somewhat confusion.  I was reminded of this exchange you  
> had with
> Yonik...
>
> : > While SAX is fast, I've found callback interfaces
> : > more difficult to
> : > deal with while generating nested object graphs...
> : > it normally
> : > requires one to maintain state in stack(s).
> :
> : I've gone to some trouble to avoid the effects of this
> : on the programming model.
>
> As someone who feels very comfortable with Lucene, but has no
> practical experience with SAX, I have to say that I don't really  
> feel like
> the API has a very clean seperation from SAX.
>
> I think that the ideal API wouldn't require people writing  
> ObjectBuilders
> to know anything about sax, or to ever need to import anything from
> org.xml.** or javax.xml.**
>
>
> 3) While the *need* to maintaing/pass state information should be  
> avoided.
> I can definitely think of uses for this framework that may *want*  
> to pass
> state information -- both down to the ObjectBuilders that get used in
> inner nodes, as well as up to wrapping nodes, and there doesn't  
> seem to be
> an easy way to that.  (it could just be my lack of SAX knowledge  
> though)
>
> The best example i can give is if someone (ie: me) wanted to use this
> framework to allow boolean queries to be written like this...
>
>    <BooleanQuery>
>       <TermQuery occurs="mustNot" field="contents" value="mustNot"/>
>       <UserInput occurs="must">"a phrase" fuzzy~</UserInput>
>    </BooleanQuery
>
> ...i want to be able to write an  
> "BooleanClauseWrapperObjectBuilder" that
> can be wrapped around any other ObjectBuilder and will return whatever
> object it does, but will also check for and "occurs" attribute, and  
> put
> that in a state bucket somewhere that the BooleanQuery has access  
> to it
> when adding the Query it gets back.
>
> Going the ooposite direction, I'd like to be able to have tags that  
> set
> state which is accesible to descendent tags (even if hte tags in teh
> middle don't know anything about that bit of state.  for example:
> specifying how much slop should be used by default in phrase  
> queries...
>
>    <StateModifier defaultPhraseSlop="100">
>       ...
>       <BooleanQuery>
>          <PhraseQuery occurs="mustNot" field="contents">
>             How Now Brown Cow?
>          </PhraseQuery>
>          ...
>       </BooleanQuery
>    <StateModifier defaultPhraseSlop="100">
>
>
> I haven't had a chance to try implimenting this, but at a high  
> level, it
> seems like all of this should be possible and still easy to use.
> Here's a real rough cut at what i've had floating arround in the back
> of my head (I'm doing this straight into email, pardon any typo's or
> psuedo code) ...
>
>
>
> /** could be implimented with SAX, or DOM, or Pull */
> public interface LuceneXmlParser {
>     /** this method will call setParser(this) on each handler */
>     public void registerHandler(String tag, LuceneXmlHandler h);
>     /**
>      primary method for clients, parses the xml and calls processNode
>      on the root node
>      */
>     public Query parse(InputStream xml);
>     /**
>      dispatches to the appropriate handler's process method based
>      on the Node name, may be called by handlers for recursion of  
> children
>      nodes
>      */
>     public Query processNode(LuceneXmlNode n, State s)
> }
> public interface LuceneXmlHandler {
>     public void setParser(LuceneXmlParser p)
>     /**
>      should return a Query that corrisponds to the specified node.
>      may rea/modify state in any way it wants ... it is recommended  
> that
>      all implimenting methods wrap their state before passing it on  
> when
>      processing children.
>      */
>     public Query process(LuceneXmlNode n, State s)
> }
> /**
>  A State is a stack frame that can delegate read operations to another
>  State it wraps (if there is one).  but it cannot delegate modifying
>  operations.
>  Classes implimenting State should provide a constructor that takes
>  another State to wrap.
> */
> public interface State extends Map<String,Object> {
>    /**
>     for callers that wnat to know what's in the immeidate stack
>     frame without any delegation
>     */
>    public Map<String,Object> getOuterFrame();
>    /* should return a new state that wraps the current state */
>    public State wrapCurrentState();
> }
> /** a very simple api arround the most basic xml concepts */
> public interface LuceneXmlNode {
>    public CharSequence getNodeName();
>    public Map<String,String> getAttributes()
>    public CharSequence getBodyText();
>    public Iterator<LuceneXmlNode> getChildren()
> }
> /** an example handler for TermQuery */
> public class BooleanQueryHandler impliments LuceneXmlHandler {
>    LuceneXmlParser p;
>    public void setParser(LuceneXmlParser q) { p=q; }
>    public Query process(LuceneXmlNode n, State s) {
>      Map<String,String> attrs = getAttributes()
>      return new TermQuery(new Term(attrs.get("field"),attrs.get 
> ("value"))
>    }
> }
> /** an example handler for BooleanQuery */
> public class BooleanQueryHandler impliments LuceneXmlHandler {
>    LuceneXmlParser p;
>    public void setParser(LuceneXmlParser q) { p=q; }
>    public Query process(LuceneXmlNode n, State s) {
>      BooleanQuery r = new BooleanQuery;
>      Integer minShouldMatch = new Integer(n.getAttributes().get 
> ("minShouldMatch"));
>      r.setMinShouldMatch(minShouldMatch);
>      for (LuceneXmlNode kid : n.getChildren()) {
>         kidState = s.wrapCurrentState();
>         Query b = p.processNode(kid,kidState);
>         Occurs o = Occurs.MAY;
>         if (kidState.getOuterFrame().contains("occurs")) {
>             o = kidState.getOuterFrame().get();
>         }
>         r.add(b,o);
>      }
>      return r;
> }
> /**
>  an example handler that can make wrap any other handler and give it
>  BooleanClause.Occurs awareness
> */
> public class BooleanClauseWrapperHandler impliments LuceneXmlHandler {
>    LuceneXmlParser p;
>    LuceneXmlHandler inner;
>    public BooleanClauseWrapperHandler(LuceneXmlHandler i) { inner =  
> i; }
>    public void setParser(LuceneXmlParser q) { p=q; }
>    public Query process(LuceneXmlNode n, State s) {
>       Query q = i.process(n, s)
>       if (n.getAttributes().contains("occurs")) {
>         /* glossing over string parsing to object construction here */
>         s.put("occurs",n.getAttributes().get("occurs"));
>       }
>       return q;
>    }
> }
>
>
> ...does that make sense?
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Chris Hostetter <ho...@fucit.org>.
: This example code looks interesting. If I understand
: correctly using this approach requires that builders
: like the "q" QueryObjectBuilder instance must be
: explicitly registered with each and every builder that
: consumes its type of output eg BQOB and FQOB. An

correct.

: provider for the class (Query) at runtime. I presume
: doing it your way is a deliberate design choice in
: order to validate at compile time that a particular
: parser configuration has all of the necessary builders
: in place to support incoming XML. That seems like a

err... more to validate at compile time that some Builder which
wraps another Builder isn't expecting the inner builder to return a Query
when the inner builder is garunteed to only ever return a Filter.  the
situation might still come up where the incoming XML doesn't match the
expectation of hte code, at which point the inner builder can say "i don't
have the attributes/data i need to construct an object, so i'm throwing an
exception"

: reasonable approach.
: If I get some time I'll look at implementing something
: based on this.

two other things about an approach like this that occured to me while
i was trying to sleep are:

1) the Builder interfaces don't need to exactly mirror the object
hierarchy, in some cases what really makes sense is an interface per
"trait" so an interface like a "SloppyBuilder" can be implimented by the
PhraseQueryObjectBuilder and a SpanNearQueryObjectBuilder (which also
impliments SpannyBuilder) so builders that expect the nested XML they get
to contain stuff that has slop can require you register a SloppyBuilder
with them.

2) requiring that each builder be explicitly passed a refrence to a
builder of each type it wants to know about is a pain .. and it might be a
dangerous pain if you're trying to "replace" one builder with another one
-- ie: consider the case where a default set of builders exists, and you
ant to replace all uses of PhraseQuery with a SpanNearQuery ... you
remember to replace/override the delegation in QueryBuilder .. but maybe
there's some other Builder that wraps PhraseQuery's you forgot about.

A static pointer could be stored in the interface itself, so that if you
want to redifine the builder that gets used when you want a
"SloppyBuilder" you just access SloppyBuilder.INSTANCE....

public interface QueryBuilder extends ObjectBuilder {
    public static QueryBuilder INSTANCE;
    public Query process(Node n);
}
public interface SloppyBuilder extends QueryBuilder {
    public static SloppyBuilder INSTANCE;
}

... but that requires a single set of registered ObjectBuilders per JVM,
so i'm not fond of the idea ... but perhaps the someone can think of a way
that the same overall concept could be applied in a more general way.




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by mark harwood <ma...@yahoo.co.uk>.
This example code looks interesting. If I understand
correctly using this approach requires that builders
like the "q" QueryObjectBuilder instance must be
explicitly registered with each and every builder that
consumes its type of output eg BQOB and FQOB. An
alternative would be to register "q" just once with
the parser as the provider of Query objects and let
BQOB and FQOB and others look up the registered
provider for the class (Query) at runtime. I presume
doing it your way is a deliberate design choice in
order to validate at compile time that a particular
parser configuration has all of the necessary builders
in place to support incoming XML. That seems like a
reasonable approach.
If I get some time I'll look at implementing something
based on this.

Cheers
Mark


		
___________________________________________________________ 
Yahoo! Photos – NEW, now offering a quality print service from just 8p a photo http://uk.photos.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Chris Hostetter <ho...@fucit.org>.
: I'd still like to keep the parser core reasonably generic (ie
: java.lang.Object rather than Query or Filter) because I can see it being
: used for instantiating many different types of objects eg requests for
: GroupBy , highlighting,  indexing, testing etc.
: As for your type-safety requirement, one approach I considered which
: supports an extensible list of types with type safety was to use a
: reflection-based API like this:

Interesting approach.  I'm generally leary of reflection, but it's just my
prejudice, i have no legitimate objection to the idea.

: However, I'm not sure that that really buys us a lot more than the
: existing approach where java.lang.Object is used in ObjectConsumer and
: the calling ObjectBuilder has to explicitly cast a java.lang.Object to
: the required type. We still don't find out until runtime that something
: doesn't work.

well, ultimately what we're doing is parsing text supplied by the user,
there's really no way to know for certain if it's going to work untill
runtime anyway -- my main concern was in eliminating ClassCastExceptions,
without requiring every ObjectBuilder to do their own little
instanceof/exception/cast dance...

      ...
      Object o = parser.processChild(...);
      if (! o instanceof FooBar) {
         throw new ParseException("child element is not a FooBar", ...);
      }
      FooBar f = (FooBar)obj;
      o = parser.processChild(...);
      if (! o instanceof Query) {
         throw new ParseException("child element is not a Query", ...);
      }
      Query q = (Query)obj
      ...

That said, I think it's possible to eliminate some suprises at compile
time using a variation of the approach i described before (where a
seperate interface/register/process method is used for each type).  If the
ObjectBuilder's have init methods that take in the specific interface of
the parser they expect to deal with, then at compile time you'll at least
know if you're trying to deal with a parser that can only handle Query and
Filter objects and an ObjectHandler for building test indexes that's
expecting it's child nodes to be Documents.

Unfortunately I can't really picture in my mind how it could work without
having a buttload of marker interfaces and methods on the parsers.


Perhaps some logic inversion is in order? ...

          For the record: it's very late, and i'm totally
          winging this off the cuff, so forgive me if it makes
          no sense what so ever.

...what if instead of registering ObjectBuilders with the parser, the
ObjectBuilders are explicitly registered with eachother in a parent child
relationship -- possibly with wild carded names for "any child"; and the
"parser" just reads the InputStream and produced an intermediate
representation (either generic SAX like events or generic dom like
objects) and the user then interacts with the ObjectBuilder of the
outermost type they expect.

The interfaces the ObjectBuilders impliment can be organized in a
hierarchy that mirrors the hierarchy of the obejcts they produce, and
generic ObjectBuilder classes for high level interfaces can be made to
"shortcut" common sets of Builders (ie: all supported Query types)

Something like...

public interface FilterObjectBuilder extends ObjectBuilder {
   public Filter process(Node n);
}
public class FOB impliments FilterObjectBuilder {
   public FOB(/* no children allowed */) { ... }
   public Filter process(Node n) { ... }
}
public interface QueryObjectBuilder extends ObjectBuilder {
   public Query process(Node n) throws UnexpectedNodeNameException;
}
public interface TermQueryObjectBuilder extends QueryObjectBuilder;
public class TQOB impliments TermQueryObjectBuilder {
   public TQOB(/* no children allowed */) { ... }
   public Query process(Node n) { ... }
}
public interface FilterQueryObjectBuilder extends QueryObjectBuilder;
public class FQOB impliments FilterQueryObjectBuilder {
   public FQOB(FilterObjectBuilder fkid, QueryObjectBuilder qkid) { ... }
   public Query process(Node n) { ... }
}
public interface BooleanQueryObjectBuilder extends QueryObjectBuilder;
public class BQOB impliments BooleanQueryObjectBuilder {
   public BQOB(BooleanClauseObjectBuilder childBuilder) { ... }
   public Query process(Node n) { ... }
}
public interfae BooleanClauseObjectBuilder extends ObjectBuilder {
   public BooleanClause process(Node n);
}
public class BCOB impliments BooleanClauseObjectBuilder {
   public BCOB(QueryObjectBuilder childBuilder) { ... }
   public BooleanClause process(Node n) { ... }
}
...
public class QOB impliments QueryObjectBuilder {
   public QOB() { ... }
   /** adds a node name => QueryObjectBuilder that can be delegated to */
   public void addDelegate(String name, QueryObjectBuilder builder) {...}
   /** throws exception if n.getName() isn't a known delegate */
   public Query process(Node n) throws UnexpectedNodeNameException { ... }
}
...
   FilterObjectBuilder f = new FOB();
   QueryObjectBuilder q = new QOB();
   q.addDelegate(new BQOB(new BCOB(q)));
   q.addDelegate(new FQOB(f, q));
   q.addDelegate(new TQOB());
   ...
   /* how people get a Query object from some XML... */
   Node n = parser.parse(myXmlInputStream);
   Query myQuery = q.process(n);
   /* how people get a Filter object from some XML... */
   Node n = parser.parse(myXmlInputStream);
   Filter myFilter = f.process(n);

...even better, people could subclass the Parser, and put those
constructer/addDelegate calls in their parser's constructor, and
expose new type specific "parse" methods.

(NOTE: I've glossed over the other issues i mentioned in my earlier email
about passing up/down state, and being able "decorate" existing
ObjectBuilder's so things like <BooleanClause> could be
replaced with clause="required" in any sub tag of a <BooleanQuery> ... but
i *think* all of those things could still be done with an approach like
this.)




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Paul Smith <ps...@aconex.com>.
On 03/01/2006, at 11:08 AM, markharw00d wrote:

>
>> I thought
>> you said you "didn't really want to have to design a general API for
>> parsing XML as part of this project" ?   :)
>>
>
> Having grown tired of messing with my own solution I tried using  
> commons Digester with my example XML but ran into issues so I'm  
> back looking at a custom solution.

Seriously... Did you try out Xstream?  Digester is just too hard,  
Xstream will work so easily you'll be pleasantly suprised..

Paul


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by markharw00d <ma...@yahoo.co.uk>.
>I thought
>you said you "didn't really want to have to design a general API for
>parsing XML as part of this project" ?   :)
>  
>

Having grown tired of messing with my own solution I tried using commons 
Digester with my example XML but ran into issues so I'm back looking at 
a custom solution.
I'd still like to keep the parser core reasonably generic (ie 
java.lang.Object rather than Query or Filter) because I can see it being 
used for instantiating many different types of objects eg requests for 
GroupBy , highlighting,  indexing, testing etc.
As for your type-safety requirement, one approach I considered which 
supports an extensible list of types with type safety was to use a 
reflection-based API like this:
class MyObjectBuilder
.....
  parser.processChildElements(new Object(){
         public void consume(Query query)
         {
                     //do something with query
         }
       }
  );
....
class Parser
    public void processChildElements(Object consumer) ...

This is how an ObjectBuilder could hand-off to the parser to get 
something nested built eg a BooleanQuery builder wanting to get a 
clause's choice of Query built.
The generic parser knows nothing about the "Query" type - it introspects 
the supplied consumer and (by convention) sees what type of object the 
"consume" method takes. It then ensures that any child elements are 
built by a registered ObjectBuilder that supports that type and then 
passes the resulting object to the consumer by dynamically invoking the 
consume method.
However, I'm not sure that that really buys us a lot more than the 
existing approach where java.lang.Object is used in ObjectConsumer and 
the calling ObjectBuilder has to explicitly cast a java.lang.Object to 
the required type. We still don't find out until runtime that something 
doesn't work.
This type-safety requirement and the general-purpose-parser requirement 
seem to be at odds with each other.



		
___________________________________________________________ 
To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Chris Hostetter <ho...@fucit.org>.
: I'm personally happier to stick with one approach,
: preferably with an existing, standardized interface
: which lets me switch implementations. I didn't really
: want to have to design a general API for parsing XML
: as part of this project.

I'm not suggesting that, I'm just saying that the API people use when
writting ObjectBuilders should be agnostic of the underlying
implimentation -- and a good way to ensure that is to think about how it
*could* be implimented using different parsing methodologies.

: The parser framework was (apart from an annoying bug)
: letting me construct and run this collection of
: objects to create a RAMIndex , populate it, run
: queries and test results.
:
: In this scenario the parser is used as a generic
: instantiator of different objects using configurable
: choice of ObjectBuilders. That's why I used
: "ObjectBuilder" as the building block not just
: "QueryBuilder".

whoa.  I hadn't even considered the possiblity of using the same
parser/handler registry for doing things like index building.  I thought
you said you "didn't really want to have to design a general API for
parsing XML as part of this project" ?   :)

: ie should we offer:
: 1)  XML Parser implementation independence (via SAX,
: DOM, other interface?)

I think the API should be parser independendant.  but that doesn't mean
there has to be multiple implimentations.

: 3) Support for builders to produce *any* object
: construction (not just queries/filters)?

There's a differnce between producing any java.lang.Object and any Lucene
related "object" (ie: query, filter, document, directory) ... I don't
think it's neccessary to support any java.lang.Object, but I can get on
board the idea of supporting any lucene related objects.  That said, i
still really, Really, REALLY like type safety, and the space of lucene
objects is small enough that having seperate registries and "process"
methods.  As I said regarding Queries/Filters -- the caller is going to
know what they are expecting, so they can call the specific method for the
return object they want.

: 4) Ability for Queries to write to XML (choice of
: parser configs can be used to write Query/Filter
: objects as well as read them?)

I'm in favor of this ... but I think it's orthoginal to the issue of
parsing.

: 5) Ability for Parser configurations to
: "self-document" the XML structures they are capable of
: parsing? ie produce a schema

I have no opinion on this.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by mark harwood <ma...@yahoo.co.uk>.
I suspect it's a little too ambitious to provide a
unifying common abstraction which wraps event based
*and* "pull" parser approaches. 

I'm personally happier to stick with one approach,
preferably with an existing, standardized interface
which lets me switch implementations. I didn't really
want to have to design a general API for parsing XML
as part of this project. 

It feels like we should probably try to define parser
scope a little more clearly at this stage before
diving into implementation details.
As an example, I was running a config last night that
let me do this:
<QueryTest>
  <Index type="RAM">
    <Document>
      <Field name="title">My report 1</Field>
    </Document>
    <Document>
      <Field name="title">My report 2</Field>
    </Document>
   <Index>
   <Query>
      <BooleanQuery>
      .....
      </BooleanQuery>
   </Query>
   <ExpectedResults docs="1,2"/>
</QueryTest>
The parser framework was (apart from an annoying bug)
letting me construct and run this collection of
objects to create a RAMIndex , populate it, run
queries and test results.

In this scenario the parser is used as a generic
instantiator of different objects using configurable
choice of ObjectBuilders. That's why I used
"ObjectBuilder" as the building block not just
"QueryBuilder".

Maybe this is overstepping the mark but it certainly
seemed useful. I would be interested to confirm the
scope a little more.

ie should we offer:
1)  XML Parser implementation independence (via SAX,
DOM, other interface?) 
2) Pluggable choice of builders 
3) Support for builders to produce *any* object
construction (not just queries/filters)?
4) Ability for Queries to write to XML (choice of
parser configs can be used to write Query/Filter
objects as well as read them?)
5) Ability for Parser configurations to
"self-document" the XML structures they are capable of
parsing? ie produce a schema

There's a lot of ground that *could* be covered so it
would be good to get some concensus on where we might
be heading.









		
___________________________________________________________ 
Yahoo! Exclusive Xmas Game, help Santa with his celebrity party - http://santas-christmas-party.yahoo.net/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Chris Hostetter <ho...@fucit.org>.
: > I think that the ideal API wouldn't require people
: > writing ObjectBuilders
: > to know anything about sax, or to ever need to
: > import anything from
: > org.xml.** or javax.xml.**
:
: Fair enough. I presume we want to maintain the
: position that Lucene should not have any dependencies
: other than JDK1.4?

As I understand it that's the current concensus -- but that's not really
my concern.  If lucene starts shipping with an xml library, or if this
parser gets put into contrib with a caveat that it only works if you have
some specified xml library is a policy/packaging issue .... i was more
worried about API for people who want to develop new types of queries --
and new converters for building those queries from XML.  I'm scared of
tying that API to a particular method of parsing XML that would make it
hard to change the underlying implimentation down the road.  (ie: having
all the methods throw SaxException would suck if 2 years from now we want
to re-impliment it using XPP)

In my twisted Utopian imagination - lucene core would ship with an
implimentation of the XML->Query parser/convertor API that was 100% pure
java1.4; but alternate implimentations (that used XPP or whatever the
flavor of the week was) would live in contrib for people who were willing
to trade the extra dependency for some trivial performance gain -- but the
same convertors (aka: ObjectBuilders) would work with either
implimentation.

: State "passed down" is something I saw as a potential
: addition to the "Parser" object shared by all
: ObjectBuilders eg a Map that was associated with
: stack level.

If you put the state in the parser, then I can't imagine any
implimentation could ever be thread safe.  I also can't really picture
what the API would be like without just making it a free for all of
esentially global variables -- ie: how does a parent ensure that state
info returned from one child doesn't polute the next child? (unless the
parent wants it too) ...

: Although the "occurs" info could be set in the child
: object as in your example that pushes some parsing
: responsibility down into child elements and I feel
: slightly uncomfortable about that as a technique. It

I wouldn't want to put any requirements on query builders just to support
being wrapped in a BooleanQuery like that -- i was just trying to
illustrate why i thought a mechanism that allowed for "decorating"
handlers with other handlers would be very usefull.  BooleanQuery is the
prime example to illustrate my goal.  While the default instance might
look something like...

      public interface LuceneXmlParser {
         public static LuceneXmlParser DEFAULT = ...
         static {
            DEFAULT.register("TermQuery",new TermQueryXmlBuilder());
            DEFAULT.register("PhraseQuery",new PhraseQueryXmlBuilder());
            ...
            DEFAULT.register("BooleanQuery", new BooleanQueryXmlBuilder());
            DEFAULT.register("Occurs", new OccursXmlBuilder());
            ...

...(and the term/phrase builders wouldn't know anything about
BooleanQueries) people who want a shorter syntax could do something
like...

     private LuceneXmlParser myParser = ...
     myParser.register("TermQuery",
                       new BooleanCLauseWrapperXmlBuilder
                       (new TermQueryXmlBuilder()));
     myParser.register("PhraseQuery",
                       new BooleanCLauseWrapperXmlBuilder
                       (new PhraseQueryXmlBuilder()));
     ...
     myParser..register("BooleanQuery", new BooleanQueryXmlBuilder());

...still reusing the orriginal builders for all of the various types of
queries, with their special decorator wrapped arround it.


: I'll spend some time studying your psuedo code in more
: detail later.

last night on the place, i realized Filters were a glarring omisison.
Since the "parent" allways needs to know what it expects back from it's
children, i think it makes sense for seperate handler interfaces, and for
the parser to have seperate methods for each (depending on what you are
expecting ... where by "you" i mean either you the person asking it to
parse a raw bit of xml, or you the person implimenting the a parent
handler whose going to have to know at compile time wether you expect
a child to be a filter or a query)

in other words, ammend what i sent out before, something like this...


public interface LuceneXmlParser {
    public void registerQueryHandler(String tag, LuceneXmlQueryHandler h);
    public void registerFilterHandler(String tag, LuceneXmlFilterHandler h);
    public Query parse(InputStream xml);
    public Filter parse(InputStream xml);
    public Filter processNode(LuceneXmlNode n, State s)
    public Query processNode(LuceneXmlNode n, State s)
}
public interface LuceneXmlHandler {
    public void setParser(LuceneXmlParser p)
}
public interface LuceneXmlQueryHandler extends LuceneXmlHandler {
    public Query process(LuceneXmlNode n, State s)
}
public interface LuceneXmlFilterHandler extends LuceneXmlHandler {
    public Filter process(LuceneXmlNode n, State s)
}






-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by mark harwood <ma...@yahoo.co.uk>.
Sorry, slip of keyboard meant I posted last message
mid-edit.

Hi Chris,
Thanks for taking the time to review this.
> 1) I aplaud the plugable nature of your solution.

I think that's definitely a worthwhile objective.

> 2) Digging into what was involved in writting an
> ObjectBuilder, I found...
> don't really feel like
> the API has a very clean seperation from SAX.


True. The efforts to remove state management were
entirely around the hand-off between one ObjectBuilder
and any "child" Object Builders - ie thinking of the
processing chain like the cartoon where one big fish
is about to eat a smaller fish, which is about to
eat a smaller fish which.... etc. The parser handles
the
stack of individual ObjectBuilders and their
consumption thus relieving one level of SAX-based
state management that a "just one big fish" approach
to parsing would take. 
However within each individual ObjectBuilder they have
responsibility for handling SAX apis to configure
themselves. My assumption was SAX would be a
familiar/useful API but I guess that may be wrong.

> I think that the ideal API wouldn't require people
> writing ObjectBuilders
> to know anything about sax, or to ever need to
> import anything from
> org.xml.** or javax.xml.**
 
Fair enough. I presume we want to maintain the
position that Lucene should not have any dependencies
other than JDK1.4?
I did look at Commons Digester but that seemed to want
to suck in logging, beanutils etc so embarked on my
own lightweight SAX-based implementation.

> 
> 3) While the *need* to maintaing/pass state
> information should be avoided.
> I can definitely think of uses for this framework
> that may *want* to pass
> state information -- both down to the ObjectBuilders
> that get used in
> inner nodes, as well as up to wrapping nodes, and
> there doesn't seem to be
> an easy way to that. 
 
State "passed down" is something I saw as a potential
addition to the "Parser" object shared by all
ObjectBuilders eg a Map that was associated with
stack level.

> The best example i can give is if someone (ie: me)
> wanted to use this
> framework to allow boolean queries to be written
> like this...
> 
>    <BooleanQuery>
>       <TermQuery occurs="mustNot" field="contents"
> value="mustNot"/>
>       <UserInput occurs="must">"a phrase"
> fuzzy~</UserInput>
>    </BooleanQuery

I did consider that my earlier version of BooleanQuery
could
be simplified to:
 
<BooleanQuery>
  <MustNot><TermQuery field="contents" value="foo"/>
  </MustNot>
  <Must><UserInput>"a phrase" fuzzy~</UserInput>
  </Must>
</BooleanQuery>

Although the "occurs" info could be set in the child
object as in your example that pushes some parsing
responsibility down into child elements and I feel
slightly uncomfortable about that as a technique. It
introduces the potential for nameclashes when mixing
various object builders and complicates documentation.
Its the same uncomfortable feeling I get from multiple
inheritance.

> Going the ooposite direction, I'd like to be able
> to have tags that set state which is accesible to
descendent >tags

Seems entirely reasonable. - See earlier comments
about Parser Stack.

I'll spend some time studying your psuedo code in more
detail later.

Cheers,

Mark


	
	
		
___________________________________________________________ 
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by mark harwood <ma...@yahoo.co.uk>.
Hi Chris,
Thanks for taking the time to review this.
> 1) I aplaud the plugable nature of your solution.

That's definitely a worthwhile objective.

> 2) Digging into what was involved in writting an
> ObjectBuilder, I found...
> don't really feel like
> the API has a very clean seperation from SAX.


True. The efforts to remove state management were
entirely around the hand-off between one ObjectBuilder
and any "child" Object Builders - ie thinking of the
processing chain like the cartoon where one big fish
is about to eat a smaller fish, which is about to eat
a smaller fish which.... etc. The parser handles the
stack of individual ObjectBuilders and their
consumption thus relieving one level of SAX-based
state management that a "just one big fish" approach
to parsing would take. However within each individual
ObjectBuilder they have responsibility for handling
SAX apis to configure themselves. My assumption was
SAX would be a familiar API but I guess that may be
wrong.

> I think that the ideal API wouldn't require people
> writing ObjectBuilders
> to know anything about sax, or to ever need to
> import anything from
> org.xml.** or javax.xml.**

Fair enough. I presume we want to maintain the
position that Lucene should not have any dependencies
other than JDK1.4?
I did look at Commons Digester but that seemed to want
to suck in logging, beanutils etc so embarked on my
own lightweight SAX-based implementation.

> 
> 3) While the *need* to maintaing/pass state
> information should be avoided.
> I can definitely think of uses for this framework
> that may *want* to pass
> state information -- both down to the ObjectBuilders
> that get used in
> inner nodes, as well as up to wrapping nodes, and
> there doesn't seem to be
> an easy way to that. 

State "passed down" is something I saw as a potential
addition to the "Parser" object shared by all
ObjectBuilders eg a Map that was associated with stack
level.

> The best example i can give is if someone (ie: me)
> wanted to use this
> framework to allow boolean queries to be written
> like this...
> 
>    <BooleanQuery>
>       <TermQuery occurs="mustNot" field="contents"
> value="mustNot"/>
>       <UserInput occurs="must">"a phrase"
> fuzzy~</UserInput>
>    </BooleanQuery

I did consider that my version of BooleanQuery could
be written slightly more succinctly as:

<BooleanQuery>
       <MustNot><TermQuery field="contents"
value="foo"/>
</MustNot>
>       <UserInput occurs="must">"a phrase"
> fuzzy~</UserInput>
>    </BooleanQuery



> 
> ...i want to be able to write an
> "BooleanClauseWrapperObjectBuilder" that
> can be wrapped around any other ObjectBuilder and
> will return whatever
> object it does, but will also check for and "occurs"
> attribute, and put
> that in a state bucket somewhere that the
> BooleanQuery has access to it
> when adding the Query it gets back.
> 
> Going the ooposite direction, I'd like to be able to
> have tags that set
> state which is accesible to descendent tags (even if
> hte tags in teh
> middle don't know anything about that bit of state. 
> for example:
> specifying how much slop should be used by default
> in phrase queries...
> 
>    <StateModifier defaultPhraseSlop="100">
>       ...
>       <BooleanQuery>
>          <PhraseQuery occurs="mustNot"
> field="contents">
>             How Now Brown Cow?
>          </PhraseQuery>
>          ...
>       </BooleanQuery
>    <StateModifier defaultPhraseSlop="100">
> 
> 
> I haven't had a chance to try implimenting this, but
> at a high level, it
> seems like all of this should be possible and still
> easy to use.
> Here's a real rough cut at what i've had floating
> arround in the back
> of my head (I'm doing this straight into email,
> pardon any typo's or
> psuedo code) ...
> 
> 
> 
> /** could be implimented with SAX, or DOM, or Pull
> */
> public interface LuceneXmlParser {
>     /** this method will call setParser(this) on
> each handler */
>     public void registerHandler(String tag,
> LuceneXmlHandler h);
>     /**
>      primary method for clients, parses the xml and
> calls processNode
>      on the root node
>      */
>     public Query parse(InputStream xml);
>     /**
>      dispatches to the appropriate handler's process
> method based
>      on the Node name, may be called by handlers for
> recursion of children
>      nodes
>      */
>     public Query processNode(LuceneXmlNode n, State
> s)
> }
> public interface LuceneXmlHandler {
>     public void setParser(LuceneXmlParser p)
>     /**
>      should return a Query that corrisponds to the
> specified node.
>      may rea/modify state in any way it wants ... it
> is recommended that
>      all implimenting methods wrap their state
> before passing it on when
>      processing children.
>      */
>     public Query process(LuceneXmlNode n, State s)
> }
> /**
>  A State is a stack frame that can delegate read
> operations to another
>  State it wraps (if there is one).  but it cannot
> delegate modifying
>  operations.
>  Classes implimenting State should provide a
> constructor that takes
>  another State to wrap.
> */
> public interface State extends Map<String,Object> {
>    /**
>     for callers that wnat to know what's in the
> immeidate stack
>     frame without any delegation
>     */
>    public Map<String,Object> getOuterFrame();
>    /* should return a new state that wraps the
> current state */
>    public State wrapCurrentState();
> }
> /** a very simple api arround the most basic xml
> concepts */
> public interface LuceneXmlNode {
>    public CharSequence getNodeName();
>    public Map<String,String> getAttributes()
>    public CharSequence getBodyText();
>    public Iterator<LuceneXmlNode> getChildren()
> }
> /** an example handler for TermQuery */
> public class BooleanQueryHandler impliments
> LuceneXmlHandler {
>    LuceneXmlParser p;
>    public void setParser(LuceneXmlParser q) { p=q; }
>    public Query process(LuceneXmlNode n, State s) {
>      Map<String,String> attrs = getAttributes()
>      return new TermQuery(new
> Term(attrs.get("field"),attrs.get("value"))
>    }
> }
> /** an example handler for BooleanQuery */
> public class BooleanQueryHandler impliments
> LuceneXmlHandler {
>    LuceneXmlParser p;
>    public void setParser(LuceneXmlParser q) { p=q; }
>    public Query process(LuceneXmlNode n, State s) {
>      BooleanQuery r = new BooleanQuery;
> 
=== message truncated ===



		
___________________________________________________________ 
To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Chris Hostetter <ho...@fucit.org>.
I finally got a chance to look at this code today (the best part about the
last day before vacation, is no one expects you to get anything done, so
you can ignore your "real work" and spend time on things that are more
important in the long run) and while I still havne't wrapped my head
arround all of it, I wanted to share my thoughts so far on the API...

1) I aplaud the plugable nature of your solution. Looking at the Test
Case, it is easy to see exactly how a service provider could
do things like override the behavior of a <PhraseQuery> to be implimented
as a SpanQuery without their clients being affected at all.  Kudos.

2) Digging into what was involved in writting an ObjectBuilder, I found
the api somewhat confusion.  I was reminded of this exchange you had with
Yonik...

: > While SAX is fast, I've found callback interfaces
: > more difficult to
: > deal with while generating nested object graphs...
: > it normally
: > requires one to maintain state in stack(s).
:
: I've gone to some trouble to avoid the effects of this
: on the programming model.

As someone who feels very comfortable with Lucene, but has no
practical experience with SAX, I have to say that I don't really feel like
the API has a very clean seperation from SAX.

I think that the ideal API wouldn't require people writing ObjectBuilders
to know anything about sax, or to ever need to import anything from
org.xml.** or javax.xml.**


3) While the *need* to maintaing/pass state information should be avoided.
I can definitely think of uses for this framework that may *want* to pass
state information -- both down to the ObjectBuilders that get used in
inner nodes, as well as up to wrapping nodes, and there doesn't seem to be
an easy way to that.  (it could just be my lack of SAX knowledge though)

The best example i can give is if someone (ie: me) wanted to use this
framework to allow boolean queries to be written like this...

   <BooleanQuery>
      <TermQuery occurs="mustNot" field="contents" value="mustNot"/>
      <UserInput occurs="must">"a phrase" fuzzy~</UserInput>
   </BooleanQuery

...i want to be able to write an "BooleanClauseWrapperObjectBuilder" that
can be wrapped around any other ObjectBuilder and will return whatever
object it does, but will also check for and "occurs" attribute, and put
that in a state bucket somewhere that the BooleanQuery has access to it
when adding the Query it gets back.

Going the ooposite direction, I'd like to be able to have tags that set
state which is accesible to descendent tags (even if hte tags in teh
middle don't know anything about that bit of state.  for example:
specifying how much slop should be used by default in phrase queries...

   <StateModifier defaultPhraseSlop="100">
      ...
      <BooleanQuery>
         <PhraseQuery occurs="mustNot" field="contents">
            How Now Brown Cow?
         </PhraseQuery>
         ...
      </BooleanQuery
   <StateModifier defaultPhraseSlop="100">


I haven't had a chance to try implimenting this, but at a high level, it
seems like all of this should be possible and still easy to use.
Here's a real rough cut at what i've had floating arround in the back
of my head (I'm doing this straight into email, pardon any typo's or
psuedo code) ...



/** could be implimented with SAX, or DOM, or Pull */
public interface LuceneXmlParser {
    /** this method will call setParser(this) on each handler */
    public void registerHandler(String tag, LuceneXmlHandler h);
    /**
     primary method for clients, parses the xml and calls processNode
     on the root node
     */
    public Query parse(InputStream xml);
    /**
     dispatches to the appropriate handler's process method based
     on the Node name, may be called by handlers for recursion of children
     nodes
     */
    public Query processNode(LuceneXmlNode n, State s)
}
public interface LuceneXmlHandler {
    public void setParser(LuceneXmlParser p)
    /**
     should return a Query that corrisponds to the specified node.
     may rea/modify state in any way it wants ... it is recommended that
     all implimenting methods wrap their state before passing it on when
     processing children.
     */
    public Query process(LuceneXmlNode n, State s)
}
/**
 A State is a stack frame that can delegate read operations to another
 State it wraps (if there is one).  but it cannot delegate modifying
 operations.
 Classes implimenting State should provide a constructor that takes
 another State to wrap.
*/
public interface State extends Map<String,Object> {
   /**
    for callers that wnat to know what's in the immeidate stack
    frame without any delegation
    */
   public Map<String,Object> getOuterFrame();
   /* should return a new state that wraps the current state */
   public State wrapCurrentState();
}
/** a very simple api arround the most basic xml concepts */
public interface LuceneXmlNode {
   public CharSequence getNodeName();
   public Map<String,String> getAttributes()
   public CharSequence getBodyText();
   public Iterator<LuceneXmlNode> getChildren()
}
/** an example handler for TermQuery */
public class BooleanQueryHandler impliments LuceneXmlHandler {
   LuceneXmlParser p;
   public void setParser(LuceneXmlParser q) { p=q; }
   public Query process(LuceneXmlNode n, State s) {
     Map<String,String> attrs = getAttributes()
     return new TermQuery(new Term(attrs.get("field"),attrs.get("value"))
   }
}
/** an example handler for BooleanQuery */
public class BooleanQueryHandler impliments LuceneXmlHandler {
   LuceneXmlParser p;
   public void setParser(LuceneXmlParser q) { p=q; }
   public Query process(LuceneXmlNode n, State s) {
     BooleanQuery r = new BooleanQuery;
     Integer minShouldMatch = new Integer(n.getAttributes().get("minShouldMatch"));
     r.setMinShouldMatch(minShouldMatch);
     for (LuceneXmlNode kid : n.getChildren()) {
        kidState = s.wrapCurrentState();
        Query b = p.processNode(kid,kidState);
        Occurs o = Occurs.MAY;
        if (kidState.getOuterFrame().contains("occurs")) {
            o = kidState.getOuterFrame().get();
        }
        r.add(b,o);
     }
     return r;
}
/**
 an example handler that can make wrap any other handler and give it
 BooleanClause.Occurs awareness
*/
public class BooleanClauseWrapperHandler impliments LuceneXmlHandler {
   LuceneXmlParser p;
   LuceneXmlHandler inner;
   public BooleanClauseWrapperHandler(LuceneXmlHandler i) { inner = i; }
   public void setParser(LuceneXmlParser q) { p=q; }
   public Query process(LuceneXmlNode n, State s) {
      Query q = i.process(n, s)
      if (n.getAttributes().contains("occurs")) {
        /* glossing over string parsing to object construction here */
        s.put("occurs",n.getAttributes().get("occurs"));
      }
      return q;
   }
}


...does that make sense?



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Yonik Seeley <ys...@gmail.com>.
Personally, I tend to use DOM for config type stuff where performance
doesn't matter.  I tend to avoid it for per-request XML processing
when you want potentially thousands per second.  Besides being slower,
it generates more garbage.

-Yonik

On 12/16/05, Erik Hatcher <er...@ehatchersolutions.com> wrote:
> Why wouldn't simply using DOM be sufficient?   Is it envisioned that
> a query XML would be large enough to prohibit RAM DOM loading of the
> entire document?
>
>         Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by mark harwood <ma...@yahoo.co.uk>.
I don't think DOM and RAM is necessarily an issue. 
The object construction process accesses the content
in the same order that a SAX based path takes so that
just seems an appropriate approach. There is no need
to leap around the structure in any other way from
what I can see, which is where DOM would be more
appropriate.

Cheers
Mark


		
___________________________________________________________ 
Yahoo! Exclusive Xmas Game, help Santa with his celebrity party - http://santas-christmas-party.yahoo.net/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Why wouldn't simply using DOM be sufficient?   Is it envisioned that  
a query XML would be large enough to prohibit RAM DOM loading of the  
entire document?

	Erik


On Dec 16, 2005, at 2:51 AM, mark harwood wrote:

>> While SAX is fast, I've found callback interfaces
>> more difficult to
>> deal with while generating nested object graphs...
>> it normally
>> requires one to maintain state in stack(s).
>
> I've gone to some trouble to avoid the effects of this
> on the programming model.
> Stack management is handled by the parser and the
> builder at each level in the stack can delegate
> control and consume output of delegate builders
> without maintaining complex state.
> For example, the builder for FilteredQuerys has to
> handle this:
> <FilteredQuery>
>  <Filter>
>   <RangeFilter fieldName="price" lowerTerm="10"
> pperTerm="20"/>
>   </Filter>
>  <Query><TermQuery field="contents"
> value="car"/></Query>
> </FilteredQuery>
>
> To delegate control to any choice of filter or query
> parser builder and consume its output it simply does
> this:
>
> if (localName.equals("Filter")) {
>  parser.delegateChildElements(new
> ObjectBuilderFinder(),
>    new ObjectConsumer() {
> 	public void setObject(Object objectValue) {
>           filter = (Filter) objectValue;
> 	}
> });
>
> It looks like procedural code calling, then setting
> instance data ("filter") but is actually still all
> event based SAX under the covers.
>
> Cheers
> Mark
>
>
> 		
> ___________________________________________________________
> To help you stay safe and secure online, we've developed the all  
> new Yahoo! Security Centre. http://uk.security.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by mark harwood <ma...@yahoo.co.uk>.
> While SAX is fast, I've found callback interfaces
> more difficult to
> deal with while generating nested object graphs...
> it normally
> requires one to maintain state in stack(s).

I've gone to some trouble to avoid the effects of this
on the programming model.
Stack management is handled by the parser and the
builder at each level in the stack can delegate
control and consume output of delegate builders
without maintaining complex state.
For example, the builder for FilteredQuerys has to
handle this:
<FilteredQuery>
 <Filter>
  <RangeFilter fieldName="price" lowerTerm="10"
pperTerm="20"/>
  </Filter>
 <Query><TermQuery field="contents"
value="car"/></Query>
</FilteredQuery>

To delegate control to any choice of filter or query
parser builder and consume its output it simply does
this:

if (localName.equals("Filter")) {
 parser.delegateChildElements(new
ObjectBuilderFinder(),
   new ObjectConsumer() {
	public void setObject(Object objectValue) {
          filter = (Filter) objectValue;
	}
});

It looks like procedural code calling, then setting
instance data ("filter") but is actually still all
event based SAX under the covers.

Cheers
Mark


		
___________________________________________________________ 
To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Yonik Seeley <ys...@gmail.com>.
On 12/15/05, markharw00d <ma...@yahoo.co.uk> wrote:
> At this stage I am more interested in feedback on parser design/approach

Excellent idea.
While SAX is fast, I've found callback interfaces more difficult to
deal with while generating nested object graphs... it normally
requires one to maintain state in stack(s).

Have you considered a pull-parser like StAX or XPP?  They are as fast
as SAX, and allow you to ask for the next XML event you are interested
in, eliminating the need to keep track of where you are by other means
(the place in your own code and normal variables do that).  It
normally turns into much more natural code.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


"Advanced" query language

Posted by markharw00d <ma...@yahoo.co.uk>.
After our recent discussions on this topic I've found some time to put 
together a first cut of a SAX based Query parser, see here:

http://www.inperspective.com/lucene/LXQueryV0_1.zip

I've implemented just a few queries (Boolean, Term, FilteredQuery, 
BoostingQuery ...) but other queries are fairly trivial to add.
At this stage I am more interested in feedback on parser design/approach 
rather than trying to achieve complete coverage of all the Lucene Query 
types or debating the choice of tag names.

Please see the readme.txt in the package for more details.

Cheers
Mark


		
___________________________________________________________ 
How much free photo storage do you get? Store your holiday 
snaps for FREE with Yahoo! Photos http://uk.photos.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Yonik Seeley <ys...@gmail.com>.
Just as a clarification, human-readable strings for queries are
essential for how we do things at CNET.

In addition to Mark's comments:
- standard logging mechanisms such as the access log of a app server
are readable
- easily human typable one-off queries during development and for
troubleshooting + support are essential.
- the speed at which a query can be parsed is important... in some
systems, it's part of the transfer syntax from client to server and is
an integral part of the system (again, analogy to SQL).

That doesn't mean I fully support the XML idea, nor am I ready to
abandon the current query syntax.  I have contemplated XML in the past
as a way to support templating of queries... a way for a user to say,
when someone queries field "x", expand this to this type of
arbitrarily comples query involving fields a,s,d,f.  There might be a
place for both LXQ (Lucene XML Query?) and the current query syntax.

My (very long) todo list has support for DisjunctionMax and
minNrShouldMatch on it, and I have worked in JavaCC in the past (an
ASN.1 compiler, circa 1998).  No timeline promises though.  Also need
to look closer at Paul's surround query language... I looked very
briefly, but not enough to "get" it.

It would be nice to resolve/fix the whole "JavaCC using an exception
for flow control" issue too.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: "Advanced" query language

Posted by Pasha Bizhan <lu...@lucenedotnet.com>.
Hi, 

> From: Erik Hatcher [mailto:erik@ehatchersolutions.com] 

> >    <MoreLikeThis minNumberShouldMatch="3"
> > maxQueryTerms="30">
> 
> We're back to MoreLikeThis - it's not currently a Query subclass.   
> How do you envision this sort of thing fitting in if it's not a Query?

But MoreLikeThis class produces a Query. It's similar to google "define:"
search. 
I think goolge handle such queries and then redirect search to somewhere. 
And QueryParser can handle such searches too and use an alternative logic to
create Query.

For example, we can extend the QueryParser by special (syntax) handlers
which will be create the Query.

Something lke this:
------
	class LikeHandler {};
	LikeHandler likeHandler = new LikeHandler(...);	
	string queryString = "like:(red quick fox)"; 
	Query q = QueryParser.parse(queryString, analyzer, likeHandler);
------

QueryParser scan the input, find special command (like:) and then find the
handler for this command.
If the handler exists the QP call it to create the Query.

Disadvantages are present.

Pasha Bizhan



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Dec 2, 2005, at 10:03 AM, mark harwood wrote:
> There seems to be a growing gap between Lucene
> functionality and the query language offered by
> QueryParser (eg no support for regex queries, span
> queries, "more like this", filter queries,
> minNumShouldMatch etc etc).

At least with a couple of these it would be sensible to subclass  
QueryParser and override some getters to create other types of  
queries.  For example, if you need ordered sloppy phrase queries you  
could create a SpanNearQuery instead of a PhraseQuery.  Likewise with  
RegexQuery instead of WildcardQuery.

Question - since when is "more like this" a Query?  Should it be?

Your points below are well taken though....

> Closing this gap is hard when:
> a) The availability of Javacc+Lucene skills is a
> bottleneck

job security?!  :)

I've been doing a lot of JavaCC work this year, and it has been a  
humbling learning curve, and I barely feel capable with it.

One interesting project I just came across is JParsec: http:// 
jparsec.codehaus.org - perhaps this could be a much simpler way than   
using JavaCC.

> b) The syntax of the query language makes it difficult
> to add new features eg rapidly running out of "special
> characters"

This is the biggest issue of all.  What do humans want to type in in  
order to achieve sophisticated queries?

Apple has it pretty nicely implemented with additive builders (such  
as with Finder, Mail rules, and smart playlists in iTunes) but they  
don't support nested expressions rather only "all" or "any" of the  
criteria.

> I don't think extending the existing query
> parser/language is necessarily useful and I see it
> being used purely to support the classic "simple
> search engine" syntax.

I concur. Tacking more into QueryParser is not going to make most  
users happy.  I think there may be too many bells and whistles in it  
already.

> Unfortunately the fall-back position for applications
> which require more complex queries is to "just write
> some Java code to instantiate the Query objects
> programmatically."

I've not found a generalization of how queries are entered into the  
system across the applications I've worked on, though.  Every query  
interface has been custom.

> This is OK but I think there is
> value in having an advanced search syntax capable of
> supporting the latest Lucene features and expressed in
> XML. It's worth considering why it's useful to have a
> String-representable form for queries:
> 1) Queries can be stored eg in audit logs or "saved
> queries" used for tasks like auto-categorization
> 2) Clients built in languages other than Java can
> issue queries to a Lucene server
> 3) I can decouple a request from the code that
> implements the query when distributing software e.g my
> applet may not want Lucene dragging down to the client

This is an interesting proposal, and one that has a lot of merit in  
how you've explained it.

> We can potentially use XML in the same way ANT does
> i.e. a declarative way of invoking an extensible list
> of Java-implemented features.

I've told many developers that the answer to almost all Java  
questions lies within the source code to Ant :)

> A query interpreter is
> used to instantiate the configured Java Query objects
> and populates them with settings from the XML in a
> generic fashion (using reflection) eg:
> ....
>    <MoreLikeThis minNumberShouldMatch="3"
> maxQueryTerms="30">

We're back to MoreLikeThis - it's not currently a Query subclass.   
How do you envision this sort of thing fitting in if it's not a Query?

> Do people feel this would be a worthwhile endeavour?

I think a way to get a query to/from XML is a good one.  Perhaps the  
XML serialization feature of JDK 1.4 (or is it 1.5?) is sufficient  
for this?  Maybe not though - and there are plenty of handy helpers  
from just doing raw reflection tricks like Ant, to using something  
like Digester or Castor.  I wouldn't recommend reinventing the XML de/ 
serialization aspect of this.

> I'm not sure if enough people feel pain around the
> points 1-3 outlined above to make it worth pursuing.

I don't see where I would use this capability just yet, but I do see  
it as useful in the contexts you provided.

I'd also be interested in effort towards an Apple-like query builder.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: "Advanced" query language

Posted by Yonik Seeley <ys...@gmail.com>.
> It's worth considering why it's useful to have a
> String-representable form for queries:

Absolutely.  A quickly parseable string representation for queries is
essential in so many contexts, for the reasons you brought out.  Think
what SQL does for the database.

-Yonik
Now hiring -- http://forms.cnet.com/slink?231706

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org