You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Pier Fumagalli <pi...@betaversion.org> on 2002/11/28 17:24:27 UTC

XML search language...

Heyoo to all...

    Simple question, I have to create a "service" for searching throughout
our database of articles (funny enough) on www.vnunet.com...

The idea is to send a search query in XML, and return the matched items in
XML again, and then styling them...

Question: is there a standard dtd/schema for those kinds of XML
"transactions" ??? Or should I just invent my own? Pointers, anyone?

    Pier


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: XML search language...

Posted by Pier Fumagalli <pi...@betaversion.org>.
On 29/11/02 4:09 "Otis Gospodnetic" <ot...@yahoo.com> wrote:

> Oh, one more thing - depending on the nature of this service, you may
> want to consider not using XML, and just writing to and reading from
> streams.

We have "tree-like" data to think about an XML (and our partners want that
as well :-)

    Pier


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: XML search language...

Posted by Pier Fumagalli <pi...@betaversion.org>.
On 29/11/02 4:28 "Nathan Ander" <na...@yahoo.com> wrote:
> 
> Take a look at the SOAP messaging protocol:
> 
> http://xml.apache.org/soap/docs/index.html
> http://www.w3.org/TR/SOAP/

Never! :-) It's an overkill... What I need to do can be done with ONE
servlet generating/reading SAX events! :-)

    Pier


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: XML search language...

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.

Pier Fumagalli wrote:

>><?xml version="1.0" encoding="UTF-8"?>
>><query>
>> <boolean type="and">
>>   <term field="character">Bird</term>
>>   <group>
>>     <term field="category">Cartoon</term>
>>     <boolean type="not">
>>       <term field="name">Roadrunner</term>
>>     </boolean>
>>   </group>
>> </boolean>
>></query>
> 
> 
> That _is_ a beauty.. Plain, simple, and doing exactly what I need! :-)
> 
> No weird XML-RPC/SOAP/QUERY stuff, just one tiny little thing that does the
> job I require... :-)
> 
> Only thing I don't "like" is how you group up the terms, for example, I
> don't quite get the distinction between "boolean / and" and group...
> 
> In theory, a binary operation can always be reducible to its minimal
> configuration of two terms, depending on what precedence we give to the (for
> instance) "and" "or" and "not" operations... So, I don't see why group is
> actually there! :-)
> 

I was trying to allow for precedence that may be determined somehow by 
"()"'s. I think your right and I guess it could be simpler to base it on 
nesting and just use ((stack or queues) and recursion) to deal with 
precedence:

With preceedence ordered (AND, OR).

The following *do* appear the same to me.

field1:foo AND field2:bar OR field3:bim AND field4:bam
(field1:foo AND field2:bar) OR (field3:bim AND field4:bam)

*last one in XML*
<?xml version="1.0" encoding="ISO-8859-1"?>
<query index="Articles">	
    <or>
       <and>
          <term field="field1">foo</term>
          <term field="field2">bar</term>
       </and>
       <and>
          <term field="field3">bim</term>
          <term field="field4">bam</term>
       </and>
    </or>
</query>


The following *don't* appear the same to me.

field1:foo AND field2:bar OR field3:bim AND field4:bam
field1:foo AND (field2:bar OR field3:bim) AND field4:bam

*but the last one can still be captured in XML without a group tag*
<?xml version="1.0" encoding="ISO-8859-1"?>
<query index="Articles">
    <and>
       <term field="field1">foo</term>
       <and>
          <or>
             <term field="field2">bar</term>
             <term field="field3">bim</term>
          </or>
          <term field="field4">bam</term>
       </and>
    </and>
</query>





> And also, one other thing is that since we have the flexibility of XML, why
> not using specific tags, such as  <and/> or <or/> and <not/>...
> 

I was trying to make it more extensible and generic. But that may be 
overkill as well. The idea is that services could define their own 
operations without building "new xml tags" because the op was just an 
attribute. thus:

  <and> could be <boolean type="AND|and|+|&&|&">
  <or> could be <boolean type="OR|or|'||'|'|'">
  <not> could be <boolean type="NOT|not|-|!">

  then we know its a boolean relation, we just don't care what 
characters are representing it in the long run. The only risk this 
brings up (and thus the need for <group> tags) is precedence of the 
unknown character representations.

Also note that <term field="xxx">test</term> carries another attribute 
that defines the operation being performed on that term. For example:

<term field="date" op="gt">1996</term>

> That is because, if you process SAX events, you can easily trigger on those
> names which are unique in your tag, while if you do use attributes, well,
> the whole thing gets a little bit messed up in terms of parsing/checking and
> slower because you have to analyze every single attribute to get the "type"
> of your boolean operation....
> 
> I'm thinking about something like:
> 
> <?xml version="1.0"?>
> <query index="Articles">
>   <and>
>     <term field="subject">Microsoft</term>
>     <or>
>        <term>Lawsuit</term>
>        <term>Court</term>
>     </or>
>   </and>
> </query>
> 
> Does it make sense????
> 
>     Pier
> 
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
> 


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: XML search language...

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Pier, I think some of this stuff, esp. boolean operators already exists
in Ant.
So you can look there for ideas, if not code.

Otis

--- Pier Fumagalli <pi...@betaversion.org> wrote:
> On 8/12/02 17:13 "Mark R. Diggory" <md...@latte.harvard.edu>
> wrote:
> 
> > Your not providing enough info about what your trying to do, thus
> you
> > get generalized responses to basic messaging technologies
> available.
> 
> Yeah, sorry Mark, but I'm overwhelmed by work ATM, and this project
> will
> start early next january... So, I can only dedicate few minutes a
> day...
> 
> Couple of days ago, I flagged a message in which you said:
> 
> > <?xml version="1.0" encoding="UTF-8"?>
> > <query>
> >  <boolean type="and">
> >    <term field="character">Bird</term>
> >    <group>
> >      <term field="category">Cartoon</term>
> >      <boolean type="not">
> >        <term field="name">Roadrunner</term>
> >      </boolean>
> >    </group>
> >  </boolean>
> > </query>
> 
> That _is_ a beauty.. Plain, simple, and doing exactly what I need!
> :-)
> 
> No weird XML-RPC/SOAP/QUERY stuff, just one tiny little thing that
> does the
> job I require... :-)
> 
> Only thing I don't "like" is how you group up the terms, for example,
> I
> don't quite get the distinction between "boolean / and" and group...
> 
> In theory, a binary operation can always be reducible to its minimal
> configuration of two terms, depending on what precedence we give to
> the (for
> instance) "and" "or" and "not" operations... So, I don't see why
> group is
> actually there! :-)
> 
> And also, one other thing is that since we have the flexibility of
> XML, why
> not using specific tags, such as  <and/> or <or/> and <not/>...
> 
> That is because, if you process SAX events, you can easily trigger on
> those
> names which are unique in your tag, while if you do use attributes,
> well,
> the whole thing gets a little bit messed up in terms of
> parsing/checking and
> slower because you have to analyze every single attribute to get the
> "type"
> of your boolean operation....
> 
> I'm thinking about something like:
> 
> <?xml version="1.0"?>
> <query index="Articles">
>   <and>
>     <term field="subject">Microsoft</term>
>     <or>
>        <term>Lawsuit</term>
>        <term>Court</term>
>     </or>
>   </and>
> </query>
> 
> Does it make sense????
> 
>     Pier
> 
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: XML search language...

Posted by Pier Fumagalli <pi...@betaversion.org>.
On 8/12/02 17:13 "Mark R. Diggory" <md...@latte.harvard.edu> wrote:

> Your not providing enough info about what your trying to do, thus you
> get generalized responses to basic messaging technologies available.

Yeah, sorry Mark, but I'm overwhelmed by work ATM, and this project will
start early next january... So, I can only dedicate few minutes a day...

Couple of days ago, I flagged a message in which you said:

> <?xml version="1.0" encoding="UTF-8"?>
> <query>
>  <boolean type="and">
>    <term field="character">Bird</term>
>    <group>
>      <term field="category">Cartoon</term>
>      <boolean type="not">
>        <term field="name">Roadrunner</term>
>      </boolean>
>    </group>
>  </boolean>
> </query>

That _is_ a beauty.. Plain, simple, and doing exactly what I need! :-)

No weird XML-RPC/SOAP/QUERY stuff, just one tiny little thing that does the
job I require... :-)

Only thing I don't "like" is how you group up the terms, for example, I
don't quite get the distinction between "boolean / and" and group...

In theory, a binary operation can always be reducible to its minimal
configuration of two terms, depending on what precedence we give to the (for
instance) "and" "or" and "not" operations... So, I don't see why group is
actually there! :-)

And also, one other thing is that since we have the flexibility of XML, why
not using specific tags, such as  <and/> or <or/> and <not/>...

That is because, if you process SAX events, you can easily trigger on those
names which are unique in your tag, while if you do use attributes, well,
the whole thing gets a little bit messed up in terms of parsing/checking and
slower because you have to analyze every single attribute to get the "type"
of your boolean operation....

I'm thinking about something like:

<?xml version="1.0"?>
<query index="Articles">
  <and>
    <term field="subject">Microsoft</term>
    <or>
       <term>Lawsuit</term>
       <term>Court</term>
    </or>
  </and>
</query>

Does it make sense????

    Pier


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: XML search language...

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Your not providing enough info about what your trying to do, thus you 
get generalized responses to basic messaging technologies available.

1.) Does your lucene query really have to be in xml <query><term 
name="foo">bar</term></query> ? Or are you just wrapping the query in an 
XML wrapper? <query>foo:bar</query>

can you settle for http params? http://host/servlet?query="foo:bar"

2.) If what your looking for is just to generate an XML response from 
the Lucene results, thats a pretty trivial SAX Filtering strategy.

Lucene resultset --> SAXFilter -->SAX Events --> Serialize to response


-Mark

Pier Fumagalli wrote:
> On 29/11/02 8:15 "petite_abeille" <pe...@mac.com> wrote:
> 
> 
>>On Friday, Nov 29, 2002, at 05:28 Europe/Zurich, Nathan Ander wrote:
>>
>>
>>>Take a look at the SOAP messaging protocol:
>>
>>Or XML-RPC is you don't feel like having an headache:
> 
> 
> Nanananana... Way out... :-)
> 
>     Pier
> 
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
> 


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: XML search language...

Posted by petite_abeille <pe...@mac.com>.
On Sunday, Dec 8, 2002, at 02:10 Europe/Zurich, Pier Fumagalli wrote:

> Nanananana... Way out... :-)

XQuery 1.0: An XML Query Language

http://www.w3.org/TR/xquery/


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: XML search language...

Posted by Pier Fumagalli <pi...@betaversion.org>.
On 29/11/02 8:15 "petite_abeille" <pe...@mac.com> wrote:

> 
> On Friday, Nov 29, 2002, at 05:28 Europe/Zurich, Nathan Ander wrote:
> 
>> Take a look at the SOAP messaging protocol:
> 
> Or XML-RPC is you don't feel like having an headache:

Nanananana... Way out... :-)

    Pier


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: XML search language...

Posted by petite_abeille <pe...@mac.com>.
On Friday, Nov 29, 2002, at 05:28 Europe/Zurich, Nathan Ander wrote:

> Take a look at the SOAP messaging protocol:

Or XML-RPC is you don't feel like having an headache:

http://www.xmlrpc.com/spec

In any case, those are simply messaging protocols. They don't address 
"searching" per se.

What about (ab)using XPath as a query language:

http://www.w3.org/TR/xpath

Or taking your chance with XML Query Language (XQL):

http://www.w3.org/TandS/QL/QL98/pp/xql.html

PA.


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: XML search language...

Posted by Nathan Ander <na...@yahoo.com>.
Take a look at the SOAP messaging protocol:

http://xml.apache.org/soap/docs/index.html
http://www.w3.org/TR/SOAP/
-Nathan
 
 Pier Fumagalli <pi...@betaversion.org> wrote:Heyoo to all...

Simple question, I have to create a "service" for searching throughout
our database of articles (funny enough) on www.vnunet.com...

The idea is to send a search query in XML, and return the matched items in
XML again, and then styling them...

Question: is there a standard dtd/schema for those kinds of XML
"transactions" ??? Or should I just invent my own? Pointers, anyone?

Pier


--
To unsubscribe, e-mail: 
For additional commands, e-mail: 



---------------------------------
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now

Problems when consulting utilizing " (also 1)

Posted by Cesar Estevez <ce...@hotmail.com>.
Hello!

Have accomplished a seeker utilizing Lucene, and when tasting it, I gave
myself bill of than when ( also one ) the SFA introduced a quotation marks
as be the text to search he gives following error:

org.apache.lucene.queryParser.TokenMgrError: Lexical error at line 1, column
2. Encountered: after : "" at
org.apache.lucene.queryParser.QueryParserTokenManager.getNextToken(Unknown
Source) at org.apache.lucene.queryParser.QueryParser.jj_ntk(Unknown Source)
at org.apache.lucene.queryParser.QueryParser.Modifiers(Unknown Source) at
org.apache.lucene.queryParser.QueryParser.Query(Unknown Source) at
org.apache.lucene.queryParser.QueryParser.parse(Unknown Source) at
org.apache.jsp.resultado$jsp._jspService(resultado$jsp.java:252)



I need to know as he is origin, and the possible solutions. In order to be
able to implement it.

thx!

César Estévez Álvarez.

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: XML search language...

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Oh, one more thing - depending on the nature of this service, you may
want to consider not using XML, and just writing to and reading from
streams.

Otis

--- Pier Fumagalli <pi...@betaversion.org> wrote:
> Heyoo to all...
> 
>     Simple question, I have to create a "service" for searching
> throughout
> our database of articles (funny enough) on www.vnunet.com...
> 
> The idea is to send a search query in XML, and return the matched
> items in
> XML again, and then styling them...
> 
> Question: is there a standard dtd/schema for those kinds of XML
> "transactions" ??? Or should I just invent my own? Pointers, anyone?
> 
>     Pier
> 
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: XML search language...

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
I expressed the idea of formalizing an xml spec for defining search 
queries. My idea is that it would look something like the following

<?xml version="1.0" encoding="UTF-8"?>
<query>
   <boolean type="and">
     <term field="character">Bird</term>
     <group>
       <term field="category">Cartoon</term>
       <boolean type="not">
         <term field="name">Roadrunner</term>
       </boolean>
     </group>
   </boolean>
</query>

Could be used to express the following Lucene query:

character:Bird AND (category:Cartoon NOT name:Roadrunner)

The reason for expressing the syntax in xml it to facilitate the use of 
generic technologies for filtering and manipulating the query (SAX XSLT 
etc).

So, say I have 4 different "Index Services" that have the similar 
information stored under different fields. So I might want to translate 
the query above into the following:

Server 1:

character:Bird AND (category:Cartoon NOT name:Roadrunner)

Server 2:

character:Bird AND (grouping:Cartoon NOT title:Roadrunner)

Server 3:

character=Bird AND (grouping=Cartoon -name=Roadrunner)

Server 4:

(& character=Bird (grouping=Cartoon (!name=Roadrunner)))

Now I can write SAX Filters that will generate the appropriate query for 
each "Index Service".

This makes XML the stepping stone to easily get from one syntax to 
another. All that needs to be written to map each syntax is:
    1.) A SAX parser to generate SAX events from the query string.
    2.) A ContentHandler to generate a query string from SAX events.

these can then be combined with other parsers/handlers to translate from 
any syntax to another mapped syntax.

I know this is probibly outside the scope of Lucene itself as a project. 
But it would be interesting if such a standard arose for query syntax, 
then if Lucene supported such features it would make it even easier for 
others to integrate it into their present "legacy" Search Services. 
Imagine, you could install a Lucene Service and write a Parser/Handler 
to map your "legacy" systems queries directly into Lucene syntax, then 
it could start acting just like your "legacy" service.

-Mark Diggory

p.s.

I've attached an example schema representation of the syntax (dubed 
QueryML.)

Of course this does not explore other issues that may arise when trying 
to collect a multitude of results from different sources (like merging...).


Otis Gospodnetic wrote:

> There is no such standard that I know of, although somebody mentioned
> inventing(?) Query Markup Language (QML, an abbreviation already used
> for things other than Query Markup Language) the other day on either
> -user or -dev.
>
> Otis
>
> --- Pier Fumagalli  wrote:
>
> >Heyoo to all...
> >
> >    Simple question, I have to create a "service" for searching
> >throughout
> >our database of articles (funny enough) on www.vnunet.com...
> >
> >The idea is to send a search query in XML, and return the matched
> >items in
> >XML again, and then styling them...
> >
> >Question: is there a standard dtd/schema for those kinds of XML
> >"transactions" ??? Or should I just invent my own? Pointers, anyone?
> >
> >    Pier
> >
> >
> >--
> >To unsubscribe, e-mail:
> >
> >For additional commands, e-mail:
> >
> >
>
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
>
> --
> To unsubscribe, e-mail:
> For additional commands, e-mail:


Re: XML search language...

Posted by Otis Gospodnetic <ot...@yahoo.com>.
There is no such standard that I know of, although somebody mentioned
inventing(?) Query Markup Language (QML, an abbreviation already used
for things other than Query Markup Language) the other day on either
-user or -dev.

Otis

--- Pier Fumagalli <pi...@betaversion.org> wrote:
> Heyoo to all...
> 
>     Simple question, I have to create a "service" for searching
> throughout
> our database of articles (funny enough) on www.vnunet.com...
> 
> The idea is to send a search query in XML, and return the matched
> items in
> XML again, and then styling them...
> 
> Question: is there a standard dtd/schema for those kinds of XML
> "transactions" ??? Or should I just invent my own? Pointers, anyone?
> 
>     Pier
> 
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>