You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Wu, Stephen T., Ph.D." <Wu...@mayo.edu> on 2012/12/07 22:47:09 UTC

Semi-structured queries

I’ve been trying to do semi-structured queries & query parsing.  In other words, you could have XML snippets mixed in with plain terms, e.g. a query like:

      christmas tree <store  loc=”abc” close_hour=”2200”>

where you’re looking for a document with the terms “christmas” “tree” but also some structured data about where (practically) you could buy the tree.   Additionally, I’d like to be able to write functions relating multiple items, sort of like predicate logic or database-like queries:

      christmas tree NEARBY( <store  close_hour=”2200”>, <restaurant close_hour=”2400”> )

which would only find you places to buy a christmas tree that had stores and restaurants in close proximity to each other.  Finally, we would eventually be interested in doing something similar to org.apache.lucene.queries.CustomScoreQuery, where you can put in several different criteria and weight them separately per document.

I’ve been poking around at a lot of places and would appreciate some help about where I should extend, an existing walkthough or example, etc.  Here’s what I’ve been considering:

  *   org/apache/lucene/queryparser/flexible/standard/StandardQueryParser.java — modifying this to add another group-like QueryNode, modifying the processor pipeline to include this, modifying the definition of a TERM so it can deal with attribute=”value” pairs in pseudo-xml.  I read through the QueryParser documentation but quickly got lost in the implementation.
  *   org/apache/lucene/queryparser/xml/CorePlusExtensionsParser.java — this seems like it has to do a lot of what I want, but I can’t tell.  I hadn’t originally thought of the query coming in as an xml stream.  I think I would still need to define some new Query types... Perhaps a lot?  One for each type of thing (“store”, in the above) I’d search for?

Thanks!

stephen

Re: Re: Group by on multi fields

Posted by Martijn v Groningen <ma...@gmail.com>.
Not that I know of. The easiest way to functionally group across
multiple fields is to concatenate multiple field values into
a special group field. This isn't a flexible solution, but it doesn't
require modifying the Lucene code.

The other solution you can try is to group by ValueSource (is like a
function) instead by field. At the moment there isn't a concatenate
field values function implementation, but you can easily create that
yourself. This solution would be flexible (you can change the fields
you want to group by during runtime), but the grouping itself will be
more expensive (execution wise).

I hope this gives you some pointers to start with.

Martijn

On 11 December 2012 02:55, dizh <di...@neusoft.com> wrote:
> Thanks to Martijn v Groningen.
>
> but , Is there anyone who has implemented this feature? just like SQL : select a1,a2,sum(a3) group by a1,a2;
>
> BTW: show me how to do is well too, Thank you。
> ---------------------------------------------------------------------------------------------------
> Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)
> is intended only for the use of the intended recipient and may be confidential and/or privileged of
> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is
> not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying
> is strictly prohibited, and may be unlawful.If you have received this communication in error,please
> immediately notify the sender by return e-mail, and delete the original message and all copies from
> your system. Thank you.
> ---------------------------------------------------------------------------------------------------



-- 
Met vriendelijke groet,

Martijn van Groningen

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Re: Group by on multi fields

Posted by dizh <di...@neusoft.com>.
Thanks to Martijn v Groningen.

but , Is there anyone who has implemented this feature? just like SQL : select a1,a2,sum(a3) group by a1,a2;

BTW: show me how to do is well too, Thank you。
---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s) 
is intended only for the use of the intended recipient and may be confidential and/or privileged of 
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is 
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying 
is strictly prohibited, and may be unlawful.If you have received this communication in error,please 
immediately notify the sender by return e-mail, and delete the original message and all copies from 
your system. Thank you. 
---------------------------------------------------------------------------------------------------

Re: Re: Group by on multi fields

Posted by dizh <di...@neusoft.com>.
Thanks to Martijn v Groningen.

but , Is there anyone who has implemented this feature? just like SQL : select a1,a2,sum(a3) group by a1,a2;
---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s) 
is intended only for the use of the intended recipient and may be confidential and/or privileged of 
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is 
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying 
is strictly prohibited, and may be unlawful.If you have received this communication in error,please 
immediately notify the sender by return e-mail, and delete the original message and all copies from 
your system. Thank you. 
---------------------------------------------------------------------------------------------------

Re: Group by on multi fields

Posted by Martijn v Groningen <ma...@gmail.com>.
Hi Dish,

Grouping on multiple fields or a field that has multiple tokens (or
values) per document hasn't been implemented yet.

Martijn

On 10 December 2012 07:12, dizh <di...@neusoft.com> wrote:
> Hi All:
> I want to ask how to do "Group by on multi fields ".
> The Lucene JavaDOC only gives a demo about how to group by a single-field.
>
> Would anybody give me a hint?
> ---------------------------------------------------------------------------------------------------
> Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)
> is intended only for the use of the intended recipient and may be confidential and/or privileged of
> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is
> not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying
> is strictly prohibited, and may be unlawful.If you have received this communication in error,please
> immediately notify the sender by return e-mail, and delete the original message and all copies from
> your system. Thank you.
> ---------------------------------------------------------------------------------------------------



-- 
Met vriendelijke groet,

Martijn van Groningen

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Group by on multi fields

Posted by dizh <di...@neusoft.com>.
Hi All:
I want to ask how to do "Group by on multi fields ".
The Lucene JavaDOC only gives a demo about how to group by a single-field.

Would anybody give me a hint?
---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s) 
is intended only for the use of the intended recipient and may be confidential and/or privileged of 
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is 
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying 
is strictly prohibited, and may be unlawful.If you have received this communication in error,please 
immediately notify the sender by return e-mail, and delete the original message and all copies from 
your system. Thank you. 
---------------------------------------------------------------------------------------------------

Re: Semi-structured queries

Posted by lukai <lu...@gmail.com>.
wrap your own parser.

eg. org/apache/lucene/querypasser/classic/QueryParser.jj.



On Fri, Dec 7, 2012 at 1:47 PM, Wu, Stephen T., Ph.D.
<Wu...@mayo.edu>wrote:

> I’ve been trying to do semi-structured queries & query parsing.  In other
> words, you could have XML snippets mixed in with plain terms, e.g. a query
> like:
>
>       christmas tree <store  loc=”abc” close_hour=”2200”>
>
> where you’re looking for a document with the terms “christmas” “tree” but
> also some structured data about where (practically) you could buy the tree.
>   Additionally, I’d like to be able to write functions relating multiple
> items, sort of like predicate logic or database-like queries:
>
>       christmas tree NEARBY( <store  close_hour=”2200”>, <restaurant
> close_hour=”2400”> )
>
> which would only find you places to buy a christmas tree that had stores
> and restaurants in close proximity to each other.  Finally, we would
> eventually be interested in doing something similar to
> org.apache.lucene.queries.CustomScoreQuery, where you can put in several
> different criteria and weight them separately per document.
>
> I’ve been poking around at a lot of places and would appreciate some help
> about where I should extend, an existing walkthough or example, etc.
>  Here’s what I’ve been considering:
>
>   *
> org/apache/lucene/queryparser/flexible/standard/StandardQueryParser.java —
> modifying this to add another group-like QueryNode, modifying the processor
> pipeline to include this, modifying the definition of a TERM so it can deal
> with attribute=”value” pairs in pseudo-xml.  I read through the QueryParser
> documentation but quickly got lost in the implementation.
>   *   org/apache/lucene/queryparser/xml/CorePlusExtensionsParser.java —
> this seems like it has to do a lot of what I want, but I can’t tell.  I
> hadn’t originally thought of the query coming in as an xml stream.  I think
> I would still need to define some new Query types... Perhaps a lot?  One
> for each type of thing (“store”, in the above) I’d search for?
>
> Thanks!
>
> stephen
>