You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "J. Delgado" <jo...@gmail.com> on 2012/02/21 09:39:23 UTC

Indexing Boolean Expressions

Hi,

I would like to propose implementing "Indexing Boolean Expressions" (See
http://www.vldb.org/pvldb/2/vldb09-83.pdf) as a Lucene-based project for
GSoC.

Here is a snippet from the Abstract of the paper:
"We consider the problem of efficiently indexing Disjunctive Normal Form
(DNF) and Conjunctive Normal Form (CNF) Boolean expressions over a
high-dimensional multi-valued attribute space. The goal is to rapidly find
the set of Boolean expressions that evaluate to true for a given assignment
of values to attributes. A solution to this problem has applications in
online advertising (where a Boolean expression represents an advertiser’s
user targeting requirements, and an assignment of values to attributes
represents the characteristics of a user visiting an online page) and in
general any publish/subscribe system (where a Boolean expression
represents a subscription, and an assignment of values to attributes
represents an event)."

Any interest?

-- J

Re: Indexing Boolean Expressions

Posted by "J. Delgado" <jo...@gmail.com>.
Yes and no.

Most of the sponsored search and display ad systems have a distributed
"matching service" that select valid ads from each of the distributed node
and then computes actual bids locally for each ad based on bidding models.
Only the top bids per node are sent over to the "auction service"

Now, there has been some work on how to bake the bid model into a relevance
score to be able to select top N from the index, however it is not as
simple as it seems as some of these prediction/bidding models
use sophisticated machine learning algorithms based on training samples and
somewhat complex objective functions.

-- Joaquin


On Mon, Feb 11, 2013 at 10:02 AM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

> Yeah.... except with Percolator (which uses MemoryIndex under the hood, I
> believe), there is no relevancy score, so you just get match/no match, and
> if you need to figure out top N best matching ads against queries derived
> from page content, you need that relevancy score to get not all matching
> docs, but just those top N.
>
> No?
>
> Otis
> --
> http://sematext.com/
>
>
>
>
>
> On Mon, Feb 11, 2013 at 11:22 AM, J. Delgado <jo...@gmail.com>wrote:
>
>> I guess ElasticSearch went ahead of SOLR with the percolate API, which is
>> exactly what is needed for two-way constraint+doc matching problem present
>> in Advertising systems and other use cases:
>>
>> http://www.elasticsearch.org/guide/reference/api/percolate.html
>>
>> Cheers,
>>
>> Joaquin Delgado, PhD.
>> http://www.linkedin.com/pub/profile/0/04b/277
>>
>> On Mon, Mar 26, 2012 at 10:17 AM, Walter Underwood <wunder@wunderwood.org
>> > wrote:
>>
>>> Efficient rule matching goes further back, at least to "alerting" in
>>> Verity K2.
>>>
>>> wunder
>>> Search Guy, Chegg
>>>
>>> On Mar 26, 2012, at 10:15 AM, J. Delgado wrote:
>>>
>>> BTW, the idea of indexing Boolean Expressions inside a text indexing
>>> engine is not new. For example Oracle Text provides the CTXRULE index and
>>> the MATCHES operator within their indexing stack, which is primarily used
>>> for Rule-based text classification.
>>>
>>> See:
>>>
>>> http://docs.oracle.com/cd/B28359_01/text.111/b28303/query.htm#autoId8
>>>
>>> http://docs.oracle.com/cd/B28359_01/text.111/b28303/classify.htm#g1011013
>>>
>>> -- J
>>>
>>> On Mon, Mar 26, 2012 at 10:07 AM, J. Delgado <jo...@gmail.com>wrote:
>>>
>>>> In full dislosure, there is a patent application that Yahoo! has filed
>>>> for the use of inverted indexes for using complex  predicates for matching
>>>> contracts and opportunities in advertising:
>>>>
>>>> http://www.google.com/patents/US20110016109?printsec=abstract#v=onepage&q&f=false
>>>>
>>>> However I believe there are many more applications that can benefit
>>>> from similar matching techniques (i.e. recommender systems,
>>>> e-commerce, recruiting,etc) to make it worthwhile implementing the ideas
>>>> exposed in the original VLDB'09 paper (which is public) in Lucene.
>>>>
>>>> As a Yahoo! employee, I might not be able to directly contribute to
>>>> this project but will be happy to point to any publicly available pointer
>>>> that can help.
>>>>
>>>> Cheers,
>>>>
>>>> -- Joaquin
>>>>
>>>>
>>>> On Sun, Mar 25, 2012 at 11:44 PM, Mikhail Khludnev <
>>>> mkhludnev@griddynamics.com> wrote:
>>>>
>>>>> Hello Joaquin,
>>>>>
>>>>> I looked through the paper several times, and see no problem to
>>>>> implement it in Lucene (the trivial case at least):
>>>>>
>>>>> Let's index conjunctive condition as
>>>>>  {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3}
>>>>>
>>>>> then, form query from the incoming fact (event):
>>>>> fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD
>>>>>
>>>>> to enforce overlap between condition and event, wrap the query above
>>>>> into own query whose scorer will check that numClauses for the matched doc
>>>>> is equal to number of matched clauses.
>>>>> To get "numClauses for the matched doc" you can use FieldCache that's
>>>>> damn fast; and "number of matched clauses" can be obtained from
>>>>> DisjunctionSumScorer.nrMatchers()
>>>>>
>>>>> Negative clauses, and multivalue can be covered also, I believe.
>>>>>
>>>>> WDYT?
>>>>>
>>>>>
>>>>> On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado <joaquin.delgado@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> I looked at LUCENE-2987 and its work on the query side (changes to
>>>>>> the accepted syntax to accept lower case 'or' and 'and'), which isn't
>>>>>> really related to my proposal.
>>>>>>
>>>>>> What I'm proposing is to be able to index complex boolean expressions
>>>>>> using Lucene. This can be viewed as the opposite of the regular search
>>>>>> task. The objective here is find a set of relevant queries given a document
>>>>>> (assignment of values to fields).
>>>>>>
>>>>>> This by itself may not sound that interesting but its a key piece
>>>>>> to efficiently implementing any MATCHING system which is effectively a
>>>>>> two-way search where constraints are defined both-ways. An example of this
>>>>>> would be:
>>>>>>
>>>>>> 1) Job matching: Potential employers define their "job posting" as a
>>>>>> documents along with complex boolean expressions used to narrow potential
>>>>>> candidates. Job searchers upload their "profile" and may formulate complex
>>>>>> queries when executing a search. Once a is search initiated from any of the
>>>>>> sides constraints need to satisfied both ways.
>>>>>> 2) Advertising: Publishers define constraints on the type of
>>>>>> advertisers/ads they are willing to show in their sites. On the other hand,
>>>>>> advertisers define constraints (typically at the campaign level) on
>>>>>> publisher sites they want their ads to show at as well as on the user
>>>>>> audiences they are targeting to. While some attribute values are known at
>>>>>> definition time, others are only instantiated once the user visits a given
>>>>>> page which triggers a matching request that must be satisfied in
>>>>>> few milliseconds to select "valid" ads and then scored based on "relevance".
>>>>>>
>>>>>> So in a matching system a MATCH QUERY is considered to to be a tuple
>>>>>> that consists of a value assignment to attributes/fields (doc) + a boolean
>>>>>> expression (query) that goes against a double index also built on tuples
>>>>>> that  simultaneously boolean expressions and associated documents.
>>>>>>
>>>>>> To do this efficiently we need to be able to build indexes on Boolean
>>>>>> expressions (Lucene queries) and retrieve the set of matching expressions
>>>>>> given a doc (typically few attributes with values assigned), which is the
>>>>>> core of what is described in this paper: "Indexing Boolean
>>>>>> Expressions" (See http://www.vldb.org/pvldb/2/vldb09-83.pdf)
>>>>>>
>>>>>> -- J
>>>>>>
>>>>>>
>>>>>> So to effectively resolve the problem of realtime matching one can
>>>>>>
>>>>>> On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera <ca...@gmail.com>wrote:
>>>>>>
>>>>>>>  On 02/21/2012 12:15 PM, Aayush Kothari wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>  So if Aayush Kothari is interested in working on this as a
>>>>>>>> Student, all we need is a formal mentor (I can be the informal one).
>>>>>>>>
>>>>>>>>  Anyone up for the task?
>>>>>>>>
>>>>>>>>
>>>>>>>>   Completely interested in working for and learning about the
>>>>>>> aforementioned subject/project. +1.
>>>>>>>
>>>>>>> This may be related to the work I'm doing with LUCENE-2987
>>>>>>> Basically changing the grammar to accepts conjunctions AND and OR in
>>>>>>> the query text.
>>>>>>> I would be interested in working with you on some of the details.
>>>>>>>
>>>>>>> However, I too am not a formal committer.
>>>>>>>
>>>>>>> --
>>>>>>> Joe Cabreraeminorlabs.com
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sincerely yours
>>>>> Mikhail Khludnev
>>>>> Lucid Certified
>>>>> Apache Lucene/Solr Developer
>>>>> Grid Dynamics
>>>>>
>>>>>
>>>
>>>
>>>
>>
>

Re: Indexing Boolean Expressions

Posted by Otis Gospodnetic <ot...@gmail.com>.
Yeah.... except with Percolator (which uses MemoryIndex under the hood, I
believe), there is no relevancy score, so you just get match/no match, and
if you need to figure out top N best matching ads against queries derived
from page content, you need that relevancy score to get not all matching
docs, but just those top N.

No?

Otis
--
http://sematext.com/





On Mon, Feb 11, 2013 at 11:22 AM, J. Delgado <jo...@gmail.com>wrote:

> I guess ElasticSearch went ahead of SOLR with the percolate API, which is
> exactly what is needed for two-way constraint+doc matching problem present
> in Advertising systems and other use cases:
>
> http://www.elasticsearch.org/guide/reference/api/percolate.html
>
> Cheers,
>
> Joaquin Delgado, PhD.
> http://www.linkedin.com/pub/profile/0/04b/277
>
> On Mon, Mar 26, 2012 at 10:17 AM, Walter Underwood <wu...@wunderwood.org>wrote:
>
>> Efficient rule matching goes further back, at least to "alerting" in
>> Verity K2.
>>
>> wunder
>> Search Guy, Chegg
>>
>> On Mar 26, 2012, at 10:15 AM, J. Delgado wrote:
>>
>> BTW, the idea of indexing Boolean Expressions inside a text indexing
>> engine is not new. For example Oracle Text provides the CTXRULE index and
>> the MATCHES operator within their indexing stack, which is primarily used
>> for Rule-based text classification.
>>
>> See:
>>
>> http://docs.oracle.com/cd/B28359_01/text.111/b28303/query.htm#autoId8
>>
>> http://docs.oracle.com/cd/B28359_01/text.111/b28303/classify.htm#g1011013
>>
>> -- J
>>
>> On Mon, Mar 26, 2012 at 10:07 AM, J. Delgado <jo...@gmail.com>wrote:
>>
>>> In full dislosure, there is a patent application that Yahoo! has filed
>>> for the use of inverted indexes for using complex  predicates for matching
>>> contracts and opportunities in advertising:
>>>
>>> http://www.google.com/patents/US20110016109?printsec=abstract#v=onepage&q&f=false
>>>
>>> However I believe there are many more applications that can benefit from
>>> similar matching techniques (i.e. recommender systems,
>>> e-commerce, recruiting,etc) to make it worthwhile implementing the ideas
>>> exposed in the original VLDB'09 paper (which is public) in Lucene.
>>>
>>> As a Yahoo! employee, I might not be able to directly contribute to this
>>> project but will be happy to point to any publicly available pointer that
>>> can help.
>>>
>>> Cheers,
>>>
>>> -- Joaquin
>>>
>>>
>>> On Sun, Mar 25, 2012 at 11:44 PM, Mikhail Khludnev <
>>> mkhludnev@griddynamics.com> wrote:
>>>
>>>> Hello Joaquin,
>>>>
>>>> I looked through the paper several times, and see no problem to
>>>> implement it in Lucene (the trivial case at least):
>>>>
>>>> Let's index conjunctive condition as
>>>>  {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3}
>>>>
>>>> then, form query from the incoming fact (event):
>>>> fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD
>>>>
>>>> to enforce overlap between condition and event, wrap the query above
>>>> into own query whose scorer will check that numClauses for the matched doc
>>>> is equal to number of matched clauses.
>>>> To get "numClauses for the matched doc" you can use FieldCache that's
>>>> damn fast; and "number of matched clauses" can be obtained from
>>>> DisjunctionSumScorer.nrMatchers()
>>>>
>>>> Negative clauses, and multivalue can be covered also, I believe.
>>>>
>>>> WDYT?
>>>>
>>>>
>>>> On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado <jo...@gmail.com>wrote:
>>>>
>>>>> I looked at LUCENE-2987 and its work on the query side (changes to the
>>>>> accepted syntax to accept lower case 'or' and 'and'), which isn't really
>>>>> related to my proposal.
>>>>>
>>>>> What I'm proposing is to be able to index complex boolean expressions
>>>>> using Lucene. This can be viewed as the opposite of the regular search
>>>>> task. The objective here is find a set of relevant queries given a document
>>>>> (assignment of values to fields).
>>>>>
>>>>> This by itself may not sound that interesting but its a key piece
>>>>> to efficiently implementing any MATCHING system which is effectively a
>>>>> two-way search where constraints are defined both-ways. An example of this
>>>>> would be:
>>>>>
>>>>> 1) Job matching: Potential employers define their "job posting" as a
>>>>> documents along with complex boolean expressions used to narrow potential
>>>>> candidates. Job searchers upload their "profile" and may formulate complex
>>>>> queries when executing a search. Once a is search initiated from any of the
>>>>> sides constraints need to satisfied both ways.
>>>>> 2) Advertising: Publishers define constraints on the type of
>>>>> advertisers/ads they are willing to show in their sites. On the other hand,
>>>>> advertisers define constraints (typically at the campaign level) on
>>>>> publisher sites they want their ads to show at as well as on the user
>>>>> audiences they are targeting to. While some attribute values are known at
>>>>> definition time, others are only instantiated once the user visits a given
>>>>> page which triggers a matching request that must be satisfied in
>>>>> few milliseconds to select "valid" ads and then scored based on "relevance".
>>>>>
>>>>> So in a matching system a MATCH QUERY is considered to to be a tuple
>>>>> that consists of a value assignment to attributes/fields (doc) + a boolean
>>>>> expression (query) that goes against a double index also built on tuples
>>>>> that  simultaneously boolean expressions and associated documents.
>>>>>
>>>>> To do this efficiently we need to be able to build indexes on Boolean
>>>>> expressions (Lucene queries) and retrieve the set of matching expressions
>>>>> given a doc (typically few attributes with values assigned), which is the
>>>>> core of what is described in this paper: "Indexing Boolean Expressions"
>>>>> (See http://www.vldb.org/pvldb/2/vldb09-83.pdf)
>>>>>
>>>>> -- J
>>>>>
>>>>>
>>>>> So to effectively resolve the problem of realtime matching one can
>>>>>
>>>>> On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera <ca...@gmail.com>wrote:
>>>>>
>>>>>>  On 02/21/2012 12:15 PM, Aayush Kothari wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>  So if Aayush Kothari is interested in working on this as a
>>>>>>> Student, all we need is a formal mentor (I can be the informal one).
>>>>>>>
>>>>>>>  Anyone up for the task?
>>>>>>>
>>>>>>>
>>>>>>>   Completely interested in working for and learning about the
>>>>>> aforementioned subject/project. +1.
>>>>>>
>>>>>> This may be related to the work I'm doing with LUCENE-2987
>>>>>> Basically changing the grammar to accepts conjunctions AND and OR in
>>>>>> the query text.
>>>>>> I would be interested in working with you on some of the details.
>>>>>>
>>>>>> However, I too am not a formal committer.
>>>>>>
>>>>>> --
>>>>>> Joe Cabreraeminorlabs.com
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Sincerely yours
>>>> Mikhail Khludnev
>>>> Lucid Certified
>>>> Apache Lucene/Solr Developer
>>>> Grid Dynamics
>>>>
>>>>
>>
>>
>>
>

Re: Indexing Boolean Expressions

Posted by "J. Delgado" <jo...@gmail.com>.
I guess ElasticSearch went ahead of SOLR with the percolate API, which is
exactly what is needed for two-way constraint+doc matching problem present
in Advertising systems and other use cases:

http://www.elasticsearch.org/guide/reference/api/percolate.html

Cheers,

Joaquin Delgado, PhD.
http://www.linkedin.com/pub/profile/0/04b/277

On Mon, Mar 26, 2012 at 10:17 AM, Walter Underwood <wu...@wunderwood.org>wrote:

> Efficient rule matching goes further back, at least to "alerting" in
> Verity K2.
>
> wunder
> Search Guy, Chegg
>
> On Mar 26, 2012, at 10:15 AM, J. Delgado wrote:
>
> BTW, the idea of indexing Boolean Expressions inside a text indexing
> engine is not new. For example Oracle Text provides the CTXRULE index and
> the MATCHES operator within their indexing stack, which is primarily used
> for Rule-based text classification.
>
> See:
>
> http://docs.oracle.com/cd/B28359_01/text.111/b28303/query.htm#autoId8
>
> http://docs.oracle.com/cd/B28359_01/text.111/b28303/classify.htm#g1011013
>
> -- J
>
> On Mon, Mar 26, 2012 at 10:07 AM, J. Delgado <jo...@gmail.com>wrote:
>
>> In full dislosure, there is a patent application that Yahoo! has filed
>> for the use of inverted indexes for using complex  predicates for matching
>> contracts and opportunities in advertising:
>>
>> http://www.google.com/patents/US20110016109?printsec=abstract#v=onepage&q&f=false
>>
>> However I believe there are many more applications that can benefit from
>> similar matching techniques (i.e. recommender systems,
>> e-commerce, recruiting,etc) to make it worthwhile implementing the ideas
>> exposed in the original VLDB'09 paper (which is public) in Lucene.
>>
>> As a Yahoo! employee, I might not be able to directly contribute to this
>> project but will be happy to point to any publicly available pointer that
>> can help.
>>
>> Cheers,
>>
>> -- Joaquin
>>
>>
>> On Sun, Mar 25, 2012 at 11:44 PM, Mikhail Khludnev <
>> mkhludnev@griddynamics.com> wrote:
>>
>>> Hello Joaquin,
>>>
>>> I looked through the paper several times, and see no problem to
>>> implement it in Lucene (the trivial case at least):
>>>
>>> Let's index conjunctive condition as
>>>  {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3}
>>>
>>> then, form query from the incoming fact (event):
>>> fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD
>>>
>>> to enforce overlap between condition and event, wrap the query above
>>> into own query whose scorer will check that numClauses for the matched doc
>>> is equal to number of matched clauses.
>>> To get "numClauses for the matched doc" you can use FieldCache that's
>>> damn fast; and "number of matched clauses" can be obtained from
>>> DisjunctionSumScorer.nrMatchers()
>>>
>>> Negative clauses, and multivalue can be covered also, I believe.
>>>
>>> WDYT?
>>>
>>>
>>> On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado <jo...@gmail.com>wrote:
>>>
>>>> I looked at LUCENE-2987 and its work on the query side (changes to the
>>>> accepted syntax to accept lower case 'or' and 'and'), which isn't really
>>>> related to my proposal.
>>>>
>>>> What I'm proposing is to be able to index complex boolean expressions
>>>> using Lucene. This can be viewed as the opposite of the regular search
>>>> task. The objective here is find a set of relevant queries given a document
>>>> (assignment of values to fields).
>>>>
>>>> This by itself may not sound that interesting but its a key piece
>>>> to efficiently implementing any MATCHING system which is effectively a
>>>> two-way search where constraints are defined both-ways. An example of this
>>>> would be:
>>>>
>>>> 1) Job matching: Potential employers define their "job posting" as a
>>>> documents along with complex boolean expressions used to narrow potential
>>>> candidates. Job searchers upload their "profile" and may formulate complex
>>>> queries when executing a search. Once a is search initiated from any of the
>>>> sides constraints need to satisfied both ways.
>>>> 2) Advertising: Publishers define constraints on the type of
>>>> advertisers/ads they are willing to show in their sites. On the other hand,
>>>> advertisers define constraints (typically at the campaign level) on
>>>> publisher sites they want their ads to show at as well as on the user
>>>> audiences they are targeting to. While some attribute values are known at
>>>> definition time, others are only instantiated once the user visits a given
>>>> page which triggers a matching request that must be satisfied in
>>>> few milliseconds to select "valid" ads and then scored based on "relevance".
>>>>
>>>> So in a matching system a MATCH QUERY is considered to to be a tuple
>>>> that consists of a value assignment to attributes/fields (doc) + a boolean
>>>> expression (query) that goes against a double index also built on tuples
>>>> that  simultaneously boolean expressions and associated documents.
>>>>
>>>> To do this efficiently we need to be able to build indexes on Boolean
>>>> expressions (Lucene queries) and retrieve the set of matching expressions
>>>> given a doc (typically few attributes with values assigned), which is the
>>>> core of what is described in this paper: "Indexing Boolean Expressions"
>>>> (See http://www.vldb.org/pvldb/2/vldb09-83.pdf)
>>>>
>>>> -- J
>>>>
>>>>
>>>> So to effectively resolve the problem of realtime matching one can
>>>>
>>>> On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera <ca...@gmail.com>wrote:
>>>>
>>>>>  On 02/21/2012 12:15 PM, Aayush Kothari wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>  So if Aayush Kothari is interested in working on this as a Student,
>>>>>> all we need is a formal mentor (I can be the informal one).
>>>>>>
>>>>>>  Anyone up for the task?
>>>>>>
>>>>>>
>>>>>>   Completely interested in working for and learning about the
>>>>> aforementioned subject/project. +1.
>>>>>
>>>>> This may be related to the work I'm doing with LUCENE-2987
>>>>> Basically changing the grammar to accepts conjunctions AND and OR in
>>>>> the query text.
>>>>> I would be interested in working with you on some of the details.
>>>>>
>>>>> However, I too am not a formal committer.
>>>>>
>>>>> --
>>>>> Joe Cabreraeminorlabs.com
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>> Lucid Certified
>>> Apache Lucene/Solr Developer
>>> Grid Dynamics
>>>
>>>
>
>
>

Re: Indexing Boolean Expressions

Posted by Walter Underwood <wu...@wunderwood.org>.
Efficient rule matching goes further back, at least to "alerting" in Verity K2.

wunder
Search Guy, Chegg

On Mar 26, 2012, at 10:15 AM, J. Delgado wrote:

> BTW, the idea of indexing Boolean Expressions inside a text indexing engine is not new. For example Oracle Text provides the CTXRULE index and the MATCHES operator within their indexing stack, which is primarily used for Rule-based text classification.
> 
> See:
> 
> http://docs.oracle.com/cd/B28359_01/text.111/b28303/query.htm#autoId8
> 
> http://docs.oracle.com/cd/B28359_01/text.111/b28303/classify.htm#g1011013
> 
> -- J
> 
> On Mon, Mar 26, 2012 at 10:07 AM, J. Delgado <jo...@gmail.com> wrote:
> In full dislosure, there is a patent application that Yahoo! has filed for the use of inverted indexes for using complex  predicates for matching contracts and opportunities in advertising:
> http://www.google.com/patents/US20110016109?printsec=abstract#v=onepage&q&f=false
> 
> However I believe there are many more applications that can benefit from similar matching techniques (i.e. recommender systems, e-commerce, recruiting,etc) to make it worthwhile implementing the ideas exposed in the original VLDB'09 paper (which is public) in Lucene.
> 
> As a Yahoo! employee, I might not be able to directly contribute to this project but will be happy to point to any publicly available pointer that can help.
> 
> Cheers,
> 
> -- Joaquin
> 
> 
> On Sun, Mar 25, 2012 at 11:44 PM, Mikhail Khludnev <mk...@griddynamics.com> wrote:
> Hello Joaquin,
> 
> I looked through the paper several times, and see no problem to implement it in Lucene (the trivial case at least):
> 
> Let's index conjunctive condition as
>  {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3}
> 
> then, form query from the incoming fact (event):
> fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD
> 
> to enforce overlap between condition and event, wrap the query above into own query whose scorer will check that numClauses for the matched doc is equal to number of matched clauses. 
> To get "numClauses for the matched doc" you can use FieldCache that's damn fast; and "number of matched clauses" can be obtained from DisjunctionSumScorer.nrMatchers()
> 
> Negative clauses, and multivalue can be covered also, I believe.  
> 
> WDYT?
> 
> 
> On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado <jo...@gmail.com> wrote:
> I looked at LUCENE-2987 and its work on the query side (changes to the accepted syntax to accept lower case 'or' and 'and'), which isn't really related to my proposal.
> 
> What I'm proposing is to be able to index complex boolean expressions using Lucene. This can be viewed as the opposite of the regular search task. The objective here is find a set of relevant queries given a document (assignment of values to fields).
> 
> This by itself may not sound that interesting but its a key piece to efficiently implementing any MATCHING system which is effectively a two-way search where constraints are defined both-ways. An example of this would be:
> 
> 1) Job matching: Potential employers define their "job posting" as a documents along with complex boolean expressions used to narrow potential candidates. Job searchers upload their "profile" and may formulate complex queries when executing a search. Once a is search initiated from any of the sides constraints need to satisfied both ways. 
> 2) Advertising: Publishers define constraints on the type of advertisers/ads they are willing to show in their sites. On the other hand, advertisers define constraints (typically at the campaign level) on publisher sites they want their ads to show at as well as on the user audiences they are targeting to. While some attribute values are known at definition time, others are only instantiated once the user visits a given page which triggers a matching request that must be satisfied in few milliseconds to select "valid" ads and then scored based on "relevance".
> 
> So in a matching system a MATCH QUERY is considered to to be a tuple that consists of a value assignment to attributes/fields (doc) + a boolean expression (query) that goes against a double index also built on tuples that  simultaneously boolean expressions and associated documents.
> 
> To do this efficiently we need to be able to build indexes on Boolean expressions (Lucene queries) and retrieve the set of matching expressions given a doc (typically few attributes with values assigned), which is the core of what is described in this paper: "Indexing Boolean Expressions" (See http://www.vldb.org/pvldb/2/vldb09-83.pdf)
> 
> -- J
> 
> 
> So to effectively resolve the problem of realtime matching one can 
> 
> On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera <ca...@gmail.com> wrote:
> On 02/21/2012 12:15 PM, Aayush Kothari wrote:
>> 
>> 
>> 
>> 
>> So if Aayush Kothari is interested in working on this as a Student, all we need is a formal mentor (I can be the informal one). 
>> 
>> Anyone up for the task?
>> 
>> 
>> Completely interested in working for and learning about the aforementioned subject/project. +1.  
> This may be related to the work I'm doing with LUCENE-2987
> Basically changing the grammar to accepts conjunctions AND and OR in the query text.
> I would be interested in working with you on some of the details.
> 
> However, I too am not a formal committer.
> 
> -- 
> Joe Cabrera
> eminorlabs.com
> 
> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev
> Lucid Certified
> Apache Lucene/Solr Developer
> Grid Dynamics
> 





Re: Indexing Boolean Expressions

Posted by "J. Delgado" <jo...@gmail.com>.
BTW, the idea of indexing Boolean Expressions inside a text indexing engine
is not new. For example Oracle Text provides the CTXRULE index and the
MATCHES operator within their indexing stack, which is primarily used for
Rule-based text classification.

See:

http://docs.oracle.com/cd/B28359_01/text.111/b28303/query.htm#autoId8

http://docs.oracle.com/cd/B28359_01/text.111/b28303/classify.htm#g1011013

-- J

On Mon, Mar 26, 2012 at 10:07 AM, J. Delgado <jo...@gmail.com>wrote:

> In full dislosure, there is a patent application that Yahoo! has filed for
> the use of inverted indexes for using complex  predicates for matching
> contracts and opportunities in advertising:
>
> http://www.google.com/patents/US20110016109?printsec=abstract#v=onepage&q&f=false
>
> However I believe there are many more applications that can benefit from
> similar matching techniques (i.e. recommender systems,
> e-commerce, recruiting,etc) to make it worthwhile implementing the ideas
> exposed in the original VLDB'09 paper (which is public) in Lucene.
>
> As a Yahoo! employee, I might not be able to directly contribute to this
> project but will be happy to point to any publicly available pointer that
> can help.
>
> Cheers,
>
> -- Joaquin
>
>
> On Sun, Mar 25, 2012 at 11:44 PM, Mikhail Khludnev <
> mkhludnev@griddynamics.com> wrote:
>
>> Hello Joaquin,
>>
>> I looked through the paper several times, and see no problem to implement
>> it in Lucene (the trivial case at least):
>>
>> Let's index conjunctive condition as
>>  {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3}
>>
>> then, form query from the incoming fact (event):
>> fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD
>>
>> to enforce overlap between condition and event, wrap the query above into
>> own query whose scorer will check that numClauses for the matched doc is
>> equal to number of matched clauses.
>> To get "numClauses for the matched doc" you can use FieldCache that's
>> damn fast; and "number of matched clauses" can be obtained from
>> DisjunctionSumScorer.nrMatchers()
>>
>> Negative clauses, and multivalue can be covered also, I believe.
>>
>> WDYT?
>>
>>
>> On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado <jo...@gmail.com>wrote:
>>
>>> I looked at LUCENE-2987 and its work on the query side (changes to the
>>> accepted syntax to accept lower case 'or' and 'and'), which isn't really
>>> related to my proposal.
>>>
>>> What I'm proposing is to be able to index complex boolean expressions
>>> using Lucene. This can be viewed as the opposite of the regular search
>>> task. The objective here is find a set of relevant queries given a document
>>> (assignment of values to fields).
>>>
>>> This by itself may not sound that interesting but its a key piece
>>> to efficiently implementing any MATCHING system which is effectively a
>>> two-way search where constraints are defined both-ways. An example of this
>>> would be:
>>>
>>> 1) Job matching: Potential employers define their "job posting" as a
>>> documents along with complex boolean expressions used to narrow potential
>>> candidates. Job searchers upload their "profile" and may formulate complex
>>> queries when executing a search. Once a is search initiated from any of the
>>> sides constraints need to satisfied both ways.
>>> 2) Advertising: Publishers define constraints on the type of
>>> advertisers/ads they are willing to show in their sites. On the other hand,
>>> advertisers define constraints (typically at the campaign level) on
>>> publisher sites they want their ads to show at as well as on the user
>>> audiences they are targeting to. While some attribute values are known at
>>> definition time, others are only instantiated once the user visits a given
>>> page which triggers a matching request that must be satisfied in
>>> few milliseconds to select "valid" ads and then scored based on "relevance".
>>>
>>> So in a matching system a MATCH QUERY is considered to to be a tuple
>>> that consists of a value assignment to attributes/fields (doc) + a boolean
>>> expression (query) that goes against a double index also built on tuples
>>> that  simultaneously boolean expressions and associated documents.
>>>
>>> To do this efficiently we need to be able to build indexes on Boolean
>>> expressions (Lucene queries) and retrieve the set of matching expressions
>>> given a doc (typically few attributes with values assigned), which is the
>>> core of what is described in this paper: "Indexing Boolean Expressions"
>>> (See http://www.vldb.org/pvldb/2/vldb09-83.pdf)
>>>
>>> -- J
>>>
>>>
>>> So to effectively resolve the problem of realtime matching one can
>>>
>>> On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera <ca...@gmail.com>wrote:
>>>
>>>>  On 02/21/2012 12:15 PM, Aayush Kothari wrote:
>>>>
>>>>
>>>>
>>>>
>>>>>  So if Aayush Kothari is interested in working on this as a Student,
>>>>> all we need is a formal mentor (I can be the informal one).
>>>>>
>>>>>  Anyone up for the task?
>>>>>
>>>>>
>>>>>   Completely interested in working for and learning about the
>>>> aforementioned subject/project. +1.
>>>>
>>>> This may be related to the work I'm doing with LUCENE-2987
>>>> Basically changing the grammar to accepts conjunctions AND and OR in
>>>> the query text.
>>>> I would be interested in working with you on some of the details.
>>>>
>>>> However, I too am not a formal committer.
>>>>
>>>> --
>>>> Joe Cabreraeminorlabs.com
>>>>
>>>>
>>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Lucid Certified
>> Apache Lucene/Solr Developer
>> Grid Dynamics
>>
>> <http://www.griddynamics.com>
>>  <mk...@griddynamics.com>
>>
>>
>

Re: Indexing Boolean Expressions

Posted by "J. Delgado" <jo...@gmail.com>.
In full dislosure, there is a patent application that Yahoo! has filed for
the use of inverted indexes for using complex  predicates for matching
contracts and opportunities in advertising:
http://www.google.com/patents/US20110016109?printsec=abstract#v=onepage&q&f=false

However I believe there are many more applications that can benefit from
similar matching techniques (i.e. recommender systems,
e-commerce, recruiting,etc) to make it worthwhile implementing the ideas
exposed in the original VLDB'09 paper (which is public) in Lucene.

As a Yahoo! employee, I might not be able to directly contribute to this
project but will be happy to point to any publicly available pointer that
can help.

Cheers,

-- Joaquin


On Sun, Mar 25, 2012 at 11:44 PM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

> Hello Joaquin,
>
> I looked through the paper several times, and see no problem to implement
> it in Lucene (the trivial case at least):
>
> Let's index conjunctive condition as
>  {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3}
>
> then, form query from the incoming fact (event):
> fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD
>
> to enforce overlap between condition and event, wrap the query above into
> own query whose scorer will check that numClauses for the matched doc is
> equal to number of matched clauses.
> To get "numClauses for the matched doc" you can use FieldCache that's damn
> fast; and "number of matched clauses" can be obtained from
> DisjunctionSumScorer.nrMatchers()
>
> Negative clauses, and multivalue can be covered also, I believe.
>
> WDYT?
>
>
> On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado <jo...@gmail.com>wrote:
>
>> I looked at LUCENE-2987 and its work on the query side (changes to the
>> accepted syntax to accept lower case 'or' and 'and'), which isn't really
>> related to my proposal.
>>
>> What I'm proposing is to be able to index complex boolean expressions
>> using Lucene. This can be viewed as the opposite of the regular search
>> task. The objective here is find a set of relevant queries given a document
>> (assignment of values to fields).
>>
>> This by itself may not sound that interesting but its a key piece
>> to efficiently implementing any MATCHING system which is effectively a
>> two-way search where constraints are defined both-ways. An example of this
>> would be:
>>
>> 1) Job matching: Potential employers define their "job posting" as a
>> documents along with complex boolean expressions used to narrow potential
>> candidates. Job searchers upload their "profile" and may formulate complex
>> queries when executing a search. Once a is search initiated from any of the
>> sides constraints need to satisfied both ways.
>> 2) Advertising: Publishers define constraints on the type of
>> advertisers/ads they are willing to show in their sites. On the other hand,
>> advertisers define constraints (typically at the campaign level) on
>> publisher sites they want their ads to show at as well as on the user
>> audiences they are targeting to. While some attribute values are known at
>> definition time, others are only instantiated once the user visits a given
>> page which triggers a matching request that must be satisfied in
>> few milliseconds to select "valid" ads and then scored based on "relevance".
>>
>> So in a matching system a MATCH QUERY is considered to to be a tuple that
>> consists of a value assignment to attributes/fields (doc) + a boolean
>> expression (query) that goes against a double index also built on tuples
>> that  simultaneously boolean expressions and associated documents.
>>
>> To do this efficiently we need to be able to build indexes on Boolean
>> expressions (Lucene queries) and retrieve the set of matching expressions
>> given a doc (typically few attributes with values assigned), which is the
>> core of what is described in this paper: "Indexing Boolean Expressions"
>> (See http://www.vldb.org/pvldb/2/vldb09-83.pdf)
>>
>> -- J
>>
>>
>> So to effectively resolve the problem of realtime matching one can
>>
>> On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera <ca...@gmail.com>wrote:
>>
>>>  On 02/21/2012 12:15 PM, Aayush Kothari wrote:
>>>
>>>
>>>
>>>
>>>>  So if Aayush Kothari is interested in working on this as a Student,
>>>> all we need is a formal mentor (I can be the informal one).
>>>>
>>>>  Anyone up for the task?
>>>>
>>>>
>>>>   Completely interested in working for and learning about the
>>> aforementioned subject/project. +1.
>>>
>>> This may be related to the work I'm doing with LUCENE-2987
>>> Basically changing the grammar to accepts conjunctions AND and OR in the
>>> query text.
>>> I would be interested in working with you on some of the details.
>>>
>>> However, I too am not a formal committer.
>>>
>>> --
>>> Joe Cabreraeminorlabs.com
>>>
>>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Lucid Certified
> Apache Lucene/Solr Developer
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mk...@griddynamics.com>
>
>

Re: Indexing Boolean Expressions

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Hello Joaquin,

I looked through the paper several times, and see no problem to implement
it in Lucene (the trivial case at least):

Let's index conjunctive condition as
 {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3}

then, form query from the incoming fact (event):
fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD

to enforce overlap between condition and event, wrap the query above into
own query whose scorer will check that numClauses for the matched doc is
equal to number of matched clauses.
To get "numClauses for the matched doc" you can use FieldCache that's damn
fast; and "number of matched clauses" can be obtained from
DisjunctionSumScorer.nrMatchers()

Negative clauses, and multivalue can be covered also, I believe.

WDYT?

On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado <jo...@gmail.com>wrote:

> I looked at LUCENE-2987 and its work on the query side (changes to the
> accepted syntax to accept lower case 'or' and 'and'), which isn't really
> related to my proposal.
>
> What I'm proposing is to be able to index complex boolean expressions
> using Lucene. This can be viewed as the opposite of the regular search
> task. The objective here is find a set of relevant queries given a document
> (assignment of values to fields).
>
> This by itself may not sound that interesting but its a key piece
> to efficiently implementing any MATCHING system which is effectively a
> two-way search where constraints are defined both-ways. An example of this
> would be:
>
> 1) Job matching: Potential employers define their "job posting" as a
> documents along with complex boolean expressions used to narrow potential
> candidates. Job searchers upload their "profile" and may formulate complex
> queries when executing a search. Once a is search initiated from any of the
> sides constraints need to satisfied both ways.
> 2) Advertising: Publishers define constraints on the type of
> advertisers/ads they are willing to show in their sites. On the other hand,
> advertisers define constraints (typically at the campaign level) on
> publisher sites they want their ads to show at as well as on the user
> audiences they are targeting to. While some attribute values are known at
> definition time, others are only instantiated once the user visits a given
> page which triggers a matching request that must be satisfied in
> few milliseconds to select "valid" ads and then scored based on "relevance".
>
> So in a matching system a MATCH QUERY is considered to to be a tuple that
> consists of a value assignment to attributes/fields (doc) + a boolean
> expression (query) that goes against a double index also built on tuples
> that  simultaneously boolean expressions and associated documents.
>
> To do this efficiently we need to be able to build indexes on Boolean
> expressions (Lucene queries) and retrieve the set of matching expressions
> given a doc (typically few attributes with values assigned), which is the
> core of what is described in this paper: "Indexing Boolean Expressions"
> (See http://www.vldb.org/pvldb/2/vldb09-83.pdf)
>
> -- J
>
>
> So to effectively resolve the problem of realtime matching one can
>
> On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera <ca...@gmail.com>wrote:
>
>>  On 02/21/2012 12:15 PM, Aayush Kothari wrote:
>>
>>
>>
>>
>>>  So if Aayush Kothari is interested in working on this as a Student,
>>> all we need is a formal mentor (I can be the informal one).
>>>
>>>  Anyone up for the task?
>>>
>>>
>>>   Completely interested in working for and learning about the
>> aforementioned subject/project. +1.
>>
>> This may be related to the work I'm doing with LUCENE-2987
>> Basically changing the grammar to accepts conjunctions AND and OR in the
>> query text.
>> I would be interested in working with you on some of the details.
>>
>> However, I too am not a formal committer.
>>
>> --
>> Joe Cabreraeminorlabs.com
>>
>>
>


-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Indexing Boolean Expressions

Posted by "J. Delgado" <jo...@gmail.com>.
I looked at LUCENE-2987 and its work on the query side (changes to the
accepted syntax to accept lower case 'or' and 'and'), which isn't really
related to my proposal.

What I'm proposing is to be able to index complex boolean expressions using
Lucene. This can be viewed as the opposite of the regular search task. The
objective here is find a set of relevant queries given a document
(assignment of values to fields).

This by itself may not sound that interesting but its a key piece
to efficiently implementing any MATCHING system which is effectively a
two-way search where constraints are defined both-ways. An example of this
would be:

1) Job matching: Potential employers define their "job posting" as a
documents along with complex boolean expressions used to narrow potential
candidates. Job searchers upload their "profile" and may formulate complex
queries when executing a search. Once a is search initiated from any of the
sides constraints need to satisfied both ways.
2) Advertising: Publishers define constraints on the type of
advertisers/ads they are willing to show in their sites. On the other hand,
advertisers define constraints (typically at the campaign level) on
publisher sites they want their ads to show at as well as on the user
audiences they are targeting to. While some attribute values are known at
definition time, others are only instantiated once the user visits a given
page which triggers a matching request that must be satisfied in
few milliseconds to select "valid" ads and then scored based on "relevance".

So in a matching system a MATCH QUERY is considered to to be a tuple that
consists of a value assignment to attributes/fields (doc) + a boolean
expression (query) that goes against a double index also built on tuples
that  simultaneously boolean expressions and associated documents.

To do this efficiently we need to be able to build indexes on Boolean
expressions (Lucene queries) and retrieve the set of matching expressions
given a doc (typically few attributes with values assigned), which is the
core of what is described in this paper: "Indexing Boolean Expressions"
(See http://www.vldb.org/pvldb/2/vldb09-83.pdf)

-- J


So to effectively resolve the problem of realtime matching one can

On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera <ca...@gmail.com> wrote:

>  On 02/21/2012 12:15 PM, Aayush Kothari wrote:
>
>
>
>
>>  So if Aayush Kothari is interested in working on this as a Student, all
>> we need is a formal mentor (I can be the informal one).
>>
>>  Anyone up for the task?
>>
>>
>>   Completely interested in working for and learning about the
> aforementioned subject/project. +1.
>
> This may be related to the work I'm doing with LUCENE-2987
> Basically changing the grammar to accepts conjunctions AND and OR in the
> query text.
> I would be interested in working with you on some of the details.
>
> However, I too am not a formal committer.
>
> --
> Joe Cabreraeminorlabs.com
>
>

Re: Indexing Boolean Expressions

Posted by Joe Cabrera <ca...@gmail.com>.
On 02/21/2012 12:15 PM, Aayush Kothari wrote:
>
>
>
>     So if Aayush Kothari is interested in working on this as a
>     Student, all we need is a formal mentor (I can be the informal one).
>
>     Anyone up for the task?
>
>
> Completely interested in working for and learning about the 
> aforementioned subject/project. +1.
This may be related to the work I'm doing with LUCENE-2987
Basically changing the grammar to accepts conjunctions AND and OR in the 
query text.
I would be interested in working with you on some of the details.

However, I too am not a formal committer.

-- 
Joe Cabrera
eminorlabs.com


Re: Indexing Boolean Expressions

Posted by Aayush Kothari <aa...@gmail.com>.
> So if Aayush Kothari is interested in working on this as a Student, all we
> need is a formal mentor (I can be the informal one).
>
> Anyone up for the task?
>
>
> Completely interested in working for and learning about the aforementioned
subject/project. +1.

Re: Indexing Boolean Expressions

Posted by "J. Delgado" <jo...@gmail.com>.
According to http://community.apache.org/mentoringprogramme.html I'm not
allowed to be a Mentor, because I'm not a committer. However, I believe
this can be a really interesting (and useful) project as it has a variety
of applications, including advertising, recommender systems, matching
engines, information filtering, pub-sub systems, etc.

Here is an interesting quote off the paper:

"IR systems [21, 26], which efficiently search documents given a
query, have been heavily studied. Our application is different in that
we are searching for queries (BEs) given the data (instead of the
other way around), and that we exploit the syntax of the complex
queries in order to exactly find the satisfied BEs"

So if Aayush Kothari is interested in working on this as a Student, all we
need is a formal mentor (I can be the informal one).

Anyone up for the task?

-- J

On Tue, Feb 21, 2012 at 8:28 AM, Aayush Kothari
<aa...@gmail.com>wrote:

> That's a really nice application of DNF and CNF. I'd be happy to work at
> it if it gets approved in GSoC.
>
>
> On 21 February 2012 14:09, J. Delgado <jo...@gmail.com> wrote:
>
>> Hi,
>>
>> I would like to propose implementing "Indexing Boolean Expressions" (See
>> http://www.vldb.org/pvldb/2/vldb09-83.pdf) as a Lucene-based project for
>> GSoC.
>>
>> Here is a snippet from the Abstract of the paper:
>> "We consider the problem of efficiently indexing Disjunctive Normal Form
>> (DNF) and Conjunctive Normal Form (CNF) Boolean expressions over a
>> high-dimensional multi-valued attribute space. The goal is to rapidly find
>> the set of Boolean expressions that evaluate to true for a given assignment
>> of values to attributes. A solution to this problem has applications in
>> online advertising (where a Boolean expression represents an advertiser’s
>> user targeting requirements, and an assignment of values to attributes
>> represents the characteristics of a user visiting an online page) and in
>> general any publish/subscribe system (where a Boolean expression
>> represents a subscription, and an assignment of values to attributes
>> represents an event)."
>>
>> Any interest?
>>
>> -- J
>>
>
>

Re: Indexing Boolean Expressions

Posted by Aayush Kothari <aa...@gmail.com>.
That's a really nice application of DNF and CNF. I'd be happy to work at it
if it gets approved in GSoC.

On 21 February 2012 14:09, J. Delgado <jo...@gmail.com> wrote:

> Hi,
>
> I would like to propose implementing "Indexing Boolean Expressions" (See
> http://www.vldb.org/pvldb/2/vldb09-83.pdf) as a Lucene-based project for
> GSoC.
>
> Here is a snippet from the Abstract of the paper:
> "We consider the problem of efficiently indexing Disjunctive Normal Form
> (DNF) and Conjunctive Normal Form (CNF) Boolean expressions over a
> high-dimensional multi-valued attribute space. The goal is to rapidly find
> the set of Boolean expressions that evaluate to true for a given assignment
> of values to attributes. A solution to this problem has applications in
> online advertising (where a Boolean expression represents an advertiser’s
> user targeting requirements, and an assignment of values to attributes
> represents the characteristics of a user visiting an online page) and in
> general any publish/subscribe system (where a Boolean expression
> represents a subscription, and an assignment of values to attributes
> represents an event)."
>
> Any interest?
>
> -- J
>