You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "J. Delgado" <jo...@gmail.com> on 2009/11/16 08:05:07 UTC

Efficient Query Evaluation using a Two-Level Retrieval Process

Please find attached the paper on "Efficient Query Evaluation using a
Two-Level Retrieval Process". I believe that such approach may improve the
way Lucene/Solr evaluates queries today.

Cheers,

-- Joaquin

Re: Efficient Query Evaluation using a Two-Level Retrieval Process

Posted by Paul Elschot <pa...@xs4all.nl>.
Op maandag 16 november 2009 19:09:52 schreef J. Delgado:
> On Mon, Nov 16, 2009 at 9:44 AM, Earwin Burrfoot <ea...@gmail.com> wrote:
> > This algo is strictly tied to sort-by-score, if I understand it correctly.
> > Lucene has queries and sorting decoupled (except for allowOutOfOrder
> > mess), so implementing it would require some really fat hacks.
> >
> 
> According to the paper on Indexing Boolean Expression (using the WAND
> algo), sorting can be done based on scores that are determined based
> weight assignment to key-value pairs:
> 
> http://ilpubs.stanford.edu:8090/927/2/wand_vldb.pdf
> 
> So I believe this can be generalized to sorting by any doc attributes
> given the proper weight assignment model
> 
> Of course, the devil-is-in-the-details :-(

Certainly. There is also the somewhat related issue on avoiding the use
of positions when not all required terms are present. For the moment
there are only ideas there:
https://issues.apache.org/jira/browse/LUCENE-1252

Regards,
Paul Elschot



> 
> -- Joaquin
> 
> 
> > On Mon, Nov 16, 2009 at 20:26, J. Delgado <jo...@gmail.com> wrote:
> >> As I understood it setMinimumNumberShouldMatch(int min) Is used to
> >> specify a minimum number of the optional BooleanClauses which must be
> >> satisfied.
> >>
> >> I haven't seen the implementation of setMinimumNumberShouldMatch but
> >> it seems a bit different than what is intended with the WAND operator,
> >> which can take any real number as threshold θ
> >>
> >> As stated in the paper:
> >>
> >> WAND(X1,w1, . . . Xk,wk, θ) is true iff X 1≤i≤k and SUM(xiwi)≥ θ
> >>
> >> where xi is the indicator variable for Xi, that is xi =  1, if Xi is
> >> true 0, otherwise.
> >>
> >> Observe that WAND can be used to implement AND
> >> and OR via
> >> AND(X1,X2, . . .Xk) ≡ WAND(X1, 1,X2, 1, . . . Xk, 1, k),
> >> and
> >> OR(X1,X2, . ..Xk) ≡ WAND(X1, 1,X2, 1, . ..Xk, 1, 1).
> >>
> >> What I find interesting is the idea of using a first pass using the
> >> upper bound (maximal) contribution of a term on any document score and
> >> the dynamic setting of the threshold θ to skip or to fully evaluate a
> >> document..
> >>
> >> As stated in the paper:
> >>
> >> "Given this setup our preliminary scoring consists of evaluating
> >> for each document d
> >> WAND(X1,UB1,X2,UB2, . . .,Xk,UBk, θ),
> >> where Xi is an indicator variable for the presence of query term i in
> >> document d and the threshold θ is varied during
> >> the algorithm as explained below. If WAND evaluates to true, then the
> >> document d undergoes a full evaluation.
> >> The threshold θ is set dynamically by the algorithm based on the
> >> minimum score m among the top n results found so
> >> far, where n is the number of requested documents. The larger the
> >> threshold, the more documents will be skipped
> >> and thus we will need to compute full scores for fewer documents."
> >>
> >> I think its worth a try...
> >>
> >> -- Joaquin
> >>
> >> On Mon, Nov 16, 2009 at 2:54 AM, Andrzej Bialecki <ab...@getopt.org> wrote:
> >>>
> >>> J. Delgado wrote:
> >>>>
> >>>> Here is the link to the paper.
> >>>> http://cis.poly.edu/westlab/papers/cntdstrb/p426-broder.pdf
> >>>>
> >>>> A more recent application of the use and extension of the WAND operator for
> >>>> indexing of Boolean expressions:
> >>>> http://ilpubs.stanford.edu:8090/927/2/wand_vldb.pdf
> >>>>
> >>>> -- Joaquin
> >>>>
> >>>>
> >>>> On Sun, Nov 15, 2009 at 11:12 PM, Shalin Shekhar Mangar <
> >>>> shalinmangar@gmail.com> wrote:
> >>>>
> >>>>> Hey Joaquin,
> >>>>>
> >>>>> The mailing list strips off attachments. Can you please upload it somewhere
> >>>>> and give us the link?
> >>>>>
> >>>>> On Mon, Nov 16, 2009 at 12:35 PM, J. Delgado <joaquin.delgado@gmail.com
> >>>>>>
> >>>>>> wrote:
> >>>>>> Please find attached the paper on "Efficient Query Evaluation using a
> >>>>>> Two-Level Retrieval Process". I believe that such approach may improve
> >>>>>
> >>>>> the
> >>>>>>
> >>>>>> way Lucene/Solr evaluates queries today.
> >>>
> >>> The functionality of WAND (weak AND) is already implemented in Lucene, if I understand it correctly - this is the BooleanQuery.setMinShouldMatch(int). Lucene implements this probably differently from the algorithm described in the paper, so there may be still some benefits from comparing the algorithms in Lucene's BooleanScorer[2] with this one ...
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>> Andrzej Bialecki     <><
> >>>  ___. ___ ___ ___ _ _   __________________________________
> >>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> >>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> >>> http://www.sigram.com  Contact: info at sigram dot com
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>
> >>
> >
> >
> >
> > --
> > Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
> > Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
> > ICQ: 104465785
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Efficient Query Evaluation using a Two-Level Retrieval Process

Posted by "J. Delgado" <jo...@gmail.com>.
On Mon, Nov 16, 2009 at 9:44 AM, Earwin Burrfoot <ea...@gmail.com> wrote:
> This algo is strictly tied to sort-by-score, if I understand it correctly.
> Lucene has queries and sorting decoupled (except for allowOutOfOrder
> mess), so implementing it would require some really fat hacks.
>

According to the paper on Indexing Boolean Expression (using the WAND
algo), sorting can be done based on scores that are determined based
weight assignment to key-value pairs:

http://ilpubs.stanford.edu:8090/927/2/wand_vldb.pdf

So I believe this can be generalized to sorting by any doc attributes
given the proper weight assignment model

Of course, the devil-is-in-the-details :-(

-- Joaquin


> On Mon, Nov 16, 2009 at 20:26, J. Delgado <jo...@gmail.com> wrote:
>> As I understood it setMinimumNumberShouldMatch(int min) Is used to
>> specify a minimum number of the optional BooleanClauses which must be
>> satisfied.
>>
>> I haven't seen the implementation of setMinimumNumberShouldMatch but
>> it seems a bit different than what is intended with the WAND operator,
>> which can take any real number as threshold θ
>>
>> As stated in the paper:
>>
>> WAND(X1,w1, . . . Xk,wk, θ) is true iff X 1≤i≤k and SUM(xiwi)≥ θ
>>
>> where xi is the indicator variable for Xi, that is xi =  1, if Xi is
>> true 0, otherwise.
>>
>> Observe that WAND can be used to implement AND
>> and OR via
>> AND(X1,X2, . . .Xk) ≡ WAND(X1, 1,X2, 1, . . . Xk, 1, k),
>> and
>> OR(X1,X2, . ..Xk) ≡ WAND(X1, 1,X2, 1, . ..Xk, 1, 1).
>>
>> What I find interesting is the idea of using a first pass using the
>> upper bound (maximal) contribution of a term on any document score and
>> the dynamic setting of the threshold θ to skip or to fully evaluate a
>> document..
>>
>> As stated in the paper:
>>
>> "Given this setup our preliminary scoring consists of evaluating
>> for each document d
>> WAND(X1,UB1,X2,UB2, . . .,Xk,UBk, θ),
>> where Xi is an indicator variable for the presence of query term i in
>> document d and the threshold θ is varied during
>> the algorithm as explained below. If WAND evaluates to true, then the
>> document d undergoes a full evaluation.
>> The threshold θ is set dynamically by the algorithm based on the
>> minimum score m among the top n results found so
>> far, where n is the number of requested documents. The larger the
>> threshold, the more documents will be skipped
>> and thus we will need to compute full scores for fewer documents."
>>
>> I think its worth a try...
>>
>> -- Joaquin
>>
>> On Mon, Nov 16, 2009 at 2:54 AM, Andrzej Bialecki <ab...@getopt.org> wrote:
>>>
>>> J. Delgado wrote:
>>>>
>>>> Here is the link to the paper.
>>>> http://cis.poly.edu/westlab/papers/cntdstrb/p426-broder.pdf
>>>>
>>>> A more recent application of the use and extension of the WAND operator for
>>>> indexing of Boolean expressions:
>>>> http://ilpubs.stanford.edu:8090/927/2/wand_vldb.pdf
>>>>
>>>> -- Joaquin
>>>>
>>>>
>>>> On Sun, Nov 15, 2009 at 11:12 PM, Shalin Shekhar Mangar <
>>>> shalinmangar@gmail.com> wrote:
>>>>
>>>>> Hey Joaquin,
>>>>>
>>>>> The mailing list strips off attachments. Can you please upload it somewhere
>>>>> and give us the link?
>>>>>
>>>>> On Mon, Nov 16, 2009 at 12:35 PM, J. Delgado <joaquin.delgado@gmail.com
>>>>>>
>>>>>> wrote:
>>>>>> Please find attached the paper on "Efficient Query Evaluation using a
>>>>>> Two-Level Retrieval Process". I believe that such approach may improve
>>>>>
>>>>> the
>>>>>>
>>>>>> way Lucene/Solr evaluates queries today.
>>>
>>> The functionality of WAND (weak AND) is already implemented in Lucene, if I understand it correctly - this is the BooleanQuery.setMinShouldMatch(int). Lucene implements this probably differently from the algorithm described in the paper, so there may be still some benefits from comparing the algorithms in Lucene's BooleanScorer[2] with this one ...
>>>
>>>
>>> --
>>> Best regards,
>>> Andrzej Bialecki     <><
>>>  ___. ___ ___ ___ _ _   __________________________________
>>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>>> http://www.sigram.com  Contact: info at sigram dot com
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
>
>
> --
> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
> ICQ: 104465785
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Efficient Query Evaluation using a Two-Level Retrieval Process

Posted by Earwin Burrfoot <ea...@gmail.com>.
This algo is strictly tied to sort-by-score, if I understand it correctly.
Lucene has queries and sorting decoupled (except for allowOutOfOrder
mess), so implementing it would require some really fat hacks.

On Mon, Nov 16, 2009 at 20:26, J. Delgado <jo...@gmail.com> wrote:
> As I understood it setMinimumNumberShouldMatch(int min) Is used to
> specify a minimum number of the optional BooleanClauses which must be
> satisfied.
>
> I haven't seen the implementation of setMinimumNumberShouldMatch but
> it seems a bit different than what is intended with the WAND operator,
> which can take any real number as threshold θ
>
> As stated in the paper:
>
> WAND(X1,w1, . . . Xk,wk, θ) is true iff X 1≤i≤k and SUM(xiwi)≥ θ
>
> where xi is the indicator variable for Xi, that is xi =  1, if Xi is
> true 0, otherwise.
>
> Observe that WAND can be used to implement AND
> and OR via
> AND(X1,X2, . . .Xk) ≡ WAND(X1, 1,X2, 1, . . . Xk, 1, k),
> and
> OR(X1,X2, . ..Xk) ≡ WAND(X1, 1,X2, 1, . ..Xk, 1, 1).
>
> What I find interesting is the idea of using a first pass using the
> upper bound (maximal) contribution of a term on any document score and
> the dynamic setting of the threshold θ to skip or to fully evaluate a
> document..
>
> As stated in the paper:
>
> "Given this setup our preliminary scoring consists of evaluating
> for each document d
> WAND(X1,UB1,X2,UB2, . . .,Xk,UBk, θ),
> where Xi is an indicator variable for the presence of query term i in
> document d and the threshold θ is varied during
> the algorithm as explained below. If WAND evaluates to true, then the
> document d undergoes a full evaluation.
> The threshold θ is set dynamically by the algorithm based on the
> minimum score m among the top n results found so
> far, where n is the number of requested documents. The larger the
> threshold, the more documents will be skipped
> and thus we will need to compute full scores for fewer documents."
>
> I think its worth a try...
>
> -- Joaquin
>
> On Mon, Nov 16, 2009 at 2:54 AM, Andrzej Bialecki <ab...@getopt.org> wrote:
>>
>> J. Delgado wrote:
>>>
>>> Here is the link to the paper.
>>> http://cis.poly.edu/westlab/papers/cntdstrb/p426-broder.pdf
>>>
>>> A more recent application of the use and extension of the WAND operator for
>>> indexing of Boolean expressions:
>>> http://ilpubs.stanford.edu:8090/927/2/wand_vldb.pdf
>>>
>>> -- Joaquin
>>>
>>>
>>> On Sun, Nov 15, 2009 at 11:12 PM, Shalin Shekhar Mangar <
>>> shalinmangar@gmail.com> wrote:
>>>
>>>> Hey Joaquin,
>>>>
>>>> The mailing list strips off attachments. Can you please upload it somewhere
>>>> and give us the link?
>>>>
>>>> On Mon, Nov 16, 2009 at 12:35 PM, J. Delgado <joaquin.delgado@gmail.com
>>>>>
>>>>> wrote:
>>>>> Please find attached the paper on "Efficient Query Evaluation using a
>>>>> Two-Level Retrieval Process". I believe that such approach may improve
>>>>
>>>> the
>>>>>
>>>>> way Lucene/Solr evaluates queries today.
>>
>> The functionality of WAND (weak AND) is already implemented in Lucene, if I understand it correctly - this is the BooleanQuery.setMinShouldMatch(int). Lucene implements this probably differently from the algorithm described in the paper, so there may be still some benefits from comparing the algorithms in Lucene's BooleanScorer[2] with this one ...
>>
>>
>> --
>> Best regards,
>> Andrzej Bialecki     <><
>>  ___. ___ ___ ___ _ _   __________________________________
>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>> http://www.sigram.com  Contact: info at sigram dot com
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Efficient Query Evaluation using a Two-Level Retrieval Process

Posted by "J. Delgado" <jo...@gmail.com>.
As I understood it setMinimumNumberShouldMatch(int min) Is used to
specify a minimum number of the optional BooleanClauses which must be
satisfied.

I haven't seen the implementation of setMinimumNumberShouldMatch but
it seems a bit different than what is intended with the WAND operator,
which can take any real number as threshold θ

As stated in the paper:

WAND(X1,w1, . . . Xk,wk, θ) is true iff X 1≤i≤k and SUM(xiwi)≥ θ

where xi is the indicator variable for Xi, that is xi =  1, if Xi is
true 0, otherwise.

Observe that WAND can be used to implement AND
and OR via
AND(X1,X2, . . .Xk) ≡ WAND(X1, 1,X2, 1, . . . Xk, 1, k),
and
OR(X1,X2, . ..Xk) ≡ WAND(X1, 1,X2, 1, . ..Xk, 1, 1).

What I find interesting is the idea of using a first pass using the
upper bound (maximal) contribution of a term on any document score and
the dynamic setting of the threshold θ to skip or to fully evaluate a
document..

As stated in the paper:

"Given this setup our preliminary scoring consists of evaluating
for each document d
WAND(X1,UB1,X2,UB2, . . .,Xk,UBk, θ),
where Xi is an indicator variable for the presence of query term i in
document d and the threshold θ is varied during
the algorithm as explained below. If WAND evaluates to true, then the
document d undergoes a full evaluation.
The threshold θ is set dynamically by the algorithm based on the
minimum score m among the top n results found so
far, where n is the number of requested documents. The larger the
threshold, the more documents will be skipped
and thus we will need to compute full scores for fewer documents."

I think its worth a try...

-- Joaquin

On Mon, Nov 16, 2009 at 2:54 AM, Andrzej Bialecki <ab...@getopt.org> wrote:
>
> J. Delgado wrote:
>>
>> Here is the link to the paper.
>> http://cis.poly.edu/westlab/papers/cntdstrb/p426-broder.pdf
>>
>> A more recent application of the use and extension of the WAND operator for
>> indexing of Boolean expressions:
>> http://ilpubs.stanford.edu:8090/927/2/wand_vldb.pdf
>>
>> -- Joaquin
>>
>>
>> On Sun, Nov 15, 2009 at 11:12 PM, Shalin Shekhar Mangar <
>> shalinmangar@gmail.com> wrote:
>>
>>> Hey Joaquin,
>>>
>>> The mailing list strips off attachments. Can you please upload it somewhere
>>> and give us the link?
>>>
>>> On Mon, Nov 16, 2009 at 12:35 PM, J. Delgado <joaquin.delgado@gmail.com
>>>>
>>>> wrote:
>>>> Please find attached the paper on "Efficient Query Evaluation using a
>>>> Two-Level Retrieval Process". I believe that such approach may improve
>>>
>>> the
>>>>
>>>> way Lucene/Solr evaluates queries today.
>
> The functionality of WAND (weak AND) is already implemented in Lucene, if I understand it correctly - this is the BooleanQuery.setMinShouldMatch(int). Lucene implements this probably differently from the algorithm described in the paper, so there may be still some benefits from comparing the algorithms in Lucene's BooleanScorer[2] with this one ...
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>

Re: Efficient Query Evaluation using a Two-Level Retrieval Process

Posted by "J. Delgado" <jo...@gmail.com>.
As I understood it setMinimumNumberShouldMatch(int min) Is used to
specify a minimum number of the optional BooleanClauses which must be
satisfied.

I haven't seen the implementation of setMinimumNumberShouldMatch but
it seems a bit different than what is intended with the WAND operator,
which can take any real number as threshold θ

As stated in the paper:

WAND(X1,w1, . . . Xk,wk, θ) is true iff X 1≤i≤k and SUM(xiwi)≥ θ

where xi is the indicator variable for Xi, that is xi =  1, if Xi is
true 0, otherwise.

Observe that WAND can be used to implement AND
and OR via
AND(X1,X2, . . .Xk) ≡ WAND(X1, 1,X2, 1, . . . Xk, 1, k),
and
OR(X1,X2, . ..Xk) ≡ WAND(X1, 1,X2, 1, . ..Xk, 1, 1).

What I find interesting is the idea of using a first pass using the
upper bound (maximal) contribution of a term on any document score and
the dynamic setting of the threshold θ to skip or to fully evaluate a
document..

As stated in the paper:

"Given this setup our preliminary scoring consists of evaluating
for each document d
WAND(X1,UB1,X2,UB2, . . .,Xk,UBk, θ),
where Xi is an indicator variable for the presence of query term i in
document d and the threshold θ is varied during
the algorithm as explained below. If WAND evaluates to true, then the
document d undergoes a full evaluation.
The threshold θ is set dynamically by the algorithm based on the
minimum score m among the top n results found so
far, where n is the number of requested documents. The larger the
threshold, the more documents will be skipped
and thus we will need to compute full scores for fewer documents."

I think its worth a try...

-- Joaquin

On Mon, Nov 16, 2009 at 2:54 AM, Andrzej Bialecki <ab...@getopt.org> wrote:
>
> J. Delgado wrote:
>>
>> Here is the link to the paper.
>> http://cis.poly.edu/westlab/papers/cntdstrb/p426-broder.pdf
>>
>> A more recent application of the use and extension of the WAND operator for
>> indexing of Boolean expressions:
>> http://ilpubs.stanford.edu:8090/927/2/wand_vldb.pdf
>>
>> -- Joaquin
>>
>>
>> On Sun, Nov 15, 2009 at 11:12 PM, Shalin Shekhar Mangar <
>> shalinmangar@gmail.com> wrote:
>>
>>> Hey Joaquin,
>>>
>>> The mailing list strips off attachments. Can you please upload it somewhere
>>> and give us the link?
>>>
>>> On Mon, Nov 16, 2009 at 12:35 PM, J. Delgado <joaquin.delgado@gmail.com
>>>>
>>>> wrote:
>>>> Please find attached the paper on "Efficient Query Evaluation using a
>>>> Two-Level Retrieval Process". I believe that such approach may improve
>>>
>>> the
>>>>
>>>> way Lucene/Solr evaluates queries today.
>
> The functionality of WAND (weak AND) is already implemented in Lucene, if I understand it correctly - this is the BooleanQuery.setMinShouldMatch(int). Lucene implements this probably differently from the algorithm described in the paper, so there may be still some benefits from comparing the algorithms in Lucene's BooleanScorer[2] with this one ...
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Efficient Query Evaluation using a Two-Level Retrieval Process

Posted by Andrzej Bialecki <ab...@getopt.org>.
J. Delgado wrote:
> Here is the link to the paper.
> http://cis.poly.edu/westlab/papers/cntdstrb/p426-broder.pdf
> 
> A more recent application of the use and extension of the WAND operator for
> indexing of Boolean expressions:
> http://ilpubs.stanford.edu:8090/927/2/wand_vldb.pdf
> 
> -- Joaquin
> 
> 
> On Sun, Nov 15, 2009 at 11:12 PM, Shalin Shekhar Mangar <
> shalinmangar@gmail.com> wrote:
> 
>> Hey Joaquin,
>>
>> The mailing list strips off attachments. Can you please upload it somewhere
>> and give us the link?
>>
>> On Mon, Nov 16, 2009 at 12:35 PM, J. Delgado <joaquin.delgado@gmail.com
>>> wrote:
>>> Please find attached the paper on "Efficient Query Evaluation using a
>>> Two-Level Retrieval Process". I believe that such approach may improve
>> the
>>> way Lucene/Solr evaluates queries today.

The functionality of WAND (weak AND) is already implemented in Lucene, 
if I understand it correctly - this is the 
BooleanQuery.setMinShouldMatch(int). Lucene implements this probably 
differently from the algorithm described in the paper, so there may be 
still some benefits from comparing the algorithms in Lucene's 
BooleanScorer[2] with this one ...


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Efficient Query Evaluation using a Two-Level Retrieval Process

Posted by "J. Delgado" <jo...@gmail.com>.
Here is the link to the paper.
http://cis.poly.edu/westlab/papers/cntdstrb/p426-broder.pdf

A more recent application of the use and extension of the WAND operator for
indexing of Boolean expressions:
http://ilpubs.stanford.edu:8090/927/2/wand_vldb.pdf

-- Joaquin


On Sun, Nov 15, 2009 at 11:12 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> Hey Joaquin,
>
> The mailing list strips off attachments. Can you please upload it somewhere
> and give us the link?
>
> On Mon, Nov 16, 2009 at 12:35 PM, J. Delgado <joaquin.delgado@gmail.com
> >wrote:
>
> > Please find attached the paper on "Efficient Query Evaluation using a
> > Two-Level Retrieval Process". I believe that such approach may improve
> the
> > way Lucene/Solr evaluates queries today.
> >
> > Cheers,
> >
> > -- Joaquin
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Efficient Query Evaluation using a Two-Level Retrieval Process

Posted by "J. Delgado" <jo...@gmail.com>.
Here is the link to the paper.
http://cis.poly.edu/westlab/papers/cntdstrb/p426-broder.pdf

A more recent application of the use and extension of the WAND operator for
indexing of Boolean expressions:
http://ilpubs.stanford.edu:8090/927/2/wand_vldb.pdf

-- Joaquin

On Sun, Nov 15, 2009 at 11:15 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

>  I see the attachment... (in java-dev)
>
>
>
> Uwe
>
>
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>   ------------------------------
>
> *From:* Shalin Shekhar Mangar [mailto:shalinmangar@gmail.com]
> *Sent:* Monday, November 16, 2009 8:13 AM
> *To:* solr-dev@lucene.apache.org
> *Cc:* java-dev@lucene.apache.org
> *Subject:* Re: Efficient Query Evaluation using a Two-Level Retrieval
> Process
>
>
>
> Hey Joaquin,
>
>
>
> The mailing list strips off attachments. Can you please upload it somewhere
> and give us the link?
>
> On Mon, Nov 16, 2009 at 12:35 PM, J. Delgado <jo...@gmail.com>
> wrote:
>
> Please find attached the paper on "Efficient Query Evaluation using a
> Two-Level Retrieval Process". I believe that such approach may improve the
> way Lucene/Solr evaluates queries today.
>
> Cheers,
>
> -- Joaquin
>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

RE: Efficient Query Evaluation using a Two-Level Retrieval Process

Posted by Uwe Schindler <uw...@thetaphi.de>.
I see the attachment... (in java-dev)

 

Uwe

 

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

  _____  

From: Shalin Shekhar Mangar [mailto:shalinmangar@gmail.com] 
Sent: Monday, November 16, 2009 8:13 AM
To: solr-dev@lucene.apache.org
Cc: java-dev@lucene.apache.org
Subject: Re: Efficient Query Evaluation using a Two-Level Retrieval Process

 

Hey Joaquin,

 

The mailing list strips off attachments. Can you please upload it somewhere
and give us the link?

On Mon, Nov 16, 2009 at 12:35 PM, J. Delgado <jo...@gmail.com>
wrote:

Please find attached the paper on "Efficient Query Evaluation using a
Two-Level Retrieval Process". I believe that such approach may improve the
way Lucene/Solr evaluates queries today.

Cheers,

-- Joaquin



  




-- 
Regards,
Shalin Shekhar Mangar.


Re: Efficient Query Evaluation using a Two-Level Retrieval Process

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Hey Joaquin,

The mailing list strips off attachments. Can you please upload it somewhere
and give us the link?

On Mon, Nov 16, 2009 at 12:35 PM, J. Delgado <jo...@gmail.com>wrote:

> Please find attached the paper on "Efficient Query Evaluation using a
> Two-Level Retrieval Process". I believe that such approach may improve the
> way Lucene/Solr evaluates queries today.
>
> Cheers,
>
> -- Joaquin
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Efficient Query Evaluation using a Two-Level Retrieval Process

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Hey Joaquin,

The mailing list strips off attachments. Can you please upload it somewhere
and give us the link?

On Mon, Nov 16, 2009 at 12:35 PM, J. Delgado <jo...@gmail.com>wrote:

> Please find attached the paper on "Efficient Query Evaluation using a
> Two-Level Retrieval Process". I believe that such approach may improve the
> way Lucene/Solr evaluates queries today.
>
> Cheers,
>
> -- Joaquin
>



-- 
Regards,
Shalin Shekhar Mangar.