You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Peng Cheng <pe...@sciencescape.net> on 2014/01/14 03:59:12 UTC

(Lucene-core) Is Query's rewrite method mandated to preserver original Query's hashcode?

Hi developers,

I've recently found a few bugs in advanced features of Lucene-core 4.6
(which is perfectly normal as those features are less likely to be used and
tested), the most serious one has rendered my ToParentBlockJoinCollector
close to useless:

In the scorer generation stage, the ToParentBlockJoinCollector will
automatically rewrite all the associated ToParentBlockJoinQuery (and their
subqueries), and save them into its in-memory Look-up table, namely
joinQueryID (see enroll() method for detail). Unfortunately, in the
getTopGroups method, the new ToParentBlockJoinQuery parameter is not
rewritten (at least users are not expected to do so). When the new one is
searched in the old lookup table (considering the impact of rewrite() on
hashCode()), the result (namely _slot) will always fail and eventually end
up with a topGroup collection consisting of only empty groups (their
hitCounts are guaranteed to be zero).

I'm not positive about whether rewrite() should preserver Query's hashcode,
as I've found many counterexamples already. If this is not true, then this
problem can be solved by rewriting the origianl BlockJoinQuery before
invoking getTopGroups method. Nevertheless users are not expected to do so,
therefore I would suggest submitting a hotfix that add the described
rewrite step.

If rewrite() must preserver the hashcode, then this is a problem of the
various rewrite() implementations and fix should be much harder.

This bug has caused widespread panic in my company and I would like to see
it fixed ASAP. Please give me some suggestion so I know which hotfix I
should be working on.

All the best,

Yours Peng

RE: (Lucene-core) Is Query's rewrite method mandated to preserver original Query's hashcode?

Posted by Uwe Schindler <uw...@thetaphi.de>.
Thanks!

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: uwe@thetaphi.de

 

From: Peng Cheng [mailto:peng@sciencescape.net] 
Sent: Wednesday, January 22, 2014 5:23 PM
To: dev@lucene.apache.org
Subject: Re: (Lucene-core) Is Query's rewrite method mandated to preserver original Query's hashcode?

 

opened as https://issues.apache.org/jira/browse/LUCENE-5409

 

On Tue, Jan 14, 2014 at 5:42 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

Yes, open an issue!

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: uwe@thetaphi.de

 

From: Peng Cheng [mailto:peng@sciencescape.net] 
Sent: Tuesday, January 14, 2014 10:41 PM
To: dev@lucene.apache.org
Subject: Re: (Lucene-core) Is Query's rewrite method mandated to preserver original Query's hashcode?

 

Do you suggest me to open a jira ticket about it? I think its a bug considering common interface standard (rewrite should not be exposed to the end user), documentation and running efficiency (as you said, rewrite is slow).

 

On Tue, Jan 14, 2014 at 4:38 AM, Peng Cheng <pe...@sciencescape.net> wrote:

I see, perhaps the best solution is to put the un-rewritten blockJoinQuries into the joinQueryID? The result will be the same. Right now the code have very strange behavior if no rewrite is called beforehand, it gives empty groups or correct results at random.

 

Its a great pleasure to read your reply, never expect someone to respond that fast.

 

Yours Peng

 

 

On Tue, Jan 14, 2014 at 2:33 AM, Uwe Schindler <uw...@thetaphi.de> wrote:

Hi Peng,

 

rewrite() returns a different query that will definitely not preserve the hashCode() or be equals() to the original one or any other rewritten one. The reason for this is: A rewritten query is a new query that contains information about the index it will be executed on (e.g., it references terms from that index), so it *cannot* be equal to the original one. If it cannot be equal, also the hashCode should be different. If you execute the query on a later stage you have to rewrite the original query again, because the index may have changed. And take care: This rewrite may produce a completely different query (with a new hashCode again) if the index changed in the meantime.

 

As there is a workaround (to me it looks, that the code is missing documentation), so you can manually rewrite the query before invoking getTopGroups() using Searcher#rewrite(query). Why is a hotfix needed?

 

Also rewriting the query on every call of getTopGroups is a major overhead (most query’s rewrites are very expensice and take as long as the execution of the query, e.g. MultiTermQueries), so it should only be done once, not on every call. Maybe that’s the reason why it was left out, but it was not documented.

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de <http://www.thetaphi.de/> 

eMail: uwe@thetaphi.de

 

From: Peng Cheng [mailto:peng@sciencescape.net] 
Sent: Tuesday, January 14, 2014 3:59 AM
To: dev@lucene.apache.org; ds@sciencescape.org


Subject: (Lucene-core) Is Query's rewrite method mandated to preserver original Query's hashcode?

 

Hi developers,

 

I've recently found a few bugs in advanced features of Lucene-core 4.6 (which is perfectly normal as those features are less likely to be used and tested), the most serious one has rendered my ToParentBlockJoinCollector close to useless:

 

In the scorer generation stage, the ToParentBlockJoinCollector will automatically rewrite all the associated ToParentBlockJoinQuery (and their subqueries), and save them into its in-memory Look-up table, namely joinQueryID (see enroll() method for detail). Unfortunately, in the getTopGroups method, the new ToParentBlockJoinQuery parameter is not rewritten (at least users are not expected to do so). When the new one is searched in the old lookup table (considering the impact of rewrite() on hashCode()), the result (namely _slot) will always fail and eventually end up with a topGroup collection consisting of only empty groups (their hitCounts are guaranteed to be zero).

 

I'm not positive about whether rewrite() should preserver Query's hashcode, as I've found many counterexamples already. If this is not true, then this problem can be solved by rewriting the origianl BlockJoinQuery before invoking getTopGroups method. Nevertheless users are not expected to do so, therefore I would suggest submitting a hotfix that add the described rewrite step.

 

If rewrite() must preserver the hashcode, then this is a problem of the various rewrite() implementations and fix should be much harder.

 

This bug has caused widespread panic in my company and I would like to see it fixed ASAP. Please give me some suggestion so I know which hotfix I should be working on.

 

All the best,

 

Yours Peng

 

 

 


Re: (Lucene-core) Is Query's rewrite method mandated to preserver original Query's hashcode?

Posted by Peng Cheng <pe...@sciencescape.net>.
opened as https://issues.apache.org/jira/browse/LUCENE-5409


On Tue, Jan 14, 2014 at 5:42 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

> Yes, open an issue!
>
>
>
> Uwe
>
>
>
> -----
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de
>
> eMail: uwe@thetaphi.de
>
>
>
> *From:* Peng Cheng [mailto:peng@sciencescape.net]
> *Sent:* Tuesday, January 14, 2014 10:41 PM
> *To:* dev@lucene.apache.org
> *Subject:* Re: (Lucene-core) Is Query's rewrite method mandated to
> preserver original Query's hashcode?
>
>
>
> Do you suggest me to open a jira ticket about it? I think its a bug
> considering common interface standard (rewrite should not be exposed to the
> end user), documentation and running efficiency (as you said, rewrite is
> slow).
>
>
>
> On Tue, Jan 14, 2014 at 4:38 AM, Peng Cheng <pe...@sciencescape.net> wrote:
>
> I see, perhaps the best solution is to put the un-rewritten
> blockJoinQuries into the joinQueryID? The result will be the same. Right
> now the code have very strange behavior if no rewrite is called beforehand,
> it gives empty groups or correct results at random.
>
>
>
> Its a great pleasure to read your reply, never expect someone to respond
> that fast.
>
>
>
> Yours Peng
>
>
>
>
>
> On Tue, Jan 14, 2014 at 2:33 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
> Hi Peng,
>
>
>
> rewrite() returns a different query that will definitely not preserve the
> hashCode() or be equals() to the original one or any other rewritten one.
> The reason for this is: A rewritten query is a new query that contains
> information about the index it will be executed on (e.g., it references
> terms from that index), so it **cannot** be equal to the original one. If
> it cannot be equal, also the hashCode should be different. If you execute
> the query on a later stage you have to rewrite the original query again,
> because the index may have changed. And take care: This rewrite may produce
> a completely different query (with a new hashCode again) if the index
> changed in the meantime.
>
>
>
> As there is a workaround (to me it looks, that the code is missing
> documentation), so you can manually rewrite the query before invoking
> getTopGroups() using Searcher#rewrite(query). Why is a hotfix needed?
>
>
>
> Also rewriting the query on every call of getTopGroups is a major overhead
> (most query’s rewrites are very expensice and take as long as the execution
> of the query, e.g. MultiTermQueries), so it should only be done once, not
> on every call. Maybe that’s the reason why it was left out, but it was not
> documented.
>
>
>
> Uwe
>
>
>
> -----
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de
>
> eMail: uwe@thetaphi.de
>
>
>
> *From:* Peng Cheng [mailto:peng@sciencescape.net]
> *Sent:* Tuesday, January 14, 2014 3:59 AM
> *To:* dev@lucene.apache.org; ds@sciencescape.org
>
>
> *Subject:* (Lucene-core) Is Query's rewrite method mandated to preserver
> original Query's hashcode?
>
>
>
> Hi developers,
>
>
>
> I've recently found a few bugs in advanced features of Lucene-core 4.6
> (which is perfectly normal as those features are less likely to be used and
> tested), the most serious one has rendered my ToParentBlockJoinCollector
> close to useless:
>
>
>
> In the scorer generation stage, the ToParentBlockJoinCollector will
> automatically rewrite all the associated ToParentBlockJoinQuery (and their
> subqueries), and save them into its in-memory Look-up table, namely
> joinQueryID (see enroll() method for detail). Unfortunately, in the
> getTopGroups method, the new ToParentBlockJoinQuery parameter is not
> rewritten (at least users are not expected to do so). When the new one is
> searched in the old lookup table (considering the impact of rewrite() on
> hashCode()), the result (namely _slot) will always fail and eventually end
> up with a topGroup collection consisting of only empty groups (their
> hitCounts are guaranteed to be zero).
>
>
>
> I'm not positive about whether rewrite() should preserver Query's
> hashcode, as I've found many counterexamples already. If this is not true,
> then this problem can be solved by rewriting the origianl BlockJoinQuery
> before invoking getTopGroups method. Nevertheless users are not expected to
> do so, therefore I would suggest submitting a hotfix that add the described
> rewrite step.
>
>
>
> If rewrite() must preserver the hashcode, then this is a problem of the
> various rewrite() implementations and fix should be much harder.
>
>
>
> This bug has caused widespread panic in my company and I would like to see
> it fixed ASAP. Please give me some suggestion so I know which hotfix I
> should be working on.
>
>
>
> All the best,
>
>
>
> Yours Peng
>
>
>
>
>

RE: (Lucene-core) Is Query's rewrite method mandated to preserver original Query's hashcode?

Posted by Uwe Schindler <uw...@thetaphi.de>.
Yes, open an issue!

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: uwe@thetaphi.de

 

From: Peng Cheng [mailto:peng@sciencescape.net] 
Sent: Tuesday, January 14, 2014 10:41 PM
To: dev@lucene.apache.org
Subject: Re: (Lucene-core) Is Query's rewrite method mandated to preserver original Query's hashcode?

 

Do you suggest me to open a jira ticket about it? I think its a bug considering common interface standard (rewrite should not be exposed to the end user), documentation and running efficiency (as you said, rewrite is slow).

 

On Tue, Jan 14, 2014 at 4:38 AM, Peng Cheng <pe...@sciencescape.net> wrote:

I see, perhaps the best solution is to put the un-rewritten blockJoinQuries into the joinQueryID? The result will be the same. Right now the code have very strange behavior if no rewrite is called beforehand, it gives empty groups or correct results at random.

 

Its a great pleasure to read your reply, never expect someone to respond that fast.

 

Yours Peng

 

 

On Tue, Jan 14, 2014 at 2:33 AM, Uwe Schindler <uw...@thetaphi.de> wrote:

Hi Peng,

 

rewrite() returns a different query that will definitely not preserve the hashCode() or be equals() to the original one or any other rewritten one. The reason for this is: A rewritten query is a new query that contains information about the index it will be executed on (e.g., it references terms from that index), so it *cannot* be equal to the original one. If it cannot be equal, also the hashCode should be different. If you execute the query on a later stage you have to rewrite the original query again, because the index may have changed. And take care: This rewrite may produce a completely different query (with a new hashCode again) if the index changed in the meantime.

 

As there is a workaround (to me it looks, that the code is missing documentation), so you can manually rewrite the query before invoking getTopGroups() using Searcher#rewrite(query). Why is a hotfix needed?

 

Also rewriting the query on every call of getTopGroups is a major overhead (most query’s rewrites are very expensice and take as long as the execution of the query, e.g. MultiTermQueries), so it should only be done once, not on every call. Maybe that’s the reason why it was left out, but it was not documented.

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de <http://www.thetaphi.de/> 

eMail: uwe@thetaphi.de

 

From: Peng Cheng [mailto:peng@sciencescape.net] 
Sent: Tuesday, January 14, 2014 3:59 AM
To: dev@lucene.apache.org; ds@sciencescape.org


Subject: (Lucene-core) Is Query's rewrite method mandated to preserver original Query's hashcode?

 

Hi developers,

 

I've recently found a few bugs in advanced features of Lucene-core 4.6 (which is perfectly normal as those features are less likely to be used and tested), the most serious one has rendered my ToParentBlockJoinCollector close to useless:

 

In the scorer generation stage, the ToParentBlockJoinCollector will automatically rewrite all the associated ToParentBlockJoinQuery (and their subqueries), and save them into its in-memory Look-up table, namely joinQueryID (see enroll() method for detail). Unfortunately, in the getTopGroups method, the new ToParentBlockJoinQuery parameter is not rewritten (at least users are not expected to do so). When the new one is searched in the old lookup table (considering the impact of rewrite() on hashCode()), the result (namely _slot) will always fail and eventually end up with a topGroup collection consisting of only empty groups (their hitCounts are guaranteed to be zero).

 

I'm not positive about whether rewrite() should preserver Query's hashcode, as I've found many counterexamples already. If this is not true, then this problem can be solved by rewriting the origianl BlockJoinQuery before invoking getTopGroups method. Nevertheless users are not expected to do so, therefore I would suggest submitting a hotfix that add the described rewrite step.

 

If rewrite() must preserver the hashcode, then this is a problem of the various rewrite() implementations and fix should be much harder.

 

This bug has caused widespread panic in my company and I would like to see it fixed ASAP. Please give me some suggestion so I know which hotfix I should be working on.

 

All the best,

 

Yours Peng

 

 


Re: (Lucene-core) Is Query's rewrite method mandated to preserver original Query's hashcode?

Posted by Peng Cheng <pe...@sciencescape.net>.
Do you suggest me to open a jira ticket about it? I think its a bug
considering common interface standard (rewrite should not be exposed to the
end user), documentation and running efficiency (as you said, rewrite is
slow).


On Tue, Jan 14, 2014 at 4:38 AM, Peng Cheng <pe...@sciencescape.net> wrote:

> I see, perhaps the best solution is to put the un-rewritten
> blockJoinQuries into the joinQueryID? The result will be the same. Right
> now the code have very strange behavior if no rewrite is called beforehand,
> it gives empty groups or correct results at random.
>
> Its a great pleasure to read your reply, never expect someone to respond
> that fast.
>
> Yours Peng
>
>
>
> On Tue, Jan 14, 2014 at 2:33 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
>> Hi Peng,
>>
>>
>>
>> rewrite() returns a different query that will definitely not preserve the
>> hashCode() or be equals() to the original one or any other rewritten one.
>> The reason for this is: A rewritten query is a new query that contains
>> information about the index it will be executed on (e.g., it references
>> terms from that index), so it **cannot** be equal to the original one.
>> If it cannot be equal, also the hashCode should be different. If you
>> execute the query on a later stage you have to rewrite the original query
>> again, because the index may have changed. And take care: This rewrite may
>> produce a completely different query (with a new hashCode again) if the
>> index changed in the meantime.
>>
>>
>>
>> As there is a workaround (to me it looks, that the code is missing
>> documentation), so you can manually rewrite the query before invoking
>> getTopGroups() using Searcher#rewrite(query). Why is a hotfix needed?
>>
>>
>>
>> Also rewriting the query on every call of getTopGroups is a major
>> overhead (most query’s rewrites are very expensice and take as long as the
>> execution of the query, e.g. MultiTermQueries), so it should only be done
>> once, not on every call. Maybe that’s the reason why it was left out, but
>> it was not documented.
>>
>>
>>
>> Uwe
>>
>>
>>
>> -----
>>
>> Uwe Schindler
>>
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>
>> http://www.thetaphi.de
>>
>> eMail: uwe@thetaphi.de
>>
>>
>>
>> *From:* Peng Cheng [mailto:peng@sciencescape.net]
>> *Sent:* Tuesday, January 14, 2014 3:59 AM
>> *To:* dev@lucene.apache.org; ds@sciencescape.org
>>
>> *Subject:* (Lucene-core) Is Query's rewrite method mandated to preserver
>> original Query's hashcode?
>>
>>
>>
>> Hi developers,
>>
>>
>>
>> I've recently found a few bugs in advanced features of Lucene-core 4.6
>> (which is perfectly normal as those features are less likely to be used and
>> tested), the most serious one has rendered my ToParentBlockJoinCollector
>> close to useless:
>>
>>
>>
>> In the scorer generation stage, the ToParentBlockJoinCollector will
>> automatically rewrite all the associated ToParentBlockJoinQuery (and their
>> subqueries), and save them into its in-memory Look-up table, namely
>> joinQueryID (see enroll() method for detail). Unfortunately, in the
>> getTopGroups method, the new ToParentBlockJoinQuery parameter is not
>> rewritten (at least users are not expected to do so). When the new one is
>> searched in the old lookup table (considering the impact of rewrite() on
>> hashCode()), the result (namely _slot) will always fail and eventually end
>> up with a topGroup collection consisting of only empty groups (their
>> hitCounts are guaranteed to be zero).
>>
>>
>>
>> I'm not positive about whether rewrite() should preserver Query's
>> hashcode, as I've found many counterexamples already. If this is not true,
>> then this problem can be solved by rewriting the origianl BlockJoinQuery
>> before invoking getTopGroups method. Nevertheless users are not expected to
>> do so, therefore I would suggest submitting a hotfix that add the described
>> rewrite step.
>>
>>
>>
>> If rewrite() must preserver the hashcode, then this is a problem of the
>> various rewrite() implementations and fix should be much harder.
>>
>>
>>
>> This bug has caused widespread panic in my company and I would like to
>> see it fixed ASAP. Please give me some suggestion so I know which hotfix I
>> should be working on.
>>
>>
>>
>> All the best,
>>
>>
>>
>> Yours Peng
>>
>
>

Re: (Lucene-core) Is Query's rewrite method mandated to preserver original Query's hashcode?

Posted by Peng Cheng <pe...@sciencescape.net>.
I see, perhaps the best solution is to put the un-rewritten blockJoinQuries
into the joinQueryID? The result will be the same. Right now the code have
very strange behavior if no rewrite is called beforehand, it gives empty
groups or correct results at random.

Its a great pleasure to read your reply, never expect someone to respond
that fast.

Yours Peng



On Tue, Jan 14, 2014 at 2:33 AM, Uwe Schindler <uw...@thetaphi.de> wrote:

> Hi Peng,
>
>
>
> rewrite() returns a different query that will definitely not preserve the
> hashCode() or be equals() to the original one or any other rewritten one.
> The reason for this is: A rewritten query is a new query that contains
> information about the index it will be executed on (e.g., it references
> terms from that index), so it **cannot** be equal to the original one. If
> it cannot be equal, also the hashCode should be different. If you execute
> the query on a later stage you have to rewrite the original query again,
> because the index may have changed. And take care: This rewrite may produce
> a completely different query (with a new hashCode again) if the index
> changed in the meantime.
>
>
>
> As there is a workaround (to me it looks, that the code is missing
> documentation), so you can manually rewrite the query before invoking
> getTopGroups() using Searcher#rewrite(query). Why is a hotfix needed?
>
>
>
> Also rewriting the query on every call of getTopGroups is a major overhead
> (most query’s rewrites are very expensice and take as long as the execution
> of the query, e.g. MultiTermQueries), so it should only be done once, not
> on every call. Maybe that’s the reason why it was left out, but it was not
> documented.
>
>
>
> Uwe
>
>
>
> -----
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de
>
> eMail: uwe@thetaphi.de
>
>
>
> *From:* Peng Cheng [mailto:peng@sciencescape.net]
> *Sent:* Tuesday, January 14, 2014 3:59 AM
> *To:* dev@lucene.apache.org; ds@sciencescape.org
>
> *Subject:* (Lucene-core) Is Query's rewrite method mandated to preserver
> original Query's hashcode?
>
>
>
> Hi developers,
>
>
>
> I've recently found a few bugs in advanced features of Lucene-core 4.6
> (which is perfectly normal as those features are less likely to be used and
> tested), the most serious one has rendered my ToParentBlockJoinCollector
> close to useless:
>
>
>
> In the scorer generation stage, the ToParentBlockJoinCollector will
> automatically rewrite all the associated ToParentBlockJoinQuery (and their
> subqueries), and save them into its in-memory Look-up table, namely
> joinQueryID (see enroll() method for detail). Unfortunately, in the
> getTopGroups method, the new ToParentBlockJoinQuery parameter is not
> rewritten (at least users are not expected to do so). When the new one is
> searched in the old lookup table (considering the impact of rewrite() on
> hashCode()), the result (namely _slot) will always fail and eventually end
> up with a topGroup collection consisting of only empty groups (their
> hitCounts are guaranteed to be zero).
>
>
>
> I'm not positive about whether rewrite() should preserver Query's
> hashcode, as I've found many counterexamples already. If this is not true,
> then this problem can be solved by rewriting the origianl BlockJoinQuery
> before invoking getTopGroups method. Nevertheless users are not expected to
> do so, therefore I would suggest submitting a hotfix that add the described
> rewrite step.
>
>
>
> If rewrite() must preserver the hashcode, then this is a problem of the
> various rewrite() implementations and fix should be much harder.
>
>
>
> This bug has caused widespread panic in my company and I would like to see
> it fixed ASAP. Please give me some suggestion so I know which hotfix I
> should be working on.
>
>
>
> All the best,
>
>
>
> Yours Peng
>

RE: (Lucene-core) Is Query's rewrite method mandated to preserver original Query's hashcode?

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi Peng,

 

rewrite() returns a different query that will definitely not preserve the hashCode() or be equals() to the original one or any other rewritten one. The reason for this is: A rewritten query is a new query that contains information about the index it will be executed on (e.g., it references terms from that index), so it *cannot* be equal to the original one. If it cannot be equal, also the hashCode should be different. If you execute the query on a later stage you have to rewrite the original query again, because the index may have changed. And take care: This rewrite may produce a completely different query (with a new hashCode again) if the index changed in the meantime.

 

As there is a workaround (to me it looks, that the code is missing documentation), so you can manually rewrite the query before invoking getTopGroups() using Searcher#rewrite(query). Why is a hotfix needed?

 

Also rewriting the query on every call of getTopGroups is a major overhead (most query’s rewrites are very expensice and take as long as the execution of the query, e.g. MultiTermQueries), so it should only be done once, not on every call. Maybe that’s the reason why it was left out, but it was not documented.

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de <http://www.thetaphi.de/> 

eMail: uwe@thetaphi.de

 

From: Peng Cheng [mailto:peng@sciencescape.net] 
Sent: Tuesday, January 14, 2014 3:59 AM
To: dev@lucene.apache.org; ds@sciencescape.org
Subject: (Lucene-core) Is Query's rewrite method mandated to preserver original Query's hashcode?

 

Hi developers,

 

I've recently found a few bugs in advanced features of Lucene-core 4.6 (which is perfectly normal as those features are less likely to be used and tested), the most serious one has rendered my ToParentBlockJoinCollector close to useless:

 

In the scorer generation stage, the ToParentBlockJoinCollector will automatically rewrite all the associated ToParentBlockJoinQuery (and their subqueries), and save them into its in-memory Look-up table, namely joinQueryID (see enroll() method for detail). Unfortunately, in the getTopGroups method, the new ToParentBlockJoinQuery parameter is not rewritten (at least users are not expected to do so). When the new one is searched in the old lookup table (considering the impact of rewrite() on hashCode()), the result (namely _slot) will always fail and eventually end up with a topGroup collection consisting of only empty groups (their hitCounts are guaranteed to be zero).

 

I'm not positive about whether rewrite() should preserver Query's hashcode, as I've found many counterexamples already. If this is not true, then this problem can be solved by rewriting the origianl BlockJoinQuery before invoking getTopGroups method. Nevertheless users are not expected to do so, therefore I would suggest submitting a hotfix that add the described rewrite step.

 

If rewrite() must preserver the hashcode, then this is a problem of the various rewrite() implementations and fix should be much harder.

 

This bug has caused widespread panic in my company and I would like to see it fixed ASAP. Please give me some suggestion so I know which hotfix I should be working on.

 

All the best,

 

Yours Peng