You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shai Erera (JIRA)" <ji...@apache.org> on 2009/05/21 22:47:45 UTC

[jira] Created: (LUCENE-1652) Enhancements to Scorers following the changes to DocIdSetIterator

Enhancements to Scorers following the changes to DocIdSetIterator
-----------------------------------------------------------------

                 Key: LUCENE-1652
                 URL: https://issues.apache.org/jira/browse/LUCENE-1652
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Search
            Reporter: Shai Erera
             Fix For: 3.0


In LUCENE-1614, we changed the semantics of DocIdSetIterator's methods to return a sentinel NO_MORE_DOCS (= Integer.MAX_VALUE) when the iterator has exhausted. Due to backward compatibility issues, we couldn't implement that semantics in doc(). Therefore this issue, which can be introduced in 3.0 only will:
# Implement the new semantics in all extending classes, such that doc() will return NO_MORE_DOCS when the iterator has exhausted.
# Change BooleanScorer to take advantage of that by removing sub.done from SubScorer and operate under the assumption that NO_MORE_DOCS is larger than any doc ID (Integer.MAX_VALUE).
# Change ConjunctionScorer to operate under the same assumptions and remove 'more'.
# Change ReqExclScorer to not rely on reqScorer in doc(), since the latter may be null.
# Make more changes to ConjunctionScorer's init() and remove 'firstTime' to improve the performance of nextDoc(), score(), advance().
# Add start()/finish() to DISI?

A snippet from LUCENE-1614 regarding the change in BooleanScorer

{code}
int doc = sub.done ? -1 : scorer.doc();
while (!sub.done && doc < end) {
  sub.collector.collect(doc);
  doc = scorer.nextDoc();
  sub.done = doc < 0;
}
{code}

To this:

{code}
int doc = scorer.doc();
while (doc < end) {
  sub.collector.collect(doc);
  doc = scorer.nextDoc();
}
{code}

And in ConjunctionScorer, change this:

{code}
while (more && (firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
  more = firstScorer.advance(lastDoc) >= 0;
  lastScorer = firstScorer;
  first = (first == (scorers.length-1)) ? 0 : first+1;
}
return more;
{code}

To this:

{code}
while ((firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
  firstScorer.advance(lastDoc);
  lastScorer = firstScorer;
  first = (first == (scorers.length-1)) ? 0 : first+1;
}
return lastDoc != DOC_SENTINEL;
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1652) Enhancements to Scorers following the changes to DocIdSetIterator

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712838#action_12712838 ] 

Shai Erera commented on LUCENE-1652:
------------------------------------

Good. I'll work on it and post the patch in 1614

> Enhancements to Scorers following the changes to DocIdSetIterator
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1652
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1652
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 3.0
>
>
> In LUCENE-1614, we changed the semantics of DocIdSetIterator's methods to return a sentinel NO_MORE_DOCS (= Integer.MAX_VALUE) when the iterator has exhausted. Due to backward compatibility issues, we couldn't implement that semantics in doc(). Therefore this issue, which can be introduced in 3.0 only will:
> # Implement the new semantics in all extending classes, such that doc() will return NO_MORE_DOCS when the iterator has exhausted.
> # Change BooleanScorer to take advantage of that by removing sub.done from SubScorer and operate under the assumption that NO_MORE_DOCS is larger than any doc ID (Integer.MAX_VALUE).
> # Change ConjunctionScorer to operate under the same assumptions and remove 'more'.
> # Change ReqExclScorer to not rely on reqScorer in doc(), since the latter may be null.
> # Make more changes to ConjunctionScorer's init() and remove 'firstTime' to improve the performance of nextDoc(), score(), advance().
> # Add start()/finish() to DISI?
> A snippet from LUCENE-1614 regarding the change in BooleanScorer
> {code}
> int doc = sub.done ? -1 : scorer.doc();
> while (!sub.done && doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
>   sub.done = doc < 0;
> }
> {code}
> To this:
> {code}
> int doc = scorer.doc();
> while (doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
> }
> {code}
> And in ConjunctionScorer, change this:
> {code}
> while (more && (firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
>   more = firstScorer.advance(lastDoc) >= 0;
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return more;
> {code}
> To this:
> {code}
> while ((firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
>   firstScorer.advance(lastDoc);
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return lastDoc != DOC_SENTINEL;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1652) Enhancements to Scorers following the changes to DocIdSetIterator

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712782#action_12712782 ] 

Michael McCandless commented on LUCENE-1652:
--------------------------------------------

Good point, I agree. How about docID()?

> Enhancements to Scorers following the changes to DocIdSetIterator
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1652
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1652
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 3.0
>
>
> In LUCENE-1614, we changed the semantics of DocIdSetIterator's methods to return a sentinel NO_MORE_DOCS (= Integer.MAX_VALUE) when the iterator has exhausted. Due to backward compatibility issues, we couldn't implement that semantics in doc(). Therefore this issue, which can be introduced in 3.0 only will:
> # Implement the new semantics in all extending classes, such that doc() will return NO_MORE_DOCS when the iterator has exhausted.
> # Change BooleanScorer to take advantage of that by removing sub.done from SubScorer and operate under the assumption that NO_MORE_DOCS is larger than any doc ID (Integer.MAX_VALUE).
> # Change ConjunctionScorer to operate under the same assumptions and remove 'more'.
> # Change ReqExclScorer to not rely on reqScorer in doc(), since the latter may be null.
> # Make more changes to ConjunctionScorer's init() and remove 'firstTime' to improve the performance of nextDoc(), score(), advance().
> # Add start()/finish() to DISI?
> A snippet from LUCENE-1614 regarding the change in BooleanScorer
> {code}
> int doc = sub.done ? -1 : scorer.doc();
> while (!sub.done && doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
>   sub.done = doc < 0;
> }
> {code}
> To this:
> {code}
> int doc = scorer.doc();
> while (doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
> }
> {code}
> And in ConjunctionScorer, change this:
> {code}
> while (more && (firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
>   more = firstScorer.advance(lastDoc) >= 0;
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return more;
> {code}
> To this:
> {code}
> while ((firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
>   firstScorer.advance(lastDoc);
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return lastDoc != DOC_SENTINEL;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1652) Enhancements to Scorers following the changes to DocIdSetIterator

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712549#action_12712549 ] 

Michael McCandless commented on LUCENE-1652:
--------------------------------------------

I hate to say this (heaping yet more limitations on our back-compat
"constraints"), but... it makes me nervous making a runtime-only
semantic change to an API (DISI in this case), even in 3.0.

Likewise, the "doc() returns -1 before next/advance have been
called" would be a runtime only change.

If we did these, you could upgrade to 2.9, fix all deprecations, then
upgrade to 3.0, recompile just fine, and hit weird problems since
Lucene is suddenly expecting different behavior from your DISI.doc().

Such "semantics-only" changes invite subtle bugs.  I'd much prefer to
find a migration path that's based on static checking, ie you get
catastrophic compilation errors if you've failed to migrate.

If external code is iterating through a Lucene DISI, these
semantics-only changes are harmless, since we are only defining
behavior "outside" the bounds of what's currently defined.  But if
Lucene is interacting w/ an external DISI, then we are in trouble.

However, it's not clear to me what's the best way to make this
migration "catastrophic" ... maybe we add DISI.document(), with
the new semantics, and with a default impl in DISI that overlays our
new semantics?  (And deprecate doc()).  We could do this for 2.9.


> Enhancements to Scorers following the changes to DocIdSetIterator
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1652
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1652
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 3.0
>
>
> In LUCENE-1614, we changed the semantics of DocIdSetIterator's methods to return a sentinel NO_MORE_DOCS (= Integer.MAX_VALUE) when the iterator has exhausted. Due to backward compatibility issues, we couldn't implement that semantics in doc(). Therefore this issue, which can be introduced in 3.0 only will:
> # Implement the new semantics in all extending classes, such that doc() will return NO_MORE_DOCS when the iterator has exhausted.
> # Change BooleanScorer to take advantage of that by removing sub.done from SubScorer and operate under the assumption that NO_MORE_DOCS is larger than any doc ID (Integer.MAX_VALUE).
> # Change ConjunctionScorer to operate under the same assumptions and remove 'more'.
> # Change ReqExclScorer to not rely on reqScorer in doc(), since the latter may be null.
> # Make more changes to ConjunctionScorer's init() and remove 'firstTime' to improve the performance of nextDoc(), score(), advance().
> # Add start()/finish() to DISI?
> A snippet from LUCENE-1614 regarding the change in BooleanScorer
> {code}
> int doc = sub.done ? -1 : scorer.doc();
> while (!sub.done && doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
>   sub.done = doc < 0;
> }
> {code}
> To this:
> {code}
> int doc = scorer.doc();
> while (doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
> }
> {code}
> And in ConjunctionScorer, change this:
> {code}
> while (more && (firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
>   more = firstScorer.advance(lastDoc) >= 0;
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return more;
> {code}
> To this:
> {code}
> while ((firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
>   firstScorer.advance(lastDoc);
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return lastDoc != DOC_SENTINEL;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1652) Enhancements to Scorers following the changes to DocIdSetIterator

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712745#action_12712745 ] 

Michael McCandless commented on LUCENE-1652:
--------------------------------------------

bq. I'm not sure about it. In 3.0, we'll make nextDoc() abstract (for sure, since the default impl calls next()) and probably advance() also. So when you upgrade to 2.9, you can switch to calling nextDoc() and advance(), but if you implemented DISI, you won't be required to implement nextDoc() and/or advance(), so when you upgrade to 3.0 your code won't compile.

You're right -- on making nextDoc & advance abstract in 3.0, your code
won't compile on upgrading to 3.0 and you'd have to go fix any custom
DISIs you have.  But: if we leave doc() as is, you wouldn't be forced
to do anything on that.  You just implement nextDoc/advance and think
you're done...

bq. When upgrading, I think we should assume (or even require) users reading CHANGES. When they notice that DISI has changed and that they need to implement two new methods, they should also notice the change in semantics of doc().

Relying only on this (seeing CHANGES.txt) is what makes me nervous.

bq. I take it that by "catastrophic" you mean that you're ok with people upgrading to 3.0 and don't compile, since that will force them to read CHANGES or javadocs and understand what they are now supposed to implement. Therefore if document() documents the new semantics, it is ok for us to rely on that, and if something fails, it's the user's problem.

Right that's what I mean by "catastrophic" (note: Marvin used it
first, but I like it ;) ) But: I want the catastrophe specifically to
apply to doc() as well, so that you are forced to make that a new
method.  Ie, I'm hoping that the extra step of having a newly named
method is enough to get you to go and understand that we subtly
changed its semantics.

bq. If we add document() (note the longer method name, compared to doc()) we can implement it following the new semantics and take advantage of that in 2.9 already (I think?).

Exactly, another benefit of this approach (besides bringing
catastrophe) is that we can do all of this in 2.9, including taking
advantage of the new semantics.  Which is great.

bq. If this indeed should work, where should I do it - in this issue (I need 1614 to be committed first) or in 1614?

I think do this as another iteration of the patch on LUCENE-1614?


> Enhancements to Scorers following the changes to DocIdSetIterator
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1652
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1652
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 3.0
>
>
> In LUCENE-1614, we changed the semantics of DocIdSetIterator's methods to return a sentinel NO_MORE_DOCS (= Integer.MAX_VALUE) when the iterator has exhausted. Due to backward compatibility issues, we couldn't implement that semantics in doc(). Therefore this issue, which can be introduced in 3.0 only will:
> # Implement the new semantics in all extending classes, such that doc() will return NO_MORE_DOCS when the iterator has exhausted.
> # Change BooleanScorer to take advantage of that by removing sub.done from SubScorer and operate under the assumption that NO_MORE_DOCS is larger than any doc ID (Integer.MAX_VALUE).
> # Change ConjunctionScorer to operate under the same assumptions and remove 'more'.
> # Change ReqExclScorer to not rely on reqScorer in doc(), since the latter may be null.
> # Make more changes to ConjunctionScorer's init() and remove 'firstTime' to improve the performance of nextDoc(), score(), advance().
> # Add start()/finish() to DISI?
> A snippet from LUCENE-1614 regarding the change in BooleanScorer
> {code}
> int doc = sub.done ? -1 : scorer.doc();
> while (!sub.done && doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
>   sub.done = doc < 0;
> }
> {code}
> To this:
> {code}
> int doc = scorer.doc();
> while (doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
> }
> {code}
> And in ConjunctionScorer, change this:
> {code}
> while (more && (firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
>   more = firstScorer.advance(lastDoc) >= 0;
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return more;
> {code}
> To this:
> {code}
> while ((firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
>   firstScorer.advance(lastDoc);
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return lastDoc != DOC_SENTINEL;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-1652) Enhancements to Scorers following the changes to DocIdSetIterator

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved LUCENE-1652.
----------------------------------------

    Resolution: Invalid

The ideas in this issue were largely fixed under other issues...

> Enhancements to Scorers following the changes to DocIdSetIterator
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1652
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1652
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 3.0
>
>
> In LUCENE-1614, we changed the semantics of DocIdSetIterator's methods to return a sentinel NO_MORE_DOCS (= Integer.MAX_VALUE) when the iterator has exhausted. Due to backward compatibility issues, we couldn't implement that semantics in doc(). Therefore this issue, which can be introduced in 3.0 only will:
> # Implement the new semantics in all extending classes, such that doc() will return NO_MORE_DOCS when the iterator has exhausted.
> # Change BooleanScorer to take advantage of that by removing sub.done from SubScorer and operate under the assumption that NO_MORE_DOCS is larger than any doc ID (Integer.MAX_VALUE).
> # Change ConjunctionScorer to operate under the same assumptions and remove 'more'.
> # Change ReqExclScorer to not rely on reqScorer in doc(), since the latter may be null.
> # Make more changes to ConjunctionScorer's init() and remove 'firstTime' to improve the performance of nextDoc(), score(), advance().
> # Add start()/finish() to DISI?
> A snippet from LUCENE-1614 regarding the change in BooleanScorer
> {code}
> int doc = sub.done ? -1 : scorer.doc();
> while (!sub.done && doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
>   sub.done = doc < 0;
> }
> {code}
> To this:
> {code}
> int doc = scorer.doc();
> while (doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
> }
> {code}
> And in ConjunctionScorer, change this:
> {code}
> while (more && (firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
>   more = firstScorer.advance(lastDoc) >= 0;
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return more;
> {code}
> To this:
> {code}
> while ((firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
>   firstScorer.advance(lastDoc);
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return lastDoc != DOC_SENTINEL;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1652) Enhancements to Scorers following the changes to DocIdSetIterator

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723282#action_12723282 ] 

Shai Erera commented on LUCENE-1652:
------------------------------------

It looks like everything we mentioned here was taken care of as part of LUCENE-1614 and LUCENE-1630.

Over in LUCENE-1630 we discussed TopScorer. Do you still think it's something we should do. If so, we can do it as part of this issue.

I don't remember anything else that came up in the latest issues that we didn't list here. Do you?

> Enhancements to Scorers following the changes to DocIdSetIterator
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1652
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1652
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 3.0
>
>
> In LUCENE-1614, we changed the semantics of DocIdSetIterator's methods to return a sentinel NO_MORE_DOCS (= Integer.MAX_VALUE) when the iterator has exhausted. Due to backward compatibility issues, we couldn't implement that semantics in doc(). Therefore this issue, which can be introduced in 3.0 only will:
> # Implement the new semantics in all extending classes, such that doc() will return NO_MORE_DOCS when the iterator has exhausted.
> # Change BooleanScorer to take advantage of that by removing sub.done from SubScorer and operate under the assumption that NO_MORE_DOCS is larger than any doc ID (Integer.MAX_VALUE).
> # Change ConjunctionScorer to operate under the same assumptions and remove 'more'.
> # Change ReqExclScorer to not rely on reqScorer in doc(), since the latter may be null.
> # Make more changes to ConjunctionScorer's init() and remove 'firstTime' to improve the performance of nextDoc(), score(), advance().
> # Add start()/finish() to DISI?
> A snippet from LUCENE-1614 regarding the change in BooleanScorer
> {code}
> int doc = sub.done ? -1 : scorer.doc();
> while (!sub.done && doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
>   sub.done = doc < 0;
> }
> {code}
> To this:
> {code}
> int doc = scorer.doc();
> while (doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
> }
> {code}
> And in ConjunctionScorer, change this:
> {code}
> while (more && (firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
>   more = firstScorer.advance(lastDoc) >= 0;
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return more;
> {code}
> To this:
> {code}
> while ((firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
>   firstScorer.advance(lastDoc);
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return lastDoc != DOC_SENTINEL;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1652) Enhancements to Scorers following the changes to DocIdSetIterator

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712778#action_12712778 ] 

Shai Erera commented on LUCENE-1652:
------------------------------------

Ok. However, I don't like the name document() too much - it gives me a feeling as if I'm going to get a Document back. How about docid, curdoc, docId (or any other suggestion that would hint you're going to get a document ID and not a Document)?

> Enhancements to Scorers following the changes to DocIdSetIterator
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1652
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1652
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 3.0
>
>
> In LUCENE-1614, we changed the semantics of DocIdSetIterator's methods to return a sentinel NO_MORE_DOCS (= Integer.MAX_VALUE) when the iterator has exhausted. Due to backward compatibility issues, we couldn't implement that semantics in doc(). Therefore this issue, which can be introduced in 3.0 only will:
> # Implement the new semantics in all extending classes, such that doc() will return NO_MORE_DOCS when the iterator has exhausted.
> # Change BooleanScorer to take advantage of that by removing sub.done from SubScorer and operate under the assumption that NO_MORE_DOCS is larger than any doc ID (Integer.MAX_VALUE).
> # Change ConjunctionScorer to operate under the same assumptions and remove 'more'.
> # Change ReqExclScorer to not rely on reqScorer in doc(), since the latter may be null.
> # Make more changes to ConjunctionScorer's init() and remove 'firstTime' to improve the performance of nextDoc(), score(), advance().
> # Add start()/finish() to DISI?
> A snippet from LUCENE-1614 regarding the change in BooleanScorer
> {code}
> int doc = sub.done ? -1 : scorer.doc();
> while (!sub.done && doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
>   sub.done = doc < 0;
> }
> {code}
> To this:
> {code}
> int doc = scorer.doc();
> while (doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
> }
> {code}
> And in ConjunctionScorer, change this:
> {code}
> while (more && (firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
>   more = firstScorer.advance(lastDoc) >= 0;
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return more;
> {code}
> To this:
> {code}
> while ((firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
>   firstScorer.advance(lastDoc);
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return lastDoc != DOC_SENTINEL;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1652) Enhancements to Scorers following the changes to DocIdSetIterator

Posted by "Shai Erera (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712733#action_12712733 ] 

Shai Erera commented on LUCENE-1652:
------------------------------------

bq. If we did these, you could upgrade to 2.9, fix all deprecations, then upgrade to 3.0, recompile just fine ...

I'm not sure about it. In 3.0, we'll make nextDoc() abstract (for sure, since the default impl calls next()) and probably advance() also. So when you upgrade to 2.9, you can switch to calling nextDoc() and advance(), but if you implemented DISI, you won't be required to implement nextDoc() and/or advance(), so when you upgrade to 3.0 your code won't compile.

When upgrading, I think we should assume (or even require) users reading CHANGES. When they notice that DISI has changed and that they need to implement two new methods, they should also notice the change in semantics of doc().

I take it that by "catastrophic" you mean that you're ok with people upgrading to 3.0 and don't compile, since that will force them to read CHANGES or javadocs and understand what they are now supposed to implement. Therefore if document() documents the new semantics, it is ok for us to rely on that, and if something fails, it's the user's problem.

bq. maybe we add DISI.document(), with the new semantics

If we add document() (note the longer method name, compared to doc()) we can implement it following the new semantics and take advantage of that in 2.9 already (I think?). For example:

{code}
public abstract class DocIdSetIterator {

  private int doc = -1;

  public int document() { return doc; }

  public int nextDoc() throws IOException {
    if (next()) {
       doc = doc();
    } else {
      doc = NO_MORE_DOCS;
    }
    return doc;
  }

  public int advance() throws IOException {
    while ((doc = nextDoc()) < target) {}
    return doc;
  }
}
{code}

We also move to call document() internally. I think this should work?

If this indeed should work, where should I do it - in this issue (I need 1614 to be committed first) or in 1614?

> Enhancements to Scorers following the changes to DocIdSetIterator
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1652
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1652
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 3.0
>
>
> In LUCENE-1614, we changed the semantics of DocIdSetIterator's methods to return a sentinel NO_MORE_DOCS (= Integer.MAX_VALUE) when the iterator has exhausted. Due to backward compatibility issues, we couldn't implement that semantics in doc(). Therefore this issue, which can be introduced in 3.0 only will:
> # Implement the new semantics in all extending classes, such that doc() will return NO_MORE_DOCS when the iterator has exhausted.
> # Change BooleanScorer to take advantage of that by removing sub.done from SubScorer and operate under the assumption that NO_MORE_DOCS is larger than any doc ID (Integer.MAX_VALUE).
> # Change ConjunctionScorer to operate under the same assumptions and remove 'more'.
> # Change ReqExclScorer to not rely on reqScorer in doc(), since the latter may be null.
> # Make more changes to ConjunctionScorer's init() and remove 'firstTime' to improve the performance of nextDoc(), score(), advance().
> # Add start()/finish() to DISI?
> A snippet from LUCENE-1614 regarding the change in BooleanScorer
> {code}
> int doc = sub.done ? -1 : scorer.doc();
> while (!sub.done && doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
>   sub.done = doc < 0;
> }
> {code}
> To this:
> {code}
> int doc = scorer.doc();
> while (doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
> }
> {code}
> And in ConjunctionScorer, change this:
> {code}
> while (more && (firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
>   more = firstScorer.advance(lastDoc) >= 0;
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return more;
> {code}
> To this:
> {code}
> while ((firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
>   firstScorer.advance(lastDoc);
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return lastDoc != DOC_SENTINEL;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-1652) Enhancements to Scorers following the changes to DocIdSetIterator

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723511#action_12723511 ] 

Michael McCandless commented on LUCENE-1652:
--------------------------------------------

I think TopScorer is the only possibility... but let's hold off on that for now?  The only other things I see was maybe adding a DISI.start/done, but I think we should hold off on that too?  So I'll resolve this on as invalid...

> Enhancements to Scorers following the changes to DocIdSetIterator
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1652
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1652
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 3.0
>
>
> In LUCENE-1614, we changed the semantics of DocIdSetIterator's methods to return a sentinel NO_MORE_DOCS (= Integer.MAX_VALUE) when the iterator has exhausted. Due to backward compatibility issues, we couldn't implement that semantics in doc(). Therefore this issue, which can be introduced in 3.0 only will:
> # Implement the new semantics in all extending classes, such that doc() will return NO_MORE_DOCS when the iterator has exhausted.
> # Change BooleanScorer to take advantage of that by removing sub.done from SubScorer and operate under the assumption that NO_MORE_DOCS is larger than any doc ID (Integer.MAX_VALUE).
> # Change ConjunctionScorer to operate under the same assumptions and remove 'more'.
> # Change ReqExclScorer to not rely on reqScorer in doc(), since the latter may be null.
> # Make more changes to ConjunctionScorer's init() and remove 'firstTime' to improve the performance of nextDoc(), score(), advance().
> # Add start()/finish() to DISI?
> A snippet from LUCENE-1614 regarding the change in BooleanScorer
> {code}
> int doc = sub.done ? -1 : scorer.doc();
> while (!sub.done && doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
>   sub.done = doc < 0;
> }
> {code}
> To this:
> {code}
> int doc = scorer.doc();
> while (doc < end) {
>   sub.collector.collect(doc);
>   doc = scorer.nextDoc();
> }
> {code}
> And in ConjunctionScorer, change this:
> {code}
> while (more && (firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
>   more = firstScorer.advance(lastDoc) >= 0;
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return more;
> {code}
> To this:
> {code}
> while ((firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
>   firstScorer.advance(lastDoc);
>   lastScorer = firstScorer;
>   first = (first == (scorers.length-1)) ? 0 : first+1;
> }
> return lastDoc != DOC_SENTINEL;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org