You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Paul Elschot (JIRA)" <ji...@apache.org> on 2006/11/27 21:33:20 UTC

[jira] Created: (LUCENE-730) Restore top level disjunction performance

Restore top level disjunction performance
-----------------------------------------

                 Key: LUCENE-730
                 URL: http://issues.apache.org/jira/browse/LUCENE-730
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Search
            Reporter: Paul Elschot
            Priority: Minor


This patch restores the performance of top level disjunctions. 
The introduction of BooleanScorer2 had impacted this as reported
on java-user on 21 Nov 2006 by Stanislav Jordanov.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-730) Restore top level disjunction performance

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch updated LUCENE-730:
---------------------------------

    Fix Version/s: 2.2

> Restore top level disjunction performance
> -----------------------------------------
>
>                 Key: LUCENE-730
>                 URL: https://issues.apache.org/jira/browse/LUCENE-730
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Paul Elschot
>         Assigned To: Michael Busch
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-730) Restore top level disjunction performance

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499365 ] 

Hoss Man commented on LUCENE-730:
---------------------------------


> The latest patch defaults to docs in order above performance,
> but my personal taste is to have performance by default.

I think it makes more sense to "default" to the most consistent rigidly defined behavior (docs in order), since that behavior will work (by definition) for any caller regardless of whether the caller expects the docs in order or not.

people who find performance lacking can then assess their needs and make a conscious choice to change the setting, and see if it actually improves performance in their use cases.

(ie: "avoid premature optimization" and all that)

> Restore top level disjunction performance
> -----------------------------------------
>
>                 Key: LUCENE-730
>                 URL: https://issues.apache.org/jira/browse/LUCENE-730
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Paul Elschot
>         Assigned To: Michael Busch
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: lucene-730.patch, TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-730) Restore top level disjunction performance

Posted by "Paul Elschot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499293 ] 

Paul Elschot commented on LUCENE-730:
-------------------------------------

The patch applies cleanly here, all core tests pass.
And I like the allowDocsOutOfOrder approach.


> Restore top level disjunction performance
> -----------------------------------------
>
>                 Key: LUCENE-730
>                 URL: https://issues.apache.org/jira/browse/LUCENE-730
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Paul Elschot
>         Assigned To: Michael Busch
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: lucene-730.patch, TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-730) Restore top level disjunction performance

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488889 ] 

Otis Gospodnetic commented on LUCENE-730:
-----------------------------------------

Just a quick note that I contacted Stanislav Jordanov about Paul's patch here.  Stanislav only used BooleanScorer.setUseScorer14() and that restored performance for him, but he did not try this patch (and won't be doing that as he's not working with Lucene at the moment).


> Restore top level disjunction performance
> -----------------------------------------
>
>                 Key: LUCENE-730
>                 URL: https://issues.apache.org/jira/browse/LUCENE-730
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Paul Elschot
>            Priority: Minor
>         Attachments: TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-730) Restore top level disjunction performance

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499372 ] 

Michael Busch commented on LUCENE-730:
--------------------------------------

> With this patch the class BooleanWeight is not
> in (direct) use anymore - it is extended by 
> BooleanWeight2 and then only the latter is used, 
> and creates either Scorer2 or Scorer. We could 
> get rid of BolleanWeight2, and have a single 
> class BooleanWeight.

Agree. Will do.

> Javadocs for useScorer14 methods:

This is good! Thanks Doron, I will add the javadocs
to my patch.

> Restore top level disjunction performance
> -----------------------------------------
>
>                 Key: LUCENE-730
>                 URL: https://issues.apache.org/jira/browse/LUCENE-730
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Paul Elschot
>         Assigned To: Michael Busch
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: lucene-730.patch, TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-730) Restore top level disjunction performance

Posted by "Doron Cohen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499362 ] 

Doron Cohen commented on LUCENE-730:
------------------------------------

Two comments: 

With this patch the class BooleanWeight is not
in (direct) use anymore - it is extended by 
BooleanWeight2 and then only the latter is used, 
and creates either Scorer2 or Scorer. We could 
get rid of BolleanWeight2, and have a single 
class BooleanWeight.

Javadocs for useScorer14 methods:
  /**
   * Indicates that 1.4 BooleanScorer should be used.
   * Being static, This setting is system wide.
   * Scoring in 1.4 mode may be faster.
   * But note that unlike the default behavior, it does 
   * not guarantee that docs are collected in docid
   * order. In other words, with this setting, 
   * {@link HitCollector#collect(int,float)} might be
   * invoked first for docid N and only later for docid N-1. 
   */
  public static void setUseScorer14(boolean use14) {

  /**
   * Whether 1.4 BooleanScorer should be used.
   * @see #setUseScorer14(boolean)
   */
  public static boolean getUseScorer14() {


> Restore top level disjunction performance
> -----------------------------------------
>
>                 Key: LUCENE-730
>                 URL: https://issues.apache.org/jira/browse/LUCENE-730
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Paul Elschot
>         Assigned To: Michael Busch
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: lucene-730.patch, TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-730) Restore top level disjunction performance

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499371 ] 

Michael Busch commented on LUCENE-730:
--------------------------------------

> The latest patch defaults to docs in order above performance,
> but my personal taste is to have performance by default.

I agree with Hoss here. IMO allowing docs out of order is a big
API change. I think if people switch to 2.2 they just want it
to work as before without having to add special settings. If 
they need better performance for certain types of queries and 
they know that their application can deal with docs out of order
they can enable the faster scoring. 
So my vote is +1 for docs in order by default.

> Some performance tests with prohibited scorers could still
> be needed to find out which of the boolean scorers does better
> on them. 

That'd be helpful. However, I'm currently working on some other
issues. Maybe you or others would have some time to run those
tests?

> Restore top level disjunction performance
> -----------------------------------------
>
>                 Key: LUCENE-730
>                 URL: https://issues.apache.org/jira/browse/LUCENE-730
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Paul Elschot
>         Assigned To: Michael Busch
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: lucene-730.patch, TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-730) Restore top level disjunction performance

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated LUCENE-730:
------------------------------------

    Lucene Fields: [New, Patch Available]  (was: [New])

> Restore top level disjunction performance
> -----------------------------------------
>
>                 Key: LUCENE-730
>                 URL: https://issues.apache.org/jira/browse/LUCENE-730
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Paul Elschot
>            Priority: Minor
>         Attachments: TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-730) Restore top level disjunction performance

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch updated LUCENE-730:
---------------------------------

    Attachment: lucene-730.patch

New patch with the following changes:

- Removes BooleanWeight2
- Javadocs for useScorer14 methods provided by Doron

> Restore top level disjunction performance
> -----------------------------------------
>
>                 Key: LUCENE-730
>                 URL: https://issues.apache.org/jira/browse/LUCENE-730
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Paul Elschot
>         Assigned To: Michael Busch
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: lucene-730.patch, lucene-730.patch, TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-730) Restore top level disjunction performance

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch updated LUCENE-730:
---------------------------------

    Attachment: lucene-730.patch

With this patch the old BooleanScorer is only used if BooleanQuery.setUseScorer14(true) is set. It also enables the tests in QueryUtils again that check if the docs are returned in order.

All tests pass.

> Restore top level disjunction performance
> -----------------------------------------
>
>                 Key: LUCENE-730
>                 URL: https://issues.apache.org/jira/browse/LUCENE-730
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Paul Elschot
>         Assigned To: Michael Busch
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: lucene-730.patch, TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-730) Restore top level disjunction performance

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499320 ] 

Michael Busch commented on LUCENE-730:
--------------------------------------

Thanks for reviewing, Paul!

I will commit this soon if nobody objects...

> Restore top level disjunction performance
> -----------------------------------------
>
>                 Key: LUCENE-730
>                 URL: https://issues.apache.org/jira/browse/LUCENE-730
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Paul Elschot
>         Assigned To: Michael Busch
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: lucene-730.patch, TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-730) Restore top level disjunction performance

Posted by "Paul Elschot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499400 ] 

Paul Elschot commented on LUCENE-730:
-------------------------------------

(Is the patch reversed? It did not apply at the first attempt,
probably because my working copy is not the same as the trunk.)
After ant clean, the boolean tests still pass here:
ant -Dtestcase='TestBool*' test-core

A slight improvement for the javadocs of BooleanQuery.java.
In the javadocs of the scorer() method it is indicated that a BooleanScorer2
will always be used, so it is better to mention here that BooleanScorer2
delegates to a 1.4 scorer in some cases:

  /**
   * Indicates that BooleanScorer2 will delegate
   * the scoring to a 1.4 BooleanScorer
   * for most queries without required clauses.
   * Being static, this setting is system wide.
   * Scoring in 1.4 mode may be faster.
   * But note that unlike the default behavior, it does
   * not guarantee that docs are collected in docid
   * order. In other words, with this setting,
   * {@link HitCollector#collect(int,float)} might be
   * invoked first for docid N and only later for docid N-1.
   */
  public static void setUseScorer14(boolean use14) {
    useScorer14 = use14;
  }


> Restore top level disjunction performance
> -----------------------------------------
>
>                 Key: LUCENE-730
>                 URL: https://issues.apache.org/jira/browse/LUCENE-730
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Paul Elschot
>         Assigned To: Michael Busch
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: lucene-730.patch, lucene-730.patch, TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-730) Restore top level disjunction performance

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489206 ] 

Yonik Seeley commented on LUCENE-730:
-------------------------------------

32 is the max number of required + prohibited clauses in the orig BooleanScorer (because it uses an int as a bitfield for each document in the current id range being considered).

> Restore top level disjunction performance
> -----------------------------------------
>
>                 Key: LUCENE-730
>                 URL: https://issues.apache.org/jira/browse/LUCENE-730
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Paul Elschot
>            Priority: Minor
>         Attachments: TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-730) Restore top level disjunction performance

Posted by "Paul Elschot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489229 ] 

Paul Elschot commented on LUCENE-730:
-------------------------------------

Further to Yonik's answer, I have not done any tests with prohibited scorers comparing BooleanScorer and BooleanScorer2.

It is quite possible that using skipTo() on any prohibited scorer (via BooleanScorer2) is generally faster than using BooleanScorer. Prohibited clauses in queries are quite seldom, so it is going to be difficult to find out whether a smaller value than 32 would be generally optimal.




> Restore top level disjunction performance
> -----------------------------------------
>
>                 Key: LUCENE-730
>                 URL: https://issues.apache.org/jira/browse/LUCENE-730
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Paul Elschot
>            Priority: Minor
>         Attachments: TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Reopened: (LUCENE-730) Restore top level disjunction performance

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch reopened LUCENE-730:
----------------------------------

         Assignee: Michael Busch
    Lucene Fields:   (was: [Patch Available, New])

As discussed on java-dev the default behavior of BooleanScorer should be to return the documents in order, because there are people who rely in their apps on that. Docs out of order should only be allowed if BooleanQuery.setUseScorer14(true) is set explicitly.

> Restore top level disjunction performance
> -----------------------------------------
>
>                 Key: LUCENE-730
>                 URL: https://issues.apache.org/jira/browse/LUCENE-730
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Paul Elschot
>         Assigned To: Michael Busch
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-730) Restore top level disjunction performance

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499415 ] 

Michael Busch commented on LUCENE-730:
--------------------------------------

> A slight improvement for the javadocs of BooleanQuery.java.
> In the javadocs of the scorer() method it is indicated that a BooleanScorer2
> will always be used, so it is better to mention here that BooleanScorer2
> delegates to a 1.4 scorer in some cases:

Maybe we should just deprecate the useScorer14 methods and add new methods
allowDocsOutOfOrder. That should be easier to understand for the users. 
And probably most users don't know (or don't care about) the differences
between BooleanScorer and BooleanScorer2 anyway.

> Restore top level disjunction performance
> -----------------------------------------
>
>                 Key: LUCENE-730
>                 URL: https://issues.apache.org/jira/browse/LUCENE-730
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Paul Elschot
>         Assigned To: Michael Busch
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: lucene-730.patch, lucene-730.patch, TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-730) Restore top level disjunction performance

Posted by "Paul Elschot (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/LUCENE-730?page=all ]

Paul Elschot updated LUCENE-730:
--------------------------------

    Attachment: TopLevelDisjunction20061127.patch

This patches BooleanScorer2 to use BooleanScorer in the score(HitCollector) method.
This also patches BooleanScorer to accept a minimum number of optional matchers.

The patch also disables some test code: the use of checkSkipTo in QueryUtils
caused a test failure in TestBoolean2 with the above changes. I think this
could be expected because of the changed document scoring order
for top level disjunction queries.
At the moment I don't know how to resolve this.

With the complete patch, all tests pass here.


> Restore top level disjunction performance
> -----------------------------------------
>
>                 Key: LUCENE-730
>                 URL: http://issues.apache.org/jira/browse/LUCENE-730
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Paul Elschot
>            Priority: Minor
>         Attachments: TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Resolved: (LUCENE-730) Restore top level disjunction performance

Posted by Michael Busch <bu...@gmail.com>.
Chris Hostetter wrote:
> : With this committed it also makes sense to deprecate the setUseScorer14()
> : method and the corresponding get...() method. If you want a patch for that,
> : I'll gladly provide one.
>
> i haven't really been able to follow this issue as much as i would like,
> but docs now sometimes coming out of order and the need to hobble
> QueryUtils.check(Query,Searcher) because of this alarms me a bit ... i
> can't really think of all the cases this might cause problems for people,
> but i'm, sure there may be some (i remember it was kind of a big deal when
> BooleanScorer2 came out and people could start relying on docs coming in
> order ... it's why ConstantScoreRangeQuery because a wrapper arround
> ConstantScoreQuery -- the first version i wrote collected dos in order
> they were found while walking the TermEnum/TermDocs and people didn't like
> that this ment they weren't in order)
>   

I agree. I also believe that a lot of people rely on docs in order (I do 
too in some cases). I think before we make a 2.2 release we should 
change this, so that the docs are still in order per default.

> perhaps this patch should be changed to only have this effect
> (BooleanScorer2 delegating to BooleanScorer) only if
> BooleanQuery.setUseScorer14(true) has been called -- and the existing use
> of BooleanQuery.getUseScorer14() to decide between BooleanWeight and
> BooleanWeight2 can be removed (and BooleanWeight can be deprecated)
>
> that way people who are okay with getting docs out of order can call
> BooleanQuery.setUseScorer14(true) and get the performance benefits when
> possible, but people who want to be sure they get documents in order have
> to accept that in some cases their queries arent' as fast asthey could be.
>
>   
+1. I think this makes sense. We can implement it this way for now, and 
maybe in the future we can deprecate setUseScorer14() and add something 
like

void allowDocsOutOfOrer(boolean allow);

to Query as Doron suggested. I'm going to reopen LUCENE-730.

> ...i believe what i'm suggesting would keep the fundemental meaning of
> setUseScorer(true), even if the value is now used in a slightly differnet
> place ... correct?
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Resolved: (LUCENE-730) Restore top level disjunction performance

Posted by Chris Hostetter <ho...@fucit.org>.
: Also, it would be nice to make this a (non static) method of Query, and, I
: think, also of Weight and Scorer (Weight would "like-inherit" that property
: from its Query, and Scorer from its Weight). The default would be

as i recall, the staticness was so that people writing apps could garuntee
they would allways get the behavior they wanted (ie: docs in order)
regardless of what code path created the BooleanQuery (QueryParser,
PrefixQuery.rewrite etc..)



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Resolved: (LUCENE-730) Restore top level disjunction performance

Posted by Doron Cohen <DO...@il.ibm.com>.
Chris Hostetter <ho...@fucit.org> wrote on 19/04/2007 16:01:53:

>
> : With this committed it also makes sense to deprecate the
setUseScorer14()
> : method and the corresponding get...() method. If you want a patch for
that,
> : I'll gladly provide one.
>
> i haven't really been able to follow this issue as much as i would like,
> but docs now sometimes coming out of order and the need to hobble
> QueryUtils.check(Query,Searcher) because of this alarms me a bit ... i
> can't really think of all the cases this might cause problems for people,
> but i'm, sure there may be some (i remember it was kind of a big deal
when
> BooleanScorer2 came out and people could start relying on docs coming in
> order ... it's why ConstantScoreRangeQuery because a wrapper arround
> ConstantScoreQuery -- the first version i wrote collected dos in order
> they were found while walking the TermEnum/TermDocs and people didn't
like
> that this ment they weren't in order)
>
> perhaps this patch should be changed to only have this effect
> (BooleanScorer2 delegating to BooleanScorer) only if
> BooleanQuery.setUseScorer14(true) has been called -- and the existing use
> of BooleanQuery.getUseScorer14() to decide between BooleanWeight and
> BooleanWeight2 can be removed (and BooleanWeight can be deprecated)
>
> that way people who are okay with getting docs out of order can call
> BooleanQuery.setUseScorer14(true) and get the performance benefits when
> possible, but people who want to be sure they get documents in order have
> to accept that in some cases their queries arent' as fast asthey could
be.
>
> ...i believe what i'm suggesting would keep the fundemental meaning of
> setUseScorer(true), even if the value is now used in a slightly differnet
> place ... correct?

Amazing timing! - just as I was working at 697 trying to un-cancel the
checking skipTo...

This suggestion would help in 697 of course, but I think it also makes
sense in general.

Could I modify it a little - to something more general than "use scorer
number J" - perhaps something like:
   void allowDocsOutOfOrer(boolean allow);
   boolean isAllowedDocsOutOfOrder();
Also, it would be nice to make this a (non static) method of Query, and, I
think, also of Weight and Scorer (Weight would "like-inherit" that property
from its Query, and Scorer from its Weight). The default would be
"disallowed", and the documentation would say that allowing may enable
further optimization, though in some cases (like currently non boolean
scorers or ones with more than 32 prohibited sub scorers) this might have
no effect at all.

Doron


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Resolved: (LUCENE-730) Restore top level disjunction performance

Posted by Chris Hostetter <ho...@fucit.org>.
: With this committed it also makes sense to deprecate the setUseScorer14()
: method and the corresponding get...() method. If you want a patch for that,
: I'll gladly provide one.

i haven't really been able to follow this issue as much as i would like,
but docs now sometimes coming out of order and the need to hobble
QueryUtils.check(Query,Searcher) because of this alarms me a bit ... i
can't really think of all the cases this might cause problems for people,
but i'm, sure there may be some (i remember it was kind of a big deal when
BooleanScorer2 came out and people could start relying on docs coming in
order ... it's why ConstantScoreRangeQuery because a wrapper arround
ConstantScoreQuery -- the first version i wrote collected dos in order
they were found while walking the TermEnum/TermDocs and people didn't like
that this ment they weren't in order)

perhaps this patch should be changed to only have this effect
(BooleanScorer2 delegating to BooleanScorer) only if
BooleanQuery.setUseScorer14(true) has been called -- and the existing use
of BooleanQuery.getUseScorer14() to decide between BooleanWeight and
BooleanWeight2 can be removed (and BooleanWeight can be deprecated)

that way people who are okay with getting docs out of order can call
BooleanQuery.setUseScorer14(true) and get the performance benefits when
possible, but people who want to be sure they get documents in order have
to accept that in some cases their queries arent' as fast asthey could be.

...i believe what i'm suggesting would keep the fundemental meaning of
setUseScorer(true), even if the value is now used in a slightly differnet
place ... correct?


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Resolved: (LUCENE-730) Restore top level disjunction performance

Posted by Paul Elschot <pa...@xs4all.nl>.
On Wednesday 18 April 2007 00:05, Otis Gospodnetic (JIRA) wrote:
> 
>      
[ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
> 
> Otis Gospodnetic resolved LUCENE-730.
> -------------------------------------
> 
>        Resolution: Fixed
>     Lucene Fields: [New, Patch Available]  (was: [Patch Available, New])
> 
> I've committed this (changed a few minor things in the patch)...without 
benchmarking BS vs. BS2 with < 32 prohibited clauses.

With this committed it also makes sense to deprecate the setUseScorer14() 
method and the corresponding get...() method. If you want a patch for that, 
I'll gladly provide one.

Actually I prefer to have these methods removed altogether now, but that is 
probably not compatible with the release policy.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-730) Restore top level disjunction performance

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic resolved LUCENE-730.
-------------------------------------

       Resolution: Fixed
    Lucene Fields: [New, Patch Available]  (was: [Patch Available, New])

I've committed this (changed a few minor things in the patch)...without benchmarking BS vs. BS2 with < 32 prohibited clauses.

Hm, if I exposed that 32 as a static setter method, then one could easily benchmark and compare BS vs. BS2 with Doron's contrib/benchmark.


> Restore top level disjunction performance
> -----------------------------------------
>
>                 Key: LUCENE-730
>                 URL: https://issues.apache.org/jira/browse/LUCENE-730
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Paul Elschot
>            Priority: Minor
>         Attachments: TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Updated: (LUCENE-730) Restore top level disjunction performance

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch updated LUCENE-730:
---------------------------------

    Attachment: lucene-730.patch

New patch that deprecates the useScorer14 methods and adds new
methods:

  /**
   * Indicates whether hit docs may be collected out of docid
   * order. In other words, with this setting, 
   * {@link HitCollector#collect(int,float)} might be
   * invoked first for docid N and only later for docid N-1.
   * Being static, this setting is system wide.
   * If docs out of order are allowed scoring might be faster
   * for certain queries (disjunction queries with less than
   * 32 prohibited terms). This setting has no effect for 
   * other queries.
   */
  public static void setAllowDocsOutOfOrder(boolean allow);
  
  /**
   * Whether hit docs may be collected out of docid order.
   * @see #setAllowDocsOutOfOrder(boolean)
   */
  public static boolean getAllowDocsOutOfOrder();
  

I think this is easier to understand for the users because it 
tells them what they need to know (docs in or out of order) 
and hides technical details (BooleanScorer vs. BooleanScorer2).

All tests pass.


> Restore top level disjunction performance
> -----------------------------------------
>
>                 Key: LUCENE-730
>                 URL: https://issues.apache.org/jira/browse/LUCENE-730
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Paul Elschot
>         Assigned To: Michael Busch
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: lucene-730.patch, lucene-730.patch, lucene-730.patch, TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-730) Restore top level disjunction performance

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489199 ] 

Otis Gospodnetic commented on LUCENE-730:
-----------------------------------------

Paul, what is special about the number 32 here (BooleanScorer2):

+    if ((requiredScorers.size() == 0) &&
+        prohibitedScorers.size() < 32) {
+      // fall back to BooleanScorer, scores documents somewhat out of order
+      BooleanScorer bs = new BooleanScorer(getSimilarity(), minNrShouldMatch);

Why can we use BooleanScorer if there are less than 32 prohibited clauses, but not otherwise?  Thanks.


> Restore top level disjunction performance
> -----------------------------------------
>
>                 Key: LUCENE-730
>                 URL: https://issues.apache.org/jira/browse/LUCENE-730
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Paul Elschot
>            Priority: Minor
>         Attachments: TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-730) Restore top level disjunction performance

Posted by "Michael Busch (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch resolved LUCENE-730.
----------------------------------

    Resolution: Fixed

I just committed the latest patch. Thanks everyone!

> Restore top level disjunction performance
> -----------------------------------------
>
>                 Key: LUCENE-730
>                 URL: https://issues.apache.org/jira/browse/LUCENE-730
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Paul Elschot
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: lucene-730.patch, lucene-730.patch, lucene-730.patch, TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


[jira] Commented: (LUCENE-730) Restore top level disjunction performance

Posted by "Paul Elschot (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499344 ] 

Paul Elschot commented on LUCENE-730:
-------------------------------------

No objection, only some remarks.

One bigger issue:

The latest patch defaults to docs in order above performance,
but my personal taste is to have performance by default.

And some smaller ones:

One could still adapt QueryUtills to take the possibility
of docs out of order into account.

Some performance tests with prohibited scorers could still
be needed to find out which of the boolean scorers does better
on them.


> Restore top level disjunction performance
> -----------------------------------------
>
>                 Key: LUCENE-730
>                 URL: https://issues.apache.org/jira/browse/LUCENE-730
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Paul Elschot
>         Assigned To: Michael Busch
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: lucene-730.patch, TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org