You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Yonik Seeley <yo...@lucidimagination.com> on 2009/07/15 22:00:00 UTC

latest lucene update

Running solr unit tests seems a fair bit slower now.  I think the root
cause may be this:
http://search.lucidimagination.com/search/document/a8bd12c3b87e98a3/speed_of_booleanqueries_on_2_9
That may be fixed, but I think we should implement the new methods anyway.

I'm also surprised that more changes weren't necessary to get the
latest Lucene to work... one thing in particular is docs out of order
- Solr currently requires them in-order to correctly create DocSet
instances, and I'm not sure this is the case any more.  I'll look into
it.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: latest lucene update

Posted by Uwe Schindler <us...@pangaea.de>.
Hi Yonik, 

Mike committed my patch, so the fix should be in trunk.

I tested solr after that with a new Lucene JAR. Some tests run faster now!
But Solr should update its DocIdSetIterators soon...

-----
UWE SCHINDLER
Webserver/Middleware Development
PANGAEA - Publishing Network for Geoscientific and Environmental Data
MARUM - University of Bremen
Room 2500, Leobener Str., D-28359 Bremen
Tel.: +49 421 218 65595
Fax:  +49 421 218 65505
http://www.pangaea.de/
E-mail: uschindler@pangaea.de


> -----Original Message-----
> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
> Seeley
> Sent: Wednesday, July 15, 2009 10:00 PM
> To: java-dev@lucene.apache.org
> Subject: latest lucene update
> 
> Running solr unit tests seems a fair bit slower now.  I think the root
> cause may be this:
> http://search.lucidimagination.com/search/document/a8bd12c3b87e98a3/speed_
> of_booleanqueries_on_2_9
> That may be fixed, but I think we should implement the new methods anyway.
> 
> I'm also surprised that more changes weren't necessary to get the
> latest Lucene to work... one thing in particular is docs out of order
> - Solr currently requires them in-order to correctly create DocSet
> instances, and I'm not sure this is the case any more.  I'll look into
> it.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: latest lucene update

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi Yonik,

could you try out, if my patch in
https://issues.apache.org/jira/browse/LUCENE-1614 makes this better?

Currently I am not sure if the first is enough, or the second one is needed.
Maybe Mike and Shai should answer, why advance may be called with
NO_MORE_DOCS.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
> Seeley
> Sent: Wednesday, July 15, 2009 10:00 PM
> To: java-dev@lucene.apache.org
> Subject: latest lucene update
> 
> Running solr unit tests seems a fair bit slower now.  I think the root
> cause may be this:
> http://search.lucidimagination.com/search/document/a8bd12c3b87e98a3/speed_
> of_booleanqueries_on_2_9
> That may be fixed, but I think we should implement the new methods anyway.
> 
> I'm also surprised that more changes weren't necessary to get the
> latest Lucene to work... one thing in particular is docs out of order
> - Solr currently requires them in-order to correctly create DocSet
> instances, and I'm not sure this is the case any more.  I'll look into
> it.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: latest lucene update

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Thu, Jul 16, 2009 at 10:06 AM, Uwe Schindler<uw...@thetaphi.de> wrote:
> OK. At least I have seen a speed up during my tests :). I have the logs
> somewhere. Which tests were affected negative, then I can look into the
> before/after logs?

Sorry, not sure - it wasn't the slower tests that tipped me off, but
the "speed of BooleanQueries on 2.9" thread that prompted me to look
at the default implementations of the new methods.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: latest lucene update

Posted by Uwe Schindler <uw...@thetaphi.de>.
OK. At least I have seen a speed up during my tests :). I have the logs
somewhere. Which tests were affected negative, then I can look into the
before/after logs?

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
> Seeley
> Sent: Thursday, July 16, 2009 3:53 PM
> To: java-dev@lucene.apache.org
> Subject: Re: latest lucene update
> 
> On Thu, Jul 16, 2009 at 2:11 AM, Uwe Schindler<uw...@thetaphi.de> wrote:
> > Did you also test, that the speed was going back to normal with the
> latest
> > fix in trunk (without modifying Solr code)?
> 
> I didn't - I was already part way through implementing advance() in Solr.
> I'm sure the advance() fix in Lucene would have worked too though.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: latest lucene update

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Thu, Jul 16, 2009 at 2:11 AM, Uwe Schindler<uw...@thetaphi.de> wrote:
> Did you also test, that the speed was going back to normal with the latest
> fix in trunk (without modifying Solr code)?

I didn't - I was already part way through implementing advance() in Solr.
I'm sure the advance() fix in Lucene would have worked too though.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: latest lucene update

Posted by Uwe Schindler <uw...@thetaphi.de>.
Did you also test, that the speed was going back to normal with the latest
fix in trunk (without modifying Solr code)?

I ran the Solr tests with updated lucene-core-2.9.jar here, but I was not
able to find out, which of the tests had the big slowdown. I only noticed
some speedup in some tests related to search.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
> Seeley
> Sent: Thursday, July 16, 2009 2:57 AM
> To: java-dev@lucene.apache.org
> Subject: Re: latest lucene update
> 
> Thanks guys, I had actually meant this message to go to solr-dev...
> hence the "but I think we should implement the new methods anyway".
> I've implemented them, and the performance has returned to normal.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 
> 
> On Wed, Jul 15, 2009 at 4:00 PM, Yonik Seeley<yo...@lucidimagination.com>
> wrote:
> > Running solr unit tests seems a fair bit slower now.  I think the root
> > cause may be this:
> >
> http://search.lucidimagination.com/search/document/a8bd12c3b87e98a3/speed_
> of_booleanqueries_on_2_9
> > That may be fixed, but I think we should implement the new methods
> anyway.
> >
> > I'm also surprised that more changes weren't necessary to get the
> > latest Lucene to work... one thing in particular is docs out of order
> > - Solr currently requires them in-order to correctly create DocSet
> > instances, and I'm not sure this is the case any more.  I'll look into
> > it.
> >
> > -Yonik
> > http://www.lucidimagination.com
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: latest lucene update

Posted by Yonik Seeley <yo...@lucidimagination.com>.
Thanks guys, I had actually meant this message to go to solr-dev...
hence the "but I think we should implement the new methods anyway".
I've implemented them, and the performance has returned to normal.

-Yonik
http://www.lucidimagination.com



On Wed, Jul 15, 2009 at 4:00 PM, Yonik Seeley<yo...@lucidimagination.com> wrote:
> Running solr unit tests seems a fair bit slower now.  I think the root
> cause may be this:
> http://search.lucidimagination.com/search/document/a8bd12c3b87e98a3/speed_of_booleanqueries_on_2_9
> That may be fixed, but I think we should implement the new methods anyway.
>
> I'm also surprised that more changes weren't necessary to get the
> latest Lucene to work... one thing in particular is docs out of order
> - Solr currently requires them in-order to correctly create DocSet
> instances, and I'm not sure this is the case any more.  I'll look into
> it.
>
> -Yonik
> http://www.lucidimagination.com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: latest lucene update

Posted by Uwe Schindler <uw...@thetaphi.de>.
> On Wed, Jul 15, 2009 at 4:12 PM, Uwe Schindler<uw...@thetaphi.de> wrote:
> 
> > As far as I know, Shalin implemented the Collectors in Solr with the
> method
> > allowDocsOutOfOrder() returning false. So the collectors should create
> > DocIdSet with correct order.
> 
> I think you meant "so the Scorers will be created so that they provide
> docIDs in order".

The latest patch in https://issues.apache.org/jira/browse/SOLR-940, here
Shalin implemented allowDocsOutOfOrder as false in some cases. I think from
the discussion in other issues, that this was because of the order of doc
ids was needed to create DocIdSets out of the collected results.

I have something similar in my code here, too. I use a Collector, but need
the doc ids in order.
 
> Ie, Lucene first asks the Collector if it'll accept docs out of order.
>  If it returns false, then it asks the Weight for a Scorer that always
> returns docs in order.
> 
> But in the case of BooleanQuery this is a sizable performance hit, if
> it's a query (only OR'd terms and at most 32 MUST_NOT terms) that can
> use BooleanScorer not BooleanScorer2.

Sometimes ordered DocIds are more important, because ordering them later
using a quicksort would be more costy than a slower query. Because of this
it is good to have this method allowDocsOutOfOrder().

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: latest lucene update

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Wed, Jul 15, 2009 at 4:12 PM, Uwe Schindler<uw...@thetaphi.de> wrote:

> As far as I know, Shalin implemented the Collectors in Solr with the method
> allowDocsOutOfOrder() returning false. So the collectors should create
> DocIdSet with correct order.

I think you meant "so the Scorers will be created so that they provide
docIDs in order".

Ie, Lucene first asks the Collector if it'll accept docs out of order.
 If it returns false, then it asks the Weight for a Scorer that always
returns docs in order.

But in the case of BooleanQuery this is a sizable performance hit, if
it's a query (only OR'd terms and at most 32 MUST_NOT terms) that can
use BooleanScorer not BooleanScorer2.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: latest lucene update

Posted by Uwe Schindler <uw...@thetaphi.de>.
> I'm also surprised that more changes weren't necessary to get the
> latest Lucene to work... one thing in particular is docs out of order
> - Solr currently requires them in-order to correctly create DocSet
> instances, and I'm not sure this is the case any more.  I'll look into
> it.

As far as I know, Shalin implemented the Collectors in Solr with the method
allowDocsOutOfOrder() returning false. So the collectors should create
DocIdSet with correct order.

Uwe



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org