You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Scott Baldwin <S....@qsr.com.au> on 2012/05/29 12:01:14 UTC

Lucene upgrade causes significant slow down

Hi all, I'm a bit of a newb when it comes to Lucene, but I am in need of some sage counsel.

I have an application that uses the Lucene engine to index various items. We upgraded to Lucene 2.9.2 and we now find that on a re-index of a significantly large project, our index size has doubled (from roughly 73MB to 137MB). This causes significant performance issues especially as we have written a custom Directory that queries the index from a SQL Database instead of a disk file of RAM index.

I was just wondering if anyone knows there would be such a huge increase in index size, and if there are any options/settings we might be able to change to reduce the size of the index?

Thanks heaps.

Scott Baldwin  Technical Architect
QSR International Pty Ltd
2nd Floor, 651 Doncaster Road   |   Doncaster Victoria 3108 Australia
T  +61 3 9840 4934  F  +61 3 9840 1500
s.baldwin@qsrinternational.com<ma...@qsrinternational.com>   |   www.qsrinternational.com<http://www.qsrinternational.com/>

Please consider the environment before printing this email.

  [Description: cid:image004.png@01CBF452.E35AB7A0]


________________________________

Disclaimer
This transmission may contain information which is confidential and privileged and intended only for the addressee. If you are not the addressee you may not use, disseminate or copy this information. If you have received this information in error please notify the sender immediately. Thank you.




RE: Lucene upgrade causes significant slow down

Posted by "Allan, Brad (Wokingham)" <Br...@Fiserv.com>.
Scott, forgive the impromptu nature of this, but couldn't help but notice that you guys have a Lucene Directory impl to a SQL Server db.

Would you mind commenting on this stackoverflow topic?
http://stackoverflow.com/questions/10815206/implement-lucene-on-existing-net-sql-server-stack-with-multiple-webservers

Brad Allan


-----Original Message-----
From: Scott Baldwin [mailto:S.Baldwin@qsr.com.au]
Sent: 31 May 2012 08:13
To: lucene-net-user@lucene.apache.org
Subject: RE: Lucene upgrade causes significant slow down

Hi again all, can someone just confirm or deny the possibility that upgrading from 2.4.1 to 2.9.2 may cause a significant increase in the size of the index when re-indexed?
We are seeing significant index size increases for indexes in some cases (not all).

Thanks.

-----Original Message-----
From: Simon Svensson [mailto:sisve@devhost.se]
Sent: Tuesday, 29 May 2012 10:49 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: Lucene upgrade causes significant slow down

Hi,

Export your database-based directory into a normal filesystem (using Directory.Copy), and open it in Luke[1]. The Files tab will show what the different files are used for, and which ones belong to old commits and can be removed.

Have you tried the latest version; 2.9.4?

// Simon

[1] https://code.google.com/p/luke/

On 2012-05-29 12:01, Scott Baldwin wrote:
>
> Hi all, I'm a bit of a newb when it comes to Lucene, but I am in need
> of some sage counsel.
>
> I have an application that uses the Lucene engine to index various
> items. We upgraded to Lucene 2.9.2 and we now find that on a re-index
> of a significantly large project, our index size has doubled (from
> roughly 73MB to 137MB). This causes significant performance issues
> especially as we have written a custom Directory that queries the
> index from a SQL Database instead of a disk file of RAM index.
>
> I was just wondering if anyone knows there would be such a huge
> increase in index size, and if there are any options/settings we might
> be able to change to reduce the size of the index?
>
> Thanks heaps.
>
> Scott Baldwin Technical Architect
>
> *QSR International Pty Ltd*
> 2nd Floor, 651 Doncaster Road |  Doncaster Victoria 3108 Australia T
> +61 3 9840 4934 F +61 3 9840 1500 s.baldwin@qsrinternational.com
> <ma...@qsrinternational.com>
> |www.qsrinternational.com <http://www.qsrinternational.com/>
>
> *Please consider the environment before printing this email.*
>
>
>
> Description: cid:image004.png@01CBF452.E35AB7A0
>
> ----------------------------------------------------------------------
> --
>
> *Disclaimer*
> This transmission may contain information which is confidential and
> privileged and intended only for the addressee. If you are not the
> addressee you may not use, disseminate or copy this information. If
> you have received this information in error please notify the sender
> immediately. Thank you.
>

CheckFree Solutions Limited (trading as Fiserv)
Registered Office: Eversheds House, 70 Great Bridgewater Street, Manchester, M15 ES
Registered in England: No. 2694333

Re: Lucene upgrade causes significant slow down

Posted by Noel <ly...@hotmail.com>.
Hi Scott, I can only tell you from my perspective that our upgrade from 2.1 
to 2.9.2 did not cause any major increase in index size.
We also move some of our data types from string to use the decimal and 
integer radix formats along the way and that may have compensated for some 
increases if they occurred at all. So in summary we did not notice any 
appreciable increase in the index size, I would further suggest that you 
look in the java forum's for such issues as the index format is compatible 
between both .NET and Java. If there was a dramatic increase in the index 
size it would have been noted and explained there.

Kind Regards
Noel.

-----Original Message----- 
From: Scott Baldwin
Sent: Thursday, May 31, 2012 8:13 AM
To: lucene-net-user@lucene.apache.org
Subject: RE: Lucene upgrade causes significant slow down

Hi again all, can someone just confirm or deny the possibility that 
upgrading from 2.4.1 to 2.9.2 may cause a significant increase in the size 
of the index when re-indexed?
We are seeing significant index size increases for indexes in some cases 
(not all).

Thanks.

-----Original Message-----
From: Simon Svensson [mailto:sisve@devhost.se]
Sent: Tuesday, 29 May 2012 10:49 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: Lucene upgrade causes significant slow down

Hi,

Export your database-based directory into a normal filesystem (using 
Directory.Copy), and open it in Luke[1]. The Files tab will show what the 
different files are used for, and which ones belong to old commits and can 
be removed.

Have you tried the latest version; 2.9.4?

// Simon

[1] https://code.google.com/p/luke/

On 2012-05-29 12:01, Scott Baldwin wrote:
>
> Hi all, I'm a bit of a newb when it comes to Lucene, but I am in need
> of some sage counsel.
>
> I have an application that uses the Lucene engine to index various
> items. We upgraded to Lucene 2.9.2 and we now find that on a re-index
> of a significantly large project, our index size has doubled (from
> roughly 73MB to 137MB). This causes significant performance issues
> especially as we have written a custom Directory that queries the
> index from a SQL Database instead of a disk file of RAM index.
>
> I was just wondering if anyone knows there would be such a huge
> increase in index size, and if there are any options/settings we might
> be able to change to reduce the size of the index?
>
> Thanks heaps.
>
> Scott Baldwin Technical Architect
>
> *QSR International Pty Ltd*
> 2nd Floor, 651 Doncaster Road |  Doncaster Victoria 3108 Australia T
> +61 3 9840 4934 F +61 3 9840 1500 s.baldwin@qsrinternational.com
> <ma...@qsrinternational.com>
> |www.qsrinternational.com <http://www.qsrinternational.com/>
>
> *Please consider the environment before printing this email.*
>
>
>
> Description: cid:image004.png@01CBF452.E35AB7A0
>
> ----------------------------------------------------------------------
> --
>
> *Disclaimer*
> This transmission may contain information which is confidential and
> privileged and intended only for the addressee. If you are not the
> addressee you may not use, disseminate or copy this information. If
> you have received this information in error please notify the sender
> immediately. Thank you.
> 


RE: Lucene upgrade causes significant slow down

Posted by Scott Baldwin <S....@qsr.com.au>.
Hi again all, can someone just confirm or deny the possibility that upgrading from 2.4.1 to 2.9.2 may cause a significant increase in the size of the index when re-indexed?
We are seeing significant index size increases for indexes in some cases (not all).

Thanks.

-----Original Message-----
From: Simon Svensson [mailto:sisve@devhost.se] 
Sent: Tuesday, 29 May 2012 10:49 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: Lucene upgrade causes significant slow down

Hi,

Export your database-based directory into a normal filesystem (using Directory.Copy), and open it in Luke[1]. The Files tab will show what the different files are used for, and which ones belong to old commits and can be removed.

Have you tried the latest version; 2.9.4?

// Simon

[1] https://code.google.com/p/luke/

On 2012-05-29 12:01, Scott Baldwin wrote:
>
> Hi all, I'm a bit of a newb when it comes to Lucene, but I am in need 
> of some sage counsel.
>
> I have an application that uses the Lucene engine to index various 
> items. We upgraded to Lucene 2.9.2 and we now find that on a re-index 
> of a significantly large project, our index size has doubled (from 
> roughly 73MB to 137MB). This causes significant performance issues 
> especially as we have written a custom Directory that queries the 
> index from a SQL Database instead of a disk file of RAM index.
>
> I was just wondering if anyone knows there would be such a huge 
> increase in index size, and if there are any options/settings we might 
> be able to change to reduce the size of the index?
>
> Thanks heaps.
>
> Scott Baldwin Technical Architect
>
> *QSR International Pty Ltd*
> 2nd Floor, 651 Doncaster Road |  Doncaster Victoria 3108 Australia T 
> +61 3 9840 4934 F +61 3 9840 1500 s.baldwin@qsrinternational.com 
> <ma...@qsrinternational.com>
> |www.qsrinternational.com <http://www.qsrinternational.com/>
>
> *Please consider the environment before printing this email.*
>
> 	
>
> Description: cid:image004.png@01CBF452.E35AB7A0
>
> ----------------------------------------------------------------------
> --
>
> *Disclaimer*
> This transmission may contain information which is confidential and 
> privileged and intended only for the addressee. If you are not the 
> addressee you may not use, disseminate or copy this information. If 
> you have received this information in error please notify the sender 
> immediately. Thank you.
>

Re: Lucene upgrade causes significant slow down

Posted by Simon Svensson <si...@devhost.se>.
Hi,

Export your database-based directory into a normal filesystem (using 
Directory.Copy), and open it in Luke[1]. The Files tab will show what 
the different files are used for, and which ones belong to old commits 
and can be removed.

Have you tried the latest version; 2.9.4?

// Simon

[1] https://code.google.com/p/luke/

On 2012-05-29 12:01, Scott Baldwin wrote:
>
> Hi all, I'm a bit of a newb when it comes to Lucene, but I am in need 
> of some sage counsel.
>
> I have an application that uses the Lucene engine to index various 
> items. We upgraded to Lucene 2.9.2 and we now find that on a re-index 
> of a significantly large project, our index size has doubled (from 
> roughly 73MB to 137MB). This causes significant performance issues 
> especially as we have written a custom Directory that queries the 
> index from a SQL Database instead of a disk file of RAM index.
>
> I was just wondering if anyone knows there would be such a huge 
> increase in index size, and if there are any options/settings we might 
> be able to change to reduce the size of the index?
>
> Thanks heaps.
>
> Scott Baldwin Technical Architect
>
> *QSR International Pty Ltd*
> 2nd Floor, 651 Doncaster Road |  Doncaster Victoria 3108 Australia
> T +61 3 9840 4934 F +61 3 9840 1500
> s.baldwin@qsrinternational.com <ma...@qsrinternational.com> 
> |www.qsrinternational.com <http://www.qsrinternational.com/>
>
> *Please consider the environment before printing this email.*
>
> 	
>
> Description: cid:image004.png@01CBF452.E35AB7A0
>
> ------------------------------------------------------------------------
>
> *Disclaimer*
> This transmission may contain information which is confidential and 
> privileged and intended only for the addressee. If you are not the 
> addressee you may not use, disseminate or copy this information. If 
> you have received this information in error please notify the sender 
> immediately. Thank you.
>

RE: Lucene upgrade causes significant slow down

Posted by Scott Baldwin <S....@qsr.com.au>.
Thanks for your reply Artem. The operation that we have seen suffer the performance degradation is what we call a "Word Frequency" query. This involves iterating through every term in the index

reader = IndexReader.Open(Directory, true);
while (terms.Next())
{
	TermDocs docs = reader.TermDocs(term);
	while (docs.Next())
	{
		Document doc = reader.Document(docs.Doc(), fieldSelector);
		// do stuff with doc
	}
}

Our normal text search queries for individual terms don't seem to have suffered from the database size increase. I am open to the possibility of there being a better way to perform such a query, but as it stands, this query is up to 50 times slower than it was with our previous version of the software. 

Also it is important to note that I am performing an Optimize on the index before I try to produce a word frequency query.

Thanks

-----Original Message-----
From: Artem Chereisky [mailto:a.chereisky@gmail.com] 
Sent: Tuesday, 29 May 2012 9:30 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: Lucene upgrade causes significant slow down

Scott,

I don't believe the index size of 138MB is your issue. We have multiple indexes close to 1GB and it's super fast. In fact we didn't notice any performance degradation as the index grew from 100MB to almost a gig. Our search engine performs thousands of searches per second on a quad core server with 32GB of ram...and the server is not even busy.

Do you retrieve field values stored in the index using doc Id? That, I found, is the most common performance issue.

Thanks,
Art

On Tue, May 29, 2012 at 8:01 PM, Scott Baldwin <S....@qsr.com.au> wrote:

>  Hi all, I'm a bit of a newb when it comes to Lucene, but I am in need 
> of some sage counsel.****
>
> ** **
>
> I have an application that uses the Lucene engine to index various items.
> We upgraded to Lucene 2.9.2 and we now find that on a re-index of a 
> significantly large project, our index size has doubled (from roughly 
> 73MB to 137MB). This causes significant performance issues especially 
> as we have written a custom Directory that queries the index from a 
> SQL Database instead of a disk file of RAM index.****
>
> ** **
>
> I was just wondering if anyone knows there would be such a huge 
> increase in index size, and if there are any options/settings we might 
> be able to change to reduce the size of the index?****
>
> ** **
>
> Thanks heaps.****
>
> ** **
>
> Scott Baldwin  Technical Architect****
>
> *QSR International Pty Ltd*
> 2nd Floor, 651 Doncaster Road   |   Doncaster Victoria 3108 Australia
> T  +61 3 9840 4934  F  +61 3 9840 1500
> s.baldwin@qsrinternational.com   |   www.qsrinternational.com ****
>
> *Please consider the environment before printing this email.*****
>
>   [image: Description: cid:image004.png@01CBF452.E35AB7A0]****
>      ------------------------------
>
> *Disclaimer*
> This transmission may contain information which is confidential and 
> privileged and intended only for the addressee. If you are not the 
> addressee you may not use, disseminate or copy this information. If 
> you have received this information in error please notify the sender 
> immediately. Thank you. ****
>
> ** **
>
> ** **
>

Re: Lucene upgrade causes significant slow down

Posted by Artem Chereisky <a....@gmail.com>.
Scott,

I don't believe the index size of 138MB is your issue. We have multiple
indexes close to 1GB and it's super fast. In fact we didn't notice any
performance degradation as the index grew from 100MB to almost a gig. Our
search engine performs thousands of searches per second on a quad core
server with 32GB of ram...and the server is not even busy.

Do you retrieve field values stored in the index using doc Id? That, I
found, is the most common performance issue.

Thanks,
Art

On Tue, May 29, 2012 at 8:01 PM, Scott Baldwin <S....@qsr.com.au> wrote:

>  Hi all, I’m a bit of a newb when it comes to Lucene, but I am in need of
> some sage counsel.****
>
> ** **
>
> I have an application that uses the Lucene engine to index various items.
> We upgraded to Lucene 2.9.2 and we now find that on a re-index of a
> significantly large project, our index size has doubled (from roughly 73MB
> to 137MB). This causes significant performance issues especially as we have
> written a custom Directory that queries the index from a SQL Database
> instead of a disk file of RAM index.****
>
> ** **
>
> I was just wondering if anyone knows there would be such a huge increase
> in index size, and if there are any options/settings we might be able to
> change to reduce the size of the index?****
>
> ** **
>
> Thanks heaps.****
>
> ** **
>
> Scott Baldwin  Technical Architect****
>
> *QSR International Pty Ltd*
> 2nd Floor, 651 Doncaster Road   |   Doncaster Victoria 3108 Australia
> T  +61 3 9840 4934  F  +61 3 9840 1500
> s.baldwin@qsrinternational.com   |   www.qsrinternational.com ****
>
> *Please consider the environment before printing this email.*****
>
>   [image: Description: cid:image004.png@01CBF452.E35AB7A0]****
>      ------------------------------
>
> *Disclaimer*
> This transmission may contain information which is confidential and
> privileged and intended only for the addressee. If you are not the
> addressee you may not use, disseminate or copy this information. If you
> have received this information in error please notify the sender
> immediately. Thank you. ****
>
> ** **
>
> ** **
>