You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Nitin Shiralkar <ni...@coreobjects.com> on 2008/12/27 05:41:27 UTC

Lucene Scalability Options

Hi All,

We are using Lucene.NET v2.0 library in our project. Our index has grown to ~80 GB in last one year. We expect our index to grow beyond 100 GB in next six months. I have read somewhere long back about Lucene performance issues after crossing 100 GB mark.


-          Is there any specific issues that we might run into after 100 GB?

-          Is there any known impact on search performance?

-          Do we have any scalability features that we can consider for implementation? Clustering etc?

Any inputs would be valuable. Also I would like to know the latest stable Lucene.NET release which we can migrate to, any download link would be useful.


Thanks & regards,

Nitin Shiralkar

RE: Lucene Scalability Options

Posted by Nic Wise <Ni...@bbc.com>.
When we were doing this, optimize (which we called every night) didn't
fix it - our problem was mid-merge.

The problem was, we were mid-optimize, and the process was killed - not
cleanly. So we had files left over, and no record of them in segments.
We DID have valid stuff in segments, just too many segment files, so it
tried to merge again.

Anyway, changed some settings - after taking a while to work out what it
was - and it went away :) moral of the story: don't have a watchdog
timer which kills the process if you merge 2GB segments!

-----Original Message-----
From: George Aroush [mailto:george@aroush.net] 
Sent: 13 January 2009 13:24
To: lucene-net-user@incubator.apache.org
Subject: RE: Lucene Scalability Options

There is.  Call the Optimize() function on the index.

You should never delete index files manually unless if you know what you
are
doing otherwise you can corrupt / destroy your index.

-- George 

> -----Original Message-----
> From: Nic Wise [mailto:Nic.Wise@bbc.com] 
> Sent: Tuesday, January 13, 2009 6:36 AM
> To: lucene-net-user@incubator.apache.org
> Subject: RE: Lucene Scalability Options
> 
> I'm SURE there is a cleaner way, but in the past, we read the 
> segments file (manually :( ), and any file which wasn't 
> listed in there was considered to be a redundant file.
> 
> Worked for us. There may be a way to ask a IndexReader which 
> files it's using, and then extrapolate from there, but we 
> were using Lucene.net 1.something, which didn't.
> 
> I think that's what luke does. Opens the index, asks Lucene 
> whats it's using, kills everything else.
> 
> -----Original Message-----
> From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
> Sent: 13 January 2009 11:26
> To: lucene-net-user@incubator.apache.org
> Subject: RE: Lucene Scalability Options
> 
> Hi All,
> 
> I have started this thread for Lucene scalability aspect. I 
> have an index with 80 GB size. However it looks like many of 
> the segment files are either redundant or unused. Even if I 
> delete them and just retain CFS, segments and deletable 
> files, the index seems to be working fine.
> However I want to know more cleaner approach to identify such 
> redundant/unused files through APIs. I am able to see these 
> unused files in Luke as "Deletable". However I am not sure 
> how Luke is able to identify unused files. I am using 
> Lucene.NET 2.0 version.
> 
> Can you please suggest some way?
> 
> 
> 
> -----Original Message-----
> From: Granroth, Neal V. [mailto:neal.granroth@thermofisher.com]
> Sent: Tuesday, January 13, 2009 1:01 AM
> To: lucene-net-user@incubator.apache.org
> Subject: RE: Lucene Scalability Options
> 
> 
> Floyd, you will need to provide more details about the 
> specific problems you are encountering.
> 
> I made a quick check, and have no difficulty opening and 
> inspecting an index I created a few minutes ago with 
> Lucene.NET v2.3.1 using Luke v0.9.1.
> 
> -- Neal
> 
> 
> -----Original Message-----
> From: Floyd Wu [mailto:floyd.wu@gmail.com]
> Sent: Friday, January 09, 2009 8:18 PM
> To: lucene-net-user@incubator.apache.org
> Subject: Re: Lucene Scalability Options
> 
> Hi all,
> It seems new version of Luke is not compitable with 
> Lucene.net and I've email to the creator of Luke. Below is 
> feedback from him
> 
> "Yes, there have been many changes,
> but Lucene 2.4 can still open indexes built with earlier 
> versions of Lucene/Java.
> This is the second report I've got about the possible 
> incompatibility with Lucene.Net - I suggest to raise up this 
> issue on the Lucene mailing list ( 
> java-dev@lucene.apache.org), and provide more details, eg. 
> Lucene.Net revision, stack trace, a small sample index if you can."
> 
> My original report as below
> "The situation is Luke-0.9 can not open the index files which 
> built by Lucene.Net-2.3.1.
> I tried to use older version of Luke and confirm Luke-0.8 and 
> Luke-0.8.1 can open and read index files fine.
>  I wonder if there is any change between java Lucene 2.3 and 2.4.
> Please help on this."
> 
> Floyd
> 
> 
> 
> 2009/1/9 George Aroush <ge...@aroush.net>
> 
> > Hi Nitin,
> >
> > Any optimization that Luke can do on an index is also 
> doable by making
> API
> > calls from Lucene.Net.  If not, then there is either a bug in
> Lucene.Net or
> > in your use of the API.  Can you share with us your API 
> calls as well
> as
> > the
> > Lucene.Net version you are using?
> >
> > Thanks.
> >
> > -- George
> >
> > > -----Original Message-----
> > > From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
> >  > Sent: Friday, January 09, 2009 6:27 AM
> > > To: lucene-net-user@incubator.apache.org
> > > Subject: RE: Lucene Scalability Options
> > >
> > > Thanks Hugh. Yes, I tried using Luke for index optimization.
> > > Surprisingly, it has brought down the index size to ~20 
> GB with only 
> > > one CFS and segment files left behind. I used compound 
> optimization 
> > > option. But I use the similar "SetUseCompoundFile" property on 
> > > "IndexModifier" object in my Lucene.NET code, but it has 
> no effect 
> > > on size or files after optimization. Any suggestions??
> > >
> > >
> > > -----Original Message-----
> > > From: Hugh Spiller [mailto:Hugh.Spiller@Renishaw.com]
> > > Sent: Friday, January 09, 2009 3:35 PM
> > > To: lucene-net-user@incubator.apache.org
> > > Subject: RE: Lucene Scalability Options
> > >
> > > Hi Nitin,
> > >
> > > I've found the easiest way to get rid of redundant files 
> in an index 
> > > is to use Luke. As soon as you use it to open the index, 
> it tidies 
> > > up all the cruft.
> > >
> > > It's at http://www.getopt.org/luke/ .
> > >
> > > ________________________________
> > >
> > > Hugh Spiller
> > >
> > >
> > > -----Original Message-----
> > > From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
> > > Sent: 09 January 2009 08:48
> > > To: lucene-net-user@incubator.apache.org
> > > Subject: RE: Lucene Scalability Options
> > >
> > > -- snip --
> > >
> > >
> > > Any inputs on junk/redundant files in above list?
> > >
> > >
> > >
> > > --------------------------------------------------------------
> > > ------------------------------------
> > > This email and any attachments are confidential and are 
> for the use 
> > > of the addressee only. If you are not the addressee, you must not 
> > > use or disclose the contents to any other person. Please 
> immediately 
> > > notify the sender and delete the email. Statements and opinions 
> > > expressed here may not represent those of the company. Email 
> > > correspondence is monitored by the company. This 
> information may be 
> > > subject to Export Control Regulation. You are obliged to 
> comply with 
> > > such Regulations
> > >
> > > The parent company of the Renishaw Group is Renishaw plc, 
> registered 
> > > in England no. 1106260. Registered Office: New Mills, 
> > > Wotton-under-Edge, Gloucestershire, GL12 8JR, United Kingdom. Tel 
> > > +44 (0) 1453 524524
> > > --------------------------------------------------------------
> > > ------------------------------------
> > >
> >
> > 
> This e-mail (and any attachments) is confidential and may 
> contain personal views which are not the views of the BBC 
> unless specifically stated. If you have received it in error, 
> please delete it from your system. Do not use, copy or 
> disclose the information in any way nor act in reliance on it 
> and notify the sender immediately.
>  
> Please note that the BBC monitors e-mails sent or received. 
> Further communication will signify your consent to this
> 
> This e-mail has been sent by one of the following 
> wholly-owned subsidiaries of the BBC:
>  
> BBC Worldwide Limited, Registration Number: 1420028 England, 
> Registered Address: BBC Media Centre, 201 Wood Lane, London, 
> W12 7TQ BBC World News Limited, Registration Number: 04514407 
> England, Registered Address: Woodlands, BBC Media Centre, 201 
> Wood Lane, London, W12 7TQ BBC World Distribution Limited, 
> Registration Number: 04514408, Registered Address: Woodlands, 
> BBC Media Centre, 201 Wood Lane, London, W12 7TQ
> 
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
 
Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this

This e-mail has been sent by one of the following wholly-owned subsidiaries of the BBC:
 
BBC Worldwide Limited, Registration Number: 1420028 England, Registered Address: BBC Media Centre, 201 Wood Lane, London, W12 7TQ
BBC World News Limited, Registration Number: 04514407 England, Registered Address: BBC Media Centre, 201 Wood Lane, London, W12 7TQ
BBC World Distribution Limited, Registration Number: 04514408, Registered Address: BBC Media Centre, 201 Wood Lane, London, W12 7TQ

RE: Lucene Scalability Options

Posted by George Aroush <ge...@aroush.net>.
What version of Lucene.Net are you observing this on?

I know I had difficulty doing a clean port in this area of the code for 2.0
and even 2.1 (it's still the same for 2.3.1).

-- George 

> -----Original Message-----
> From: Nitin Shiralkar [mailto:nitins@coreobjects.com] 
> Sent: Tuesday, January 13, 2009 8:46 AM
> To: lucene-net-user@incubator.apache.org
> Subject: RE: Lucene Scalability Options
> 
> Hi George,
> 
> Thanks. But the basic reason for junk files is optimize only. 
> When you set compound index flag to true to have single 
> segment file, then lucene tries to merge all segments and 
> deletes the older ones. However if the older ones are being 
> accessed in parallel, then delete operation fails. This is 
> tracked in lucene through "deletable" and should be cleaned 
> when we open index next time. However in some cases the files 
> remain as unused and no longer referenced in lucene.
> 
> This is a rare scenario and files are created over a period 
> of two years.
> 
> -----Original Message-----
> From: George Aroush [mailto:george@aroush.net]
> Sent: Tuesday, January 13, 2009 6:54 PM
> To: lucene-net-user@incubator.apache.org
> Subject: RE: Lucene Scalability Options
> 
> There is.  Call the Optimize() function on the index.
> 
> You should never delete index files manually unless if you 
> know what you are doing otherwise you can corrupt / destroy 
> your index.
> 
> -- George
> 
> > -----Original Message-----
> > From: Nic Wise [mailto:Nic.Wise@bbc.com]
> > Sent: Tuesday, January 13, 2009 6:36 AM
> > To: lucene-net-user@incubator.apache.org
> > Subject: RE: Lucene Scalability Options
> >
> > I'm SURE there is a cleaner way, but in the past, we read 
> the segments 
> > file (manually :( ), and any file which wasn't listed in there was 
> > considered to be a redundant file.
> >
> > Worked for us. There may be a way to ask a IndexReader which files 
> > it's using, and then extrapolate from there, but we were using 
> > Lucene.net 1.something, which didn't.
> >
> > I think that's what luke does. Opens the index, asks Lucene 
> whats it's 
> > using, kills everything else.
> >
> > -----Original Message-----
> > From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
> > Sent: 13 January 2009 11:26
> > To: lucene-net-user@incubator.apache.org
> > Subject: RE: Lucene Scalability Options
> >
> > Hi All,
> >
> > I have started this thread for Lucene scalability aspect. I have an 
> > index with 80 GB size. However it looks like many of the 
> segment files 
> > are either redundant or unused. Even if I delete them and 
> just retain 
> > CFS, segments and deletable files, the index seems to be 
> working fine.
> > However I want to know more cleaner approach to identify such 
> > redundant/unused files through APIs. I am able to see these unused 
> > files in Luke as "Deletable". However I am not sure how 
> Luke is able 
> > to identify unused files. I am using Lucene.NET 2.0 version.
> >
> > Can you please suggest some way?
> >
> >
> >
> > -----Original Message-----
> > From: Granroth, Neal V. [mailto:neal.granroth@thermofisher.com]
> > Sent: Tuesday, January 13, 2009 1:01 AM
> > To: lucene-net-user@incubator.apache.org
> > Subject: RE: Lucene Scalability Options
> >
> >
> > Floyd, you will need to provide more details about the specific 
> > problems you are encountering.
> >
> > I made a quick check, and have no difficulty opening and 
> inspecting an 
> > index I created a few minutes ago with Lucene.NET v2.3.1 using Luke 
> > v0.9.1.
> >
> > -- Neal
> >
> >
> > -----Original Message-----
> > From: Floyd Wu [mailto:floyd.wu@gmail.com]
> > Sent: Friday, January 09, 2009 8:18 PM
> > To: lucene-net-user@incubator.apache.org
> > Subject: Re: Lucene Scalability Options
> >
> > Hi all,
> > It seems new version of Luke is not compitable with Lucene.net and 
> > I've email to the creator of Luke. Below is feedback from him
> >
> > "Yes, there have been many changes,
> > but Lucene 2.4 can still open indexes built with earlier 
> versions of 
> > Lucene/Java.
> > This is the second report I've got about the possible 
> incompatibility 
> > with Lucene.Net - I suggest to raise up this issue on the Lucene 
> > mailing list ( java-dev@lucene.apache.org), and provide 
> more details, 
> > eg.
> > Lucene.Net revision, stack trace, a small sample index if you can."
> >
> > My original report as below
> > "The situation is Luke-0.9 can not open the index files 
> which built by 
> > Lucene.Net-2.3.1.
> > I tried to use older version of Luke and confirm Luke-0.8 and
> > Luke-0.8.1 can open and read index files fine.
> >  I wonder if there is any change between java Lucene 2.3 and 2.4.
> > Please help on this."
> >
> > Floyd
> >
> >
> >
> > 2009/1/9 George Aroush <ge...@aroush.net>
> >
> > > Hi Nitin,
> > >
> > > Any optimization that Luke can do on an index is also
> > doable by making
> > API
> > > calls from Lucene.Net.  If not, then there is either a bug in
> > Lucene.Net or
> > > in your use of the API.  Can you share with us your API
> > calls as well
> > as
> > > the
> > > Lucene.Net version you are using?
> > >
> > > Thanks.
> > >
> > > -- George
> > >
> > > > -----Original Message-----
> > > > From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
> > >  > Sent: Friday, January 09, 2009 6:27 AM
> > > > To: lucene-net-user@incubator.apache.org
> > > > Subject: RE: Lucene Scalability Options
> > > >
> > > > Thanks Hugh. Yes, I tried using Luke for index optimization.
> > > > Surprisingly, it has brought down the index size to ~20
> > GB with only
> > > > one CFS and segment files left behind. I used compound
> > optimization
> > > > option. But I use the similar "SetUseCompoundFile" property on 
> > > > "IndexModifier" object in my Lucene.NET code, but it has
> > no effect
> > > > on size or files after optimization. Any suggestions??
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Hugh Spiller [mailto:Hugh.Spiller@Renishaw.com]
> > > > Sent: Friday, January 09, 2009 3:35 PM
> > > > To: lucene-net-user@incubator.apache.org
> > > > Subject: RE: Lucene Scalability Options
> > > >
> > > > Hi Nitin,
> > > >
> > > > I've found the easiest way to get rid of redundant files
> > in an index
> > > > is to use Luke. As soon as you use it to open the index,
> > it tidies
> > > > up all the cruft.
> > > >
> > > > It's at http://www.getopt.org/luke/ .
> > > >
> > > > ________________________________
> > > >
> > > > Hugh Spiller
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
> > > > Sent: 09 January 2009 08:48
> > > > To: lucene-net-user@incubator.apache.org
> > > > Subject: RE: Lucene Scalability Options
> > > >
> > > > -- snip --
> > > >
> > > >
> > > > Any inputs on junk/redundant files in above list?
> > > >
> > > >
> > > >
> > > > --------------------------------------------------------------
> > > > ------------------------------------
> > > > This email and any attachments are confidential and are
> > for the use
> > > > of the addressee only. If you are not the addressee, 
> you must not 
> > > > use or disclose the contents to any other person. Please
> > immediately
> > > > notify the sender and delete the email. Statements and opinions 
> > > > expressed here may not represent those of the company. Email 
> > > > correspondence is monitored by the company. This
> > information may be
> > > > subject to Export Control Regulation. You are obliged to
> > comply with
> > > > such Regulations
> > > >
> > > > The parent company of the Renishaw Group is Renishaw plc,
> > registered
> > > > in England no. 1106260. Registered Office: New Mills, 
> > > > Wotton-under-Edge, Gloucestershire, GL12 8JR, United 
> Kingdom. Tel
> > > > +44 (0) 1453 524524
> > > > --------------------------------------------------------------
> > > > ------------------------------------
> > > >
> > >
> > >
> > This e-mail (and any attachments) is confidential and may contain 
> > personal views which are not the views of the BBC unless 
> specifically 
> > stated. If you have received it in error, please delete it 
> from your 
> > system. Do not use, copy or disclose the information in any way nor 
> > act in reliance on it and notify the sender immediately.
> >
> > Please note that the BBC monitors e-mails sent or received.
> > Further communication will signify your consent to this
> >
> > This e-mail has been sent by one of the following wholly-owned 
> > subsidiaries of the BBC:
> >
> > BBC Worldwide Limited, Registration Number: 1420028 England, 
> > Registered Address: BBC Media Centre, 201 Wood Lane, London,
> > W12 7TQ BBC World News Limited, Registration Number: 
> 04514407 England, 
> > Registered Address: Woodlands, BBC Media Centre, 201 Wood Lane, 
> > London, W12 7TQ BBC World Distribution Limited, 
> Registration Number: 
> > 04514408, Registered Address: Woodlands, BBC Media Centre, 201 Wood 
> > Lane, London, W12 7TQ
> >
> 


RE: Lucene Scalability Options

Posted by Nitin Shiralkar <ni...@coreobjects.com>.
Hi George,

Thanks. But the basic reason for junk files is optimize only. When you set compound index flag to true to have single segment file, then lucene tries to merge all segments and deletes the older ones. However if the older ones are being accessed in parallel, then delete operation fails. This is tracked in lucene through "deletable" and should be cleaned when we open index next time. However in some cases the files remain as unused and no longer referenced in lucene.

This is a rare scenario and files are created over a period of two years.

-----Original Message-----
From: George Aroush [mailto:george@aroush.net]
Sent: Tuesday, January 13, 2009 6:54 PM
To: lucene-net-user@incubator.apache.org
Subject: RE: Lucene Scalability Options

There is.  Call the Optimize() function on the index.

You should never delete index files manually unless if you know what you are
doing otherwise you can corrupt / destroy your index.

-- George

> -----Original Message-----
> From: Nic Wise [mailto:Nic.Wise@bbc.com]
> Sent: Tuesday, January 13, 2009 6:36 AM
> To: lucene-net-user@incubator.apache.org
> Subject: RE: Lucene Scalability Options
>
> I'm SURE there is a cleaner way, but in the past, we read the
> segments file (manually :( ), and any file which wasn't
> listed in there was considered to be a redundant file.
>
> Worked for us. There may be a way to ask a IndexReader which
> files it's using, and then extrapolate from there, but we
> were using Lucene.net 1.something, which didn't.
>
> I think that's what luke does. Opens the index, asks Lucene
> whats it's using, kills everything else.
>
> -----Original Message-----
> From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
> Sent: 13 January 2009 11:26
> To: lucene-net-user@incubator.apache.org
> Subject: RE: Lucene Scalability Options
>
> Hi All,
>
> I have started this thread for Lucene scalability aspect. I
> have an index with 80 GB size. However it looks like many of
> the segment files are either redundant or unused. Even if I
> delete them and just retain CFS, segments and deletable
> files, the index seems to be working fine.
> However I want to know more cleaner approach to identify such
> redundant/unused files through APIs. I am able to see these
> unused files in Luke as "Deletable". However I am not sure
> how Luke is able to identify unused files. I am using
> Lucene.NET 2.0 version.
>
> Can you please suggest some way?
>
>
>
> -----Original Message-----
> From: Granroth, Neal V. [mailto:neal.granroth@thermofisher.com]
> Sent: Tuesday, January 13, 2009 1:01 AM
> To: lucene-net-user@incubator.apache.org
> Subject: RE: Lucene Scalability Options
>
>
> Floyd, you will need to provide more details about the
> specific problems you are encountering.
>
> I made a quick check, and have no difficulty opening and
> inspecting an index I created a few minutes ago with
> Lucene.NET v2.3.1 using Luke v0.9.1.
>
> -- Neal
>
>
> -----Original Message-----
> From: Floyd Wu [mailto:floyd.wu@gmail.com]
> Sent: Friday, January 09, 2009 8:18 PM
> To: lucene-net-user@incubator.apache.org
> Subject: Re: Lucene Scalability Options
>
> Hi all,
> It seems new version of Luke is not compitable with
> Lucene.net and I've email to the creator of Luke. Below is
> feedback from him
>
> "Yes, there have been many changes,
> but Lucene 2.4 can still open indexes built with earlier
> versions of Lucene/Java.
> This is the second report I've got about the possible
> incompatibility with Lucene.Net - I suggest to raise up this
> issue on the Lucene mailing list (
> java-dev@lucene.apache.org), and provide more details, eg.
> Lucene.Net revision, stack trace, a small sample index if you can."
>
> My original report as below
> "The situation is Luke-0.9 can not open the index files which
> built by Lucene.Net-2.3.1.
> I tried to use older version of Luke and confirm Luke-0.8 and
> Luke-0.8.1 can open and read index files fine.
>  I wonder if there is any change between java Lucene 2.3 and 2.4.
> Please help on this."
>
> Floyd
>
>
>
> 2009/1/9 George Aroush <ge...@aroush.net>
>
> > Hi Nitin,
> >
> > Any optimization that Luke can do on an index is also
> doable by making
> API
> > calls from Lucene.Net.  If not, then there is either a bug in
> Lucene.Net or
> > in your use of the API.  Can you share with us your API
> calls as well
> as
> > the
> > Lucene.Net version you are using?
> >
> > Thanks.
> >
> > -- George
> >
> > > -----Original Message-----
> > > From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
> >  > Sent: Friday, January 09, 2009 6:27 AM
> > > To: lucene-net-user@incubator.apache.org
> > > Subject: RE: Lucene Scalability Options
> > >
> > > Thanks Hugh. Yes, I tried using Luke for index optimization.
> > > Surprisingly, it has brought down the index size to ~20
> GB with only
> > > one CFS and segment files left behind. I used compound
> optimization
> > > option. But I use the similar "SetUseCompoundFile" property on
> > > "IndexModifier" object in my Lucene.NET code, but it has
> no effect
> > > on size or files after optimization. Any suggestions??
> > >
> > >
> > > -----Original Message-----
> > > From: Hugh Spiller [mailto:Hugh.Spiller@Renishaw.com]
> > > Sent: Friday, January 09, 2009 3:35 PM
> > > To: lucene-net-user@incubator.apache.org
> > > Subject: RE: Lucene Scalability Options
> > >
> > > Hi Nitin,
> > >
> > > I've found the easiest way to get rid of redundant files
> in an index
> > > is to use Luke. As soon as you use it to open the index,
> it tidies
> > > up all the cruft.
> > >
> > > It's at http://www.getopt.org/luke/ .
> > >
> > > ________________________________
> > >
> > > Hugh Spiller
> > >
> > >
> > > -----Original Message-----
> > > From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
> > > Sent: 09 January 2009 08:48
> > > To: lucene-net-user@incubator.apache.org
> > > Subject: RE: Lucene Scalability Options
> > >
> > > -- snip --
> > >
> > >
> > > Any inputs on junk/redundant files in above list?
> > >
> > >
> > >
> > > --------------------------------------------------------------
> > > ------------------------------------
> > > This email and any attachments are confidential and are
> for the use
> > > of the addressee only. If you are not the addressee, you must not
> > > use or disclose the contents to any other person. Please
> immediately
> > > notify the sender and delete the email. Statements and opinions
> > > expressed here may not represent those of the company. Email
> > > correspondence is monitored by the company. This
> information may be
> > > subject to Export Control Regulation. You are obliged to
> comply with
> > > such Regulations
> > >
> > > The parent company of the Renishaw Group is Renishaw plc,
> registered
> > > in England no. 1106260. Registered Office: New Mills,
> > > Wotton-under-Edge, Gloucestershire, GL12 8JR, United Kingdom. Tel
> > > +44 (0) 1453 524524
> > > --------------------------------------------------------------
> > > ------------------------------------
> > >
> >
> >
> This e-mail (and any attachments) is confidential and may
> contain personal views which are not the views of the BBC
> unless specifically stated. If you have received it in error,
> please delete it from your system. Do not use, copy or
> disclose the information in any way nor act in reliance on it
> and notify the sender immediately.
>
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this
>
> This e-mail has been sent by one of the following
> wholly-owned subsidiaries of the BBC:
>
> BBC Worldwide Limited, Registration Number: 1420028 England,
> Registered Address: BBC Media Centre, 201 Wood Lane, London,
> W12 7TQ BBC World News Limited, Registration Number: 04514407
> England, Registered Address: Woodlands, BBC Media Centre, 201
> Wood Lane, London, W12 7TQ BBC World Distribution Limited,
> Registration Number: 04514408, Registered Address: Woodlands,
> BBC Media Centre, 201 Wood Lane, London, W12 7TQ
>


RE: Lucene Scalability Options

Posted by George Aroush <ge...@aroush.net>.
There is.  Call the Optimize() function on the index.

You should never delete index files manually unless if you know what you are
doing otherwise you can corrupt / destroy your index.

-- George 

> -----Original Message-----
> From: Nic Wise [mailto:Nic.Wise@bbc.com] 
> Sent: Tuesday, January 13, 2009 6:36 AM
> To: lucene-net-user@incubator.apache.org
> Subject: RE: Lucene Scalability Options
> 
> I'm SURE there is a cleaner way, but in the past, we read the 
> segments file (manually :( ), and any file which wasn't 
> listed in there was considered to be a redundant file.
> 
> Worked for us. There may be a way to ask a IndexReader which 
> files it's using, and then extrapolate from there, but we 
> were using Lucene.net 1.something, which didn't.
> 
> I think that's what luke does. Opens the index, asks Lucene 
> whats it's using, kills everything else.
> 
> -----Original Message-----
> From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
> Sent: 13 January 2009 11:26
> To: lucene-net-user@incubator.apache.org
> Subject: RE: Lucene Scalability Options
> 
> Hi All,
> 
> I have started this thread for Lucene scalability aspect. I 
> have an index with 80 GB size. However it looks like many of 
> the segment files are either redundant or unused. Even if I 
> delete them and just retain CFS, segments and deletable 
> files, the index seems to be working fine.
> However I want to know more cleaner approach to identify such 
> redundant/unused files through APIs. I am able to see these 
> unused files in Luke as "Deletable". However I am not sure 
> how Luke is able to identify unused files. I am using 
> Lucene.NET 2.0 version.
> 
> Can you please suggest some way?
> 
> 
> 
> -----Original Message-----
> From: Granroth, Neal V. [mailto:neal.granroth@thermofisher.com]
> Sent: Tuesday, January 13, 2009 1:01 AM
> To: lucene-net-user@incubator.apache.org
> Subject: RE: Lucene Scalability Options
> 
> 
> Floyd, you will need to provide more details about the 
> specific problems you are encountering.
> 
> I made a quick check, and have no difficulty opening and 
> inspecting an index I created a few minutes ago with 
> Lucene.NET v2.3.1 using Luke v0.9.1.
> 
> -- Neal
> 
> 
> -----Original Message-----
> From: Floyd Wu [mailto:floyd.wu@gmail.com]
> Sent: Friday, January 09, 2009 8:18 PM
> To: lucene-net-user@incubator.apache.org
> Subject: Re: Lucene Scalability Options
> 
> Hi all,
> It seems new version of Luke is not compitable with 
> Lucene.net and I've email to the creator of Luke. Below is 
> feedback from him
> 
> "Yes, there have been many changes,
> but Lucene 2.4 can still open indexes built with earlier 
> versions of Lucene/Java.
> This is the second report I've got about the possible 
> incompatibility with Lucene.Net - I suggest to raise up this 
> issue on the Lucene mailing list ( 
> java-dev@lucene.apache.org), and provide more details, eg. 
> Lucene.Net revision, stack trace, a small sample index if you can."
> 
> My original report as below
> "The situation is Luke-0.9 can not open the index files which 
> built by Lucene.Net-2.3.1.
> I tried to use older version of Luke and confirm Luke-0.8 and 
> Luke-0.8.1 can open and read index files fine.
>  I wonder if there is any change between java Lucene 2.3 and 2.4.
> Please help on this."
> 
> Floyd
> 
> 
> 
> 2009/1/9 George Aroush <ge...@aroush.net>
> 
> > Hi Nitin,
> >
> > Any optimization that Luke can do on an index is also 
> doable by making
> API
> > calls from Lucene.Net.  If not, then there is either a bug in
> Lucene.Net or
> > in your use of the API.  Can you share with us your API 
> calls as well
> as
> > the
> > Lucene.Net version you are using?
> >
> > Thanks.
> >
> > -- George
> >
> > > -----Original Message-----
> > > From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
> >  > Sent: Friday, January 09, 2009 6:27 AM
> > > To: lucene-net-user@incubator.apache.org
> > > Subject: RE: Lucene Scalability Options
> > >
> > > Thanks Hugh. Yes, I tried using Luke for index optimization.
> > > Surprisingly, it has brought down the index size to ~20 
> GB with only 
> > > one CFS and segment files left behind. I used compound 
> optimization 
> > > option. But I use the similar "SetUseCompoundFile" property on 
> > > "IndexModifier" object in my Lucene.NET code, but it has 
> no effect 
> > > on size or files after optimization. Any suggestions??
> > >
> > >
> > > -----Original Message-----
> > > From: Hugh Spiller [mailto:Hugh.Spiller@Renishaw.com]
> > > Sent: Friday, January 09, 2009 3:35 PM
> > > To: lucene-net-user@incubator.apache.org
> > > Subject: RE: Lucene Scalability Options
> > >
> > > Hi Nitin,
> > >
> > > I've found the easiest way to get rid of redundant files 
> in an index 
> > > is to use Luke. As soon as you use it to open the index, 
> it tidies 
> > > up all the cruft.
> > >
> > > It's at http://www.getopt.org/luke/ .
> > >
> > > ________________________________
> > >
> > > Hugh Spiller
> > >
> > >
> > > -----Original Message-----
> > > From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
> > > Sent: 09 January 2009 08:48
> > > To: lucene-net-user@incubator.apache.org
> > > Subject: RE: Lucene Scalability Options
> > >
> > > -- snip --
> > >
> > >
> > > Any inputs on junk/redundant files in above list?
> > >
> > >
> > >
> > > --------------------------------------------------------------
> > > ------------------------------------
> > > This email and any attachments are confidential and are 
> for the use 
> > > of the addressee only. If you are not the addressee, you must not 
> > > use or disclose the contents to any other person. Please 
> immediately 
> > > notify the sender and delete the email. Statements and opinions 
> > > expressed here may not represent those of the company. Email 
> > > correspondence is monitored by the company. This 
> information may be 
> > > subject to Export Control Regulation. You are obliged to 
> comply with 
> > > such Regulations
> > >
> > > The parent company of the Renishaw Group is Renishaw plc, 
> registered 
> > > in England no. 1106260. Registered Office: New Mills, 
> > > Wotton-under-Edge, Gloucestershire, GL12 8JR, United Kingdom. Tel 
> > > +44 (0) 1453 524524
> > > --------------------------------------------------------------
> > > ------------------------------------
> > >
> >
> > 
> This e-mail (and any attachments) is confidential and may 
> contain personal views which are not the views of the BBC 
> unless specifically stated. If you have received it in error, 
> please delete it from your system. Do not use, copy or 
> disclose the information in any way nor act in reliance on it 
> and notify the sender immediately.
>  
> Please note that the BBC monitors e-mails sent or received. 
> Further communication will signify your consent to this
> 
> This e-mail has been sent by one of the following 
> wholly-owned subsidiaries of the BBC:
>  
> BBC Worldwide Limited, Registration Number: 1420028 England, 
> Registered Address: BBC Media Centre, 201 Wood Lane, London, 
> W12 7TQ BBC World News Limited, Registration Number: 04514407 
> England, Registered Address: Woodlands, BBC Media Centre, 201 
> Wood Lane, London, W12 7TQ BBC World Distribution Limited, 
> Registration Number: 04514408, Registered Address: Woodlands, 
> BBC Media Centre, 201 Wood Lane, London, W12 7TQ
> 


RE: Lucene Scalability Options

Posted by Nic Wise <Ni...@bbc.com>.
I'm SURE there is a cleaner way, but in the past, we read the segments
file (manually :( ), and any file which wasn't listed in there was
considered to be a redundant file.

Worked for us. There may be a way to ask a IndexReader which files it's
using, and then extrapolate from there, but we were using Lucene.net
1.something, which didn't.

I think that's what luke does. Opens the index, asks Lucene whats it's
using, kills everything else.

-----Original Message-----
From: Nitin Shiralkar [mailto:nitins@coreobjects.com] 
Sent: 13 January 2009 11:26
To: lucene-net-user@incubator.apache.org
Subject: RE: Lucene Scalability Options

Hi All,

I have started this thread for Lucene scalability aspect. I have an
index with 80 GB size. However it looks like many of the segment files
are either redundant or unused. Even if I delete them and just retain
CFS, segments and deletable files, the index seems to be working fine.
However I want to know more cleaner approach to identify such
redundant/unused files through APIs. I am able to see these unused files
in Luke as "Deletable". However I am not sure how Luke is able to
identify unused files. I am using Lucene.NET 2.0 version.

Can you please suggest some way?



-----Original Message-----
From: Granroth, Neal V. [mailto:neal.granroth@thermofisher.com]
Sent: Tuesday, January 13, 2009 1:01 AM
To: lucene-net-user@incubator.apache.org
Subject: RE: Lucene Scalability Options


Floyd, you will need to provide more details about the specific problems
you are encountering.

I made a quick check, and have no difficulty opening and inspecting an
index I created a few minutes ago with Lucene.NET v2.3.1 using Luke
v0.9.1.

-- Neal


-----Original Message-----
From: Floyd Wu [mailto:floyd.wu@gmail.com]
Sent: Friday, January 09, 2009 8:18 PM
To: lucene-net-user@incubator.apache.org
Subject: Re: Lucene Scalability Options

Hi all,
It seems new version of Luke is not compitable with Lucene.net and I've
email to the creator of Luke. Below is feedback from him

"Yes, there have been many changes,
but Lucene 2.4 can still open indexes built with earlier versions of
Lucene/Java.
This is the second report I've got about the possible incompatibility
with
Lucene.Net -
I suggest to raise up this issue on the Lucene mailing list (
java-dev@lucene.apache.org),
and provide more details,
eg. Lucene.Net revision, stack trace, a small sample index if you can."

My original report as below
"The situation is Luke-0.9 can not open the index files which built by
Lucene.Net-2.3.1.
I tried to use older version of Luke and confirm Luke-0.8 and Luke-0.8.1
can
open and read index files fine.
 I wonder if there is any change between java Lucene 2.3 and 2.4.
Please help on this."

Floyd



2009/1/9 George Aroush <ge...@aroush.net>

> Hi Nitin,
>
> Any optimization that Luke can do on an index is also doable by making
API
> calls from Lucene.Net.  If not, then there is either a bug in
Lucene.Net or
> in your use of the API.  Can you share with us your API calls as well
as
> the
> Lucene.Net version you are using?
>
> Thanks.
>
> -- George
>
> > -----Original Message-----
> > From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
>  > Sent: Friday, January 09, 2009 6:27 AM
> > To: lucene-net-user@incubator.apache.org
> > Subject: RE: Lucene Scalability Options
> >
> > Thanks Hugh. Yes, I tried using Luke for index optimization.
> > Surprisingly, it has brought down the index size to ~20 GB
> > with only one CFS and segment files left behind. I used
> > compound optimization option. But I use the similar
> > "SetUseCompoundFile" property on "IndexModifier" object in my
> > Lucene.NET code, but it has no effect on size or files after
> > optimization. Any suggestions??
> >
> >
> > -----Original Message-----
> > From: Hugh Spiller [mailto:Hugh.Spiller@Renishaw.com]
> > Sent: Friday, January 09, 2009 3:35 PM
> > To: lucene-net-user@incubator.apache.org
> > Subject: RE: Lucene Scalability Options
> >
> > Hi Nitin,
> >
> > I've found the easiest way to get rid of redundant files in
> > an index is to use Luke. As soon as you use it to open the
> > index, it tidies up all the cruft.
> >
> > It's at http://www.getopt.org/luke/ .
> >
> > ________________________________
> >
> > Hugh Spiller
> >
> >
> > -----Original Message-----
> > From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
> > Sent: 09 January 2009 08:48
> > To: lucene-net-user@incubator.apache.org
> > Subject: RE: Lucene Scalability Options
> >
> > -- snip --
> >
> >
> > Any inputs on junk/redundant files in above list?
> >
> >
> >
> > --------------------------------------------------------------
> > ------------------------------------
> > This email and any attachments are confidential and are for
> > the use of the addressee only. If you are not the addressee,
> > you must not use or disclose the contents to any other
> > person. Please immediately notify the sender and delete the
> > email. Statements and opinions expressed here may not
> > represent those of the company. Email correspondence is
> > monitored by the company. This information may be subject to
> > Export Control Regulation. You are obliged to comply with
> > such Regulations
> >
> > The parent company of the Renishaw Group is Renishaw plc,
> > registered in England no. 1106260. Registered Office: New
> > Mills, Wotton-under-Edge, Gloucestershire, GL12 8JR, United
> > Kingdom. Tel +44 (0) 1453 524524
> > --------------------------------------------------------------
> > ------------------------------------
> >
>
> 
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
 
Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this

This e-mail has been sent by one of the following wholly-owned subsidiaries of the BBC:
 
BBC Worldwide Limited, Registration Number: 1420028 England, Registered Address: BBC Media Centre, 201 Wood Lane, London, W12 7TQ
BBC World News Limited, Registration Number: 04514407 England, Registered Address: Woodlands, BBC Media Centre, 201 Wood Lane, London, W12 7TQ
BBC World Distribution Limited, Registration Number: 04514408, Registered Address: Woodlands, BBC Media Centre, 201 Wood Lane, London, W12 7TQ

RE: Lucene Scalability Options

Posted by Nitin Shiralkar <ni...@coreobjects.com>.
Hi All,

I have started this thread for Lucene scalability aspect. I have an index with 80 GB size. However it looks like many of the segment files are either redundant or unused. Even if I delete them and just retain CFS, segments and deletable files, the index seems to be working fine. However I want to know more cleaner approach to identify such redundant/unused files through APIs. I am able to see these unused files in Luke as "Deletable". However I am not sure how Luke is able to identify unused files. I am using Lucene.NET 2.0 version.

Can you please suggest some way?



-----Original Message-----
From: Granroth, Neal V. [mailto:neal.granroth@thermofisher.com]
Sent: Tuesday, January 13, 2009 1:01 AM
To: lucene-net-user@incubator.apache.org
Subject: RE: Lucene Scalability Options


Floyd, you will need to provide more details about the specific problems you are encountering.

I made a quick check, and have no difficulty opening and inspecting an index I created a few minutes ago with Lucene.NET v2.3.1 using Luke v0.9.1.

-- Neal


-----Original Message-----
From: Floyd Wu [mailto:floyd.wu@gmail.com]
Sent: Friday, January 09, 2009 8:18 PM
To: lucene-net-user@incubator.apache.org
Subject: Re: Lucene Scalability Options

Hi all,
It seems new version of Luke is not compitable with Lucene.net and I've
email to the creator of Luke. Below is feedback from him

"Yes, there have been many changes,
but Lucene 2.4 can still open indexes built with earlier versions of
Lucene/Java.
This is the second report I've got about the possible incompatibility with
Lucene.Net -
I suggest to raise up this issue on the Lucene mailing list (
java-dev@lucene.apache.org),
and provide more details,
eg. Lucene.Net revision, stack trace, a small sample index if you can."

My original report as below
"The situation is Luke-0.9 can not open the index files which built by
Lucene.Net-2.3.1.
I tried to use older version of Luke and confirm Luke-0.8 and Luke-0.8.1 can
open and read index files fine.
 I wonder if there is any change between java Lucene 2.3 and 2.4.
Please help on this."

Floyd



2009/1/9 George Aroush <ge...@aroush.net>

> Hi Nitin,
>
> Any optimization that Luke can do on an index is also doable by making API
> calls from Lucene.Net.  If not, then there is either a bug in Lucene.Net or
> in your use of the API.  Can you share with us your API calls as well as
> the
> Lucene.Net version you are using?
>
> Thanks.
>
> -- George
>
> > -----Original Message-----
> > From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
>  > Sent: Friday, January 09, 2009 6:27 AM
> > To: lucene-net-user@incubator.apache.org
> > Subject: RE: Lucene Scalability Options
> >
> > Thanks Hugh. Yes, I tried using Luke for index optimization.
> > Surprisingly, it has brought down the index size to ~20 GB
> > with only one CFS and segment files left behind. I used
> > compound optimization option. But I use the similar
> > "SetUseCompoundFile" property on "IndexModifier" object in my
> > Lucene.NET code, but it has no effect on size or files after
> > optimization. Any suggestions??
> >
> >
> > -----Original Message-----
> > From: Hugh Spiller [mailto:Hugh.Spiller@Renishaw.com]
> > Sent: Friday, January 09, 2009 3:35 PM
> > To: lucene-net-user@incubator.apache.org
> > Subject: RE: Lucene Scalability Options
> >
> > Hi Nitin,
> >
> > I've found the easiest way to get rid of redundant files in
> > an index is to use Luke. As soon as you use it to open the
> > index, it tidies up all the cruft.
> >
> > It's at http://www.getopt.org/luke/ .
> >
> > ________________________________
> >
> > Hugh Spiller
> >
> >
> > -----Original Message-----
> > From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
> > Sent: 09 January 2009 08:48
> > To: lucene-net-user@incubator.apache.org
> > Subject: RE: Lucene Scalability Options
> >
> > -- snip --
> >
> >
> > Any inputs on junk/redundant files in above list?
> >
> >
> >
> > --------------------------------------------------------------
> > ------------------------------------
> > This email and any attachments are confidential and are for
> > the use of the addressee only. If you are not the addressee,
> > you must not use or disclose the contents to any other
> > person. Please immediately notify the sender and delete the
> > email. Statements and opinions expressed here may not
> > represent those of the company. Email correspondence is
> > monitored by the company. This information may be subject to
> > Export Control Regulation. You are obliged to comply with
> > such Regulations
> >
> > The parent company of the Renishaw Group is Renishaw plc,
> > registered in England no. 1106260. Registered Office: New
> > Mills, Wotton-under-Edge, Gloucestershire, GL12 8JR, United
> > Kingdom. Tel +44 (0) 1453 524524
> > --------------------------------------------------------------
> > ------------------------------------
> >
>
>

RE: Lucene Scalability Options

Posted by George Aroush <ge...@aroush.net>.
What version of Lucene.Net did you use to create the index?  Was it created
with an earlier version of Lucene.Net and subsequently newer version of
Lucene.Net is using?  Is the index small enough that you can share it with
us for debugging?  If not, can you re-index the data (totally new index) and
try again?

-- George

> -----Original Message-----
> From: Floyd Wu [mailto:floyd.wu@gmail.com] 
> Sent: Tuesday, January 13, 2009 9:18 PM
> To: lucene-net-user@incubator.apache.org
> Subject: Re: Lucene Scalability Options
> 
> Well the situatio is Luke 0.9.1 will show me a message "read 
> pass EOF" when I opened index files made by Lucene.Net 2.3.1. 
> But Luke 0.8 can smoothly do that for me.
> 
> 
> 
> 
> 2009/1/13 Granroth, Neal V. <ne...@thermofisher.com>
> 
> >
> > Floyd, you will need to provide more details about the specific 
> > problems you are encountering.
> >
> > I made a quick check, and have no difficulty opening and 
> inspecting an 
> > index I created a few minutes ago with Lucene.NET v2.3.1 
> using Luke v0.9.1.
> >
> > -- Neal
> >
> >
> > -----Original Message-----
> > From: Floyd Wu [mailto:floyd.wu@gmail.com]
> > Sent: Friday, January 09, 2009 8:18 PM
> > To: lucene-net-user@incubator.apache.org
> >  Subject: Re: Lucene Scalability Options
> >
> > Hi all,
> > It seems new version of Luke is not compitable with Lucene.net and 
> > I've email to the creator of Luke. Below is feedback from him
> >
> > "Yes, there have been many changes,
> > but Lucene 2.4 can still open indexes built with earlier 
> versions of 
> > Lucene/Java.
> > This is the second report I've got about the possible 
> incompatibility 
> > with Lucene.Net - I suggest to raise up this issue on the Lucene 
> > mailing list ( java-dev@lucene.apache.org), and provide 
> more details, 
> > eg. Lucene.Net revision, stack trace, a small sample index 
> if you can."
> >
> > My original report as below
> > "The situation is Luke-0.9 can not open the index files 
> which built by 
> > Lucene.Net-2.3.1.
> > I tried to use older version of Luke and confirm Luke-0.8 and 
> > Luke-0.8.1 can open and read index files fine.
> >  I wonder if there is any change between java Lucene 2.3 and 2.4.
> > Please help on this."
> >
> > Floyd
> >
> >
> >
> > 2009/1/9 George Aroush <ge...@aroush.net>
> >
> > > Hi Nitin,
> > >
> > > Any optimization that Luke can do on an index is also doable by 
> > > making
> > API
> > > calls from Lucene.Net.  If not, then there is either a bug in 
> > > Lucene.Net
> > or
> > > in your use of the API.  Can you share with us your API calls as 
> > > well as the Lucene.Net version you are using?
> > >
> > > Thanks.
> > >
> > > -- George
> > >
> > > > -----Original Message-----
> > > > From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
> > >  > Sent: Friday, January 09, 2009 6:27 AM
> > > > To: lucene-net-user@incubator.apache.org
> > > > Subject: RE: Lucene Scalability Options
> > > >
> > > > Thanks Hugh. Yes, I tried using Luke for index optimization.
> > > > Surprisingly, it has brought down the index size to ~20 GB with 
> > > > only one CFS and segment files left behind. I used compound 
> > > > optimization option. But I use the similar "SetUseCompoundFile" 
> > > > property on "IndexModifier" object in my Lucene.NET 
> code, but it 
> > > > has no effect on size or files after optimization. Any 
> > > > suggestions??
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Hugh Spiller [mailto:Hugh.Spiller@Renishaw.com]
> > > > Sent: Friday, January 09, 2009 3:35 PM
> > > > To: lucene-net-user@incubator.apache.org
> > > > Subject: RE: Lucene Scalability Options
> > > >
> > > > Hi Nitin,
> > > >
> > > > I've found the easiest way to get rid of redundant files in an 
> > > > index is to use Luke. As soon as you use it to open the 
> index, it 
> > > > tidies up all the cruft.
> > > >
> > > > It's at http://www.getopt.org/luke/ .
> > > >
> > > > ________________________________
> > > >
> > > > Hugh Spiller
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
> > > > Sent: 09 January 2009 08:48
> > > > To: lucene-net-user@incubator.apache.org
> > > > Subject: RE: Lucene Scalability Options
> > > >
> > > > -- snip --
> > > >
> > > >
> > > > Any inputs on junk/redundant files in above list?
> > > >
> > > >
> > > >
> > > > --------------------------------------------------------------
> > > > ------------------------------------
> > > > This email and any attachments are confidential and are for the 
> > > > use of the addressee only. If you are not the 
> addressee, you must 
> > > > not use or disclose the contents to any other person. Please 
> > > > immediately notify the sender and delete the email. 
> Statements and 
> > > > opinions expressed here may not represent those of the company. 
> > > > Email correspondence is monitored by the company. This 
> information 
> > > > may be subject to Export Control Regulation. You are obliged to 
> > > > comply with such Regulations
> > > >
> > > > The parent company of the Renishaw Group is Renishaw plc, 
> > > > registered in England no. 1106260. Registered Office: 
> New Mills, 
> > > > Wotton-under-Edge, Gloucestershire, GL12 8JR, United 
> Kingdom. Tel 
> > > > +44 (0) 1453 524524
> > > > --------------------------------------------------------------
> > > > ------------------------------------
> > > >
> > >
> > >
> >
> 


Re: Lucene Scalability Options

Posted by Floyd Wu <fl...@gmail.com>.
Well the situatio is Luke 0.9.1 will show me a message "read pass EOF" when
I opened index files made by Lucene.Net 2.3.1. But Luke 0.8 can smoothly do
that for me.




2009/1/13 Granroth, Neal V. <ne...@thermofisher.com>

>
> Floyd, you will need to provide more details about the specific problems
> you are encountering.
>
> I made a quick check, and have no difficulty opening and inspecting an
> index I created a few minutes ago with Lucene.NET v2.3.1 using Luke v0.9.1.
>
> -- Neal
>
>
> -----Original Message-----
> From: Floyd Wu [mailto:floyd.wu@gmail.com]
> Sent: Friday, January 09, 2009 8:18 PM
> To: lucene-net-user@incubator.apache.org
>  Subject: Re: Lucene Scalability Options
>
> Hi all,
> It seems new version of Luke is not compitable with Lucene.net and I've
> email to the creator of Luke. Below is feedback from him
>
> "Yes, there have been many changes,
> but Lucene 2.4 can still open indexes built with earlier versions of
> Lucene/Java.
> This is the second report I've got about the possible incompatibility with
> Lucene.Net -
> I suggest to raise up this issue on the Lucene mailing list (
> java-dev@lucene.apache.org),
> and provide more details,
> eg. Lucene.Net revision, stack trace, a small sample index if you can."
>
> My original report as below
> "The situation is Luke-0.9 can not open the index files which built by
> Lucene.Net-2.3.1.
> I tried to use older version of Luke and confirm Luke-0.8 and Luke-0.8.1
> can
> open and read index files fine.
>  I wonder if there is any change between java Lucene 2.3 and 2.4.
> Please help on this."
>
> Floyd
>
>
>
> 2009/1/9 George Aroush <ge...@aroush.net>
>
> > Hi Nitin,
> >
> > Any optimization that Luke can do on an index is also doable by making
> API
> > calls from Lucene.Net.  If not, then there is either a bug in Lucene.Net
> or
> > in your use of the API.  Can you share with us your API calls as well as
> > the
> > Lucene.Net version you are using?
> >
> > Thanks.
> >
> > -- George
> >
> > > -----Original Message-----
> > > From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
> >  > Sent: Friday, January 09, 2009 6:27 AM
> > > To: lucene-net-user@incubator.apache.org
> > > Subject: RE: Lucene Scalability Options
> > >
> > > Thanks Hugh. Yes, I tried using Luke for index optimization.
> > > Surprisingly, it has brought down the index size to ~20 GB
> > > with only one CFS and segment files left behind. I used
> > > compound optimization option. But I use the similar
> > > "SetUseCompoundFile" property on "IndexModifier" object in my
> > > Lucene.NET code, but it has no effect on size or files after
> > > optimization. Any suggestions??
> > >
> > >
> > > -----Original Message-----
> > > From: Hugh Spiller [mailto:Hugh.Spiller@Renishaw.com]
> > > Sent: Friday, January 09, 2009 3:35 PM
> > > To: lucene-net-user@incubator.apache.org
> > > Subject: RE: Lucene Scalability Options
> > >
> > > Hi Nitin,
> > >
> > > I've found the easiest way to get rid of redundant files in
> > > an index is to use Luke. As soon as you use it to open the
> > > index, it tidies up all the cruft.
> > >
> > > It's at http://www.getopt.org/luke/ .
> > >
> > > ________________________________
> > >
> > > Hugh Spiller
> > >
> > >
> > > -----Original Message-----
> > > From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
> > > Sent: 09 January 2009 08:48
> > > To: lucene-net-user@incubator.apache.org
> > > Subject: RE: Lucene Scalability Options
> > >
> > > -- snip --
> > >
> > >
> > > Any inputs on junk/redundant files in above list?
> > >
> > >
> > >
> > > --------------------------------------------------------------
> > > ------------------------------------
> > > This email and any attachments are confidential and are for
> > > the use of the addressee only. If you are not the addressee,
> > > you must not use or disclose the contents to any other
> > > person. Please immediately notify the sender and delete the
> > > email. Statements and opinions expressed here may not
> > > represent those of the company. Email correspondence is
> > > monitored by the company. This information may be subject to
> > > Export Control Regulation. You are obliged to comply with
> > > such Regulations
> > >
> > > The parent company of the Renishaw Group is Renishaw plc,
> > > registered in England no. 1106260. Registered Office: New
> > > Mills, Wotton-under-Edge, Gloucestershire, GL12 8JR, United
> > > Kingdom. Tel +44 (0) 1453 524524
> > > --------------------------------------------------------------
> > > ------------------------------------
> > >
> >
> >
>

RE: Lucene Scalability Options

Posted by "Granroth, Neal V." <ne...@thermofisher.com>.
Floyd, you will need to provide more details about the specific problems you are encountering.

I made a quick check, and have no difficulty opening and inspecting an index I created a few minutes ago with Lucene.NET v2.3.1 using Luke v0.9.1.

-- Neal


-----Original Message-----
From: Floyd Wu [mailto:floyd.wu@gmail.com]
Sent: Friday, January 09, 2009 8:18 PM
To: lucene-net-user@incubator.apache.org
Subject: Re: Lucene Scalability Options

Hi all,
It seems new version of Luke is not compitable with Lucene.net and I've
email to the creator of Luke. Below is feedback from him

"Yes, there have been many changes,
but Lucene 2.4 can still open indexes built with earlier versions of
Lucene/Java.
This is the second report I've got about the possible incompatibility with
Lucene.Net -
I suggest to raise up this issue on the Lucene mailing list (
java-dev@lucene.apache.org),
and provide more details,
eg. Lucene.Net revision, stack trace, a small sample index if you can."

My original report as below
"The situation is Luke-0.9 can not open the index files which built by
Lucene.Net-2.3.1.
I tried to use older version of Luke and confirm Luke-0.8 and Luke-0.8.1 can
open and read index files fine.
 I wonder if there is any change between java Lucene 2.3 and 2.4.
Please help on this."

Floyd



2009/1/9 George Aroush <ge...@aroush.net>

> Hi Nitin,
>
> Any optimization that Luke can do on an index is also doable by making API
> calls from Lucene.Net.  If not, then there is either a bug in Lucene.Net or
> in your use of the API.  Can you share with us your API calls as well as
> the
> Lucene.Net version you are using?
>
> Thanks.
>
> -- George
>
> > -----Original Message-----
> > From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
>  > Sent: Friday, January 09, 2009 6:27 AM
> > To: lucene-net-user@incubator.apache.org
> > Subject: RE: Lucene Scalability Options
> >
> > Thanks Hugh. Yes, I tried using Luke for index optimization.
> > Surprisingly, it has brought down the index size to ~20 GB
> > with only one CFS and segment files left behind. I used
> > compound optimization option. But I use the similar
> > "SetUseCompoundFile" property on "IndexModifier" object in my
> > Lucene.NET code, but it has no effect on size or files after
> > optimization. Any suggestions??
> >
> >
> > -----Original Message-----
> > From: Hugh Spiller [mailto:Hugh.Spiller@Renishaw.com]
> > Sent: Friday, January 09, 2009 3:35 PM
> > To: lucene-net-user@incubator.apache.org
> > Subject: RE: Lucene Scalability Options
> >
> > Hi Nitin,
> >
> > I've found the easiest way to get rid of redundant files in
> > an index is to use Luke. As soon as you use it to open the
> > index, it tidies up all the cruft.
> >
> > It's at http://www.getopt.org/luke/ .
> >
> > ________________________________
> >
> > Hugh Spiller
> >
> >
> > -----Original Message-----
> > From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
> > Sent: 09 January 2009 08:48
> > To: lucene-net-user@incubator.apache.org
> > Subject: RE: Lucene Scalability Options
> >
> > -- snip --
> >
> >
> > Any inputs on junk/redundant files in above list?
> >
> >
> >
> > --------------------------------------------------------------
> > ------------------------------------
> > This email and any attachments are confidential and are for
> > the use of the addressee only. If you are not the addressee,
> > you must not use or disclose the contents to any other
> > person. Please immediately notify the sender and delete the
> > email. Statements and opinions expressed here may not
> > represent those of the company. Email correspondence is
> > monitored by the company. This information may be subject to
> > Export Control Regulation. You are obliged to comply with
> > such Regulations
> >
> > The parent company of the Renishaw Group is Renishaw plc,
> > registered in England no. 1106260. Registered Office: New
> > Mills, Wotton-under-Edge, Gloucestershire, GL12 8JR, United
> > Kingdom. Tel +44 (0) 1453 524524
> > --------------------------------------------------------------
> > ------------------------------------
> >
>
>

Re: Lucene Scalability Options

Posted by Floyd Wu <fl...@gmail.com>.
Hi all,
It seems new version of Luke is not compitable with Lucene.net and I've
email to the creator of Luke. Below is feedback from him

"Yes, there have been many changes,
but Lucene 2.4 can still open indexes built with earlier versions of
Lucene/Java.
This is the second report I've got about the possible incompatibility with
Lucene.Net -
I suggest to raise up this issue on the Lucene mailing list (
java-dev@lucene.apache.org),
and provide more details,
eg. Lucene.Net revision, stack trace, a small sample index if you can."

My original report as below
"The situation is Luke-0.9 can not open the index files which built by
Lucene.Net-2.3.1.
I tried to use older version of Luke and confirm Luke-0.8 and Luke-0.8.1 can
open and read index files fine.
 I wonder if there is any change between java Lucene 2.3 and 2.4.
Please help on this."

Floyd



2009/1/9 George Aroush <ge...@aroush.net>

> Hi Nitin,
>
> Any optimization that Luke can do on an index is also doable by making API
> calls from Lucene.Net.  If not, then there is either a bug in Lucene.Net or
> in your use of the API.  Can you share with us your API calls as well as
> the
> Lucene.Net version you are using?
>
> Thanks.
>
> -- George
>
> > -----Original Message-----
> > From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
>  > Sent: Friday, January 09, 2009 6:27 AM
> > To: lucene-net-user@incubator.apache.org
> > Subject: RE: Lucene Scalability Options
> >
> > Thanks Hugh. Yes, I tried using Luke for index optimization.
> > Surprisingly, it has brought down the index size to ~20 GB
> > with only one CFS and segment files left behind. I used
> > compound optimization option. But I use the similar
> > "SetUseCompoundFile" property on "IndexModifier" object in my
> > Lucene.NET code, but it has no effect on size or files after
> > optimization. Any suggestions??
> >
> >
> > -----Original Message-----
> > From: Hugh Spiller [mailto:Hugh.Spiller@Renishaw.com]
> > Sent: Friday, January 09, 2009 3:35 PM
> > To: lucene-net-user@incubator.apache.org
> > Subject: RE: Lucene Scalability Options
> >
> > Hi Nitin,
> >
> > I've found the easiest way to get rid of redundant files in
> > an index is to use Luke. As soon as you use it to open the
> > index, it tidies up all the cruft.
> >
> > It's at http://www.getopt.org/luke/ .
> >
> > ________________________________
> >
> > Hugh Spiller
> >
> >
> > -----Original Message-----
> > From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
> > Sent: 09 January 2009 08:48
> > To: lucene-net-user@incubator.apache.org
> > Subject: RE: Lucene Scalability Options
> >
> > -- snip --
> >
> >
> > Any inputs on junk/redundant files in above list?
> >
> >
> >
> > --------------------------------------------------------------
> > ------------------------------------
> > This email and any attachments are confidential and are for
> > the use of the addressee only. If you are not the addressee,
> > you must not use or disclose the contents to any other
> > person. Please immediately notify the sender and delete the
> > email. Statements and opinions expressed here may not
> > represent those of the company. Email correspondence is
> > monitored by the company. This information may be subject to
> > Export Control Regulation. You are obliged to comply with
> > such Regulations
> >
> > The parent company of the Renishaw Group is Renishaw plc,
> > registered in England no. 1106260. Registered Office: New
> > Mills, Wotton-under-Edge, Gloucestershire, GL12 8JR, United
> > Kingdom. Tel +44 (0) 1453 524524
> > --------------------------------------------------------------
> > ------------------------------------
> >
>
>

RE: Lucene Scalability Options

Posted by George Aroush <ge...@aroush.net>.
Hi Nitin,

Any optimization that Luke can do on an index is also doable by making API
calls from Lucene.Net.  If not, then there is either a bug in Lucene.Net or
in your use of the API.  Can you share with us your API calls as well as the
Lucene.Net version you are using?

Thanks.

-- George

> -----Original Message-----
> From: Nitin Shiralkar [mailto:nitins@coreobjects.com] 
> Sent: Friday, January 09, 2009 6:27 AM
> To: lucene-net-user@incubator.apache.org
> Subject: RE: Lucene Scalability Options
> 
> Thanks Hugh. Yes, I tried using Luke for index optimization. 
> Surprisingly, it has brought down the index size to ~20 GB 
> with only one CFS and segment files left behind. I used 
> compound optimization option. But I use the similar 
> "SetUseCompoundFile" property on "IndexModifier" object in my 
> Lucene.NET code, but it has no effect on size or files after 
> optimization. Any suggestions??
> 
> 
> -----Original Message-----
> From: Hugh Spiller [mailto:Hugh.Spiller@Renishaw.com]
> Sent: Friday, January 09, 2009 3:35 PM
> To: lucene-net-user@incubator.apache.org
> Subject: RE: Lucene Scalability Options
> 
> Hi Nitin,
> 
> I've found the easiest way to get rid of redundant files in 
> an index is to use Luke. As soon as you use it to open the 
> index, it tidies up all the cruft.
> 
> It's at http://www.getopt.org/luke/ .
> 
> ________________________________
> 
> Hugh Spiller
> 
> 
> -----Original Message-----
> From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
> Sent: 09 January 2009 08:48
> To: lucene-net-user@incubator.apache.org
> Subject: RE: Lucene Scalability Options
> 
> -- snip --
> 
> 
> Any inputs on junk/redundant files in above list?
> 
> 
> 
> --------------------------------------------------------------
> ------------------------------------
> This email and any attachments are confidential and are for 
> the use of the addressee only. If you are not the addressee, 
> you must not use or disclose the contents to any other 
> person. Please immediately notify the sender and delete the 
> email. Statements and opinions expressed here may not 
> represent those of the company. Email correspondence is 
> monitored by the company. This information may be subject to 
> Export Control Regulation. You are obliged to comply with 
> such Regulations
> 
> The parent company of the Renishaw Group is Renishaw plc, 
> registered in England no. 1106260. Registered Office: New 
> Mills, Wotton-under-Edge, Gloucestershire, GL12 8JR, United 
> Kingdom. Tel +44 (0) 1453 524524
> --------------------------------------------------------------
> ------------------------------------
> 


RE: Lucene Scalability Options

Posted by Nitin Shiralkar <ni...@coreobjects.com>.
Thanks Hugh. Yes, I tried using Luke for index optimization. Surprisingly, it has brought down the index size to ~20 GB with only one CFS and segment files left behind. I used compound optimization option. But I use the similar "SetUseCompoundFile" property on "IndexModifier" object in my Lucene.NET code, but it has no effect on size or files after optimization. Any suggestions??


-----Original Message-----
From: Hugh Spiller [mailto:Hugh.Spiller@Renishaw.com]
Sent: Friday, January 09, 2009 3:35 PM
To: lucene-net-user@incubator.apache.org
Subject: RE: Lucene Scalability Options

Hi Nitin,

I've found the easiest way to get rid of redundant files in an index is
to use Luke. As soon as you use it to open the index, it tidies up all
the cruft.

It's at http://www.getopt.org/luke/ .

________________________________

Hugh Spiller


-----Original Message-----
From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
Sent: 09 January 2009 08:48
To: lucene-net-user@incubator.apache.org
Subject: RE: Lucene Scalability Options

-- snip --


Any inputs on junk/redundant files in above list?



--------------------------------------------------------------------------------------------------
This email and any attachments are confidential and are for the use of the addressee only. If you are not the addressee, you must not use or disclose the contents to any other person. Please immediately notify the sender and delete the email. Statements and opinions expressed here may not represent those of the company. Email correspondence is monitored by the company. This information may be subject to Export Control Regulation. You are obliged to comply with such Regulations

The parent company of the Renishaw Group is Renishaw plc, registered in England no. 1106260. Registered Office: New Mills, Wotton-under-Edge, Gloucestershire, GL12 8JR, United Kingdom. Tel +44 (0) 1453 524524
--------------------------------------------------------------------------------------------------


RE: Lucene Scalability Options

Posted by Hugh Spiller <Hu...@Renishaw.com>.
Hi Nitin,

I've found the easiest way to get rid of redundant files in an index is
to use Luke. As soon as you use it to open the index, it tidies up all
the cruft. 

It's at http://www.getopt.org/luke/ . 

________________________________

Hugh Spiller 


-----Original Message-----
From: Nitin Shiralkar [mailto:nitins@coreobjects.com] 
Sent: 09 January 2009 08:48
To: lucene-net-user@incubator.apache.org
Subject: RE: Lucene Scalability Options

-- snip --


Any inputs on junk/redundant files in above list?



--------------------------------------------------------------------------------------------------
This email and any attachments are confidential and are for the use of the addressee only. If you are not the addressee, you must not use or disclose the contents to any other person. Please immediately notify the sender and delete the email. Statements and opinions expressed here may not represent those of the company. Email correspondence is monitored by the company. This information may be subject to Export Control Regulation. You are obliged to comply with such Regulations

The parent company of the Renishaw Group is Renishaw plc, registered in England no. 1106260. Registered Office: New Mills, Wotton-under-Edge, Gloucestershire, GL12 8JR, United Kingdom. Tel +44 (0) 1453 524524
--------------------------------------------------------------------------------------------------


RE: Lucene Scalability Options

Posted by Nitin Shiralkar <ni...@coreobjects.com>.
Digy,

It will be difficult to create group of indexes because of the way we build and search the index. We keep on adding new documents and also keep on updating existing documents quite frequently. Also our searches need to be fired on the entire set.

We are not facing any search performance problems as of now, I just wanted to check if there are any known performance or scalability issues after crossing 100 GB size. Another question on same topic. I am not sure if 100 GB size of our index is genuine or it is due to some failures which has resulted into redundant segments/files. I saw few TMP files which I have deleted. But apart from that, I am not sure how to identify redundant or junk files in Lucene index folder.

Following is the list of files which we have in lucene index folder:

\\LuceneIndexTest\_d8by.prx
\\LuceneIndexTest\_d8by.tii
\\LuceneIndexTest\_d8by.tis
\\LuceneIndexTest\_d8c9.fdt
\\LuceneIndexTest\_d8c9.fdx
\\LuceneIndexTest\_d8c9.fnm
\\LuceneIndexTest\_d8ca.fdt
\\LuceneIndexTest\_d8ca.fdx
\\LuceneIndexTest\_d8ca.fnm
\\LuceneIndexTest\_dl4h.fdt
\\LuceneIndexTest\_dl4h.fdx
\\LuceneIndexTest\_dl4h.fnm
\\LuceneIndexTest\_dl48.fdt
\\LuceneIndexTest\_dl48.fdx
\\LuceneIndexTest\_dl48.fnm
\\LuceneIndexTest\_dl48.frq
\\LuceneIndexTest\_dl48.prx
\\LuceneIndexTest\_dl48.tii
\\LuceneIndexTest\_dl48.tis
\\LuceneIndexTest\_fdbs.fdt
\\LuceneIndexTest\_fdbs.fdx
\\LuceneIndexTest\_fdbs.fnm
\\LuceneIndexTest\_fdbs.frq
\\LuceneIndexTest\_fdbs.prx
\\LuceneIndexTest\_fdbs.tii
\\LuceneIndexTest\_fdbs.tis
\\LuceneIndexTest\_fhz5.fdt
\\LuceneIndexTest\_fhz5.fdx
\\LuceneIndexTest\_fhz5.fnm
\\LuceneIndexTest\_fhz5.frq
\\LuceneIndexTest\_fhz5.prx
\\LuceneIndexTest\_fhz5.tii
\\LuceneIndexTest\_fhz5.tis
\\LuceneIndexTest\_fkla.fdt
\\LuceneIndexTest\_fkla.fdx
\\LuceneIndexTest\_fkla.fnm
\\LuceneIndexTest\_fkla.frq
\\LuceneIndexTest\_fkla.prx
\\LuceneIndexTest\_fkla.tii
\\LuceneIndexTest\_fkla.tis
\\LuceneIndexTest\_fmo5.fdt
\\LuceneIndexTest\_fmo5.fdx
\\LuceneIndexTest\_fmo5.fnm
\\LuceneIndexTest\_fmo5.frq
\\LuceneIndexTest\_fmo5.prx
\\LuceneIndexTest\_fmo5.tii
\\LuceneIndexTest\_fmo5.tis
\\LuceneIndexTest\_fmo6.fdt
\\LuceneIndexTest\_fmo6.fdx
\\LuceneIndexTest\_fmo6.fnm
\\LuceneIndexTest\_fmo6.frq
\\LuceneIndexTest\_fmo6.prx
\\LuceneIndexTest\_fmo6.tii
\\LuceneIndexTest\_fmo6.tis
\\LuceneIndexTest\_fmo7.fdt
\\LuceneIndexTest\_fmo7.fdx
\\LuceneIndexTest\_fmo7.fnm
\\LuceneIndexTest\_fmo7.frq
\\LuceneIndexTest\_fmo7.prx
\\LuceneIndexTest\_fmo7.tii
\\LuceneIndexTest\_fmo7.tis
\\LuceneIndexTest\_fmo9.fdt
\\LuceneIndexTest\_fmo9.fdx
\\LuceneIndexTest\_fmo9.fnm
\\LuceneIndexTest\_fmoa.fdt
\\LuceneIndexTest\_fmoa.fdx
\\LuceneIndexTest\_fmoa.fnm
\\LuceneIndexTest\_fmod.fdt
\\LuceneIndexTest\_fmod.fdx
\\LuceneIndexTest\_fmod.fnm
\\LuceneIndexTest\_fmoe.fdt
\\LuceneIndexTest\_fmoe.fdx
\\LuceneIndexTest\_fmoe.fnm
\\LuceneIndexTest\_fmof.fdt
\\LuceneIndexTest\_fmof.fdx
\\LuceneIndexTest\_fmof.fnm
\\LuceneIndexTest\_fmog.fdt
\\LuceneIndexTest\_fmog.fdx
\\LuceneIndexTest\_fmog.fnm
\\LuceneIndexTest\_fmoh.fdt
\\LuceneIndexTest\_fmoh.fdx
\\LuceneIndexTest\_fmoh.fnm
\\LuceneIndexTest\_foq9.fdt
\\LuceneIndexTest\_foq9.fdx
\\LuceneIndexTest\_foq9.fnm
\\LuceneIndexTest\_foq9.frq
\\LuceneIndexTest\_foq9.prx
\\LuceneIndexTest\_foq9.tii
\\LuceneIndexTest\_foq9.tis
\\LuceneIndexTest\_fq23.fdt
\\LuceneIndexTest\_fq23.fdx
\\LuceneIndexTest\_fq23.fnm
\\LuceneIndexTest\_fq23.frq
\\LuceneIndexTest\_fq23.prx
\\LuceneIndexTest\_fq23.tii
\\LuceneIndexTest\_fq23.tis
\\LuceneIndexTest\_hr8w.fdt
\\LuceneIndexTest\_hr8w.fdx
\\LuceneIndexTest\_hr8w.fnm
\\LuceneIndexTest\_hr8x.fdt
\\LuceneIndexTest\_hr8x.fdx
\\LuceneIndexTest\_hr8x.fnm
\\LuceneIndexTest\_k6jf.cfs
\\LuceneIndexTest\_kwhl.cfs
\\LuceneIndexTest\deletable
\\LuceneIndexTest\segments
\\LuceneIndexTest\_d8by.fdt
\\LuceneIndexTest\_d8by.fdx
\\LuceneIndexTest\_d8by.fnm
\\LuceneIndexTest\_d8by.frq

Any inputs on junk/redundant files in above list?



-----Original Message-----
From: Digy [mailto:digydigy@gmail.com]
Sent: Tuesday, December 30, 2008 2:37 AM
To: lucene-net-user@incubator.apache.org
Subject: RE: Lucene Scalability Options

Hi Nitin,

* I haven't heard about that 100GB limit but I tried Lucene.Net once with a
300GB index. The first searches (with a fresh IndexSearcher) took
~20sec(because of caching) but next searches performed quite well(varying
from ~50msec to 3sec).

* If you deal with such large indexes, it is better to group the indexes
according to some criteria(for ex., index of December, index of November
etc.) and not to use an index when it is not needed in the search. Of
course, keeping smaller indexes on multiple machines and making a parallel
search on them and then merging the results would be a good solution too,
but it would require more complex coding

You may also want to see some tricks about search speed optimizations (
http://wiki.apache.org/jakarta-lucene/ImproveSearchingSpeed ) and the
project Solr ( http://lucene.apache.org/solr/features.html ).

* You can get the official releases of Lucene.Net from
https://svn.apache.org/repos/asf/incubator/lucene.net/site/download and the
current version from svn trunk
https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/C%23/src/Lucene.
Net/



DIGY.







-----Original Message-----
From: Nitin Shiralkar [mailto:nitins@coreobjects.com]
Sent: Saturday, December 27, 2008 6:41 AM
To: lucene-net-user@incubator.apache.org
Subject: Lucene Scalability Options

Hi All,

We are using Lucene.NET v2.0 library in our project. Our index has grown to
~80 GB in last one year. We expect our index to grow beyond 100 GB in next
six months. I have read somewhere long back about Lucene performance issues
after crossing 100 GB mark.


-          Is there any specific issues that we might run into after 100 GB?

-          Is there any known impact on search performance?

-          Do we have any scalability features that we can consider for
implementation? Clustering etc?

Any inputs would be valuable. Also I would like to know the latest stable
Lucene.NET release which we can migrate to, any download link would be
useful.


Thanks & regards,

Nitin Shiralkar


RE: Lucene Scalability Options

Posted by Digy <di...@gmail.com>.
Hi Nitin,

* I haven't heard about that 100GB limit but I tried Lucene.Net once with a
300GB index. The first searches (with a fresh IndexSearcher) took
~20sec(because of caching) but next searches performed quite well(varying
from ~50msec to 3sec).

* If you deal with such large indexes, it is better to group the indexes
according to some criteria(for ex., index of December, index of November
etc.) and not to use an index when it is not needed in the search. Of
course, keeping smaller indexes on multiple machines and making a parallel
search on them and then merging the results would be a good solution too,
but it would require more complex coding 

You may also want to see some tricks about search speed optimizations (
http://wiki.apache.org/jakarta-lucene/ImproveSearchingSpeed ) and the
project Solr ( http://lucene.apache.org/solr/features.html ).

* You can get the official releases of Lucene.Net from
https://svn.apache.org/repos/asf/incubator/lucene.net/site/download and the
current version from svn trunk
https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/C%23/src/Lucene.
Net/



DIGY.







-----Original Message-----
From: Nitin Shiralkar [mailto:nitins@coreobjects.com] 
Sent: Saturday, December 27, 2008 6:41 AM
To: lucene-net-user@incubator.apache.org
Subject: Lucene Scalability Options

Hi All,

We are using Lucene.NET v2.0 library in our project. Our index has grown to
~80 GB in last one year. We expect our index to grow beyond 100 GB in next
six months. I have read somewhere long back about Lucene performance issues
after crossing 100 GB mark.


-          Is there any specific issues that we might run into after 100 GB?

-          Is there any known impact on search performance?

-          Do we have any scalability features that we can consider for
implementation? Clustering etc?

Any inputs would be valuable. Also I would like to know the latest stable
Lucene.NET release which we can migrate to, any download link would be
useful.


Thanks & regards,

Nitin Shiralkar