You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Rob Jose <rj...@comcast.net> on 2004/08/18 22:44:37 UTC

Index Size

Hello
I have indexed several thousand (52 to be exact) text files and I keep running out of disk space to store the indexes.  The size of the documents I have indexed is around 2.5 GB.  The size of the Lucene indexes is around 287 GB.  Does this seem correct?  I am not storing the contents of the file, just indexing and tokenizing.  I am using Lucene 1.3 final.  Can you guys let me know what you are experiencing?  I don't want to go into production with something that I should be configuring better.  

I am not sure if this helps, but I have a temp index and a real index.  I index the file into the temp index, and then merge the temp index into the real index using the addIndexes method on the IndexWriter.  I have also set the production writer setUseCompoundFile to true.  I did not set this on the temp index.  The last thing that I do before closing the production writer is to call the optimize method.  

I would really appreciate any ideas to get the index size smaller if it is at all possible.

Thanks
Rob

Re: Index Size

Posted by Rob Jose <rj...@comcast.net>.
Stephane

Thanks for your response.  I have thought that same question.  If fact after
I went home last night that is exactly what I thought I was doing.  But I
just used Luke to go through all of my documents, and I don't see any
duplicates.  But I will go check again just to make sure.

Rob
----- Original Message ----- 
From: "Stephane James Vaucher" <va...@cirano.qc.ca>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Thursday, August 19, 2004 9:34 AM
Subject: Re: Index Size


Stupid question:

Are you sure you have the right number of docs in your index? i.e. you're
not adding the same document twice into or via your tmp index.

sv

On Thu, 19 Aug 2004, Rob Jose wrote:

> Paul
> Thank you for your response.  I have appended to the bottom of this
message
> the field structure that I am using.  I hope that this helps.  I am using
> the StandardAnalyzer.  I do not believe that I am changing any default
> values, but I have also appended the code that adds the temp index to the
> production index.
>
> Thanks for you help
> Rob
>
> Here is the code that describes the field structure.
> public static Document Document(String contents, String path, Date
modified,
> String runDate, String totalpages, String pagecount, String countycode,
> String reportnum, String reportdescr)
>
> {
>
> SimpleDateFormat showFormat = new
> SimpleDateFormat(TurbineResources.getString("date.default.format"));
>
> SimpleDateFormat searchFormat = new SimpleDateFormat("yyyyMMdd");
>
> Document doc = new Document();
>
> doc.add(Field.Keyword("path", path));
>
> doc.add(Field.Keyword("modified", showFormat.format(modified)));
>
> doc.add(Field.UnStored("searchDate", searchFormat.format(modified)));
>
> doc.add(Field.Keyword("runDate", runDate==null?"":runDate));
>
> doc.add(Field.UnStored("searchRunDate",
>
runDate==null?"":runDate.substring(6)+runDate.substring(0,2)+runDate.substri
> ng(3,5)));
>
> doc.add(Field.Keyword("reportnum", reportnum));
>
> doc.add(Field.Text("reportdescr", reportdescr));
>
> doc.add(Field.UnStored("cntycode", countycode));
>
> doc.add(Field.Keyword("totalpages", totalpages));
>
> doc.add(Field.Keyword("page", pagecount));
>
> doc.add(Field.UnStored("contents", contents));
>
> return doc;
>
> }
>
>
>
> Here is the code that adds the temp index to the production index.
>
> File tempFile = new File(sIndex + File.separatorChar + "temp" +
sCntyCode);
>
> tempReader = IndexReader.open(tempFile);
>
> try
>
> {
>
> boolean createIndex = false;
>
> File f = new File(sIndex + File.separatorChar + sCntyCode);
>
> if (!f.exists())
>
> {
>
> createIndex = true;
>
> }
>
> prodWriter = new IndexWriter(sIndex + File.separatorChar + sCntyCode, new
> StandardAnalyzer(), createIndex);
>
> }
>
> catch (Exception e)
>
> {
>
> IndexReader.unlock(FSDirectory.getDirectory(sIndex + File.separatorChar +
> sCntyCode, false));
>
> CasesReports.log("Tried to Unlock " + sIndex);
>
> prodWriter = new IndexWriter(sIndex, new StandardAnalyzer(), false);
>
> CasesReports.log("Successfully Unlocked " + sIndex + File.separatorChar +
> sCntyCode);
>
> }
>
> prodWriter.setUseCompoundFile(true);
>
> prodWriter.addIndexes(new IndexReader[] { tempReader });
>
>
>
>
>
> ----- Original Message -----
> From: "Paul Elschot" <pa...@xs4all.nl>
> To: <lu...@jakarta.apache.org>
> Sent: Thursday, August 19, 2004 12:16 AM
> Subject: Re: Index Size
>
>
> On Wednesday 18 August 2004 22:44, Rob Jose wrote:
> > Hello
> > I have indexed several thousand (52 to be exact) text files and I keep
> > running out of disk space to store the indexes.  The size of the
documents
> > I have indexed is around 2.5 GB.  The size of the Lucene indexes is
around
> > 287 GB.  Does this seem correct?  I am not storing the contents of the
>
> As noted, one would expect the index size to be about 35%
> of the original text, ie. about 2.5GB * 35% = 800MB.
> That is two orders of magnitude off from what you have.
>
> Could you provide some more information about the field structure,
> ie. how many fields, which fields are stored, which fields are indexed,
> evt. use of non standard analyzers, and evt. non standard
> Lucene settings?
>
> You might also try to change to non compound format to have a look
> at the sizes of the individual index files, see file formats on the lucene
> web site.
> You can then see the total disk size of for example the stored fields.
>
> Regards,
> Paul Elschot
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Index Size

Posted by Stephane James Vaucher <va...@cirano.qc.ca>.
Stupid question:

Are you sure you have the right number of docs in your index? i.e. you're
not adding the same document twice into or via your tmp index.

sv

On Thu, 19 Aug 2004, Rob Jose wrote:

> Paul
> Thank you for your response.  I have appended to the bottom of this message
> the field structure that I am using.  I hope that this helps.  I am using
> the StandardAnalyzer.  I do not believe that I am changing any default
> values, but I have also appended the code that adds the temp index to the
> production index.
>
> Thanks for you help
> Rob
>
> Here is the code that describes the field structure.
> public static Document Document(String contents, String path, Date modified,
> String runDate, String totalpages, String pagecount, String countycode,
> String reportnum, String reportdescr)
>
> {
>
> SimpleDateFormat showFormat = new
> SimpleDateFormat(TurbineResources.getString("date.default.format"));
>
> SimpleDateFormat searchFormat = new SimpleDateFormat("yyyyMMdd");
>
> Document doc = new Document();
>
> doc.add(Field.Keyword("path", path));
>
> doc.add(Field.Keyword("modified", showFormat.format(modified)));
>
> doc.add(Field.UnStored("searchDate", searchFormat.format(modified)));
>
> doc.add(Field.Keyword("runDate", runDate==null?"":runDate));
>
> doc.add(Field.UnStored("searchRunDate",
> runDate==null?"":runDate.substring(6)+runDate.substring(0,2)+runDate.substri
> ng(3,5)));
>
> doc.add(Field.Keyword("reportnum", reportnum));
>
> doc.add(Field.Text("reportdescr", reportdescr));
>
> doc.add(Field.UnStored("cntycode", countycode));
>
> doc.add(Field.Keyword("totalpages", totalpages));
>
> doc.add(Field.Keyword("page", pagecount));
>
> doc.add(Field.UnStored("contents", contents));
>
> return doc;
>
> }
>
>
>
> Here is the code that adds the temp index to the production index.
>
> File tempFile = new File(sIndex + File.separatorChar + "temp" + sCntyCode);
>
> tempReader = IndexReader.open(tempFile);
>
> try
>
> {
>
> boolean createIndex = false;
>
> File f = new File(sIndex + File.separatorChar + sCntyCode);
>
> if (!f.exists())
>
> {
>
> createIndex = true;
>
> }
>
> prodWriter = new IndexWriter(sIndex + File.separatorChar + sCntyCode, new
> StandardAnalyzer(), createIndex);
>
> }
>
> catch (Exception e)
>
> {
>
> IndexReader.unlock(FSDirectory.getDirectory(sIndex + File.separatorChar +
> sCntyCode, false));
>
> CasesReports.log("Tried to Unlock " + sIndex);
>
> prodWriter = new IndexWriter(sIndex, new StandardAnalyzer(), false);
>
> CasesReports.log("Successfully Unlocked " + sIndex + File.separatorChar +
> sCntyCode);
>
> }
>
> prodWriter.setUseCompoundFile(true);
>
> prodWriter.addIndexes(new IndexReader[] { tempReader });
>
>
>
>
>
> ----- Original Message -----
> From: "Paul Elschot" <pa...@xs4all.nl>
> To: <lu...@jakarta.apache.org>
> Sent: Thursday, August 19, 2004 12:16 AM
> Subject: Re: Index Size
>
>
> On Wednesday 18 August 2004 22:44, Rob Jose wrote:
> > Hello
> > I have indexed several thousand (52 to be exact) text files and I keep
> > running out of disk space to store the indexes.  The size of the documents
> > I have indexed is around 2.5 GB.  The size of the Lucene indexes is around
> > 287 GB.  Does this seem correct?  I am not storing the contents of the
>
> As noted, one would expect the index size to be about 35%
> of the original text, ie. about 2.5GB * 35% = 800MB.
> That is two orders of magnitude off from what you have.
>
> Could you provide some more information about the field structure,
> ie. how many fields, which fields are stored, which fields are indexed,
> evt. use of non standard analyzers, and evt. non standard
> Lucene settings?
>
> You might also try to change to non compound format to have a look
> at the sizes of the individual index files, see file formats on the lucene
> web site.
> You can then see the total disk size of for example the stored fields.
>
> Regards,
> Paul Elschot
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Index Size

Posted by Rob Jose <rj...@comcast.net>.
Paul
Thank you for your response.  I have appended to the bottom of this message
the field structure that I am using.  I hope that this helps.  I am using
the StandardAnalyzer.  I do not believe that I am changing any default
values, but I have also appended the code that adds the temp index to the
production index.

Thanks for you help
Rob

Here is the code that describes the field structure.
public static Document Document(String contents, String path, Date modified,
String runDate, String totalpages, String pagecount, String countycode,
String reportnum, String reportdescr)

{

SimpleDateFormat showFormat = new
SimpleDateFormat(TurbineResources.getString("date.default.format"));

SimpleDateFormat searchFormat = new SimpleDateFormat("yyyyMMdd");

Document doc = new Document();

doc.add(Field.Keyword("path", path));

doc.add(Field.Keyword("modified", showFormat.format(modified)));

doc.add(Field.UnStored("searchDate", searchFormat.format(modified)));

doc.add(Field.Keyword("runDate", runDate==null?"":runDate));

doc.add(Field.UnStored("searchRunDate",
runDate==null?"":runDate.substring(6)+runDate.substring(0,2)+runDate.substri
ng(3,5)));

doc.add(Field.Keyword("reportnum", reportnum));

doc.add(Field.Text("reportdescr", reportdescr));

doc.add(Field.UnStored("cntycode", countycode));

doc.add(Field.Keyword("totalpages", totalpages));

doc.add(Field.Keyword("page", pagecount));

doc.add(Field.UnStored("contents", contents));

return doc;

}



Here is the code that adds the temp index to the production index.

File tempFile = new File(sIndex + File.separatorChar + "temp" + sCntyCode);

tempReader = IndexReader.open(tempFile);

try

{

boolean createIndex = false;

File f = new File(sIndex + File.separatorChar + sCntyCode);

if (!f.exists())

{

createIndex = true;

}

prodWriter = new IndexWriter(sIndex + File.separatorChar + sCntyCode, new
StandardAnalyzer(), createIndex);

}

catch (Exception e)

{

IndexReader.unlock(FSDirectory.getDirectory(sIndex + File.separatorChar +
sCntyCode, false));

CasesReports.log("Tried to Unlock " + sIndex);

prodWriter = new IndexWriter(sIndex, new StandardAnalyzer(), false);

CasesReports.log("Successfully Unlocked " + sIndex + File.separatorChar +
sCntyCode);

}

prodWriter.setUseCompoundFile(true);

prodWriter.addIndexes(new IndexReader[] { tempReader });





----- Original Message ----- 
From: "Paul Elschot" <pa...@xs4all.nl>
To: <lu...@jakarta.apache.org>
Sent: Thursday, August 19, 2004 12:16 AM
Subject: Re: Index Size


On Wednesday 18 August 2004 22:44, Rob Jose wrote:
> Hello
> I have indexed several thousand (52 to be exact) text files and I keep
> running out of disk space to store the indexes.  The size of the documents
> I have indexed is around 2.5 GB.  The size of the Lucene indexes is around
> 287 GB.  Does this seem correct?  I am not storing the contents of the

As noted, one would expect the index size to be about 35%
of the original text, ie. about 2.5GB * 35% = 800MB.
That is two orders of magnitude off from what you have.

Could you provide some more information about the field structure,
ie. how many fields, which fields are stored, which fields are indexed,
evt. use of non standard analyzers, and evt. non standard
Lucene settings?

You might also try to change to non compound format to have a look
at the sizes of the individual index files, see file formats on the lucene
web site.
You can then see the total disk size of for example the stored fields.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Index Size

Posted by Paul Elschot <pa...@xs4all.nl>.
On Wednesday 18 August 2004 22:44, Rob Jose wrote:
> Hello
> I have indexed several thousand (52 to be exact) text files and I keep
> running out of disk space to store the indexes.  The size of the documents
> I have indexed is around 2.5 GB.  The size of the Lucene indexes is around
> 287 GB.  Does this seem correct?  I am not storing the contents of the

As noted, one would expect the index size to be about 35%
of the original text, ie. about 2.5GB * 35% = 800MB.
That is two orders of magnitude off from what you have.

Could you provide some more information about the field structure,
ie. how many fields, which fields are stored, which fields are indexed,
evt. use of non standard analyzers, and evt. non standard
Lucene settings?

You might also try to change to non compound format to have a look
at the sizes of the individual index files, see file formats on the lucene
web site.
You can then see the total disk size of for example the stored fields.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Index Size

Posted by Rob Jose <rj...@comcast.net>.
I did a little more research into my production indexes, and so far the
first index in the only one that has any other files besides the CFS files.
The other indexes that I have seen have just the deletable and segments
files and a whole bunch of cfs files.  Very interesting.  Also worth noting
is that once in awhile one of the production indexes will have a 0 length
FNM file.

Rob
----- Original Message ----- 
From: "Rob Jose" <rj...@comcast.net>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Thursday, August 19, 2004 6:42 AM
Subject: Re: Index Size


Bernhard
Thanks for responding.  I do have an IndexReader open on the Temp index.  I
pass this IndexReader into the addIndexes method on the IndexWriter to add
these files.  I did notice that I have a ton of CFS files that I removed and
was still able to read the indexes.  Are these the temporary segment files
you are talking about?  Here is my code that adds the temp index to the prod
index.
File tempFile = new File(sIndex + File.separatorChar + "temp" + sCntyCode);
tempReader = IndexReader.open(tempFile);

try

{

boolean createIndex = false;

File f = new File(sIndex + File.separatorChar + sCntyCode);

if (!f.exists())

{

createIndex = true;

}

prodWriter = new IndexWriter(sIndex + File.separatorChar + sCntyCode, new
StandardAnalyzer(), createIndex);

}

catch (Exception e)

{

IndexReader.unlock(FSDirectory.getDirectory(sIndex + File.separatorChar +
sCntyCode, false));

CasesReports.log("Tried to Unlock " + sIndex);

prodWriter = new IndexWriter(sIndex, new StandardAnalyzer(), false);

CasesReports.log("Successfully Unlocked " + sIndex + File.separatorChar +
sCntyCode);

}

prodWriter.setUseCompoundFile(true);

prodWriter.addIndexes(new IndexReader[] { tempReader });



Am I doing something wrong?  Any help would be extremely appreciated.



Thanks

Rob

----- Original Message ----- 
From: "Bernhard Messer" <Be...@intrafind.de>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Thursday, August 19, 2004 1:09 AM
Subject: Re: Index Size


Rob,

as Doug and Paul already mentioned, the index size is definitely to big :-(.

What could raise the problem, especially when running on a windows
platform, is that an IndexReader is open during the whole index process.
During indexing, the writer creates temporary segment files which will
be merged into bigger segments. If done, the old segment files will be
deleted. If there is an open IndexReader, the environment is unable to
unlock the files and they still stay in the index directory. You will
end up with an index, several times bigger than the dataset.

Can you check your code for any open IndexReaders when indexing, or
paste the relevant part to the list so we could have a look on it.

hope this helps
Bernhard


Rob Jose wrote:

>Hello
>I have indexed several thousand (52 to be exact) text files and I keep
running out of disk space to store the indexes.  The size of the documents I
have indexed is around 2.5 GB.  The size of the Lucene indexes is around 287
GB.  Does this seem correct?  I am not storing the contents of the file,
just indexing and tokenizing.  I am using Lucene 1.3 final.  Can you guys
let me know what you are experiencing?  I don't want to go into production
with something that I should be configuring better.
>
>I am not sure if this helps, but I have a temp index and a real index.  I
index the file into the temp index, and then merge the temp index into the
real index using the addIndexes method on the IndexWriter.  I have also set
the production writer setUseCompoundFile to true.  I did not set this on the
temp index.  The last thing that I do before closing the production writer
is to call the optimize method.
>
>I would really appreciate any ideas to get the index size smaller if it is
at all possible.
>
>Thanks
>Rob
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Index Size

Posted by Rob Jose <rj...@comcast.net>.
Bernhard
Thanks for responding.  I do have an IndexReader open on the Temp index.  I
pass this IndexReader into the addIndexes method on the IndexWriter to add
these files.  I did notice that I have a ton of CFS files that I removed and
was still able to read the indexes.  Are these the temporary segment files
you are talking about?  Here is my code that adds the temp index to the prod
index.
File tempFile = new File(sIndex + File.separatorChar + "temp" + sCntyCode);
tempReader = IndexReader.open(tempFile);

try

{

boolean createIndex = false;

File f = new File(sIndex + File.separatorChar + sCntyCode);

if (!f.exists())

{

createIndex = true;

}

prodWriter = new IndexWriter(sIndex + File.separatorChar + sCntyCode, new
StandardAnalyzer(), createIndex);

}

catch (Exception e)

{

IndexReader.unlock(FSDirectory.getDirectory(sIndex + File.separatorChar +
sCntyCode, false));

CasesReports.log("Tried to Unlock " + sIndex);

prodWriter = new IndexWriter(sIndex, new StandardAnalyzer(), false);

CasesReports.log("Successfully Unlocked " + sIndex + File.separatorChar +
sCntyCode);

}

prodWriter.setUseCompoundFile(true);

prodWriter.addIndexes(new IndexReader[] { tempReader });



Am I doing something wrong?  Any help would be extremely appreciated.



Thanks

Rob

----- Original Message ----- 
From: "Bernhard Messer" <Be...@intrafind.de>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Thursday, August 19, 2004 1:09 AM
Subject: Re: Index Size


Rob,

as Doug and Paul already mentioned, the index size is definitely to big :-(.

What could raise the problem, especially when running on a windows
platform, is that an IndexReader is open during the whole index process.
During indexing, the writer creates temporary segment files which will
be merged into bigger segments. If done, the old segment files will be
deleted. If there is an open IndexReader, the environment is unable to
unlock the files and they still stay in the index directory. You will
end up with an index, several times bigger than the dataset.

Can you check your code for any open IndexReaders when indexing, or
paste the relevant part to the list so we could have a look on it.

hope this helps
Bernhard


Rob Jose wrote:

>Hello
>I have indexed several thousand (52 to be exact) text files and I keep
running out of disk space to store the indexes.  The size of the documents I
have indexed is around 2.5 GB.  The size of the Lucene indexes is around 287
GB.  Does this seem correct?  I am not storing the contents of the file,
just indexing and tokenizing.  I am using Lucene 1.3 final.  Can you guys
let me know what you are experiencing?  I don't want to go into production
with something that I should be configuring better.
>
>I am not sure if this helps, but I have a temp index and a real index.  I
index the file into the temp index, and then merge the temp index into the
real index using the addIndexes method on the IndexWriter.  I have also set
the production writer setUseCompoundFile to true.  I did not set this on the
temp index.  The last thing that I do before closing the production writer
is to call the optimize method.
>
>I would really appreciate any ideas to get the index size smaller if it is
at all possible.
>
>Thanks
>Rob
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Index Size

Posted by Bernhard Messer <Be...@intrafind.de>.
Rob,

as Doug and Paul already mentioned, the index size is definitely to big :-(.

What could raise the problem, especially when running on a windows 
platform, is that an IndexReader is open during the whole index process. 
During indexing, the writer creates temporary segment files which will 
be merged into bigger segments. If done, the old segment files will be 
deleted. If there is an open IndexReader, the environment is unable to 
unlock the files and they still stay in the index directory. You will 
end up with an index, several times bigger than the dataset.

Can you check your code for any open IndexReaders when indexing, or 
paste the relevant part to the list so we could have a look on it.

hope this helps
Bernhard


Rob Jose wrote:

>Hello
>I have indexed several thousand (52 to be exact) text files and I keep running out of disk space to store the indexes.  The size of the documents I have indexed is around 2.5 GB.  The size of the Lucene indexes is around 287 GB.  Does this seem correct?  I am not storing the contents of the file, just indexing and tokenizing.  I am using Lucene 1.3 final.  Can you guys let me know what you are experiencing?  I don't want to go into production with something that I should be configuring better.  
>
>I am not sure if this helps, but I have a temp index and a real index.  I index the file into the temp index, and then merge the temp index into the real index using the addIndexes method on the IndexWriter.  I have also set the production writer setUseCompoundFile to true.  I did not set this on the temp index.  The last thing that I do before closing the production writer is to call the optimize method.  
>
>I would really appreciate any ideas to get the index size smaller if it is at all possible.
>
>Thanks
>Rob
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Index Size

Posted by Stephane James Vaucher <va...@cirano.qc.ca>.
From: Doug Cutting
http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg08757.html

> An index typically requires around 35% of the plain text size.

I think it's a little big.

sv

On Wed, 18 Aug 2004, Rob Jose wrote:

> Hello
> I have indexed several thousand (52 to be exact) text files and I keep 
> running out of disk space to store the indexes.  The size of the 
> documents I have indexed is around 2.5 GB.  The size of the Lucene 
> indexes is around 287 GB.  Does this seem correct?  I am not storing the 
> contents of the file, just indexing and tokenizing.  I am using Lucene 
> 1.3 final.  Can you guys let me know what you are experiencing?  I don't 
> want to go into production with something that I should be configuring 
> better.  
> 
> I am not sure if this helps, but I have a temp index and a real index.  I index the file into the temp index, and then merge the temp index into the real index using the addIndexes method on the IndexWriter.  I have also set the production writer setUseCompoundFile to true.  I did not set this on the temp index.  The last thing that I do before closing the production writer is to call the optimize method.  
> 
> I would really appreciate any ideas to get the index size smaller if it is at all possible.
> 
> Thanks
> Rob


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Index Size

Posted by Rob Jose <rj...@comcast.net>.
Otis
I upgraded to 1.4.1.  I deleted all of my old indexes and started from
scratch.  I indexed 2 MB worth of text files and my index size is 8 MB.
Would it be better if I stopped using the
IndexWriter.addIndexes(IndexReader) method and instead traverse the
IndexReader on the temp index and use IndexWriter.addDocument(Document)
method?

Thanks again for your input, I appreciate it.

Rob
----- Original Message ----- 
From: "Otis Gospodnetic" <ot...@yahoo.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Thursday, August 19, 2004 8:00 AM
Subject: Re: Index Size


Just go for 1.4.1 and look at the CHANGES.txt file to see if there were
any index format changes.  If there were, you'll need to re-index.

Otis

--- Rob Jose <rj...@comcast.net> wrote:

> Otis
> I am using Lucene 1.3 final.  Would it help if I move to Lucene 1.4
> final?
>
> Rob
> ----- Original Message ----- 
> From: "Otis Gospodnetic" <ot...@yahoo.com>
> To: "Lucene Users List" <lu...@jakarta.apache.org>
> Sent: Thursday, August 19, 2004 7:13 AM
> Subject: Re: Index Size
>
>
> I thought this was the case.  I believe there was a bug in one of the
> recent Lucene releases that caused old CFS files not to be removed
> when
> they should be removed.  This resulted in your index directory
> containing a bunch of old CFS files consuming your disk space.
>
> Try getting a recent nightly build and see if using that takes car
> eof
> your problem.
>
> Otis
>
> --- Rob Jose <rj...@comcast.net> wrote:
>
> > Hey George
> > Thanks for responding.  I am using windows and I don't see any
> hidden
> > files.
> > I have a ton of CFS files (1366/1405).  I have 22 F# (F1, F2, etc.)
> > files.
> > I have two FDT files and two FDX files. And three FNM files.  Add
> > these
> > files to the deletable and segments file and that is all of the
> files
> > that I
> > have.   The CFS files are appoximately 11 MB each.  The totals I
> gave
> > you
> > before were for all of my indexes together.  This particular index
> > has a
> > size of 21.6 GB.  The files that it indexed have a size of 89 MB.
> >
> > OK - I just removed all of the CFS files from the directory and I
> can
> > still
> > read my indexes.  So know I have to ask what are these CFS files?
> > Why are
> > they created?  And how can I get rid of them if I don't need them.
> I
> > will
> > also take a look at the Lucene website to see if I can find any
> > information.
> >
> > Thanks
> > Rob
> >
> > ----- Original Message ----- 
> > From: "Honey George" <ho...@yahoo.com>
> > To: "Lucene Users List" <lu...@jakarta.apache.org>
> > Sent: Thursday, August 19, 2004 12:29 AM
> > Subject: Re: Index Size
> >
> >
> > Hi,
> >  Please check for hidden files in the index folder. If
> > you are using linx, do something like
> >
> > ls -al <index folder>
> >
> > I am also facing a similar problem where the index
> > size is greater than the data size. In my case there
> > were some hidden temproary files which the lucene
> > creates.
> > That was taking half of the total size.
> >
> > My problem is that after deleting the temporary files,
> > the index size is same as that of the data size. That
> > again seems to be a problem. I am yet to find out the
> > reason..
> >
> > Thanks,
> >    george
> >
> >
> >  --- Rob Jose <rj...@comcast.net> wrote:
> > > Hello
> > > I have indexed several thousand (52 to be exact)
> > > text files and I keep running out of disk space to
> > > store the indexes.  The size of the documents I have
> > > indexed is around 2.5 GB.  The size of the Lucene
> > > indexes is around 287 GB.  Does this seem correct?
> > > I am not storing the contents of the file, just
> > > indexing and tokenizing.  I am using Lucene 1.3
> > > final.  Can you guys let me know what you are
> > > experiencing?  I don't want to go into production
> > > with something that I should be configuring better.
> > >
> > >
> > > I am not sure if this helps, but I have a temp index
> > > and a real index.  I index the file into the temp
> > > index, and then merge the temp index into the real
> > > index using the addIndexes method on the
> > > IndexWriter.  I have also set the production writer
> > > setUseCompoundFile to true.  I did not set this on
> > > the temp index.  The last thing that I do before
> > > closing the production writer is to call the
> > > optimize method.
> > >
> > > I would really appreciate any ideas to get the index
> > > size smaller if it is at all possible.
> > >
> > > Thanks
> > > Rob


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Index Size

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Just go for 1.4.1 and look at the CHANGES.txt file to see if there were
any index format changes.  If there were, you'll need to re-index.

Otis

--- Rob Jose <rj...@comcast.net> wrote:

> Otis
> I am using Lucene 1.3 final.  Would it help if I move to Lucene 1.4
> final?
> 
> Rob
> ----- Original Message ----- 
> From: "Otis Gospodnetic" <ot...@yahoo.com>
> To: "Lucene Users List" <lu...@jakarta.apache.org>
> Sent: Thursday, August 19, 2004 7:13 AM
> Subject: Re: Index Size
> 
> 
> I thought this was the case.  I believe there was a bug in one of the
> recent Lucene releases that caused old CFS files not to be removed
> when
> they should be removed.  This resulted in your index directory
> containing a bunch of old CFS files consuming your disk space.
> 
> Try getting a recent nightly build and see if using that takes car
> eof
> your problem.
> 
> Otis
> 
> --- Rob Jose <rj...@comcast.net> wrote:
> 
> > Hey George
> > Thanks for responding.  I am using windows and I don't see any
> hidden
> > files.
> > I have a ton of CFS files (1366/1405).  I have 22 F# (F1, F2, etc.)
> > files.
> > I have two FDT files and two FDX files. And three FNM files.  Add
> > these
> > files to the deletable and segments file and that is all of the
> files
> > that I
> > have.   The CFS files are appoximately 11 MB each.  The totals I
> gave
> > you
> > before were for all of my indexes together.  This particular index
> > has a
> > size of 21.6 GB.  The files that it indexed have a size of 89 MB.
> > 
> > OK - I just removed all of the CFS files from the directory and I
> can
> > still
> > read my indexes.  So know I have to ask what are these CFS files? 
> > Why are
> > they created?  And how can I get rid of them if I don't need them. 
> I
> > will
> > also take a look at the Lucene website to see if I can find any
> > information.
> > 
> > Thanks
> > Rob
> > 
> > ----- Original Message ----- 
> > From: "Honey George" <ho...@yahoo.com>
> > To: "Lucene Users List" <lu...@jakarta.apache.org>
> > Sent: Thursday, August 19, 2004 12:29 AM
> > Subject: Re: Index Size
> > 
> > 
> > Hi,
> >  Please check for hidden files in the index folder. If
> > you are using linx, do something like
> > 
> > ls -al <index folder>
> > 
> > I am also facing a similar problem where the index
> > size is greater than the data size. In my case there
> > were some hidden temproary files which the lucene
> > creates.
> > That was taking half of the total size.
> > 
> > My problem is that after deleting the temporary files,
> > the index size is same as that of the data size. That
> > again seems to be a problem. I am yet to find out the
> > reason..
> > 
> > Thanks,
> >    george
> > 
> > 
> >  --- Rob Jose <rj...@comcast.net> wrote:
> > > Hello
> > > I have indexed several thousand (52 to be exact)
> > > text files and I keep running out of disk space to
> > > store the indexes.  The size of the documents I have
> > > indexed is around 2.5 GB.  The size of the Lucene
> > > indexes is around 287 GB.  Does this seem correct?
> > > I am not storing the contents of the file, just
> > > indexing and tokenizing.  I am using Lucene 1.3
> > > final.  Can you guys let me know what you are
> > > experiencing?  I don't want to go into production
> > > with something that I should be configuring better.
> > >
> > >
> > > I am not sure if this helps, but I have a temp index
> > > and a real index.  I index the file into the temp
> > > index, and then merge the temp index into the real
> > > index using the addIndexes method on the
> > > IndexWriter.  I have also set the production writer
> > > setUseCompoundFile to true.  I did not set this on
> > > the temp index.  The last thing that I do before
> > > closing the production writer is to call the
> > > optimize method.
> > >
> > > I would really appreciate any ideas to get the index
> > > size smaller if it is at all possible.
> > >
> > > Thanks
> > > Rob
> > 
> > 
> > 
> > 
> > 
> > ___________________________________________________________ALL-NEW
> > Yahoo!
> > Messenger - all new features - even more fun! 
> > http://uk.messenger.yahoo.com
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> > 
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Index Size

Posted by Rob Jose <rj...@comcast.net>.
Otis
I am using Lucene 1.3 final.  Would it help if I move to Lucene 1.4 final?

Rob
----- Original Message ----- 
From: "Otis Gospodnetic" <ot...@yahoo.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Thursday, August 19, 2004 7:13 AM
Subject: Re: Index Size


I thought this was the case.  I believe there was a bug in one of the
recent Lucene releases that caused old CFS files not to be removed when
they should be removed.  This resulted in your index directory
containing a bunch of old CFS files consuming your disk space.

Try getting a recent nightly build and see if using that takes car eof
your problem.

Otis

--- Rob Jose <rj...@comcast.net> wrote:

> Hey George
> Thanks for responding.  I am using windows and I don't see any hidden
> files.
> I have a ton of CFS files (1366/1405).  I have 22 F# (F1, F2, etc.)
> files.
> I have two FDT files and two FDX files. And three FNM files.  Add
> these
> files to the deletable and segments file and that is all of the files
> that I
> have.   The CFS files are appoximately 11 MB each.  The totals I gave
> you
> before were for all of my indexes together.  This particular index
> has a
> size of 21.6 GB.  The files that it indexed have a size of 89 MB.
> 
> OK - I just removed all of the CFS files from the directory and I can
> still
> read my indexes.  So know I have to ask what are these CFS files? 
> Why are
> they created?  And how can I get rid of them if I don't need them.  I
> will
> also take a look at the Lucene website to see if I can find any
> information.
> 
> Thanks
> Rob
> 
> ----- Original Message ----- 
> From: "Honey George" <ho...@yahoo.com>
> To: "Lucene Users List" <lu...@jakarta.apache.org>
> Sent: Thursday, August 19, 2004 12:29 AM
> Subject: Re: Index Size
> 
> 
> Hi,
>  Please check for hidden files in the index folder. If
> you are using linx, do something like
> 
> ls -al <index folder>
> 
> I am also facing a similar problem where the index
> size is greater than the data size. In my case there
> were some hidden temproary files which the lucene
> creates.
> That was taking half of the total size.
> 
> My problem is that after deleting the temporary files,
> the index size is same as that of the data size. That
> again seems to be a problem. I am yet to find out the
> reason..
> 
> Thanks,
>    george
> 
> 
>  --- Rob Jose <rj...@comcast.net> wrote:
> > Hello
> > I have indexed several thousand (52 to be exact)
> > text files and I keep running out of disk space to
> > store the indexes.  The size of the documents I have
> > indexed is around 2.5 GB.  The size of the Lucene
> > indexes is around 287 GB.  Does this seem correct?
> > I am not storing the contents of the file, just
> > indexing and tokenizing.  I am using Lucene 1.3
> > final.  Can you guys let me know what you are
> > experiencing?  I don't want to go into production
> > with something that I should be configuring better.
> >
> >
> > I am not sure if this helps, but I have a temp index
> > and a real index.  I index the file into the temp
> > index, and then merge the temp index into the real
> > index using the addIndexes method on the
> > IndexWriter.  I have also set the production writer
> > setUseCompoundFile to true.  I did not set this on
> > the temp index.  The last thing that I do before
> > closing the production writer is to call the
> > optimize method.
> >
> > I would really appreciate any ideas to get the index
> > size smaller if it is at all possible.
> >
> > Thanks
> > Rob
> 
> 
> 
> 
> 
> ___________________________________________________________ALL-NEW
> Yahoo!
> Messenger - all new features - even more fun! 
> http://uk.messenger.yahoo.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Index Size

Posted by Otis Gospodnetic <ot...@yahoo.com>.
I thought this was the case.  I believe there was a bug in one of the
recent Lucene releases that caused old CFS files not to be removed when
they should be removed.  This resulted in your index directory
containing a bunch of old CFS files consuming your disk space.

Try getting a recent nightly build and see if using that takes car eof
your problem.

Otis

--- Rob Jose <rj...@comcast.net> wrote:

> Hey George
> Thanks for responding.  I am using windows and I don't see any hidden
> files.
> I have a ton of CFS files (1366/1405).  I have 22 F# (F1, F2, etc.)
> files.
> I have two FDT files and two FDX files. And three FNM files.  Add
> these
> files to the deletable and segments file and that is all of the files
> that I
> have.   The CFS files are appoximately 11 MB each.  The totals I gave
> you
> before were for all of my indexes together.  This particular index
> has a
> size of 21.6 GB.  The files that it indexed have a size of 89 MB.
> 
> OK - I just removed all of the CFS files from the directory and I can
> still
> read my indexes.  So know I have to ask what are these CFS files? 
> Why are
> they created?  And how can I get rid of them if I don't need them.  I
> will
> also take a look at the Lucene website to see if I can find any
> information.
> 
> Thanks
> Rob
> 
> ----- Original Message ----- 
> From: "Honey George" <ho...@yahoo.com>
> To: "Lucene Users List" <lu...@jakarta.apache.org>
> Sent: Thursday, August 19, 2004 12:29 AM
> Subject: Re: Index Size
> 
> 
> Hi,
>  Please check for hidden files in the index folder. If
> you are using linx, do something like
> 
> ls -al <index folder>
> 
> I am also facing a similar problem where the index
> size is greater than the data size. In my case there
> were some hidden temproary files which the lucene
> creates.
> That was taking half of the total size.
> 
> My problem is that after deleting the temporary files,
> the index size is same as that of the data size. That
> again seems to be a problem. I am yet to find out the
> reason..
> 
> Thanks,
>    george
> 
> 
>  --- Rob Jose <rj...@comcast.net> wrote:
> > Hello
> > I have indexed several thousand (52 to be exact)
> > text files and I keep running out of disk space to
> > store the indexes.  The size of the documents I have
> > indexed is around 2.5 GB.  The size of the Lucene
> > indexes is around 287 GB.  Does this seem correct?
> > I am not storing the contents of the file, just
> > indexing and tokenizing.  I am using Lucene 1.3
> > final.  Can you guys let me know what you are
> > experiencing?  I don't want to go into production
> > with something that I should be configuring better.
> >
> >
> > I am not sure if this helps, but I have a temp index
> > and a real index.  I index the file into the temp
> > index, and then merge the temp index into the real
> > index using the addIndexes method on the
> > IndexWriter.  I have also set the production writer
> > setUseCompoundFile to true.  I did not set this on
> > the temp index.  The last thing that I do before
> > closing the production writer is to call the
> > optimize method.
> >
> > I would really appreciate any ideas to get the index
> > size smaller if it is at all possible.
> >
> > Thanks
> > Rob
> 
> 
> 
> 
> 
> ___________________________________________________________ALL-NEW
> Yahoo!
> Messenger - all new features - even more fun! 
> http://uk.messenger.yahoo.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Index Size

Posted by Rob Jose <rj...@comcast.net>.
Hey George
Thanks for responding.  I am using windows and I don't see any hidden files.
I have a ton of CFS files (1366/1405).  I have 22 F# (F1, F2, etc.) files.
I have two FDT files and two FDX files. And three FNM files.  Add these
files to the deletable and segments file and that is all of the files that I
have.   The CFS files are appoximately 11 MB each.  The totals I gave you
before were for all of my indexes together.  This particular index has a
size of 21.6 GB.  The files that it indexed have a size of 89 MB.

OK - I just removed all of the CFS files from the directory and I can still
read my indexes.  So know I have to ask what are these CFS files?  Why are
they created?  And how can I get rid of them if I don't need them.  I will
also take a look at the Lucene website to see if I can find any information.

Thanks
Rob

----- Original Message ----- 
From: "Honey George" <ho...@yahoo.com>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Thursday, August 19, 2004 12:29 AM
Subject: Re: Index Size


Hi,
 Please check for hidden files in the index folder. If
you are using linx, do something like

ls -al <index folder>

I am also facing a similar problem where the index
size is greater than the data size. In my case there
were some hidden temproary files which the lucene
creates.
That was taking half of the total size.

My problem is that after deleting the temporary files,
the index size is same as that of the data size. That
again seems to be a problem. I am yet to find out the
reason..

Thanks,
   george


 --- Rob Jose <rj...@comcast.net> wrote:
> Hello
> I have indexed several thousand (52 to be exact)
> text files and I keep running out of disk space to
> store the indexes.  The size of the documents I have
> indexed is around 2.5 GB.  The size of the Lucene
> indexes is around 287 GB.  Does this seem correct?
> I am not storing the contents of the file, just
> indexing and tokenizing.  I am using Lucene 1.3
> final.  Can you guys let me know what you are
> experiencing?  I don't want to go into production
> with something that I should be configuring better.
>
>
> I am not sure if this helps, but I have a temp index
> and a real index.  I index the file into the temp
> index, and then merge the temp index into the real
> index using the addIndexes method on the
> IndexWriter.  I have also set the production writer
> setUseCompoundFile to true.  I did not set this on
> the temp index.  The last thing that I do before
> closing the production writer is to call the
> optimize method.
>
> I would really appreciate any ideas to get the index
> size smaller if it is at all possible.
>
> Thanks
> Rob





___________________________________________________________ALL-NEW Yahoo!
Messenger - all new features - even more fun!  http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Index Size

Posted by Rob Jose <rj...@comcast.net>.
Karthik
Thanks for responding.  Yes, I optimize right before I close the index
writer.  I added this a little while ago to try and get the size down.

Rob
----- Original Message ----- 
From: "Karthik N S" <ka...@controlnet.co.in>
To: "Lucene Users List" <lu...@jakarta.apache.org>
Sent: Thursday, August 19, 2004 12:59 AM
Subject: RE: Index Size


Guys

   Are u Using the Optimizing  the index before close process.....

  If not try using it...  :}



karthik




-----Original Message-----
From: Honey George [mailto:honey_george@yahoo.com]
Sent: Thursday, August 19, 2004 1:00 PM
To: Lucene Users List
Subject: Re: Index Size


Hi,
 Please check for hidden files in the index folder. If
you are using linx, do something like

ls -al <index folder>

I am also facing a similar problem where the index
size is greater than the data size. In my case there
were some hidden temproary files which the lucene
creates.
That was taking half of the total size.

My problem is that after deleting the temporary files,
the index size is same as that of the data size. That
again seems to be a problem. I am yet to find out the
reason..

Thanks,
   george


 --- Rob Jose <rj...@comcast.net> wrote:
> Hello
> I have indexed several thousand (52 to be exact)
> text files and I keep running out of disk space to
> store the indexes.  The size of the documents I have
> indexed is around 2.5 GB.  The size of the Lucene
> indexes is around 287 GB.  Does this seem correct?
> I am not storing the contents of the file, just
> indexing and tokenizing.  I am using Lucene 1.3
> final.  Can you guys let me know what you are
> experiencing?  I don't want to go into production
> with something that I should be configuring better.
>
>
> I am not sure if this helps, but I have a temp index
> and a real index.  I index the file into the temp
> index, and then merge the temp index into the real
> index using the addIndexes method on the
> IndexWriter.  I have also set the production writer
> setUseCompoundFile to true.  I did not set this on
> the temp index.  The last thing that I do before
> closing the production writer is to call the
> optimize method.
>
> I would really appreciate any ideas to get the index
> size smaller if it is at all possible.
>
> Thanks
> Rob





___________________________________________________________ALL-NEW Yahoo!
Messenger - all new features - even more fun!  http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: Index Size

Posted by Karthik N S <ka...@controlnet.co.in>.
Guys

   Are u Using the Optimizing  the index before close process.....

  If not try using it...  :}



karthik




-----Original Message-----
From: Honey George [mailto:honey_george@yahoo.com]
Sent: Thursday, August 19, 2004 1:00 PM
To: Lucene Users List
Subject: Re: Index Size


Hi,
 Please check for hidden files in the index folder. If
you are using linx, do something like

ls -al <index folder>

I am also facing a similar problem where the index
size is greater than the data size. In my case there
were some hidden temproary files which the lucene
creates.
That was taking half of the total size.

My problem is that after deleting the temporary files,
the index size is same as that of the data size. That
again seems to be a problem. I am yet to find out the
reason..

Thanks,
   george


 --- Rob Jose <rj...@comcast.net> wrote:
> Hello
> I have indexed several thousand (52 to be exact)
> text files and I keep running out of disk space to
> store the indexes.  The size of the documents I have
> indexed is around 2.5 GB.  The size of the Lucene
> indexes is around 287 GB.  Does this seem correct?
> I am not storing the contents of the file, just
> indexing and tokenizing.  I am using Lucene 1.3
> final.  Can you guys let me know what you are
> experiencing?  I don't want to go into production
> with something that I should be configuring better.
>
>
> I am not sure if this helps, but I have a temp index
> and a real index.  I index the file into the temp
> index, and then merge the temp index into the real
> index using the addIndexes method on the
> IndexWriter.  I have also set the production writer
> setUseCompoundFile to true.  I did not set this on
> the temp index.  The last thing that I do before
> closing the production writer is to call the
> optimize method.
>
> I would really appreciate any ideas to get the index
> size smaller if it is at all possible.
>
> Thanks
> Rob





___________________________________________________________ALL-NEW Yahoo!
Messenger - all new features - even more fun!  http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Index Size

Posted by Honey George <ho...@yahoo.com>.
Hi,
 Please check for hidden files in the index folder. If
you are using linx, do something like

ls -al <index folder>

I am also facing a similar problem where the index
size is greater than the data size. In my case there
were some hidden temproary files which the lucene
creates.
That was taking half of the total size.

My problem is that after deleting the temporary files,
the index size is same as that of the data size. That
again seems to be a problem. I am yet to find out the
reason..

Thanks,
   george


 --- Rob Jose <rj...@comcast.net> wrote: 
> Hello
> I have indexed several thousand (52 to be exact)
> text files and I keep running out of disk space to
> store the indexes.  The size of the documents I have
> indexed is around 2.5 GB.  The size of the Lucene
> indexes is around 287 GB.  Does this seem correct? 
> I am not storing the contents of the file, just
> indexing and tokenizing.  I am using Lucene 1.3
> final.  Can you guys let me know what you are
> experiencing?  I don't want to go into production
> with something that I should be configuring better. 
> 
> 
> I am not sure if this helps, but I have a temp index
> and a real index.  I index the file into the temp
> index, and then merge the temp index into the real
> index using the addIndexes method on the
> IndexWriter.  I have also set the production writer
> setUseCompoundFile to true.  I did not set this on
> the temp index.  The last thing that I do before
> closing the production writer is to call the
> optimize method.  
> 
> I would really appreciate any ideas to get the index
> size smaller if it is at all possible.
> 
> Thanks
> Rob 


	
	
		
___________________________________________________________ALL-NEW Yahoo! Messenger - all new features - even more fun!  http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org