You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by "Kevin A. Burton" <bu...@newsmonster.org> on 2005/01/22 01:17:19 UTC

Opening up one large index takes 940M or memory?

We have one large index right now... its about 60G ... When I open it 
the Java VM used 940M of memory.  The VM does nothing else besides open 
this index.

Here's the code:

        System.out.println( "opening..." );

        long before = System.currentTimeMillis();
        Directory dir = FSDirectory.getDirectory( 
"/var/ksa/index-1078106952160/", false );
        IndexReader ir = IndexReader.open( dir );
        System.out.println( ir.getClass() );
        long after = System.currentTimeMillis();
        System.out.println( "opening...done - duration: " + 
(after-before) );

        System.out.println( "totalMemory: " + 
Runtime.getRuntime().totalMemory() );
        System.out.println( "freeMemory: " + 
Runtime.getRuntime().freeMemory() );

Is there any way to reduce this footprint?  The index is fully 
optimized... I'm willing to take a performance hit if necessary.  Is 
this documented anywhere?

Kevin

-- 

Use Rojo (RSS/Atom aggregator).  Visit http://rojo.com. Ask me for an 
invite!  Also see irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html

If you're interested in RSS, Weblogs, Social Networking, etc... then you 
should work for Rojo!  If you recommend someone and we hire them you'll 
get a free iPod!
    
Kevin A. Burton, Location - San Francisco, CA
       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Opening up one large index takes 940M or memory?

Posted by "Kevin A. Burton" <bu...@newsmonster.org>.

Chris Hostetter wrote:

>: We have one large index right now... its about 60G ... When I open it
>: the Java VM used 940M of memory.  The VM does nothing else besides open
>
>Just out of curiosity, have you tried turning on the verbose gc log, and
>putting in some thread sleeps after you open the reader, to see if the
>memory footprint "settles down" after a little while?  You're currently
>checking the memoory usage immediately after opening the index, and some
>of that memory may be used holding transient data that will get freed up
>after some GC iterations.
>  
>
Actually I haven't but to be honest the numbers seem dead on. The VM 
heap wouldn't reallocate if it didn't need that much memory and this is 
almost exactly the behavior I'm seeing in product.

Though I guess it wouldn't hurt ;)

Kevin

-- 

Use Rojo (RSS/Atom aggregator).  Visit http://rojo.com. Ask me for an 
invite!  Also see irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html

If you're interested in RSS, Weblogs, Social Networking, etc... then you 
should work for Rojo!  If you recommend someone and we hire them you'll 
get a free iPod!
    
Kevin A. Burton, Location - San Francisco, CA
       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Opening up one large index takes 940M or memory?

Posted by Chris Hostetter <ho...@fucit.org>.

: We have one large index right now... its about 60G ... When I open it
: the Java VM used 940M of memory.  The VM does nothing else besides open

Just out of curiosity, have you tried turning on the verbose gc log, and
putting in some thread sleeps after you open the reader, to see if the
memory footprint "settles down" after a little while?  You're currently
checking the memoory usage immediately after opening the index, and some
of that memory may be used holding transient data that will get freed up
after some GC iterations.


:         IndexReader ir = IndexReader.open( dir );
:         System.out.println( ir.getClass() );
:         long after = System.currentTimeMillis();
:         System.out.println( "opening...done - duration: " +
: (after-before) );
:
:         System.out.println( "totalMemory: " +
: Runtime.getRuntime().totalMemory() );
:         System.out.println( "freeMemory: " +
: Runtime.getRuntime().freeMemory() );





-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Opening up one large index takes 940M or memory?

Posted by petite_abeille <pe...@mac.com>.

On Jan 24, 2005, at 00:10, Vic wrote:

> (Is there a btree seralization impl in java?)

http://jdbm.sourceforge.net/

Cheers

--
PA
http://alt.textdrive.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Opening up one large index takes 940M or memory?

Posted by Vic <vi...@friendvu.com>.

Sounds interesting. (Is there a btree seralization impl in java?)
.V

jian chen wrote:

>Hi,
>
>If it is really the case that every 128th term is loaded into memory.
>Could you use a relational database or b-tree to index to do the work
>of indexing of the terms instead?
>
>Even if you create another level of indexing on top of the .tii fle,
>it is just a hack and would not scale well.
>
>I would think a b/b+ tree based approach is the way to go for better
>memory utilization.
>
>Cheers,
>
>Jian
>
>
>On Sat, 22 Jan 2005 08:32:50 -0800 (PST), Otis Gospodnetic
><ot...@yahoo.com> wrote:
>  
>
>>There Kevin, that's what I was referring to, the .tii file.
>>
>>Otis
>>
>>--- Paul Elschot <pa...@xs4all.nl> wrote:
>>
>>    
>>
>>>On Saturday 22 January 2005 01:39, Kevin A. Burton wrote:
>>>      
>>>
>>>>Kevin A. Burton wrote:
>>>>
>>>>        
>>>>
>>>>>We have one large index right now... its about 60G ... When I
>>>>>          
>>>>>
>>>open it
>>>      
>>>
>>>>>the Java VM used 940M of memory.  The VM does nothing else
>>>>>          
>>>>>
>>>besides
>>>      
>>>
>>>>>open this index.
>>>>>          
>>>>>
>>>>After thinking about it I guess 1.5% of memory per index really
>>>>        
>>>>
>>>isn't
>>>      
>>>
>>>>THAT bad.  What would be nice if there was a way to do this from
>>>>        
>>>>
>>>disk
>>>      
>>>
>>>>and then use the a buffer (either via the filesystem or in-vm
>>>>        
>>>>
>>>memory) to
>>>      
>>>
>>>>access these variables.
>>>>        
>>>>
>>>It's even documented. From:
>>>http://jakarta.apache.org/lucene/docs/fileformats.html :
>>>
>>>      
>>>
>>>>The term info index, or .tii file.
>>>>This contains every IndexIntervalth entry from the .tis file, along
>>>>        
>>>>
>>>with its
>>>      
>>>
>>>>location in the "tis" file. This is designed to be read entirely
>>>>        
>>>>
>>>into memory
>>>      
>>>
>>>>and used to provide random access to the "tis" file.
>>>>        
>>>>
>>>My guess is that this is what you see happening.
>>>To see the actuall .tii file, you need the non default file format.
>>>
>>>Once searching starts you'll also see that the field norms are
>>>loaded,
>>>these take one byte per searched field per document.
>>>
>>>      
>>>
>>>>This would be similar to the way the MySQL index cache works...
>>>>        
>>>>
>>>It would be possible to add another level of indexing to the terms.
>>>No one has done this yet, so I guess it's prefered to buy RAM
>>>instead...
>>>
>>>Regards,
>>>Paul Elschot
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>>
>>    
>>


-- 
RiA-SoA w/JDNC <http://www.SandraSF.com> forums
- help develop a community
My blog <http://www.sandrasf.com/adminBlog>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Opening up one large index takes 940M or memory?

Posted by jian chen <ch...@gmail.com>.

Hi,

If it is really the case that every 128th term is loaded into memory.
Could you use a relational database or b-tree to index to do the work
of indexing of the terms instead?

Even if you create another level of indexing on top of the .tii fle,
it is just a hack and would not scale well.

I would think a b/b+ tree based approach is the way to go for better
memory utilization.

Cheers,

Jian


On Sat, 22 Jan 2005 08:32:50 -0800 (PST), Otis Gospodnetic
<ot...@yahoo.com> wrote:
> There Kevin, that's what I was referring to, the .tii file.
> 
> Otis
> 
> --- Paul Elschot <pa...@xs4all.nl> wrote:
> 
> > On Saturday 22 January 2005 01:39, Kevin A. Burton wrote:
> > > Kevin A. Burton wrote:
> > >
> > > > We have one large index right now... its about 60G ... When I
> > open it
> > > > the Java VM used 940M of memory.  The VM does nothing else
> > besides
> > > > open this index.
> > >
> > > After thinking about it I guess 1.5% of memory per index really
> > isn't
> > > THAT bad.  What would be nice if there was a way to do this from
> > disk
> > > and then use the a buffer (either via the filesystem or in-vm
> > memory) to
> > > access these variables.
> >
> > It's even documented. From:
> > http://jakarta.apache.org/lucene/docs/fileformats.html :
> >
> > >The term info index, or .tii file.
> > >This contains every IndexIntervalth entry from the .tis file, along
> > with its
> > >location in the "tis" file. This is designed to be read entirely
> > into memory
> > >and used to provide random access to the "tis" file.
> >
> > My guess is that this is what you see happening.
> > To see the actuall .tii file, you need the non default file format.
> >
> > Once searching starts you'll also see that the field norms are
> > loaded,
> > these take one byte per searched field per document.
> >
> > > This would be similar to the way the MySQL index cache works...
> >
> > It would be possible to add another level of indexing to the terms.
> > No one has done this yet, so I guess it's prefered to buy RAM
> > instead...
> >
> > Regards,
> > Paul Elschot
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Opening up one large index takes 940M or memory?

Posted by petite_abeille <pe...@mac.com>.

On Jan 22, 2005, at 23:50, Kevin A. Burton wrote:

> The problem I think for everyone right now is that 32bits just doesn't 
> cut it in production systems...   2G of memory per process and you 
> really start to feel it.

Hmmm... no... no pain at all... or perhaps you are implying that your 
entire system is running on one puny JVM instance... in that case, this 
is perhaps more of a design problem than an implementation one... 
YMMV...

Cheers

--
PA
http://alt.textdrive.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Opening up one large index takes 940M or memory?

Posted by "Kevin A. Burton" <bu...@newsmonster.org>.

Paul Elschot wrote:

>>This would be similar to the way the MySQL index cache works...
>>    
>>
>
>It would be possible to add another level of indexing to the terms.
>No one has done this yet, so I guess it's prefered to buy RAM instead...
>
>  
>
The problem I think for everyone right now is that 32bits just doesn't 
cut it in production systems...   2G of memory per process and you 
really start to feel it.

Kevin

-- 

Use Rojo (RSS/Atom aggregator).  Visit http://rojo.com. Ask me for an 
invite!  Also see irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html

If you're interested in RSS, Weblogs, Social Networking, etc... then you 
should work for Rojo!  If you recommend someone and we hire them you'll 
get a free iPod!

Kevin A. Burton, Location - San Francisco, CA
       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412

Re: Opening up one large index takes 940M or memory?

Posted by Otis Gospodnetic <ot...@yahoo.com>.

There Kevin, that's what I was referring to, the .tii file.

Otis

--- Paul Elschot <pa...@xs4all.nl> wrote:

> On Saturday 22 January 2005 01:39, Kevin A. Burton wrote:
> > Kevin A. Burton wrote:
> > 
> > > We have one large index right now... its about 60G ... When I
> open it 
> > > the Java VM used 940M of memory.  The VM does nothing else
> besides 
> > > open this index.
> > 
> > After thinking about it I guess 1.5% of memory per index really
> isn't 
> > THAT bad.  What would be nice if there was a way to do this from
> disk 
> > and then use the a buffer (either via the filesystem or in-vm
> memory) to 
> > access these variables.
> 
> It's even documented. From:
> http://jakarta.apache.org/lucene/docs/fileformats.html :
> 
> >The term info index, or .tii file. 
> >This contains every IndexIntervalth entry from the .tis file, along
> with its
> >location in the "tis" file. This is designed to be read entirely
> into memory
> >and used to provide random access to the "tis" file. 
> 
> My guess is that this is what you see happening.
> To see the actuall .tii file, you need the non default file format.
> 
> Once searching starts you'll also see that the field norms are
> loaded,
> these take one byte per searched field per document.
> 
> > This would be similar to the way the MySQL index cache works...
> 
> It would be possible to add another level of indexing to the terms.
> No one has done this yet, so I guess it's prefered to buy RAM
> instead...
> 
> Regards,
> Paul Elschot
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Opening up one large index takes 940M or memory?

Posted by Paul Elschot <pa...@xs4all.nl>.

On Saturday 22 January 2005 01:39, Kevin A. Burton wrote:
> Kevin A. Burton wrote:
> 
> > We have one large index right now... its about 60G ... When I open it 
> > the Java VM used 940M of memory.  The VM does nothing else besides 
> > open this index.
> 
> After thinking about it I guess 1.5% of memory per index really isn't 
> THAT bad.  What would be nice if there was a way to do this from disk 
> and then use the a buffer (either via the filesystem or in-vm memory) to 
> access these variables.

It's even documented. From:
http://jakarta.apache.org/lucene/docs/fileformats.html :

>The term info index, or .tii file. 
>This contains every IndexIntervalth entry from the .tis file, along with its
>location in the "tis" file. This is designed to be read entirely into memory
>and used to provide random access to the "tis" file. 

My guess is that this is what you see happening.
To see the actuall .tii file, you need the non default file format.

Once searching starts you'll also see that the field norms are loaded,
these take one byte per searched field per document.

> This would be similar to the way the MySQL index cache works...

It would be possible to add another level of indexing to the terms.
No one has done this yet, so I guess it's prefered to buy RAM instead...

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Opening up one large index takes 940M or memory?

Posted by "Kevin A. Burton" <bu...@newsmonster.org>.

Kevin A. Burton wrote:

> We have one large index right now... its about 60G ... When I open it 
> the Java VM used 940M of memory.  The VM does nothing else besides 
> open this index.

After thinking about it I guess 1.5% of memory per index really isn't 
THAT bad.  What would be nice if there was a way to do this from disk 
and then use the a buffer (either via the filesystem or in-vm memory) to 
access these variables.

This would be similar to the way the MySQL index cache works...

Kevin

-- 

Use Rojo (RSS/Atom aggregator).  Visit http://rojo.com. Ask me for an 
invite!  Also see irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html

If you're interested in RSS, Weblogs, Social Networking, etc... then you 
should work for Rojo!  If you recommend someone and we hire them you'll 
get a free iPod!

Kevin A. Burton, Location - San Francisco, CA
       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Opening up one large index takes 940M or memory?

Posted by Doug Cutting <cu...@apache.org>.

Kevin A. Burton wrote:
> 1.  Do I have to do this with a NEW directory?  Our nightly index merger 
> uses an existing "target" index which I assume will re-use the same 
> settings as before?  I did this last night and it still seems to use the 
> same amount of memory.  Above you assert that I should use a new empty 
> directory and I'll try that tonight.

You need to re-write the entire index using a modified 
TermIndexWriter.java.  Optimize rewrites the entire index but is 
destructive.  Merging into a new empty directory is a non-destructive 
way to do this.

> 2. This isn't destructive is it?  I mean I'll be able to move BACK to a 
> TermInfosWriter.indexInterval of 128 right?

Yes, you can go back if you re-optimize or re-merge again.

Also, there's no need to CC my personal email address.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Opening up one large index takes 940M or memory?

Posted by "Kevin A. Burton" <bu...@newsmonster.org>.

Doug Cutting wrote:

> Kevin A. Burton wrote:
>
>> Is there any way to reduce this footprint?  The index is fully 
>> optimized... I'm willing to take a performance hit if necessary.  Is 
>> this documented anywhere?
>
>
> You can increase TermInfosWriter.indexInterval.  You'll need to 
> re-write the .tii file for this to take effect.  The simplest way to 
> do this is to use IndexWriter.addIndexes(), adding your index to a 
> new, empty, directory.  This will of course take a while for a 60GB 
> index...
>
(Note... when this works I'll note my findings in a wiki page for future 
developers)

Two more questions:

1.  Do I have to do this with a NEW directory?  Our nightly index merger 
uses an existing "target" index which I assume will re-use the same 
settings as before?  I did this last night and it still seems to use the 
same amount of memory.  Above you assert that I should use a new empty 
directory and I'll try that tonight.

2. This isn't destructive is it?  I mean I'll be able to move BACK to a 
TermInfosWriter.indexInterval of 128 right?

Thanks!

Kevin

-- 

Use Rojo (RSS/Atom aggregator).  Visit http://rojo.com. Ask me for an 
invite!  Also see irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html

If you're interested in RSS, Weblogs, Social Networking, etc... then you 
should work for Rojo!  If you recommend someone and we hire them you'll 
get a free iPod!

Kevin A. Burton, Location - San Francisco, CA
       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Opening up one large index takes 940M or memory?

Posted by Doug Cutting <cu...@apache.org>.

Kevin A. Burton wrote:
> Is there any way to reduce this footprint?  The index is fully 
> optimized... I'm willing to take a performance hit if necessary.  Is 
> this documented anywhere?

You can increase TermInfosWriter.indexInterval.  You'll need to re-write 
the .tii file for this to take effect.  The simplest way to do this is 
to use IndexWriter.addIndexes(), adding your index to a new, empty, 
directory.  This will of course take a while for a 60GB index...

Doubling TermInfosWriter.indexInterval should half the Term memory usage 
and double the time required to look up terms in the dictionary.  With 
an index this large the the latter is probably not an issue, since 
processing term frequency and proximity data probably overwhelmingly 
dominate search performance.

Perhaps we should make this public by adding an IndexWriter method?

Also, you can list the size of your .tii file by using the main() from 
CompoundFileReader.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Opening up one large index takes 940M or memory?

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Yes, I remember your email about the large number of Terms.  If it can
be avoided and you figure out how to do it, I'd love to patch
something. :)

Otis

--- "Kevin A. Burton" <bu...@newsmonster.org> wrote:

> Otis Gospodnetic wrote:
> 
> >It would be interesting to know _what_exactly_ uses your memory. 
> >Running under an optimizer should tell you that.
> >
> >The only thing that comes to mind is... can't remember the details
> now,
> >but when the index is opened, I believe every 128th term is read
> into
> >memory.  This, I believe, helps with index seeks at search time.  I
> >wonder if this is what's using your memory.  The number '128' can't
> be
> >modified just like that, but somebody (Julien?) has modified the
> code
> >in the past to make this variable.  That's the only thing I can
> think
> >of right now and it may or may not be an idea in the right
> direction.
> >  
> >
> I loaded it into a profiler a long time ago. Most of the code was due
> to 
> Term classes being loaded into memory.
> 
> I might try to get some time to load it into a profiler on monday...
> 
> Kevin
> 
> -- 
> 
> Use Rojo (RSS/Atom aggregator).  Visit http://rojo.com. Ask me for an
> 
> invite!  Also see irc.freenode.net #rojo if you want to chat.
> 
> Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
> 
> If you're interested in RSS, Weblogs, Social Networking, etc... then
> you 
> should work for Rojo!  If you recommend someone and we hire them
> you'll 
> get a free iPod!
>     
> Kevin A. Burton, Location - San Francisco, CA
>        AIM/YIM - sfburtonator,  Web - http://peerfear.org/
> GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Opening up one large index takes 940M or memory?

Posted by "Kevin A. Burton" <bu...@newsmonster.org>.

Otis Gospodnetic wrote:

>It would be interesting to know _what_exactly_ uses your memory. 
>Running under an optimizer should tell you that.
>
>The only thing that comes to mind is... can't remember the details now,
>but when the index is opened, I believe every 128th term is read into
>memory.  This, I believe, helps with index seeks at search time.  I
>wonder if this is what's using your memory.  The number '128' can't be
>modified just like that, but somebody (Julien?) has modified the code
>in the past to make this variable.  That's the only thing I can think
>of right now and it may or may not be an idea in the right direction.
>  
>
I loaded it into a profiler a long time ago. Most of the code was due to 
Term classes being loaded into memory.

I might try to get some time to load it into a profiler on monday...

Kevin

-- 

Use Rojo (RSS/Atom aggregator).  Visit http://rojo.com. Ask me for an 
invite!  Also see irc.freenode.net #rojo if you want to chat.

Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html

If you're interested in RSS, Weblogs, Social Networking, etc... then you 
should work for Rojo!  If you recommend someone and we hire them you'll 
get a free iPod!
    
Kevin A. Burton, Location - San Francisco, CA
       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Opening up one large index takes 940M or memory?

Posted by Otis Gospodnetic <ot...@yahoo.com>.

It would be interesting to know _what_exactly_ uses your memory. 
Running under an optimizer should tell you that.

The only thing that comes to mind is... can't remember the details now,
but when the index is opened, I believe every 128th term is read into
memory.  This, I believe, helps with index seeks at search time.  I
wonder if this is what's using your memory.  The number '128' can't be
modified just like that, but somebody (Julien?) has modified the code
in the past to make this variable.  That's the only thing I can think
of right now and it may or may not be an idea in the right direction.

Otis


--- "Kevin A. Burton" <bu...@newsmonster.org> wrote:
> We have one large index right now... its about 60G ... When I open it
> 
> the Java VM used 940M of memory.  The VM does nothing else besides
> open 
> this index.
> 
> Here's the code:
> 
>         System.out.println( "opening..." );
> 
>         long before = System.currentTimeMillis();
>         Directory dir = FSDirectory.getDirectory( 
> "/var/ksa/index-1078106952160/", false );
>         IndexReader ir = IndexReader.open( dir );
>         System.out.println( ir.getClass() );
>         long after = System.currentTimeMillis();
>         System.out.println( "opening...done - duration: " + 
> (after-before) );
> 
>         System.out.println( "totalMemory: " + 
> Runtime.getRuntime().totalMemory() );
>         System.out.println( "freeMemory: " + 
> Runtime.getRuntime().freeMemory() );
> 
> Is there any way to reduce this footprint?  The index is fully 
> optimized... I'm willing to take a performance hit if necessary.  Is 
> this documented anywhere?
> 
> Kevin
> 
> -- 
> 
> Use Rojo (RSS/Atom aggregator).  Visit http://rojo.com. Ask me for an
> 
> invite!  Also see irc.freenode.net #rojo if you want to chat.
> 
> Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
> 
> If you're interested in RSS, Weblogs, Social Networking, etc... then
> you 
> should work for Rojo!  If you recommend someone and we hire them
> you'll 
> get a free iPod!
>     
> Kevin A. Burton, Location - San Francisco, CA
>        AIM/YIM - sfburtonator,  Web - http://peerfear.org/
> GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org