You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Gaurav gupta <gu...@gmail.com> on 2014/09/26 21:11:32 UTC

Optimum Lucene’s MMapDirectory size on 64bit OS

Hi,

As per the post "The Generics Policeman Blog
<http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html>"
, I am using the MMapDirectory for faster access(search and update
operations ,mainly search) of Lucene 4.8.1 index files. I am contemplating
what is the optimal maximum MMap value for my indexes. Is default i.e. 1 GB
(1 << 30) or higher?

I have 6 indexes of size varying from 65GB to 6 GB. Currently, I am using 1
GB as maxChunkSize : - *MMapDirectory(file, null, 1<<30) *for all indexes.
But thinking of specifying the higher value for mmap (1 GB or higher) for
bigger index having 65GB size and lower value (0.5 GB or less) for smaller
index having size of 6 GB. Any suggestion/guidance on it ?

Also, per blog mmap is not a size of physical memory allocation but just a
address space to map the index files. How to allocate more RAM to index
files for better performance? We have enough RAM free out of 64 GB. Per
blog, one should use the  mmap file, like - *MMapDirectory(file, null,
1<<30) *and let OS manage the physical memory allocation for the index
files. Is my understanding correct ?

The Generics Policeman Blog
<http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html> :-

   - *MMapDirectory does not consume additional memory and the size of
   mapped index files is not limited by the physical memory available on your
   server.* By mmap() files, we only reserve address space not memory!
   Remember, address space on 64bit platforms is for free!

Thanks
Gaurav

RE: Optimum Lucene’s MMapDirectory size on 64bit OS

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

In general, the chunk size should be as large as possible. It merely only exists for 32 bit environments, to work around the limited address space, where fragmentation causes issues earlier. With 64 bit operating systems, fragmentation of address space is also an issue, but only if your total size of all indexes is like several terabytes. Keep in mind, that smaller chunk sizes cause more work for lucene, because it is more likely that a random read hits another mapped region than the current one, so it has to switch buffers (which is done through ByteBuffer's exception handling). The maximum size of 1 GiB is caused by the maximum size of ByteBuffers in the JVM: They have 32 bit signed offsets only, so only 2 GiB - 1 Byte maximum capacity -> rounded down to next power of 2, this is 1 GiB.

We don't know anything about how the kernel of your OS assigns address space, but why do you think a "higher" address space is better for a larger index?

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Gaurav gupta [mailto:gupta.gaurav0125@gmail.com]
> Sent: Saturday, September 27, 2014 5:20 AM
> To: java-user@lucene.apache.org
> Subject: Re: Optimum Lucene’s MMapDirectory size on 64bit OS
> 
> Thanks Uwe for the insight !
> 
> Also, is it advisable to set the lower chunk size for smaller indexes, like below
> or let Lucene/OS manage by itself. I am just guessing that assigning lower
> value to smaller index will make sure that bigger index are getting higher
> mmap address space.
> 
> *Index Name  Total Records Size (in GB)   What should be the max. or
> optimal chunk size ?*
> Address index  106,192,963.00    65             1 GiB
> Name index     97,924,594.00      44             1 GiB
> GovtId index   81,178,958.00       11              512 MB
> Phone index    169,691,376.00    14              512 MB
> Email index    46,602,090.00        5               256 MB
> Date index     77,243,714.00        6.5             256 MB
> 
> Thanks
> 
> On Sat, Sep 27, 2014 at 3:40 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> 
> > Hi,
> >
> > 1 GiB is the maximum possible. The chunk size is only applicable for
> > 32 bit JDKs because of limited address space.
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> > > -----Original Message-----
> > > From: Gaurav gupta [mailto:gupta.gaurav0125@gmail.com]
> > > Sent: Friday, September 26, 2014 9:12 PM
> > > To: java-user@lucene.apache.org
> > > Subject: Optimum Lucene’s MMapDirectory size on 64bit OS
> > >
> > > Hi,
> > >
> > > As per the post "The Generics Policeman Blog
> > > <http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-
> > > 64bit.html>"
> > > , I am using the MMapDirectory for faster access(search and update
> > > operations ,mainly search) of Lucene 4.8.1 index files. I am
> > contemplating
> > > what is the optimal maximum MMap value for my indexes. Is default
> > > i.e. 1 GB
> > > (1 << 30) or higher?
> > >
> > > I have 6 indexes of size varying from 65GB to 6 GB. Currently, I am
> > using 1 GB
> > > as maxChunkSize : - *MMapDirectory(file, null, 1<<30) *for all indexes.
> > > But thinking of specifying the higher value for mmap (1 GB or
> > > higher) for bigger index having 65GB size and lower value (0.5 GB or
> > > less) for
> > smaller
> > > index having size of 6 GB. Any suggestion/guidance on it ?
> > >
> > > Also, per blog mmap is not a size of physical memory allocation but
> > > just
> > a
> > > address space to map the index files. How to allocate more RAM to
> > > index files for better performance? We have enough RAM free out of
> > > 64 GB. Per blog, one should use the  mmap file, like -
> > > *MMapDirectory(file, null,
> > > 1<<30) *and let OS manage the physical memory allocation for the
> > > index files. Is my understanding correct ?
> > >
> > > The Generics Policeman Blog
> > > <http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-
> > > 64bit.html> :-
> > >
> > >    - *MMapDirectory does not consume additional memory and the size of
> > >    mapped index files is not limited by the physical memory
> > > available on
> > your
> > >    server.* By mmap() files, we only reserve address space not memory!
> > >    Remember, address space on 64bit platforms is for free!
> > >
> > > Thanks
> > > Gaurav
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Optimum Lucene’s MMapDirectory size on 64bit OS

Posted by Gaurav gupta <gu...@gmail.com>.
Thanks Uwe for the insight !

Also, is it advisable to set the lower chunk size for smaller indexes, like
below or let Lucene/OS manage by itself. I am just guessing that assigning
lower value to smaller index will make sure that bigger index are getting
higher mmap address space.

*Index Name  Total Records Size (in GB)   What should be the max. or
optimal chunk size ?*
Address index  106,192,963.00    65             1 GiB
Name index     97,924,594.00      44             1 GiB
GovtId index   81,178,958.00       11              512 MB
Phone index    169,691,376.00    14              512 MB
Email index    46,602,090.00        5               256 MB
Date index     77,243,714.00        6.5             256 MB

Thanks

On Sat, Sep 27, 2014 at 3:40 AM, Uwe Schindler <uw...@thetaphi.de> wrote:

> Hi,
>
> 1 GiB is the maximum possible. The chunk size is only applicable for 32
> bit JDKs because of limited address space.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Gaurav gupta [mailto:gupta.gaurav0125@gmail.com]
> > Sent: Friday, September 26, 2014 9:12 PM
> > To: java-user@lucene.apache.org
> > Subject: Optimum Lucene’s MMapDirectory size on 64bit OS
> >
> > Hi,
> >
> > As per the post "The Generics Policeman Blog
> > <http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-
> > 64bit.html>"
> > , I am using the MMapDirectory for faster access(search and update
> > operations ,mainly search) of Lucene 4.8.1 index files. I am
> contemplating
> > what is the optimal maximum MMap value for my indexes. Is default i.e. 1
> > GB
> > (1 << 30) or higher?
> >
> > I have 6 indexes of size varying from 65GB to 6 GB. Currently, I am
> using 1 GB
> > as maxChunkSize : - *MMapDirectory(file, null, 1<<30) *for all indexes.
> > But thinking of specifying the higher value for mmap (1 GB or higher) for
> > bigger index having 65GB size and lower value (0.5 GB or less) for
> smaller
> > index having size of 6 GB. Any suggestion/guidance on it ?
> >
> > Also, per blog mmap is not a size of physical memory allocation but just
> a
> > address space to map the index files. How to allocate more RAM to index
> > files for better performance? We have enough RAM free out of 64 GB. Per
> > blog, one should use the  mmap file, like - *MMapDirectory(file, null,
> > 1<<30) *and let OS manage the physical memory allocation for the index
> > files. Is my understanding correct ?
> >
> > The Generics Policeman Blog
> > <http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-
> > 64bit.html> :-
> >
> >    - *MMapDirectory does not consume additional memory and the size of
> >    mapped index files is not limited by the physical memory available on
> your
> >    server.* By mmap() files, we only reserve address space not memory!
> >    Remember, address space on 64bit platforms is for free!
> >
> > Thanks
> > Gaurav
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: Optimum Lucene’s MMapDirectory size on 64bit OS

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

1 GiB is the maximum possible. The chunk size is only applicable for 32 bit JDKs because of limited address space.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Gaurav gupta [mailto:gupta.gaurav0125@gmail.com]
> Sent: Friday, September 26, 2014 9:12 PM
> To: java-user@lucene.apache.org
> Subject: Optimum Lucene’s MMapDirectory size on 64bit OS
> 
> Hi,
> 
> As per the post "The Generics Policeman Blog
> <http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-
> 64bit.html>"
> , I am using the MMapDirectory for faster access(search and update
> operations ,mainly search) of Lucene 4.8.1 index files. I am contemplating
> what is the optimal maximum MMap value for my indexes. Is default i.e. 1
> GB
> (1 << 30) or higher?
> 
> I have 6 indexes of size varying from 65GB to 6 GB. Currently, I am using 1 GB
> as maxChunkSize : - *MMapDirectory(file, null, 1<<30) *for all indexes.
> But thinking of specifying the higher value for mmap (1 GB or higher) for
> bigger index having 65GB size and lower value (0.5 GB or less) for smaller
> index having size of 6 GB. Any suggestion/guidance on it ?
> 
> Also, per blog mmap is not a size of physical memory allocation but just a
> address space to map the index files. How to allocate more RAM to index
> files for better performance? We have enough RAM free out of 64 GB. Per
> blog, one should use the  mmap file, like - *MMapDirectory(file, null,
> 1<<30) *and let OS manage the physical memory allocation for the index
> files. Is my understanding correct ?
> 
> The Generics Policeman Blog
> <http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-
> 64bit.html> :-
> 
>    - *MMapDirectory does not consume additional memory and the size of
>    mapped index files is not limited by the physical memory available on your
>    server.* By mmap() files, we only reserve address space not memory!
>    Remember, address space on 64bit platforms is for free!
> 
> Thanks
> Gaurav


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org