You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Gaurav gupta <gu...@gmail.com> on 2014/09/26 21:11:32 UTC
Optimum Lucene’s MMapDirectory size on 64bit OS
Hi,
As per the post "The Generics Policeman Blog
<http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html>"
, I am using the MMapDirectory for faster access(search and update
operations ,mainly search) of Lucene 4.8.1 index files. I am contemplating
what is the optimal maximum MMap value for my indexes. Is default i.e. 1 GB
(1 << 30) or higher?
I have 6 indexes of size varying from 65GB to 6 GB. Currently, I am using 1
GB as maxChunkSize : - *MMapDirectory(file, null, 1<<30) *for all indexes.
But thinking of specifying the higher value for mmap (1 GB or higher) for
bigger index having 65GB size and lower value (0.5 GB or less) for smaller
index having size of 6 GB. Any suggestion/guidance on it ?
Also, per blog mmap is not a size of physical memory allocation but just a
address space to map the index files. How to allocate more RAM to index
files for better performance? We have enough RAM free out of 64 GB. Per
blog, one should use the mmap file, like - *MMapDirectory(file, null,
1<<30) *and let OS manage the physical memory allocation for the index
files. Is my understanding correct ?
The Generics Policeman Blog
<http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html> :-
- *MMapDirectory does not consume additional memory and the size of
mapped index files is not limited by the physical memory available on your
server.* By mmap() files, we only reserve address space not memory!
Remember, address space on 64bit platforms is for free!
Thanks
Gaurav
RE: Optimum Lucene’s MMapDirectory size on 64bit OS
Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,
In general, the chunk size should be as large as possible. It merely only exists for 32 bit environments, to work around the limited address space, where fragmentation causes issues earlier. With 64 bit operating systems, fragmentation of address space is also an issue, but only if your total size of all indexes is like several terabytes. Keep in mind, that smaller chunk sizes cause more work for lucene, because it is more likely that a random read hits another mapped region than the current one, so it has to switch buffers (which is done through ByteBuffer's exception handling). The maximum size of 1 GiB is caused by the maximum size of ByteBuffers in the JVM: They have 32 bit signed offsets only, so only 2 GiB - 1 Byte maximum capacity -> rounded down to next power of 2, this is 1 GiB.
We don't know anything about how the kernel of your OS assigns address space, but why do you think a "higher" address space is better for a larger index?
Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de
> -----Original Message-----
> From: Gaurav gupta [mailto:gupta.gaurav0125@gmail.com]
> Sent: Saturday, September 27, 2014 5:20 AM
> To: java-user@lucene.apache.org
> Subject: Re: Optimum Lucene’s MMapDirectory size on 64bit OS
>
> Thanks Uwe for the insight !
>
> Also, is it advisable to set the lower chunk size for smaller indexes, like below
> or let Lucene/OS manage by itself. I am just guessing that assigning lower
> value to smaller index will make sure that bigger index are getting higher
> mmap address space.
>
> *Index Name Total Records Size (in GB) What should be the max. or
> optimal chunk size ?*
> Address index 106,192,963.00 65 1 GiB
> Name index 97,924,594.00 44 1 GiB
> GovtId index 81,178,958.00 11 512 MB
> Phone index 169,691,376.00 14 512 MB
> Email index 46,602,090.00 5 256 MB
> Date index 77,243,714.00 6.5 256 MB
>
> Thanks
>
> On Sat, Sep 27, 2014 at 3:40 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
> > Hi,
> >
> > 1 GiB is the maximum possible. The chunk size is only applicable for
> > 32 bit JDKs because of limited address space.
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> > > -----Original Message-----
> > > From: Gaurav gupta [mailto:gupta.gaurav0125@gmail.com]
> > > Sent: Friday, September 26, 2014 9:12 PM
> > > To: java-user@lucene.apache.org
> > > Subject: Optimum Lucene’s MMapDirectory size on 64bit OS
> > >
> > > Hi,
> > >
> > > As per the post "The Generics Policeman Blog
> > > <http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-
> > > 64bit.html>"
> > > , I am using the MMapDirectory for faster access(search and update
> > > operations ,mainly search) of Lucene 4.8.1 index files. I am
> > contemplating
> > > what is the optimal maximum MMap value for my indexes. Is default
> > > i.e. 1 GB
> > > (1 << 30) or higher?
> > >
> > > I have 6 indexes of size varying from 65GB to 6 GB. Currently, I am
> > using 1 GB
> > > as maxChunkSize : - *MMapDirectory(file, null, 1<<30) *for all indexes.
> > > But thinking of specifying the higher value for mmap (1 GB or
> > > higher) for bigger index having 65GB size and lower value (0.5 GB or
> > > less) for
> > smaller
> > > index having size of 6 GB. Any suggestion/guidance on it ?
> > >
> > > Also, per blog mmap is not a size of physical memory allocation but
> > > just
> > a
> > > address space to map the index files. How to allocate more RAM to
> > > index files for better performance? We have enough RAM free out of
> > > 64 GB. Per blog, one should use the mmap file, like -
> > > *MMapDirectory(file, null,
> > > 1<<30) *and let OS manage the physical memory allocation for the
> > > index files. Is my understanding correct ?
> > >
> > > The Generics Policeman Blog
> > > <http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-
> > > 64bit.html> :-
> > >
> > > - *MMapDirectory does not consume additional memory and the size of
> > > mapped index files is not limited by the physical memory
> > > available on
> > your
> > > server.* By mmap() files, we only reserve address space not memory!
> > > Remember, address space on 64bit platforms is for free!
> > >
> > > Thanks
> > > Gaurav
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Optimum Lucene’s MMapDirectory size on 64bit OS
Posted by Gaurav gupta <gu...@gmail.com>.
Thanks Uwe for the insight !
Also, is it advisable to set the lower chunk size for smaller indexes, like
below or let Lucene/OS manage by itself. I am just guessing that assigning
lower value to smaller index will make sure that bigger index are getting
higher mmap address space.
*Index Name Total Records Size (in GB) What should be the max. or
optimal chunk size ?*
Address index 106,192,963.00 65 1 GiB
Name index 97,924,594.00 44 1 GiB
GovtId index 81,178,958.00 11 512 MB
Phone index 169,691,376.00 14 512 MB
Email index 46,602,090.00 5 256 MB
Date index 77,243,714.00 6.5 256 MB
Thanks
On Sat, Sep 27, 2014 at 3:40 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> Hi,
>
> 1 GiB is the maximum possible. The chunk size is only applicable for 32
> bit JDKs because of limited address space.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Gaurav gupta [mailto:gupta.gaurav0125@gmail.com]
> > Sent: Friday, September 26, 2014 9:12 PM
> > To: java-user@lucene.apache.org
> > Subject: Optimum Lucene’s MMapDirectory size on 64bit OS
> >
> > Hi,
> >
> > As per the post "The Generics Policeman Blog
> > <http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-
> > 64bit.html>"
> > , I am using the MMapDirectory for faster access(search and update
> > operations ,mainly search) of Lucene 4.8.1 index files. I am
> contemplating
> > what is the optimal maximum MMap value for my indexes. Is default i.e. 1
> > GB
> > (1 << 30) or higher?
> >
> > I have 6 indexes of size varying from 65GB to 6 GB. Currently, I am
> using 1 GB
> > as maxChunkSize : - *MMapDirectory(file, null, 1<<30) *for all indexes.
> > But thinking of specifying the higher value for mmap (1 GB or higher) for
> > bigger index having 65GB size and lower value (0.5 GB or less) for
> smaller
> > index having size of 6 GB. Any suggestion/guidance on it ?
> >
> > Also, per blog mmap is not a size of physical memory allocation but just
> a
> > address space to map the index files. How to allocate more RAM to index
> > files for better performance? We have enough RAM free out of 64 GB. Per
> > blog, one should use the mmap file, like - *MMapDirectory(file, null,
> > 1<<30) *and let OS manage the physical memory allocation for the index
> > files. Is my understanding correct ?
> >
> > The Generics Policeman Blog
> > <http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-
> > 64bit.html> :-
> >
> > - *MMapDirectory does not consume additional memory and the size of
> > mapped index files is not limited by the physical memory available on
> your
> > server.* By mmap() files, we only reserve address space not memory!
> > Remember, address space on 64bit platforms is for free!
> >
> > Thanks
> > Gaurav
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
RE: Optimum Lucene’s MMapDirectory size on 64bit OS
Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,
1 GiB is the maximum possible. The chunk size is only applicable for 32 bit JDKs because of limited address space.
Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de
> -----Original Message-----
> From: Gaurav gupta [mailto:gupta.gaurav0125@gmail.com]
> Sent: Friday, September 26, 2014 9:12 PM
> To: java-user@lucene.apache.org
> Subject: Optimum Lucene’s MMapDirectory size on 64bit OS
>
> Hi,
>
> As per the post "The Generics Policeman Blog
> <http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-
> 64bit.html>"
> , I am using the MMapDirectory for faster access(search and update
> operations ,mainly search) of Lucene 4.8.1 index files. I am contemplating
> what is the optimal maximum MMap value for my indexes. Is default i.e. 1
> GB
> (1 << 30) or higher?
>
> I have 6 indexes of size varying from 65GB to 6 GB. Currently, I am using 1 GB
> as maxChunkSize : - *MMapDirectory(file, null, 1<<30) *for all indexes.
> But thinking of specifying the higher value for mmap (1 GB or higher) for
> bigger index having 65GB size and lower value (0.5 GB or less) for smaller
> index having size of 6 GB. Any suggestion/guidance on it ?
>
> Also, per blog mmap is not a size of physical memory allocation but just a
> address space to map the index files. How to allocate more RAM to index
> files for better performance? We have enough RAM free out of 64 GB. Per
> blog, one should use the mmap file, like - *MMapDirectory(file, null,
> 1<<30) *and let OS manage the physical memory allocation for the index
> files. Is my understanding correct ?
>
> The Generics Policeman Blog
> <http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-
> 64bit.html> :-
>
> - *MMapDirectory does not consume additional memory and the size of
> mapped index files is not limited by the physical memory available on your
> server.* By mmap() files, we only reserve address space not memory!
> Remember, address space on 64bit platforms is for free!
>
> Thanks
> Gaurav
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org