You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by JOHN JAIRO GÓMEZ LAVERDE <jj...@hotmail.com> on 2011/04/15 22:31:09 UTC

QUESTION: SOLR INDEX BIG FILE SIZES

SOLR
USER SUPPORT TEAM

I have a quiestion about the "maximun file size of solr index",
when i have a "lot of data in the solr index",

-¿How can split the file of the solr index into multiple files?

That's because some "file systems are about to support a maximun
of space in a single file" for example some UNIX file systems only support
a maximun of 2GB per file.

-¿What is the recommended storage strategy for a big solr index files?

Thanks for the reply.

JOHN JAIRO GÓMEZ LAVERDE
Bogotá - Colombia - South America 		 	   		  

Re: QUESTION: SOLR INDEX BIG FILE SIZES

Posted by François Schiettecatte <fs...@gmail.com>.
Specifically to the file size support, all the file systems on current releases of linux (and unixes too) support large files with 64 bit offsets, and I am pretty sure that java VM supports 64 bit offsets in files, so there is no 2GB file size limit anymore.

François

On Apr 15, 2011, at 4:31 PM, JOHN JAIRO GÓMEZ LAVERDE wrote:

> 
> SOLR
> USER SUPPORT TEAM
> 
> I have a quiestion about the "maximun file size of solr index",
> when i have a "lot of data in the solr index",
> 
> -¿How can split the file of the solr index into multiple files?
> 
> That's because some "file systems are about to support a maximun
> of space in a single file" for example some UNIX file systems only support
> a maximun of 2GB per file.
> 
> -¿What is the recommended storage strategy for a big solr index files?
> 
> Thanks for the reply.
> 
> JOHN JAIRO GÓMEZ LAVERDE
> Bogotá - Colombia - South America 		 	   		  


Re: QUESTION: SOLR INDEX BIG FILE SIZES

Posted by Juan Grande <ju...@gmail.com>.
I'm sorry, you're right, I was thinking in the 2GB default value for
maxMergeMB.

*Juan*

On Mon, Apr 18, 2011 at 3:16 PM, Burton-West, Tom <tb...@umich.edu>wrote:

> >> As far as I know, Solr will never arrive to a segment file greater than
> 2GB,
> >>so this shouldn't be a problem.
>
> Solr can easily create a file size over 2GB, it just depends on how much
> data you index and your particular Solr configuration, including your
> ramBufferSizeMB, your mergeFactor, and whether you optimize.  For example we
> index about a terabyte of full text and optimize our indexes so we have a
> 300GB *prx file.  If you really have a filesystem limit of 2GB, there is a
> parameter called maxMergeMB in Solr 3.1 that you can set.  Unfortunately it
> is the maximum size of a segment that will be merged rather than the maximum
> size of the resulting segment.  So if you have a mergeFactor of 10 you could
> probably set it somewhere around (2GB / 10)= 200.  Just to be cautious, you
> might want to set it to 100.
>
> <mergePolicy class="org.apache.lucene.index.LogByteSizeMergePolicy">
>        <double name="maxMergeMB">200</double>
> </mergePolicy>
>
> In the flexible indexing branch/trunk there is a new merge policy and
> parameter that allows you to set the maximum size of the merged segment:
> https://issues.apache.org/jira/browse/LUCENE-854.
>
>
> Tom Burton-West
> http://www.hathitrust.org/blogs/large-scale-search
>
> -----Original Message-----
> From: Juan Grande [mailto:juan.grande@gmail.com]
> Sent: Friday, April 15, 2011 5:15 PM
> To: solr-user@lucene.apache.org
> Subject: Re: QUESTION: SOLR INDEX BIG FILE SIZES
>
> Hi John,
>
> ¿How can split the file of the solr index into multiple files?
> >
>
> Actually, the index is organized in a set of files called segments. It's
> not
> just a single file, unless you tell Solr to do so.
>
> That's because some "file systems are about to support a maximun
> > of space in a single file" for example some UNIX file systems only
> support
> > a maximun of 2GB per file.
> >
>
> As far as I know, Solr will never arrive to a segment file greater than
> 2GB,
> so this shouldn't be a problem.
>
> ¿What is the recommended storage strategy for a big solr index files?
> >
>
> I guess that it depends in the indexing/querying performance that you're
> having, the performance that you want, and what "big" exactly means for
> you.
> If your index is so big that individual queries take too long, sharding may
> be what you're looking for.
>
> To better understand the index format, you can see
> http://lucene.apache.org/java/3_1_0/fileformats.html
>
> Also, you can take a look at my blog (http://juanggrande.wordpress.com),
> in
> my last post I speak about segments merging.
>
> Regards,
>
> *Juan*
>
>
> 2011/4/15 JOHN JAIRO GÓMEZ LAVERDE <jj...@hotmail.com>
>
> >
> > SOLR
> > USER SUPPORT TEAM
> >
> > I have a quiestion about the "maximun file size of solr index",
> > when i have a "lot of data in the solr index",
> >
> > -¿How can split the file of the solr index into multiple files?
> >
> > That's because some "file systems are about to support a maximun
> > of space in a single file" for example some UNIX file systems only
> support
> > a maximun of 2GB per file.
> >
> > -¿What is the recommended storage strategy for a big solr index files?
> >
> > Thanks for the reply.
> >
> > JOHN JAIRO GÓMEZ LAVERDE
> > Bogotá - Colombia - South America
>

RE: QUESTION: SOLR INDEX BIG FILE SIZES

Posted by "Burton-West, Tom" <tb...@umich.edu>.
>> As far as I know, Solr will never arrive to a segment file greater than 2GB,
>>so this shouldn't be a problem.

Solr can easily create a file size over 2GB, it just depends on how much data you index and your particular Solr configuration, including your ramBufferSizeMB, your mergeFactor, and whether you optimize.  For example we index about a terabyte of full text and optimize our indexes so we have a 300GB *prx file.  If you really have a filesystem limit of 2GB, there is a parameter called maxMergeMB in Solr 3.1 that you can set.  Unfortunately it is the maximum size of a segment that will be merged rather than the maximum size of the resulting segment.  So if you have a mergeFactor of 10 you could probably set it somewhere around (2GB / 10)= 200.  Just to be cautious, you might want to set it to 100.  

<mergePolicy class="org.apache.lucene.index.LogByteSizeMergePolicy">
	<double name="maxMergeMB">200</double>
</mergePolicy>

In the flexible indexing branch/trunk there is a new merge policy and parameter that allows you to set the maximum size of the merged segment: https://issues.apache.org/jira/browse/LUCENE-854. 


Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search

-----Original Message-----
From: Juan Grande [mailto:juan.grande@gmail.com] 
Sent: Friday, April 15, 2011 5:15 PM
To: solr-user@lucene.apache.org
Subject: Re: QUESTION: SOLR INDEX BIG FILE SIZES

Hi John,

¿How can split the file of the solr index into multiple files?
>

Actually, the index is organized in a set of files called segments. It's not
just a single file, unless you tell Solr to do so.

That's because some "file systems are about to support a maximun
> of space in a single file" for example some UNIX file systems only support
> a maximun of 2GB per file.
>

As far as I know, Solr will never arrive to a segment file greater than 2GB,
so this shouldn't be a problem.

¿What is the recommended storage strategy for a big solr index files?
>

I guess that it depends in the indexing/querying performance that you're
having, the performance that you want, and what "big" exactly means for you.
If your index is so big that individual queries take too long, sharding may
be what you're looking for.

To better understand the index format, you can see
http://lucene.apache.org/java/3_1_0/fileformats.html

Also, you can take a look at my blog (http://juanggrande.wordpress.com), in
my last post I speak about segments merging.

Regards,

*Juan*


2011/4/15 JOHN JAIRO GÓMEZ LAVERDE <jj...@hotmail.com>

>
> SOLR
> USER SUPPORT TEAM
>
> I have a quiestion about the "maximun file size of solr index",
> when i have a "lot of data in the solr index",
>
> -¿How can split the file of the solr index into multiple files?
>
> That's because some "file systems are about to support a maximun
> of space in a single file" for example some UNIX file systems only support
> a maximun of 2GB per file.
>
> -¿What is the recommended storage strategy for a big solr index files?
>
> Thanks for the reply.
>
> JOHN JAIRO GÓMEZ LAVERDE
> Bogotá - Colombia - South America

Re: QUESTION: SOLR INDEX BIG FILE SIZES

Posted by Juan Grande <ju...@gmail.com>.
Hi John,

¿How can split the file of the solr index into multiple files?
>

Actually, the index is organized in a set of files called segments. It's not
just a single file, unless you tell Solr to do so.

That's because some "file systems are about to support a maximun
> of space in a single file" for example some UNIX file systems only support
> a maximun of 2GB per file.
>

As far as I know, Solr will never arrive to a segment file greater than 2GB,
so this shouldn't be a problem.

¿What is the recommended storage strategy for a big solr index files?
>

I guess that it depends in the indexing/querying performance that you're
having, the performance that you want, and what "big" exactly means for you.
If your index is so big that individual queries take too long, sharding may
be what you're looking for.

To better understand the index format, you can see
http://lucene.apache.org/java/3_1_0/fileformats.html

Also, you can take a look at my blog (http://juanggrande.wordpress.com), in
my last post I speak about segments merging.

Regards,

*Juan*


2011/4/15 JOHN JAIRO GÓMEZ LAVERDE <jj...@hotmail.com>

>
> SOLR
> USER SUPPORT TEAM
>
> I have a quiestion about the "maximun file size of solr index",
> when i have a "lot of data in the solr index",
>
> -¿How can split the file of the solr index into multiple files?
>
> That's because some "file systems are about to support a maximun
> of space in a single file" for example some UNIX file systems only support
> a maximun of 2GB per file.
>
> -¿What is the recommended storage strategy for a big solr index files?
>
> Thanks for the reply.
>
> JOHN JAIRO GÓMEZ LAVERDE
> Bogotá - Colombia - South America