You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Dennis Reichelt <de...@askvisual.de> on 2013/11/06 14:20:53 UTC

Question regarding Indexsize with Spatial4j Rectangulars

Hi,

we are testing Solr and index a huge amount of files. We integrated a 
Spatial4j field which is only used to index rectangulars so we removed 
the JTS dependency. However we had some problems with this. At first 
Solr seems to get a GC OutOfMemory error which seems to be fixed with 
more memory for the server (atleast i hope so ;)) and the second is, 
that the index grows kinda big...

It takes around 150mb when indexing rects for 30k documents which is a 
factor 8 more than we would not. And its normally only one rect per 
document. Though the functions we get through this are pretty cool this 
could be a huge drawback because scaling seems to be linear and we 
target around 500k documents. Are we doing something wrong or do we have 
to live with that?

Re: Question regarding Indexsize with Spatial4j Rectangulars

Posted by "Smiley, David W." <ds...@mitre.org>.
Hi Dennis,

I would not expect the index growth to be quite linear as the number of
shapes grows, but nonetheless it may be significant.  Indexing non-point
shapes will index more term data than it ideally should: LUCENE-4942  I
need to find the time/priority to do it.  Probably within the next couple
months.

In the meantime, you could perhaps modify distErrPct on the field type
definition to be looser; it depends on your requirements what you can live
with.  The default is 0.025 (2.5% of approximate radius); maybe you'll be
satisfied with precision of 10% or more?  Tweaking this number trades off
precision for index size.  It can make a big difference.

~ David

On 11/6/13 8:20 AM, "Dennis Reichelt" <de...@askvisual.de> wrote:

>Hi,
>
>we are testing Solr and index a huge amount of files. We integrated a
>Spatial4j field which is only used to index rectangulars so we removed
>the JTS dependency. However we had some problems with this. At first
>Solr seems to get a GC OutOfMemory error which seems to be fixed with
>more memory for the server (atleast i hope so ;)) and the second is,
>that the index grows kinda big...
>
>It takes around 150mb when indexing rects for 30k documents which is a
>factor 8 more than we would not. And its normally only one rect per
>document. Though the functions we get through this are pretty cool this
>could be a huge drawback because scaling seems to be linear and we
>target around 500k documents. Are we doing something wrong or do we have
>to live with that?