You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Dennis Reichelt <de...@askvisual.de> on 2013/11/06 14:20:53 UTC
Question regarding Indexsize with Spatial4j Rectangulars
Hi,
we are testing Solr and index a huge amount of files. We integrated a
Spatial4j field which is only used to index rectangulars so we removed
the JTS dependency. However we had some problems with this. At first
Solr seems to get a GC OutOfMemory error which seems to be fixed with
more memory for the server (atleast i hope so ;)) and the second is,
that the index grows kinda big...
It takes around 150mb when indexing rects for 30k documents which is a
factor 8 more than we would not. And its normally only one rect per
document. Though the functions we get through this are pretty cool this
could be a huge drawback because scaling seems to be linear and we
target around 500k documents. Are we doing something wrong or do we have
to live with that?
Re: Question regarding Indexsize with Spatial4j Rectangulars
Posted by "Smiley, David W." <ds...@mitre.org>.
Hi Dennis,
I would not expect the index growth to be quite linear as the number of
shapes grows, but nonetheless it may be significant. Indexing non-point
shapes will index more term data than it ideally should: LUCENE-4942 I
need to find the time/priority to do it. Probably within the next couple
months.
In the meantime, you could perhaps modify distErrPct on the field type
definition to be looser; it depends on your requirements what you can live
with. The default is 0.025 (2.5% of approximate radius); maybe you'll be
satisfied with precision of 10% or more? Tweaking this number trades off
precision for index size. It can make a big difference.
~ David
On 11/6/13 8:20 AM, "Dennis Reichelt" <de...@askvisual.de> wrote:
>Hi,
>
>we are testing Solr and index a huge amount of files. We integrated a
>Spatial4j field which is only used to index rectangulars so we removed
>the JTS dependency. However we had some problems with this. At first
>Solr seems to get a GC OutOfMemory error which seems to be fixed with
>more memory for the server (atleast i hope so ;)) and the second is,
>that the index grows kinda big...
>
>It takes around 150mb when indexing rects for 30k documents which is a
>factor 8 more than we would not. And its normally only one rect per
>document. Though the functions we get through this are pretty cool this
>could be a huge drawback because scaling seems to be linear and we
>target around 500k documents. Are we doing something wrong or do we have
>to live with that?