You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jens Viebig <je...@vitec.com> on 2018/05/25 10:59:11 UTC

Impact/Performance of maxDistErr

Hello,

we are indexing a polygon with 4 points (non-rectangular, field-of-view 
of a camera) in a RptWithGeometrySpatialField alongside some more 
fields, to perform searches that check if a point is within this polygon

We started using the default configuration found in several examples online:

<fieldType name="location_grpt" class="solr.RptWithGeometrySpatialField"
spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
            geo="true" distErrPct="0.15" maxDistErr="0.001" 
distanceUnits="kilometers" />

We discovered that with this setting the indexing (soft commit) speed is 
very slow
For 10000 documents it takes several minutes to finish the commit

If we disable this field, indexing+soft commit is only 3 seconds for 
10000 docs,
if we set maxDistErr to 1, indexing speed is at around 5 seconds, so a 
huge performance gain against the several minutes we had before

I tried to find out via the documentation whats the impact of 
"maxDistErr" on search results but didn't quite find an in-depth explanation
 From our tests we did, the search results still seem to be very 
accurate even if the covered space of the polygon is less then 1km and 
search speed did not suffer.

So i would love to learn more about the differences on having 
maxDistErr="0.001" vs maxDistErr="1" on a RptWithGeometrySpatialField 
and what problems could we run into with the bigger value

Thanks
Jens

Fwd: so, nun aber...
***



*

*Jens Viebig***

Software Development

MAM Products


T. +49-(0)4307-8358-0

E. jens.viebig@vitec.com

_http://www.vitec.com_____

__

VITEC_logo_for_email_signature__

-- 

VITEC GmbH, 24223 Schwentinental

Geschäftsführer/Managing Director: Philippe Wetzel
HRB Plön 1584 / Steuernummer: 1929705211 / VATnumber: DE134878603


Re: Impact/Performance of maxDistErr

Posted by David Smiley <da...@gmail.com>.
I suggest using the "Intersects" spatial predicate when either the data is
all points or if the query is a point.  It's semantically equivalent and
the algorithm is much faster.

On Wed, May 30, 2018 at 3:25 AM Jens Viebig <je...@vitec.com> wrote:

> Thanks for the detailed answer David, that helps a lot to understand!
> Best Regards
>
> Jens
>
> P.S. Currently the only search we are doing on the polygon is
> Contains(POINT(x,y))
>
>
> Am 29.05.2018 um 13:30 schrieb David Smiley:
>
> Hello Jens,
> With solr.RptWithGeometrySpatialField, you always get an accurate result
> thanks to the "WithGeometry" part.  The "Rpt" part is a grid index, and
> most of the parameters pertain to that.  maxDistErr controls the highest
> resolution grid.  No shape will be indexed to higher resolutions than this,
> though may be courser resolutions dependent on distErrPct.  The
> configuration you chose initially (that turned out to be slow for you) was
> a meter, and then you changed it to a kilometer and got fast indexing
> results.  I figure the size of your indexed shapes are on average a
> kilometer in size (give or take an order of magnitude).  It's hard to guess
> how your query shapes compare to your indexed shapes as there are multiple
> possibilities that could yield similar query performance when changing
> maxDistErr so much.
>
> The bottom line is that you should dial up maxDistErr as much as you can
> get away with it -- which is as long as query performance is good.  So you
> did the right thing :-).  That number will probably be a distance somewhat
> less than the average indexed shape diameter, or average query shape
> diameter, whichever is greater.  Perhaps 1/10th smaller; if I had to pick.
> The default setting, I think a meter, is probably not a good default for
> this field type.
>
> Note you could also try increasing distErrPct some, maybe to as much as
> .25, though I wouldn't go much higher., as it may yield gridded shapes that
> are so course as to not have interior cells.  Depending on what your query
> shapes typically look like and indexed shapes relative to each other, that
> may be significant or may not be.  If the indexed shapes are often much
> larger than your query shape then it's significant.
>
> ~ David
>
> On Fri, May 25, 2018 at 6:59 AM Jens Viebig <je...@vitec.com> wrote:
>
>> Hello,
>>
>> we are indexing a polygon with 4 points (non-rectangular, field-of-view
>> of a camera) in a RptWithGeometrySpatialField alongside some more fields,
>> to perform searches that check if a point is within this polygon
>>
>> We started using the default configuration found in several examples
>> online:
>>
>> <fieldType name="location_grpt" class="solr.RptWithGeometrySpatialField"
>>
>> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
>>            geo="true" distErrPct="0.15" maxDistErr="0.001"
>> distanceUnits="kilometers" />
>>
>> We discovered that with this setting the indexing (soft commit) speed is
>> very slow
>> For 10000 documents it takes several minutes to finish the commit
>>
>> If we disable this field, indexing+soft commit is only 3 seconds for
>> 10000 docs,
>> if we set maxDistErr to 1, indexing speed is at around 5 seconds, so a
>> huge performance gain against the several minutes we had before
>>
>> I tried to find out via the documentation whats the impact of
>> "maxDistErr" on search results but didn't quite find an in-depth explanation
>> From our tests we did, the search results still seem to be very accurate
>> even if the covered space of the polygon is less then 1km and search speed
>> did not suffer.
>>
>> So i would love to learn more about the differences on having
>> maxDistErr="0.001" vs maxDistErr="1" on a RptWithGeometrySpatialField and
>> what problems could we run into with the bigger value
>>
>> Thanks
>> Jens
>>
>>
>>
>>
>> *Jens Viebig*
>>
>> Software Development
>>
>> MAM Products
>>
>>
>> T. +49-(0)4307-8358-0 <+49%204307%2083580>
>>
>> E. jens.viebig@vitec.com
>>
>> *http://www.vitec.com <http://www.vitec.com>*
>>
>>
>>
>> [image: VITEC_logo_for_email_signature]
>>
>>
>>
>> --
>>
>> VITEC GmbH, 24223 Schwentinental
>>
>> Geschäftsführer/Managing Director: Philippe Wetzel
>> HRB Plön 1584 / Steuernummer: 1929705211 / VATnumber: DE134878603
>>
>>
>>
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
>
>
> --
>
>
> *Jens Viebig*
>
> Software Development
>
> MAM Products
>
>
> T. +49-(0)4307-8358-0 <+49%204307%2083580>
>
> E. jens.viebig@vitec.com
>
> *http://www.vitec.com <http://www.vitec.com>*
>
>
>
> [image: VITEC_logo_for_email_signature]
>
>
>
> --
>
> VITEC GmbH, 24223 Schwentinental
>
> Geschäftsführer/Managing Director: Philippe Wetzel
> HRB Plön 1584 / Steuernummer: 1929705211 / VATnumber: DE134878603
>
>
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com

Re: Impact/Performance of maxDistErr

Posted by Jens Viebig <je...@vitec.com>.
Thanks for the detailed answer David, that helps a lot to understand!

Best Regards
Jens

P.S. Currently the only search we are doing on the polygon is 
Contains(POINT(x,y))

Am 29.05.2018 um 13:30 schrieb David Smiley:
> Hello Jens,
> With solr.RptWithGeometrySpatialField, you always get an accurate 
> result thanks to the "WithGeometry" part.  The "Rpt" part is a grid 
> index, and most of the parameters pertain to that.  maxDistErr 
> controls the highest resolution grid.  No shape will be indexed to 
> higher resolutions than this, though may be courser resolutions 
> dependent on distErrPct.  The configuration you chose initially (that 
> turned out to be slow for you) was a meter, and then you changed it to 
> a kilometer and got fast indexing results.  I figure the size of your 
> indexed shapes are on average a kilometer in size (give or take an 
> order of magnitude).  It's hard to guess how your query shapes compare 
> to your indexed shapes as there are multiple possibilities that could 
> yield similar query performance when changing maxDistErr so much.
>
> The bottom line is that you should dial up maxDistErr as much as you 
> can get away with it -- which is as long as query performance is good. 
> So you did the right thing :-).  That number will probably be a 
> distance somewhat less than the average indexed shape diameter, or 
> average query shape diameter, whichever is greater.  Perhaps 1/10th 
> smaller; if I had to pick.  The default setting, I think a meter, is 
> probably not a good default for this field type.
>
> Note you could also try increasing distErrPct some, maybe to as much 
> as .25, though I wouldn't go much higher., as it may yield gridded 
> shapes that are so course as to not have interior cells.  Depending on 
> what your query shapes typically look like and indexed shapes relative 
> to each other, that may be significant or may not be.  If the indexed 
> shapes are often much larger than your query shape then it's significant.
>
> ~ David
>
> On Fri, May 25, 2018 at 6:59 AM Jens Viebig <jens.viebig@vitec.com 
> <ma...@vitec.com>> wrote:
>
>     Hello,
>
>     we are indexing a polygon with 4 points (non-rectangular,
>     field-of-view of a camera) in a RptWithGeometrySpatialField
>     alongside some more fields, to perform searches that check if a
>     point is within this polygon
>
>     We started using the default configuration found in several
>     examples online:
>
>     <fieldType name="location_grpt"
>     class="solr.RptWithGeometrySpatialField"
>     spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
>                geo="true" distErrPct="0.15" maxDistErr="0.001"
>     distanceUnits="kilometers" />
>
>     We discovered that with this setting the indexing (soft commit)
>     speed is very slow
>     For 10000 documents it takes several minutes to finish the commit
>
>     If we disable this field, indexing+soft commit is only 3 seconds
>     for 10000 docs,
>     if we set maxDistErr to 1, indexing speed is at around 5 seconds,
>     so a huge performance gain against the several minutes we had before
>
>     I tried to find out via the documentation whats the impact of
>     "maxDistErr" on search results but didn't quite find an in-depth
>     explanation
>     From our tests we did, the search results still seem to be very
>     accurate even if the covered space of the polygon is less then 1km
>     and search speed did not suffer.
>
>     So i would love to learn more about the differences on having
>     maxDistErr="0.001" vs maxDistErr="1" on a
>     RptWithGeometrySpatialField and what problems could we run into
>     with the bigger value
>
>     Thanks
>     Jens
>
>     ***
>
>
>
>     *
>
>     *Jens Viebig***
>
>     Software Development
>
>     MAM Products
>
>
>     T. +49-(0)4307-8358-0 <tel:+49%204307%2083580>
>
>     E. jens.viebig@vitec.com <ma...@vitec.com>
>
>     _http://www.vitec.com_____
>
>     __
>
>     VITEC_logo_for_email_signature__
>
>     -- 
>
>     VITEC GmbH, 24223 Schwentinental
>
>     Geschäftsführer/Managing Director: Philippe Wetzel
>     HRB Plön 1584 / Steuernummer: 1929705211 / VATnumber: DE134878603
>
> -- 
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: 
> http://www.solrenterprisesearchserver.com

-- 
Fwd: so, nun aber...
***

*

*Jens Viebig***

Software Development

MAM Products


T. +49-(0)4307-8358-0

E. jens.viebig@vitec.com

_http://www.vitec.com_____

__

VITEC_logo_for_email_signature__

-- 

VITEC GmbH, 24223 Schwentinental

Geschäftsführer/Managing Director: Philippe Wetzel
HRB Plön 1584 / Steuernummer: 1929705211 / VATnumber: DE134878603


Re: Impact/Performance of maxDistErr

Posted by David Smiley <da...@gmail.com>.
Hello Jens,
With solr.RptWithGeometrySpatialField, you always get an accurate result
thanks to the "WithGeometry" part.  The "Rpt" part is a grid index, and
most of the parameters pertain to that.  maxDistErr controls the highest
resolution grid.  No shape will be indexed to higher resolutions than this,
though may be courser resolutions dependent on distErrPct.  The
configuration you chose initially (that turned out to be slow for you) was
a meter, and then you changed it to a kilometer and got fast indexing
results.  I figure the size of your indexed shapes are on average a
kilometer in size (give or take an order of magnitude).  It's hard to guess
how your query shapes compare to your indexed shapes as there are multiple
possibilities that could yield similar query performance when changing
maxDistErr so much.

The bottom line is that you should dial up maxDistErr as much as you can
get away with it -- which is as long as query performance is good.  So you
did the right thing :-).  That number will probably be a distance somewhat
less than the average indexed shape diameter, or average query shape
diameter, whichever is greater.  Perhaps 1/10th smaller; if I had to pick.
The default setting, I think a meter, is probably not a good default for
this field type.

Note you could also try increasing distErrPct some, maybe to as much as
.25, though I wouldn't go much higher., as it may yield gridded shapes that
are so course as to not have interior cells.  Depending on what your query
shapes typically look like and indexed shapes relative to each other, that
may be significant or may not be.  If the indexed shapes are often much
larger than your query shape then it's significant.

~ David

On Fri, May 25, 2018 at 6:59 AM Jens Viebig <je...@vitec.com> wrote:

> Hello,
>
> we are indexing a polygon with 4 points (non-rectangular, field-of-view of
> a camera) in a RptWithGeometrySpatialField alongside some more fields, to
> perform searches that check if a point is within this polygon
>
> We started using the default configuration found in several examples
> online:
>
> <fieldType name="location_grpt" class="solr.RptWithGeometrySpatialField"
>
> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
>            geo="true" distErrPct="0.15" maxDistErr="0.001"
> distanceUnits="kilometers" />
>
> We discovered that with this setting the indexing (soft commit) speed is
> very slow
> For 10000 documents it takes several minutes to finish the commit
>
> If we disable this field, indexing+soft commit is only 3 seconds for 10000
> docs,
> if we set maxDistErr to 1, indexing speed is at around 5 seconds, so a
> huge performance gain against the several minutes we had before
>
> I tried to find out via the documentation whats the impact of "maxDistErr"
> on search results but didn't quite find an in-depth explanation
> From our tests we did, the search results still seem to be very accurate
> even if the covered space of the polygon is less then 1km and search speed
> did not suffer.
>
> So i would love to learn more about the differences on having
> maxDistErr="0.001" vs maxDistErr="1" on a RptWithGeometrySpatialField and
> what problems could we run into with the bigger value
>
> Thanks
> Jens
>
>
>
>
> *Jens Viebig*
>
> Software Development
>
> MAM Products
>
>
> T. +49-(0)4307-8358-0 <+49%204307%2083580>
>
> E. jens.viebig@vitec.com
>
> *http://www.vitec.com <http://www.vitec.com>*
>
>
>
> [image: VITEC_logo_for_email_signature]
>
>
>
> --
>
> VITEC GmbH, 24223 Schwentinental
>
> Geschäftsführer/Managing Director: Philippe Wetzel
> HRB Plön 1584 / Steuernummer: 1929705211 / VATnumber: DE134878603
>
>
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com