You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Lorenz Bühmann (Jira)" <ji...@apache.org> on 2022/03/14 08:41:00 UTC
[jira] [Created] (JENA-2311) query rewrite index does too expensive caching on geo literals
Lorenz Bühmann created JENA-2311:
------------------------------------
Summary: query rewrite index does too expensive caching on geo literals
Key: JENA-2311
URL: https://issues.apache.org/jira/browse/JENA-2311
Project: Apache Jena
Issue Type: Improvement
Components: GeoSPARQL
Affects Versions: Jena 4.4.0
Reporter: Lorenz Bühmann
Using a GeoSPARQL query with a geospatial property function, e.g.
{code:java}
SELECT * {
:x geo:hasGeometry ?geo1 .
?s2 geo:hasGeometry ?geo2 .
?geo1 geo:sfContains ?geo2
}
{code}
leads to heavy memory consumption for larger datasets - and we're not talking about big data at all. Imagine given a polygon and checking for millions of geometries for containment in the polygon.
In the {{QueryRewriteIndex}} class for caching a key will be generated, but this is horribly expensive given that the string representation of Geometries is called millions of times leading millions of Byte arrays being created leading a to a possible OOM exception - we got it with 8GB assigned.
The key generation for reference:
{code:java}
String key = subjectGeometryLiteral.getLiteralLexicalForm() + KEY_SEPARATOR + predicate.getURI() + KEY_SEPARATOR + objectGeometryLiteral.getLiteralLexicalForm();
{code}
My suggestion is to use a separate {{Node -> Integer}} (or {{Long}} Guava cache and use the long values instead to generate the cache key. Or any other more efficient datastructure, not even sure if a String is necessary?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)