You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jena.apache.org by "Lorenz Bühmann (Jira)" <ji...@apache.org> on 2022/03/14 08:41:00 UTC

[jira] [Created] (JENA-2311) query rewrite index does too expensive caching on geo literals

Lorenz Bühmann created JENA-2311:
------------------------------------

             Summary: query rewrite index does too expensive caching on geo literals
                 Key: JENA-2311
                 URL: https://issues.apache.org/jira/browse/JENA-2311
             Project: Apache Jena
          Issue Type: Improvement
          Components: GeoSPARQL
    Affects Versions: Jena 4.4.0
            Reporter: Lorenz Bühmann


Using a GeoSPARQL query with a geospatial property function, e.g.


{code:java}
SELECT * {
:x geo:hasGeometry ?geo1 .
?s2 geo:hasGeometry ?geo2 .
?geo1 geo:sfContains ?geo2
}
{code}


leads to heavy memory consumption for larger datasets - and we're not talking about big data at all. Imagine given a polygon and checking for millions of geometries for containment in the polygon.

In the {{QueryRewriteIndex}} class for caching a key will be generated, but this is horribly expensive given that the string representation of Geometries is called millions of times leading millions of Byte arrays being created leading a to a possible OOM exception - we got it with 8GB assigned.
The key generation for reference:

{code:java}
String key = subjectGeometryLiteral.getLiteralLexicalForm() + KEY_SEPARATOR + predicate.getURI() + KEY_SEPARATOR + objectGeometryLiteral.getLiteralLexicalForm();
{code}

My suggestion is to use a separate {{Node -> Integer}} (or {{Long}} Guava cache and use the long values instead to generate the cache key. Or any other more efficient datastructure, not even sure if a String is necessary?






--
This message was sent by Atlassian Jira
(v8.20.1#820001)