You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@jena.apache.org by "Andy Seaborne (Jira)" <ji...@apache.org> on 2022/07/26 09:37:00 UTC

[jira] [Resolved] (JENA-2311) query rewrite index does too expensive caching on geo literals

     [ https://issues.apache.org/jira/browse/JENA-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andy Seaborne resolved JENA-2311.
---------------------------------
    Fix Version/s: Jena 4.6.0
         Assignee: Andy Seaborne
       Resolution: Done

> query rewrite index does too expensive caching on geo literals
> --------------------------------------------------------------
>
>                 Key: JENA-2311
>                 URL: https://issues.apache.org/jira/browse/JENA-2311
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: GeoSPARQL
>    Affects Versions: Jena 4.4.0
>            Reporter: Lorenz Bühmann
>            Assignee: Andy Seaborne
>            Priority: Major
>             Fix For: Jena 4.6.0
>
>
> Using a GeoSPARQL query with a geospatial property function, e.g.
> {code:java}
> SELECT * {
> :x geo:hasGeometry ?geo1 .
> ?s2 geo:hasGeometry ?geo2 .
> ?geo1 geo:sfContains ?geo2
> }
> {code}
> leads to heavy memory consumption for larger datasets - and we're not talking about big data at all. Imagine given a polygon and checking for millions of geometries for containment in the polygon.
> In the {{QueryRewriteIndex}} class for caching a key will be generated, but this is horribly expensive given that the string representation of Geometries is called millions of times leading millions of Byte arrays being created leading a to a possible OOM exception - we got it with 8GB assigned.
> The key generation for reference:
> {code:java}
> String key = subjectGeometryLiteral.getLiteralLexicalForm() + KEY_SEPARATOR + predicate.getURI() + KEY_SEPARATOR + objectGeometryLiteral.getLiteralLexicalForm();
> {code}
> My suggestion is to use a separate {{Node -> Integer}} (or {{Long}}?) Guava cache and use the long values instead to generate the cache key. Or any other more efficient datastructure, not even sure if a String is necessary?
> We tried some fix which works for us and keeps the memory consumption stable:
> {code:java}
>  private LoadingCache<Node, Integer> nodeIDCache;
>  private AtomicInteger cacheCounter;
> ...
>         cacheCounter = new AtomicInteger(0);
>         CacheBuilder<Object, Object> builder = CacheBuilder.newBuilder();
>         if (maxSize > 0) {
>             builder = builder.maximumSize(maxSize);
>         }
>         if (expiryInterval > 0) {
>             builder = builder.expireAfterWrite(expiryInterval, TimeUnit.MILLISECONDS);
>         }
>         nodeIDCache = builder.build(
>                         new CacheLoader<>() {
>                             public Integer load(Node key) {
>                                 return cacheCounter.incrementAndGet();
>                             }
>                         });
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: jira-unsubscribe@jena.apache.org
For additional commands, e-mail: jira-help@jena.apache.org