You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@jena.apache.org by "Greg Albiston (Jira)" <ji...@apache.org> on 2022/04/27 15:27:00 UTC
[jira] [Commented] (JENA-2311) query rewrite index does too expensive caching on geo literals

    [ https://issues.apache.org/jira/browse/JENA-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528857#comment-17528857 ] 

Greg Albiston commented on JENA-2311:
-------------------------------------

Apologies for the slow reply. In both cases, using a better key and optimising the query rewrite, it came down to available time to have an adequate implementation rather than the optimal. The use of the string key was something I intended to revisit when necessary. The PR is now using the Triple for the index based on your suggestion but unit tests need adding to it.

In terms of the `graph.find(...)`, this occurs if the subject and/or object are unbound to find candidate results. The Property Functions are implemented through the PFuncSimple interface and registered with the ARQ engine. I'm not familiar with optimising the query algebra or how to perform that in this context.

Do you have suggestions or an alternative implementation?

```

public QueryIterator execEvaluated(Binding binding, Node subject, Node predicate, Node object, ExecutionContext execCxt)

```

> query rewrite index does too expensive caching on geo literals
> --------------------------------------------------------------
>
>                 Key: JENA-2311
>                 URL: https://issues.apache.org/jira/browse/JENA-2311
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: GeoSPARQL
>    Affects Versions: Jena 4.4.0
>            Reporter: Lorenz Bühmann
>            Priority: Major
>
> Using a GeoSPARQL query with a geospatial property function, e.g.
> {code:java}
> SELECT * {
> :x geo:hasGeometry ?geo1 .
> ?s2 geo:hasGeometry ?geo2 .
> ?geo1 geo:sfContains ?geo2
> }
> {code}
> leads to heavy memory consumption for larger datasets - and we're not talking about big data at all. Imagine given a polygon and checking for millions of geometries for containment in the polygon.
> In the {{QueryRewriteIndex}} class for caching a key will be generated, but this is horribly expensive given that the string representation of Geometries is called millions of times leading millions of Byte arrays being created leading a to a possible OOM exception - we got it with 8GB assigned.
> The key generation for reference:
> {code:java}
> String key = subjectGeometryLiteral.getLiteralLexicalForm() + KEY_SEPARATOR + predicate.getURI() + KEY_SEPARATOR + objectGeometryLiteral.getLiteralLexicalForm();
> {code}
> My suggestion is to use a separate {{Node -> Integer}} (or {{Long}}?) Guava cache and use the long values instead to generate the cache key. Or any other more efficient datastructure, not even sure if a String is necessary?
> We tried some fix which works for us and keeps the memory consumption stable:
> {code:java}
>  private LoadingCache<Node, Integer> nodeIDCache;
>  private AtomicInteger cacheCounter;
> ...
>         cacheCounter = new AtomicInteger(0);
>         CacheBuilder<Object, Object> builder = CacheBuilder.newBuilder();
>         if (maxSize > 0) {
>             builder = builder.maximumSize(maxSize);
>         }
>         if (expiryInterval > 0) {
>             builder = builder.expireAfterWrite(expiryInterval, TimeUnit.MILLISECONDS);
>         }
>         nodeIDCache = builder.build(
>                         new CacheLoader<>() {
>                             public Integer load(Node key) {
>                                 return cacheCounter.incrementAndGet();
>                             }
>                         });
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: jira-unsubscribe@jena.apache.org
For additional commands, e-mail: jira-help@jena.apache.org