You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Claus Stadler (Jira)" <ji...@apache.org> on 2020/03/13 06:38:00 UTC
[jira] [Updated] (JENA-1858) SERVICE in SPARQL blocks after a while

     [ https://issues.apache.org/jira/browse/JENA-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Claus Stadler updated JENA-1858:
--------------------------------
    Description: 
Hi once again :)
I wanted to create a quick RDF/SPARQL-based service online/offline monitoring system just like this:

* A list of endpoints in [this dataset|https://github.com/SmartDataAnalytics/lodservatory/blob/master/sparql-endpoints-from-andre.ttl]
* Have a CI process run this SPARQL query and publish/commit the results to a file

{code}
PREFIX eg: <http://www.example.org/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>

CONSTRUCT {
    ?s eg:serviceStatus ?status
}
{
  ?s dcat:endpointURL ?e .

  # Here we rely on jena's substitution mechanism in QueryIterService.java - which is sufficient for my use case
  SERVICE SILENT ?e { 
    # If the request fails, we get a single binding without any variables bound
    { SELECT ?t { ?x a ?t } LIMIT 1 }
  }

  BIND(IF(BOUND(?t), "online", "offline") AS ?status)
}
{code}

However, the query blocks after a while by consuming the HTTP connection pool.
I have not yet identified all sources, but one I could spot is here:

* The InputStream opened at [Service.java#L172|https://github.com/apache/jena/blob/64253b9de5924006cdd46f1e3492a92031842d3b/jena-arq/src/main/java/org/apache/jena/sparql/engine/http/Service.java#L172] is not in a try-catch-block, so if the subsequent XML parsing fails, then it is never closed.

Maybe this triggers ideas of potentially other spots. I have a local jena checkout and will try to find out whether there are any other leaks. My goal is to have the query complete on the whole endpoint list - despite many of the URLs actually referring to by now broken services.


I am aware of the context settings in https://jena.apache.org/documentation/query/service.html - but I did not fiddle with the settings - especially timeouts, as so far the issue is really the exhaustion of the connection pool.


  was:
Hi once again :)
I wanted to create a quick RDF/SPARQL-based service online/offline monitoring system just like this:

* A list of endpoints in [this dataset|https://github.com/SmartDataAnalytics/Meta-LOD/blob/master/sparql-endpoints.ttl]
* Have a CI process run this SPARQL query and publish/commit the results to a file

{code}
PREFIX eg: <http://www.example.org/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>

CONSTRUCT {
    ?s eg:serviceStatus ?status
}
{
  ?s dcat:endpointURL ?e .

  # Here we rely on jena's substitution mechanism in QueryIterService.java - which is sufficient for my use case
  SERVICE SILENT ?e { 
    # If the request fails, we get a single binding without any variables bound
    { SELECT ?t { ?x a ?t } LIMIT 1 }
  }

  BIND(IF(BOUND(?t), "online", "offline") AS ?status)
}
{code}

However, the query blocks after a while by consuming the HTTP connection pool.
I have not yet identified all sources, but one I could spot is here:

* The InputStream opened at [Service.java#L172|https://github.com/apache/jena/blob/64253b9de5924006cdd46f1e3492a92031842d3b/jena-arq/src/main/java/org/apache/jena/sparql/engine/http/Service.java#L172] is not in a try-catch-block, so if the subsequent XML parsing fails, then it is never closed.

Maybe this triggers ideas of potentially other spots. I have a local jena checkout and will try to find out whether there are any other leaks. My goal is to have the query complete on the whole endpoint list - despite many of the URLs actually referring to by now broken services.


I am aware of the context settings in https://jena.apache.org/documentation/query/service.html - but I did not fiddle with the settings - especially timeouts, as so far the issue is really the exhaustion of the connection pool.



> SERVICE in SPARQL blocks after a while
> --------------------------------------
>
>                 Key: JENA-1858
>                 URL: https://issues.apache.org/jira/browse/JENA-1858
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: Jena 3.14.0
>            Reporter: Claus Stadler
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hi once again :)
> I wanted to create a quick RDF/SPARQL-based service online/offline monitoring system just like this:
> * A list of endpoints in [this dataset|https://github.com/SmartDataAnalytics/lodservatory/blob/master/sparql-endpoints-from-andre.ttl]
> * Have a CI process run this SPARQL query and publish/commit the results to a file
> {code}
> PREFIX eg: <http://www.example.org/>
> PREFIX dcat: <http://www.w3.org/ns/dcat#>
> CONSTRUCT {
>     ?s eg:serviceStatus ?status
> }
> {
>   ?s dcat:endpointURL ?e .
>   # Here we rely on jena's substitution mechanism in QueryIterService.java - which is sufficient for my use case
>   SERVICE SILENT ?e { 
>     # If the request fails, we get a single binding without any variables bound
>     { SELECT ?t { ?x a ?t } LIMIT 1 }
>   }
>   BIND(IF(BOUND(?t), "online", "offline") AS ?status)
> }
> {code}
> However, the query blocks after a while by consuming the HTTP connection pool.
> I have not yet identified all sources, but one I could spot is here:
> * The InputStream opened at [Service.java#L172|https://github.com/apache/jena/blob/64253b9de5924006cdd46f1e3492a92031842d3b/jena-arq/src/main/java/org/apache/jena/sparql/engine/http/Service.java#L172] is not in a try-catch-block, so if the subsequent XML parsing fails, then it is never closed.
> Maybe this triggers ideas of potentially other spots. I have a local jena checkout and will try to find out whether there are any other leaks. My goal is to have the query complete on the whole endpoint list - despite many of the URLs actually referring to by now broken services.
> I am aware of the context settings in https://jena.apache.org/documentation/query/service.html - but I did not fiddle with the settings - especially timeouts, as so far the issue is really the exhaustion of the connection pool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)