You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Claus Stadler (Jira)" <ji...@apache.org> on 2020/03/13 06:38:00 UTC
[jira] [Updated] (JENA-1858) SERVICE in SPARQL blocks after a while
[ https://issues.apache.org/jira/browse/JENA-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Claus Stadler updated JENA-1858:
--------------------------------
Description:
Hi once again :)
I wanted to create a quick RDF/SPARQL-based service online/offline monitoring system just like this:
* A list of endpoints in [this dataset|https://github.com/SmartDataAnalytics/lodservatory/blob/master/sparql-endpoints-from-andre.ttl]
* Have a CI process run this SPARQL query and publish/commit the results to a file
{code}
PREFIX eg: <http://www.example.org/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
CONSTRUCT {
?s eg:serviceStatus ?status
}
{
?s dcat:endpointURL ?e .
# Here we rely on jena's substitution mechanism in QueryIterService.java - which is sufficient for my use case
SERVICE SILENT ?e {
# If the request fails, we get a single binding without any variables bound
{ SELECT ?t { ?x a ?t } LIMIT 1 }
}
BIND(IF(BOUND(?t), "online", "offline") AS ?status)
}
{code}
However, the query blocks after a while by consuming the HTTP connection pool.
I have not yet identified all sources, but one I could spot is here:
* The InputStream opened at [Service.java#L172|https://github.com/apache/jena/blob/64253b9de5924006cdd46f1e3492a92031842d3b/jena-arq/src/main/java/org/apache/jena/sparql/engine/http/Service.java#L172] is not in a try-catch-block, so if the subsequent XML parsing fails, then it is never closed.
Maybe this triggers ideas of potentially other spots. I have a local jena checkout and will try to find out whether there are any other leaks. My goal is to have the query complete on the whole endpoint list - despite many of the URLs actually referring to by now broken services.
I am aware of the context settings in https://jena.apache.org/documentation/query/service.html - but I did not fiddle with the settings - especially timeouts, as so far the issue is really the exhaustion of the connection pool.
was:
Hi once again :)
I wanted to create a quick RDF/SPARQL-based service online/offline monitoring system just like this:
* A list of endpoints in [this dataset|https://github.com/SmartDataAnalytics/Meta-LOD/blob/master/sparql-endpoints.ttl]
* Have a CI process run this SPARQL query and publish/commit the results to a file
{code}
PREFIX eg: <http://www.example.org/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
CONSTRUCT {
?s eg:serviceStatus ?status
}
{
?s dcat:endpointURL ?e .
# Here we rely on jena's substitution mechanism in QueryIterService.java - which is sufficient for my use case
SERVICE SILENT ?e {
# If the request fails, we get a single binding without any variables bound
{ SELECT ?t { ?x a ?t } LIMIT 1 }
}
BIND(IF(BOUND(?t), "online", "offline") AS ?status)
}
{code}
However, the query blocks after a while by consuming the HTTP connection pool.
I have not yet identified all sources, but one I could spot is here:
* The InputStream opened at [Service.java#L172|https://github.com/apache/jena/blob/64253b9de5924006cdd46f1e3492a92031842d3b/jena-arq/src/main/java/org/apache/jena/sparql/engine/http/Service.java#L172] is not in a try-catch-block, so if the subsequent XML parsing fails, then it is never closed.
Maybe this triggers ideas of potentially other spots. I have a local jena checkout and will try to find out whether there are any other leaks. My goal is to have the query complete on the whole endpoint list - despite many of the URLs actually referring to by now broken services.
I am aware of the context settings in https://jena.apache.org/documentation/query/service.html - but I did not fiddle with the settings - especially timeouts, as so far the issue is really the exhaustion of the connection pool.
> SERVICE in SPARQL blocks after a while
> --------------------------------------
>
> Key: JENA-1858
> URL: https://issues.apache.org/jira/browse/JENA-1858
> Project: Apache Jena
> Issue Type: Bug
> Components: ARQ
> Affects Versions: Jena 3.14.0
> Reporter: Claus Stadler
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Hi once again :)
> I wanted to create a quick RDF/SPARQL-based service online/offline monitoring system just like this:
> * A list of endpoints in [this dataset|https://github.com/SmartDataAnalytics/lodservatory/blob/master/sparql-endpoints-from-andre.ttl]
> * Have a CI process run this SPARQL query and publish/commit the results to a file
> {code}
> PREFIX eg: <http://www.example.org/>
> PREFIX dcat: <http://www.w3.org/ns/dcat#>
> CONSTRUCT {
> ?s eg:serviceStatus ?status
> }
> {
> ?s dcat:endpointURL ?e .
> # Here we rely on jena's substitution mechanism in QueryIterService.java - which is sufficient for my use case
> SERVICE SILENT ?e {
> # If the request fails, we get a single binding without any variables bound
> { SELECT ?t { ?x a ?t } LIMIT 1 }
> }
> BIND(IF(BOUND(?t), "online", "offline") AS ?status)
> }
> {code}
> However, the query blocks after a while by consuming the HTTP connection pool.
> I have not yet identified all sources, but one I could spot is here:
> * The InputStream opened at [Service.java#L172|https://github.com/apache/jena/blob/64253b9de5924006cdd46f1e3492a92031842d3b/jena-arq/src/main/java/org/apache/jena/sparql/engine/http/Service.java#L172] is not in a try-catch-block, so if the subsequent XML parsing fails, then it is never closed.
> Maybe this triggers ideas of potentially other spots. I have a local jena checkout and will try to find out whether there are any other leaks. My goal is to have the query complete on the whole endpoint list - despite many of the URLs actually referring to by now broken services.
> I am aware of the context settings in https://jena.apache.org/documentation/query/service.html - but I did not fiddle with the settings - especially timeouts, as so far the issue is really the exhaustion of the connection pool.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)