You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Rob Vesse (JIRA)" <ji...@apache.org> on 2013/01/02 11:24:12 UTC

[jira] [Commented] (JENA-228) Limiting query output centrally

    [ https://issues.apache.org/jira/browse/JENA-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13542073#comment-13542073 ] 

Rob Vesse commented on JENA-228:
--------------------------------

Regardless of implementation this feature essentially boils down to a form of security feature, you are looking to secure your SPARQL engine (however it may be exposed) against intentional or inadvertent denial of service attacks.  These might be queries where calculating large numbers of results is costly or queries which are simple but produce vast amounts of data - think SELECT * { ?s ?p ?o } on very large numbers.

FWIW this was something we addressed this at YarcData in our product by doing interception in two ways:

1 - At the SPARQL endpoint layer

We have a defined (and configurable) limit on results and when a query comes in we inspect the existing LIMIT (if any) and add our own LIMIT as appropriate.  Where a pre-existing limit exists we apply the lesser of the existing and system limit e.g. if the system limit is 100 and the query has LIMIT 1 we leave it as is.

2 - At the Query Iterator layer

Since we farm out the entire query to an external query engine and then translate the internal format of results back into Jena classes via a custom QueryIterator implementation we also apply a limit at this point.  Applying at this level is more useful when you want to make the query engine do all the work but don't need all the results sent direct to the client, this is useful for us because we transmit large results over architectural boundaries using files on disk so can provide previews of results to clients with the full results available on disk for later consumption,

The latter approach is probably less applicable to ARQ because a query is answered by some combination of iterators not a single iterator as in our case.  Personally I prefer the former approach because it is nice and early in the pipeline.  However a general solution probably needs to be somewhere more in the middle to account for both queries coming in via SPARQL endpoints and via the API.
                
> Limiting query output centrally
> -------------------------------
>
>                 Key: JENA-228
>                 URL: https://issues.apache.org/jira/browse/JENA-228
>             Project: Apache Jena
>          Issue Type: New Feature
>          Components: ARQ, Fuseki
>    Affects Versions: ARQ 2.9.0, Fuseki 0.2.1
>            Reporter: Giuseppe Sollazzo
>
> I was wondering whether there will be some way of limiting output in fuseki. Basically, I'd like to be able to enforce limits on the number of results returned by the system.
> As an example, think about a "numrows" in sql.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira