You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Paolo Castagna (JIRA)" <ji...@apache.org> on 2011/08/08 10:40:27 UTC
[jira] [Updated] (JENA-89) Avoid a total sort for ORDER BY + LIMIT
queries
[ https://issues.apache.org/jira/browse/JENA-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paolo Castagna updated JENA-89:
-------------------------------
Description:
In case of SPARQL queries with ORDER BY + LIMIT, ARQ sorts the entire result set and then produces the first N, according to the specified LIMIT.
As an alternative, we can use a PriorityQueue [1] (which is based on a priority heap) to avoid a sort operation.
ARQ's algebra package contains already a OpTopN [2] operator. The OpExecutor [3] will need to use a new QueryIterTopN instead of QueryIterSort + QueryIterSlice. A new TransformOrderByLimit to be used by Optimize is also necessary.
ORDER BY + LIMIT queries are typically used to construct the first page when results are paginated. Then the following query is ORDER BY + OFFSET + LIMIT. (Often users stop at the first page). Ideally, we could cache the ORDER BY and implement the OFFSET|LIMIT using results from the cache. However, the improvement described by this issue is limited to the ORDER BY + LIMIT case for which a priority heap is a good enough solution.
Hopefully, this would improve the scalability of ORDER BY + LIMIT queries in case of small values specified on the LIMIT.
[1] http://download.oracle.com/javase/6/docs/api/java/util/PriorityQueue.html
[2] https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/com/hp/hpl/jena/sparql/algebra/op/OpTopN.java
[3] https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/com/hp/hpl/jena/sparql/engine/main/OpExecutor.java
was:
In case of SPARQL queries with ORDER BY + LIMIT, ARQ sorts the entire result set and then produces the first N, according to the specified LIMIT.
As an alternative, we can use a [PriorityQueue|http://download.oracle.com/javase/6/docs/api/java/util/PriorityQueue.html] (which is based on a priority heap) to avoid a sort operation.
ARQ's algebra package contains already a [OpTopN|https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/com/hp/hpl/jena/sparql/algebra/op/OpTopN.java] operator. The [OpExecutor|https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/com/hp/hpl/jena/sparql/engine/main/OpExecutor.java] will need to use a new QueryIterTopN instead of QueryIterSort + QueryIterSlice. A new TransformOrderByLimit to be used by Optimize is also necessary.
ORDER BY + LIMIT queries are typically used to construct the first page when results are paginated. Then the following query is ORDER BY + OFFSET + LIMIT. (Often users stop at the first page). Ideally, we could cache the ORDER BY and implement the OFFSET|LIMIT using results from the cache. However, the improvement described by this issue is limited to the ORDER BY + LIMIT case for which a priority heap is a good enough solution.
Hopefully, this would improve the scalability of ORDER BY + LIMIT queries in case of small values specified on the LIMIT.
> Avoid a total sort for ORDER BY + LIMIT queries
> -----------------------------------------------
>
> Key: JENA-89
> URL: https://issues.apache.org/jira/browse/JENA-89
> Project: Jena
> Issue Type: Improvement
> Components: ARQ
> Reporter: Paolo Castagna
> Assignee: Paolo Castagna
> Priority: Minor
> Labels: arq, optimization, scalability, sparql
>
> In case of SPARQL queries with ORDER BY + LIMIT, ARQ sorts the entire result set and then produces the first N, according to the specified LIMIT.
> As an alternative, we can use a PriorityQueue [1] (which is based on a priority heap) to avoid a sort operation.
> ARQ's algebra package contains already a OpTopN [2] operator. The OpExecutor [3] will need to use a new QueryIterTopN instead of QueryIterSort + QueryIterSlice. A new TransformOrderByLimit to be used by Optimize is also necessary.
> ORDER BY + LIMIT queries are typically used to construct the first page when results are paginated. Then the following query is ORDER BY + OFFSET + LIMIT. (Often users stop at the first page). Ideally, we could cache the ORDER BY and implement the OFFSET|LIMIT using results from the cache. However, the improvement described by this issue is limited to the ORDER BY + LIMIT case for which a priority heap is a good enough solution.
> Hopefully, this would improve the scalability of ORDER BY + LIMIT queries in case of small values specified on the LIMIT.
> [1] http://download.oracle.com/javase/6/docs/api/java/util/PriorityQueue.html
> [2] https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/com/hp/hpl/jena/sparql/algebra/op/OpTopN.java
> [3] https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/trunk/src/com/hp/hpl/jena/sparql/engine/main/OpExecutor.java
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira