You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (JIRA)" <ji...@apache.org> on 2013/12/24 17:25:57 UTC

[jira] [Commented] (JENA-615) Possible optimisation for FILTER(?var != )

    [ https://issues.apache.org/jira/browse/JENA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856380#comment-13856380 ] 

Andy Seaborne commented on JENA-615:
------------------------------------

That would be a good pattern if it can be optimized.

Since 2.11.0, the filter placement has been improved which helps this sort of pattern when {{ # Some patterns}} is complex.

Example:
{noformat}
PREFIX : <http://example/>
SELECT * {
  ?var :p ?o .
  OPTIONAL {?var :q ?v }
  FILTER(?var != <http://constant>)
}
{noformat}

used to give (2.11.0)
{noformat}
(prefix ((: <http://example/>))
  (filter (!= ?var <http://constant>)
    (conditional
      (bgp (triple ?var :p ?o))
      (bgp (triple ?var :q ?v)))))
{noformat}
and now gives (2.11.1 development):
{noformat}
(prefix ((: <http://example/>))
  (conditional
    (filter (!= ?var <http://constant>)
      (bgp (triple ?var :p ?o)))
    (bgp (triple ?var :q ?v))))
{noformat}
so it's putting the removal of `?var = <http://constant>` as early possible.

For some engines, like TDB, where the lowest level are not the Nodes themselves but some internal id, then {{FILTER(?var != <http://constant>)}} naively needs to get the string form of `?var`. It could work in reverse and find the internal id for <http://constant> then do filtering on the internal ids, which is much more efficient (no need to touch the node table) although as it's a != test, ?var might well be returned anyway so getting early may not matter too much.

A special optimization is when {{<http://constant>}} is not in the data at all.

As ever, it comes down to timing different designs to see which tradeoffs works best.

> Possible optimisation for FILTER(?var != <constant>)
> ----------------------------------------------------
>
>                 Key: JENA-615
>                 URL: https://issues.apache.org/jira/browse/JENA-615
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>            Reporter: Rob Vesse
>            Assignee: Rob Vesse
>            Priority: Minor
>              Labels: algebra, optimization, sparql
>
> I have an idea for a possible optimisation for queries of the following general form:
> {noformat}
> SELECT *
> WHERE
> {
>   # Some Patterns
>   FILTER(?var != <http://constant>)
> } 
> {noformat}
> This pattern crops up surprisingly often in real SPARQL workloads since it is often used to either limit a variable to exclude certain possibilities or to avoid self referential links in the data.
> In some cases it seems like this could be safely rewritten as follows:
> {noformat}
> SELECT *
> WHERE
> {
>   # Some Patterns
>   MINUS { BIND(<http://constant> AS ?var) }
> }
> {noformat}
> Or perhaps in a more generalised form like so:
> {noformat}
> SELECT * WHERE
> {
>   # Some patterns
>   MINUS { VALUES ?var { <http://constant/1> <http://constant/2> } }
> }
> {noformat}
> Which would nicely deal with the case of stating that a variable is not equal to multiple constant values.
> As I pointed out earlier this would not apply in every case, specifically I think at least the following must be true:
> - The variable must be guaranteed to be bound (similar to existing filter equality and implicit join optimisations)
> There is also the potential to spot cases where the variable will always be unbound and thus the expression is always an error and replace the entire sub-tree with {{table empty}} as we already do for equality and implicit join filters.
> I plan on taking a look at implementing this in the new year, if anyone has any thoughts on this (especially wrt to restrictions that should apply to when the optimisation is considered safe) then please comment.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)