You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Andy Seaborne <an...@apache.org> on 2016/11/07 18:32:04 UTC
TransformFilterImplicitJoin
Rob,
I ran into some issue with TransformFilterImplicitJoin
1/ As noted in the javadoc, the join condition can not be between
literals for the FILTER(?x = ?y) variant.
While the javadoc for the control symbol says "This optimization is
conservative - it does not take place if there is a potential risk of
changing query semantics." there is no test that ?x or ?y come from a
position that ensures one or other is not a literal.
This optimization is on by default. Is this a good idea?
2/ It eliminates dead code - except it also eliminates not-so-dead code
in the case of EXISTS because the scoping is more complicated.
.. pattern involving ?z ...
FILTER EXIST{ ... FILTER(?x = ?z) }
where one comes from outside from the current row being filtered.
This is as much to do with the scoping engine; what caught me is the
fact that "implicit join" was doing code elimination.
Andy
Re: TransformFilterImplicitJoin
Posted by Rob Vesse <rv...@dotnetrdf.org>.
Andy
Comments inline:
On 07/11/2016 18:32, "Andy Seaborne" <an...@apache.org> wrote:
Rob,
I ran into some issue with TransformFilterImplicitJoin
1/ As noted in the javadoc, the join condition can not be between
literals for the FILTER(?x = ?y) variant.
While the javadoc for the control symbol says "This optimization is
conservative - it does not take place if there is a potential risk of
changing query semantics." there is no test that ?x or ?y come from a
position that ensures one or other is not a literal.
There is, it is done in the preprocess() method which calls isSafeEquals(). isSafeEquals() considers the variables and their positions found by calling OpVars.mentionedVarsByPosition() and checking that ?x and ?y don’t appear in any unsafe position without also appearing in a safe position.
preprocess() is called by preprocessFilterImplicitJoin() which is pretty much the first call in the transform. If it doesn’t find any eligible ?x = ?y expressions it returns null and the transform does not proceed.
This optimization is on by default. Is this a good idea?
Yes it has huge performance benefits as it often eliminates potential cross products
2/ It eliminates dead code - except it also eliminates not-so-dead code
in the case of EXISTS because the scoping is more complicated.
.. pattern involving ?z ...
FILTER EXIST{ ... FILTER(?x = ?z) }
where one comes from outside from the current row being filtered.
This is as much to do with the scoping engine; what caught me is the
fact that "implicit join" was doing code elimination.
The transform was heavily based upon TransformFilterEquality. In fact, I’m pretty sure I copy pasted that and then modify as necessary so any issue that exists also exist there.
(Aside - Quite frankly EXISTS/NOT EXISTS is a terrible feature that never should’ve made it into the language because it screws with scope so damn much)
There may also be a similar problem with TransformImplicitLeftJoin which implements much the same logic.
Andy