You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Andy Seaborne <an...@apache.org> on 2016/11/07 18:32:04 UTC

TransformFilterImplicitJoin

Rob,

I ran into some issue with TransformFilterImplicitJoin

1/ As noted in the javadoc, the join condition can not be between 
literals for the FILTER(?x = ?y) variant.

While the javadoc for the control symbol says "This optimization is 
conservative - it does not take place if there is a potential risk of 
changing query semantics." there is no test that ?x or ?y come from a 
position that ensures one or other is not a literal.

This optimization is on by default. Is this a good idea?

2/ It eliminates dead code - except it also eliminates not-so-dead code 
in the case of EXISTS because the scoping is more complicated.

.. pattern involving ?z ...
FILTER EXIST{ ... FILTER(?x = ?z) }

where one comes from outside from the current row being filtered.

This is as much to do with the scoping engine; what caught me is the 
fact that "implicit join" was doing code elimination.

     Andy


Re: TransformFilterImplicitJoin

Posted by Rob Vesse <rv...@dotnetrdf.org>.
Andy

Comments inline:

On 07/11/2016 18:32, "Andy Seaborne" <an...@apache.org> wrote:

    Rob,
    
    I ran into some issue with TransformFilterImplicitJoin
    
    1/ As noted in the javadoc, the join condition can not be between 
    literals for the FILTER(?x = ?y) variant.
    
    While the javadoc for the control symbol says "This optimization is 
    conservative - it does not take place if there is a potential risk of 
    changing query semantics." there is no test that ?x or ?y come from a 
    position that ensures one or other is not a literal.

There is, it is done in the preprocess() method which calls isSafeEquals().  isSafeEquals() considers the variables and their positions found by calling OpVars.mentionedVarsByPosition() and checking that ?x and ?y don’t appear in any unsafe position without also appearing in a safe position.

preprocess() is called by preprocessFilterImplicitJoin() which is pretty much the first call in the transform.  If it doesn’t find any eligible ?x = ?y expressions it returns null and the transform does not proceed.
    
    This optimization is on by default. Is this a good idea?

 Yes it has huge performance benefits as it often eliminates potential cross products
    
    2/ It eliminates dead code - except it also eliminates not-so-dead code 
    in the case of EXISTS because the scoping is more complicated.
    
    .. pattern involving ?z ...
    FILTER EXIST{ ... FILTER(?x = ?z) }
    
    where one comes from outside from the current row being filtered.
    
    This is as much to do with the scoping engine; what caught me is the 
    fact that "implicit join" was doing code elimination.

The transform was heavily based upon TransformFilterEquality. In fact, I’m pretty sure I copy pasted that and then modify as necessary so any issue that exists also exist there.

 (Aside - Quite frankly EXISTS/NOT EXISTS is a terrible feature that never should’ve made it into the language because it screws with scope so damn much)

 There may also be a similar problem with TransformImplicitLeftJoin which implements much the same logic.
    
         Andy