You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Rob Vesse <rv...@yarcdata.com> on 2013/07/09 00:42:55 UTC

OpQuadPattern limitations?

I've been looking at doing various query optimizations lately and one which we would like to do involves combining adjacent quad patterns together.  ARQ will already combine adjacent BGPs but does not do this for quad patterns.

Part of the issue in doing this ourselves seems to be the fact that ARQ treats a OpQuadPattern as a wrapper around a graph Node and a BasicPattern and produces the QuadPattern on the fly by using the graph node to form quads.  This means that a merger can only be made where the adjacent quad patterns have the same graph node as otherwise we lose part of the graph information.  It would be useful to us if OpQuadPattern instead just held a QuadPattern and did not have a fixed graph node associated with it.  However I suspect this would have a lot of knock on effects to other implementations so this is not an implementation detail which I would lightly change.

Is there value in making this change longer term and what would the knock of effects be?

Or is is better to introduce a new operator which is a true wrapper around a QuadPattern and allows for different graph nodes on different quads within the pattern?  This way we don't propagate the change to implementations where it would not make any sense or would create unnecessary work.

If the latter is preferable we can probably do this completely in our code base by subclassing OpExt and not affect ARQ itself but thought I'd throw the idea out there to see if there was any value of making the change in ARQ

Rob

Re: OpQuadPattern limitations?

Posted by Rob Vesse <rv...@yarcdata.com>.
Comments inline:


On 7/9/13 7:13 AM, "Andy Seaborne" <an...@apache.org> wrote:

>On 08/07/13 23:42, Rob Vesse wrote:
>> I've been looking at doing various query optimizations lately and one
>> which we would like to do involves combining adjacent quad patterns
>> together.  ARQ will already combine adjacent BGPs but does not do
>> this for quad patterns.
>>
>> Part of the issue in doing this ourselves seems to be the fact that
>> ARQ treats a OpQuadPattern as a wrapper around a graph Node and a
>> BasicPattern and produces the QuadPattern on the fly by using the
>> graph node to form quads.  This means that a merger can only be made
>> where the adjacent quad patterns have the same graph node as
>> otherwise we lose part of the graph information.  It would be useful
>> to us if OpQuadPattern instead just held a QuadPattern and did not
>> have a fixed graph node associated with it.  However I suspect this
>> would have a lot of knock on effects to other implementations so this
>> is not an implementation detail which I would lightly change.
>>
>> Is there value in making this change longer term and what would the
>> knock of effects be?
>>
>> Or is is better to introduce a new operator which is a true wrapper
>> around a QuadPattern and allows for different graph nodes on
>> different quads within the pattern?  This way we don't propagate the
>> change to implementations where it would not make any sense or would
>> create unnecessary work.
>>
>> If the latter is preferable we can probably do this completely in our
>> code base by subclassing OpExt and not affect ARQ itself but thought
>> I'd throw the idea out there to see if there was any value of making
>> the change in ARQ
>>
>> Rob
>>
>
>I agree that long term, a different OpQuadPattern would be good.  It's
>the getting to there from here that matters.
>
>And, yes, changing too quickly it would be have knock-on effects, not
>just on Jena but maybe (=probably) extensions.  I looked at calls to
>getGraphNode() and getBasicPattern() and getPattern() and there are
>enough to see it's not a simple switch but it's not huge either.

Yes I figured as much and it sounds like it would create too many issues
to be worth doing.

Plus the caveats you point out would render it a bad design choice for
many systems.

>
>A couple of caveats:
>
>* Some boundaries are special, like default union graph.
>
>* Entailment works on graphs: keeping boundaries can matter to some
>systems.  I don't know if such systems do quad-things.
>
>* OpAsQuery may be affected.  Should not be too bad - it can regroup
>quads.
>
>* Odd corner cases like crossing storage boundaries as the graph
>boundary changes.  Probably shouldn't happen.

Thankfully for us none of these apply but I can see they would create
problems in many systems.

>
>
>Thought: What about going the other way??
>Convert to joins of OpTriple/OpQuad everywhere and mark trees of pure
>joins of triples/quads.
>
>See TransformPattern2Join

Interesting idea though won't work in our architecture because it relies
on being able to do large blocks of work in parallel and breaking down the
algebra like this would perform poorly or require severe refactoring of
our internals to recognize these special tree structures.

>
>
>
>If you want to do that under OpExt, that's a "no barrier" route.  I
>would be interested in comments on how well that works - extensible Ops
>is nice but it interacts with the visitor/transform pattern.  I don't
>know of a better way but I could well be missing a design pattern.

Yes I already got this working the OpExt way yesterday, I'm doing this as
the very last step in our optimization process I.e. after applying the ARQ
standard optimizations and our own internal ones.  Since the final algebra
tree need only be visited by our own internal visitors which can use the
visit(OpExt) method to dispatch the call to the actual appropriate custom
visitor method.


>
>I'm also happy to (myself) add OpQuadBlock soon and wire it in properly
>as a first class Op - it does not take too long and it would force me to
>look at the code.

A first class operator would make my life easier but is not essential.

Rob

>
>	Andy


Re: OpQuadPattern limitations?

Posted by Andy Seaborne <an...@apache.org>.
On 08/07/13 23:42, Rob Vesse wrote:
> I've been looking at doing various query optimizations lately and one
> which we would like to do involves combining adjacent quad patterns
> together.  ARQ will already combine adjacent BGPs but does not do
> this for quad patterns.
>
> Part of the issue in doing this ourselves seems to be the fact that
> ARQ treats a OpQuadPattern as a wrapper around a graph Node and a
> BasicPattern and produces the QuadPattern on the fly by using the
> graph node to form quads.  This means that a merger can only be made
> where the adjacent quad patterns have the same graph node as
> otherwise we lose part of the graph information.  It would be useful
> to us if OpQuadPattern instead just held a QuadPattern and did not
> have a fixed graph node associated with it.  However I suspect this
> would have a lot of knock on effects to other implementations so this
> is not an implementation detail which I would lightly change.
>
> Is there value in making this change longer term and what would the
> knock of effects be?
>
> Or is is better to introduce a new operator which is a true wrapper
> around a QuadPattern and allows for different graph nodes on
> different quads within the pattern?  This way we don't propagate the
> change to implementations where it would not make any sense or would
> create unnecessary work.
>
> If the latter is preferable we can probably do this completely in our
> code base by subclassing OpExt and not affect ARQ itself but thought
> I'd throw the idea out there to see if there was any value of making
> the change in ARQ
>
> Rob
>

I agree that long term, a different OpQuadPattern would be good.  It's 
the getting to there from here that matters.

And, yes, changing too quickly it would be have knock-on effects, not 
just on Jena but maybe (=probably) extensions.  I looked at calls to 
getGraphNode() and getBasicPattern() and getPattern() and there are 
enough to see it's not a simple switch but it's not huge either.

A couple of caveats:

* Some boundaries are special, like default union graph.

* Entailment works on graphs: keeping boundaries can matter to some 
systems.  I don't know if such systems do quad-things.

* OpAsQuery may be affected.  Should not be too bad - it can regroup quads.

* Odd corner cases like crossing storage boundaries as the graph 
boundary changes.  Probably shouldn't happen.


Thought: What about going the other way??
Convert to joins of OpTriple/OpQuad everywhere and mark trees of pure 
joins of triples/quads.

See TransformPattern2Join



If you want to do that under OpExt, that's a "no barrier" route.  I 
would be interested in comments on how well that works - extensible Ops 
is nice but it interacts with the visitor/transform pattern.  I don't 
know of a better way but I could well be missing a design pattern.

I'm also happy to (myself) add OpQuadBlock soon and wire it in properly 
as a first class Op - it does not take too long and it would force me to 
look at the code.

	Andy