You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Diogo FC Patrao <dj...@gmail.com> on 2013/09/02 20:03:45 UTC

ARQ Service join strategy

Hi

I'm running a query with join results from two endpoints:

SELECT * {
  SERVICE <s1> { ?a a :Class1 } # q1
  SERVICE <s2> { ?a a :Class2 } # q2
}

I noticed (by running ARQ 2.8.8 with -v) that it first dispatches query q1
on s1 and then one query q2 on s2 *for each result* in the previous query.

Is there any other way of doing it? I mean, I'm getting like 1M results
from q1, maybe it would be better to get all results from q2 and then join
them in memory, or pass more than one value at a time for q2 (the VALUE tag
allows that).

Cheers!


--
diogo patrão

Re: ARQ Service join strategy

Posted by Diogo FC Patrao <dj...@gmail.com>.

Thanks for your help, Andy!

dfcp

--
diogo patrão




On Thu, Sep 5, 2013 at 8:09 AM, Andy Seaborne <an...@apache.org> wrote:

> On 04/09/13 11:32, Diogo FC Patrao wrote:
>
>> Hi Andy
>>
>> Thanks for answering! I'll look into OpExecutor to see if I get somewhere.
>>
>> Couldn't find the webpage to this quack library, is it published yet?
>>
>
> Not yet; its in my github  area (user 'afs')/
>
> There is code there that might help you (that said, hash join is actually
> one of the simpler join algorithms to implement if you want in-memory joins
> datastructures).
>
>         Andy
>
>
>> Cheers,
>>
>> Dfcp
>>
>> On Tuesday, September 3, 2013 <x-apple-data-detectors://24>, Andy Seaborne
>>
>> wrote:
>>
>>  On 02/09/13 19:03, Diogo FC Patrao wrote:
>>>
>>
>>
>>
>>
>>    Hi
>>>
>>>>
>>>> I'm running a query with join results from two endpoints:
>>>>
>>>> SELECT * {
>>>>     SERVICE <s1> { ?a a :Class1 } # q1
>>>>     SERVICE <s2> { ?a a :Class2 } # q2
>>>> }
>>>>
>>>> I noticed (by running ARQ 2.8.8 with -v) that it first dispatches query
>>>> q1
>>>> on s1 and then one query q2 on s2 *for each result* in the previous
>>>> query.
>>>>
>>>> Is there any other way of doing it? I mean, I'm getting like 1M results
>>>> from q1, maybe it would be better to get all results from q2 and then
>>>> join
>>>> them in memory, or pass more than one value at a time for q2 (the VALUE
>>>> tag
>>>> allows that).
>>>>
>>>>
>>> Without programming, there is not a way to control the execution. You can
>>> implement a OpExecutor and provide your own implementation of join.
>>>
>>> There are many issues such as how to know the sizes from a remote service
>>> and how to send data to a remote service from s1 to s2 -- VALUEs assumes
>>> looping back through the query engine, but does s2 allow huge SPARQL.
>>>  See
>>> bind joins from the IBM Garlic papers.
>>>
>>> ARQ provides the basics - remote execution - but isn't a federated query
>>> optimizer.
>>>
>>> There is a not-yet-ready query engine, called 'quack', that provides
>>> merge
>>> and hash joins - only on BGPs currently but the code, in principle, is
>>> general.
>>>
>>>          Andy
>>>
>>>
>>>  Cheers!
>>>>
>>>>
>>>> --
>>>> diogo patrão
>>>>
>>>>
>>>>
>>>
>>
>

Re: ARQ Service join strategy

Posted by Andy Seaborne <an...@apache.org>.

On 04/09/13 11:32, Diogo FC Patrao wrote:
> Hi Andy
>
> Thanks for answering! I'll look into OpExecutor to see if I get somewhere.
>
> Couldn't find the webpage to this quack library, is it published yet?

Not yet; its in my github  area (user 'afs')/

There is code there that might help you (that said, hash join is 
actually one of the simpler join algorithms to implement if you want 
in-memory joins datastructures).

	Andy

>
> Cheers,
>
> Dfcp
>
> On Tuesday, September 3, 2013 <x-apple-data-detectors://24>, Andy Seaborne
> wrote:
>
>> On 02/09/13 19:03, Diogo FC Patrao wrote:
>
>
>
>
>>   Hi
>>>
>>> I'm running a query with join results from two endpoints:
>>>
>>> SELECT * {
>>>     SERVICE <s1> { ?a a :Class1 } # q1
>>>     SERVICE <s2> { ?a a :Class2 } # q2
>>> }
>>>
>>> I noticed (by running ARQ 2.8.8 with -v) that it first dispatches query q1
>>> on s1 and then one query q2 on s2 *for each result* in the previous query.
>>>
>>> Is there any other way of doing it? I mean, I'm getting like 1M results
>>> from q1, maybe it would be better to get all results from q2 and then join
>>> them in memory, or pass more than one value at a time for q2 (the VALUE
>>> tag
>>> allows that).
>>>
>>
>> Without programming, there is not a way to control the execution. You can
>> implement a OpExecutor and provide your own implementation of join.
>>
>> There are many issues such as how to know the sizes from a remote service
>> and how to send data to a remote service from s1 to s2 -- VALUEs assumes
>> looping back through the query engine, but does s2 allow huge SPARQL.  See
>> bind joins from the IBM Garlic papers.
>>
>> ARQ provides the basics - remote execution - but isn't a federated query
>> optimizer.
>>
>> There is a not-yet-ready query engine, called 'quack', that provides merge
>> and hash joins - only on BGPs currently but the code, in principle, is
>> general.
>>
>>          Andy
>>
>>
>>> Cheers!
>>>
>>>
>>> --
>>> diogo patrão
>>>
>>>
>>
>

Re: ARQ Service join strategy

Posted by Diogo FC Patrao <dj...@gmail.com>.

Hi Andy

Thanks for answering! I'll look into OpExecutor to see if I get somewhere.

Couldn't find the webpage to this quack library, is it published yet?

Cheers,

Dfcp

On Tuesday, September 3, 2013 <x-apple-data-detectors://24>, Andy Seaborne
wrote:

> On 02/09/13 19:03, Diogo FC Patrao wrote:




>  Hi
>>
>> I'm running a query with join results from two endpoints:
>>
>> SELECT * {
>>    SERVICE <s1> { ?a a :Class1 } # q1
>>    SERVICE <s2> { ?a a :Class2 } # q2
>> }
>>
>> I noticed (by running ARQ 2.8.8 with -v) that it first dispatches query q1
>> on s1 and then one query q2 on s2 *for each result* in the previous query.
>>
>> Is there any other way of doing it? I mean, I'm getting like 1M results
>> from q1, maybe it would be better to get all results from q2 and then join
>> them in memory, or pass more than one value at a time for q2 (the VALUE
>> tag
>> allows that).
>>
>
> Without programming, there is not a way to control the execution. You can
> implement a OpExecutor and provide your own implementation of join.
>
> There are many issues such as how to know the sizes from a remote service
> and how to send data to a remote service from s1 to s2 -- VALUEs assumes
> looping back through the query engine, but does s2 allow huge SPARQL.  See
> bind joins from the IBM Garlic papers.
>
> ARQ provides the basics - remote execution - but isn't a federated query
> optimizer.
>
> There is a not-yet-ready query engine, called 'quack', that provides merge
> and hash joins - only on BGPs currently but the code, in principle, is
> general.
>
>         Andy
>
>
>> Cheers!
>>
>>
>> --
>> diogo patrão
>>
>>
>

-- 
--
diogo patrão

Re: ARQ Service join strategy

Posted by Andy Seaborne <an...@apache.org>.

On 02/09/13 19:03, Diogo FC Patrao wrote:
> Hi
>
> I'm running a query with join results from two endpoints:
>
> SELECT * {
>    SERVICE <s1> { ?a a :Class1 } # q1
>    SERVICE <s2> { ?a a :Class2 } # q2
> }
>
> I noticed (by running ARQ 2.8.8 with -v) that it first dispatches query q1
> on s1 and then one query q2 on s2 *for each result* in the previous query.
>
> Is there any other way of doing it? I mean, I'm getting like 1M results
> from q1, maybe it would be better to get all results from q2 and then join
> them in memory, or pass more than one value at a time for q2 (the VALUE tag
> allows that).

Without programming, there is not a way to control the execution. You 
can implement a OpExecutor and provide your own implementation of join.

There are many issues such as how to know the sizes from a remote 
service and how to send data to a remote service from s1 to s2 -- VALUEs 
assumes looping back through the query engine, but does s2 allow huge 
SPARQL.  See bind joins from the IBM Garlic papers.

ARQ provides the basics - remote execution - but isn't a federated query 
optimizer.

There is a not-yet-ready query engine, called 'quack', that provides 
merge and hash joins - only on BGPs currently but the code, in 
principle, is general.

	Andy

>
> Cheers!
>
>
> --
> diogo patrão
>