You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Diogo FC Patrao <dj...@gmail.com> on 2013/09/02 20:03:45 UTC
ARQ Service join strategy
Hi
I'm running a query with join results from two endpoints:
SELECT * {
SERVICE <s1> { ?a a :Class1 } # q1
SERVICE <s2> { ?a a :Class2 } # q2
}
I noticed (by running ARQ 2.8.8 with -v) that it first dispatches query q1
on s1 and then one query q2 on s2 *for each result* in the previous query.
Is there any other way of doing it? I mean, I'm getting like 1M results
from q1, maybe it would be better to get all results from q2 and then join
them in memory, or pass more than one value at a time for q2 (the VALUE tag
allows that).
Cheers!
--
diogo patrão
Re: ARQ Service join strategy
Posted by Diogo FC Patrao <dj...@gmail.com>.
Thanks for your help, Andy!
dfcp
--
diogo patrão
On Thu, Sep 5, 2013 at 8:09 AM, Andy Seaborne <an...@apache.org> wrote:
> On 04/09/13 11:32, Diogo FC Patrao wrote:
>
>> Hi Andy
>>
>> Thanks for answering! I'll look into OpExecutor to see if I get somewhere.
>>
>> Couldn't find the webpage to this quack library, is it published yet?
>>
>
> Not yet; its in my github area (user 'afs')/
>
> There is code there that might help you (that said, hash join is actually
> one of the simpler join algorithms to implement if you want in-memory joins
> datastructures).
>
> Andy
>
>
>> Cheers,
>>
>> Dfcp
>>
>> On Tuesday, September 3, 2013 <x-apple-data-detectors://24>, Andy Seaborne
>>
>> wrote:
>>
>> On 02/09/13 19:03, Diogo FC Patrao wrote:
>>>
>>
>>
>>
>>
>> Hi
>>>
>>>>
>>>> I'm running a query with join results from two endpoints:
>>>>
>>>> SELECT * {
>>>> SERVICE <s1> { ?a a :Class1 } # q1
>>>> SERVICE <s2> { ?a a :Class2 } # q2
>>>> }
>>>>
>>>> I noticed (by running ARQ 2.8.8 with -v) that it first dispatches query
>>>> q1
>>>> on s1 and then one query q2 on s2 *for each result* in the previous
>>>> query.
>>>>
>>>> Is there any other way of doing it? I mean, I'm getting like 1M results
>>>> from q1, maybe it would be better to get all results from q2 and then
>>>> join
>>>> them in memory, or pass more than one value at a time for q2 (the VALUE
>>>> tag
>>>> allows that).
>>>>
>>>>
>>> Without programming, there is not a way to control the execution. You can
>>> implement a OpExecutor and provide your own implementation of join.
>>>
>>> There are many issues such as how to know the sizes from a remote service
>>> and how to send data to a remote service from s1 to s2 -- VALUEs assumes
>>> looping back through the query engine, but does s2 allow huge SPARQL.
>>> See
>>> bind joins from the IBM Garlic papers.
>>>
>>> ARQ provides the basics - remote execution - but isn't a federated query
>>> optimizer.
>>>
>>> There is a not-yet-ready query engine, called 'quack', that provides
>>> merge
>>> and hash joins - only on BGPs currently but the code, in principle, is
>>> general.
>>>
>>> Andy
>>>
>>>
>>> Cheers!
>>>>
>>>>
>>>> --
>>>> diogo patrão
>>>>
>>>>
>>>>
>>>
>>
>
Re: ARQ Service join strategy
Posted by Andy Seaborne <an...@apache.org>.
On 04/09/13 11:32, Diogo FC Patrao wrote:
> Hi Andy
>
> Thanks for answering! I'll look into OpExecutor to see if I get somewhere.
>
> Couldn't find the webpage to this quack library, is it published yet?
Not yet; its in my github area (user 'afs')/
There is code there that might help you (that said, hash join is
actually one of the simpler join algorithms to implement if you want
in-memory joins datastructures).
Andy
>
> Cheers,
>
> Dfcp
>
> On Tuesday, September 3, 2013 <x-apple-data-detectors://24>, Andy Seaborne
> wrote:
>
>> On 02/09/13 19:03, Diogo FC Patrao wrote:
>
>
>
>
>> Hi
>>>
>>> I'm running a query with join results from two endpoints:
>>>
>>> SELECT * {
>>> SERVICE <s1> { ?a a :Class1 } # q1
>>> SERVICE <s2> { ?a a :Class2 } # q2
>>> }
>>>
>>> I noticed (by running ARQ 2.8.8 with -v) that it first dispatches query q1
>>> on s1 and then one query q2 on s2 *for each result* in the previous query.
>>>
>>> Is there any other way of doing it? I mean, I'm getting like 1M results
>>> from q1, maybe it would be better to get all results from q2 and then join
>>> them in memory, or pass more than one value at a time for q2 (the VALUE
>>> tag
>>> allows that).
>>>
>>
>> Without programming, there is not a way to control the execution. You can
>> implement a OpExecutor and provide your own implementation of join.
>>
>> There are many issues such as how to know the sizes from a remote service
>> and how to send data to a remote service from s1 to s2 -- VALUEs assumes
>> looping back through the query engine, but does s2 allow huge SPARQL. See
>> bind joins from the IBM Garlic papers.
>>
>> ARQ provides the basics - remote execution - but isn't a federated query
>> optimizer.
>>
>> There is a not-yet-ready query engine, called 'quack', that provides merge
>> and hash joins - only on BGPs currently but the code, in principle, is
>> general.
>>
>> Andy
>>
>>
>>> Cheers!
>>>
>>>
>>> --
>>> diogo patrão
>>>
>>>
>>
>
Re: ARQ Service join strategy
Posted by Diogo FC Patrao <dj...@gmail.com>.
Hi Andy
Thanks for answering! I'll look into OpExecutor to see if I get somewhere.
Couldn't find the webpage to this quack library, is it published yet?
Cheers,
Dfcp
On Tuesday, September 3, 2013 <x-apple-data-detectors://24>, Andy Seaborne
wrote:
> On 02/09/13 19:03, Diogo FC Patrao wrote:
> Hi
>>
>> I'm running a query with join results from two endpoints:
>>
>> SELECT * {
>> SERVICE <s1> { ?a a :Class1 } # q1
>> SERVICE <s2> { ?a a :Class2 } # q2
>> }
>>
>> I noticed (by running ARQ 2.8.8 with -v) that it first dispatches query q1
>> on s1 and then one query q2 on s2 *for each result* in the previous query.
>>
>> Is there any other way of doing it? I mean, I'm getting like 1M results
>> from q1, maybe it would be better to get all results from q2 and then join
>> them in memory, or pass more than one value at a time for q2 (the VALUE
>> tag
>> allows that).
>>
>
> Without programming, there is not a way to control the execution. You can
> implement a OpExecutor and provide your own implementation of join.
>
> There are many issues such as how to know the sizes from a remote service
> and how to send data to a remote service from s1 to s2 -- VALUEs assumes
> looping back through the query engine, but does s2 allow huge SPARQL. See
> bind joins from the IBM Garlic papers.
>
> ARQ provides the basics - remote execution - but isn't a federated query
> optimizer.
>
> There is a not-yet-ready query engine, called 'quack', that provides merge
> and hash joins - only on BGPs currently but the code, in principle, is
> general.
>
> Andy
>
>
>> Cheers!
>>
>>
>> --
>> diogo patrão
>>
>>
>
--
--
diogo patrão
Re: ARQ Service join strategy
Posted by Andy Seaborne <an...@apache.org>.
On 02/09/13 19:03, Diogo FC Patrao wrote:
> Hi
>
> I'm running a query with join results from two endpoints:
>
> SELECT * {
> SERVICE <s1> { ?a a :Class1 } # q1
> SERVICE <s2> { ?a a :Class2 } # q2
> }
>
> I noticed (by running ARQ 2.8.8 with -v) that it first dispatches query q1
> on s1 and then one query q2 on s2 *for each result* in the previous query.
>
> Is there any other way of doing it? I mean, I'm getting like 1M results
> from q1, maybe it would be better to get all results from q2 and then join
> them in memory, or pass more than one value at a time for q2 (the VALUE tag
> allows that).
Without programming, there is not a way to control the execution. You
can implement a OpExecutor and provide your own implementation of join.
There are many issues such as how to know the sizes from a remote
service and how to send data to a remote service from s1 to s2 -- VALUEs
assumes looping back through the query engine, but does s2 allow huge
SPARQL. See bind joins from the IBM Garlic papers.
ARQ provides the basics - remote execution - but isn't a federated query
optimizer.
There is a not-yet-ready query engine, called 'quack', that provides
merge and hash joins - only on BGPs currently but the code, in
principle, is general.
Andy
>
> Cheers!
>
>
> --
> diogo patrão
>