You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@rya.apache.org by Brian McBride <br...@epimorphics.com> on 2016/03/24 09:56:34 UTC

locality groups

Does Rya automatically create a locality group for each each graph?

I'm assuming it does, but haven't found any confirmation anywhere and 
not got my head around the code yet.

Brian

-- 
Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Epimorphics Ltd. is a limited company registered in England (number 7016688)


Re: locality groups

Posted by "Aaron D. Mihalik" <aa...@gmail.com>.
No... it does not.

We've run into extreme use cases where a graph is created for each
statement (as means of providing properties for a statement).  In that
case, Accumulo would not be happy if Rya created a locality group for each
graph.

--Aaron

On Thu, Mar 24, 2016 at 9:29 AM Brian McBride <br...@epimorphics.com> wrote:

> Does Rya automatically create a locality group for each each graph?
>
> I'm assuming it does, but haven't found any confirmation anywhere and
> not got my head around the code yet.
>
> Brian
>
> --
> Epimorphics Ltd, http://www.epimorphics.com
> Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20
> 6PT
> Epimorphics Ltd. is a limited company registered in England (number
> 7016688)
>
>

Re: locality groups

Posted by Brian McBride <br...@epimorphics.com>.
Hi Aaron,

On 28/03/16 17:21, Aaron D. Mihalik wrote:
> Two things:
>
> 1. Rya will reject a blanket ?s ?p ?o query in order to avoid a full table
> scan (even if the G is set).
That's what I thought.

Thanks again.
Brian

>
> 2. Here's the code where the specific graph is set [1].  Rya sets the
> "ColumnFamily" value on the Accumulo scanner.  This filtering/seeking is
> all performed on the Accumulo Tablet server.
>
> --Aaron
>
>
> [1]
> https://github.com/apache/incubator-rya/blob/develop/dao/accumulo.rya/src/main/java/mvm/rya/accumulo/query/AccumuloRyaQueryEngine.java#L370
>
> On Sat, Mar 26, 2016 at 8:19 AM Brian McBride <br...@epimorphics.com> wrote:
>
>> Aaron replied:
>> [[
>>
>> No... it does not.
>>
>> We've run into extreme use cases where a graph is created for each
>> statement (as means of providing properties for a statement).  In that
>> case, Accumulo would not be happy if Rya created a locality group for each
>> graph.
>>
>> --Aaron
>>
>> ]]
>>
>> Again, thanks Aaron.  I can see that Accumulo would not be happy in
>> those circumstances.
>>
>> I've been trying to understand how Rya handles named graphs.   To me it
>> looks like it treats them as a filter on the results of matching S P O
>> patterns.  Since there is no index that includes the G component of a
>> quad, and locality groups are not used, I don't see how Rya can limit
>> its scans to a specific graph.  To scan a specific graph, it would have
>> to scan all the matching triples and filter out the ones from the
>> specific graph.  Is that right?  Or is there some Accumulo magic I've
>> missed.
>>
>> Queries like
>>
>> SELECT * { graph <http://example.org/foo> { ?s ?p ?o } }
>>
>> fail with an exception.
>>
>> A query like
>>
>> SELECT DISTINCT ?c { graph <http://example.org/foo> { ?s a ?c } }
>>
>> will scan all the rdf:type quads in the store, not just those in the
>> specified graph.  (Is that right?)
>>
>> Is this by design, or does it just reflect the current state of
>> development?
>>
>> Brian
>>
>>
>> --
>> Epimorphics Ltd, http://www.epimorphics.com
>> Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20
>> 6PT
>> Epimorphics Ltd. is a limited company registered in England (number
>> 7016688)
>>
>>

-- 
Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Epimorphics Ltd. is a limited company registered in England (number 7016688)


Re: locality groups

Posted by "Aaron D. Mihalik" <aa...@gmail.com>.
Two things:

1. Rya will reject a blanket ?s ?p ?o query in order to avoid a full table
scan (even if the G is set).

2. Here's the code where the specific graph is set [1].  Rya sets the
"ColumnFamily" value on the Accumulo scanner.  This filtering/seeking is
all performed on the Accumulo Tablet server.

--Aaron


[1]
https://github.com/apache/incubator-rya/blob/develop/dao/accumulo.rya/src/main/java/mvm/rya/accumulo/query/AccumuloRyaQueryEngine.java#L370

On Sat, Mar 26, 2016 at 8:19 AM Brian McBride <br...@epimorphics.com> wrote:

> Aaron replied:
> [[
>
> No... it does not.
>
> We've run into extreme use cases where a graph is created for each
> statement (as means of providing properties for a statement).  In that
> case, Accumulo would not be happy if Rya created a locality group for each
> graph.
>
> --Aaron
>
> ]]
>
> Again, thanks Aaron.  I can see that Accumulo would not be happy in
> those circumstances.
>
> I've been trying to understand how Rya handles named graphs.   To me it
> looks like it treats them as a filter on the results of matching S P O
> patterns.  Since there is no index that includes the G component of a
> quad, and locality groups are not used, I don't see how Rya can limit
> its scans to a specific graph.  To scan a specific graph, it would have
> to scan all the matching triples and filter out the ones from the
> specific graph.  Is that right?  Or is there some Accumulo magic I've
> missed.
>
> Queries like
>
> SELECT * { graph <http://example.org/foo> { ?s ?p ?o } }
>
> fail with an exception.
>
> A query like
>
> SELECT DISTINCT ?c { graph <http://example.org/foo> { ?s a ?c } }
>
> will scan all the rdf:type quads in the store, not just those in the
> specified graph.  (Is that right?)
>
> Is this by design, or does it just reflect the current state of
> development?
>
> Brian
>
>
> --
> Epimorphics Ltd, http://www.epimorphics.com
> Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20
> 6PT
> Epimorphics Ltd. is a limited company registered in England (number
> 7016688)
>
>

Re: locality groups

Posted by Brian McBride <br...@epimorphics.com>.
Aaron replied:
[[

No... it does not.

We've run into extreme use cases where a graph is created for each
statement (as means of providing properties for a statement).  In that
case, Accumulo would not be happy if Rya created a locality group for each
graph.

--Aaron

]]

Again, thanks Aaron.  I can see that Accumulo would not be happy in 
those circumstances.

I've been trying to understand how Rya handles named graphs.   To me it 
looks like it treats them as a filter on the results of matching S P O 
patterns.  Since there is no index that includes the G component of a 
quad, and locality groups are not used, I don't see how Rya can limit 
its scans to a specific graph.  To scan a specific graph, it would have 
to scan all the matching triples and filter out the ones from the 
specific graph.  Is that right?  Or is there some Accumulo magic I've 
missed.

Queries like

SELECT * { graph <http://example.org/foo> { ?s ?p ?o } }

fail with an exception.

A query like

SELECT DISTINCT ?c { graph <http://example.org/foo> { ?s a ?c } }

will scan all the rdf:type quads in the store, not just those in the 
specified graph.  (Is that right?)

Is this by design, or does it just reflect the current state of development?

Brian


-- 
Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Epimorphics Ltd. is a limited company registered in England (number 7016688)