You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "A. Soroka" <aj...@virginia.edu> on 2016/11/01 16:00:54 UTC
Re: Graph on Cassandra

Sounds in some ways like two different efforts. Jena brings a lot of assumptions and machinery that aren't present and won't ever be present in Commons. I could see a SPARQL-less Commons impl over Cassandra being useful for the LDP-style use case, but that doesn't sound like what Claude is trying to get done.

---
A. Soroka
The University of Virginia Library

> On Oct 31, 2016, at 10:27 AM, Claude Warren <cl...@xenei.com> wrote:
> 
> Well, I started the process at work with Apache Jena as the target, If I
> change target I have to start the process over.  Unless there is a very
> strong reason to move to Commons RDF I would prefer to stay with Jena.
> 
> Given that we want to run SPARQL queries over the data I think we want to
> stay with Jena.
> 
> Claude
> 
> On Mon, Oct 31, 2016 at 2:23 PM, Stian Soiland-Reyes <st...@apache.org>
> wrote:
> 
>> Do you think it would make sense to do a Cassandra  Commons RDF API binding
>> for Graph or Dataset..? Or would that be too high level?
>> 
>> The streaming part would fit well there I think.
>> 
>> Commons RDF 0.3.0 is under vote now, adding Quad, Dataset and "RDF" as the
>> factory interface.
>> 
>> https://commonsrdf.incubator.apache.org/apidocs/index.html?
>> org/apache/commons/rdf/api/package-summary.html
>> 
>> But it could make more sense as a Jena DatasetGraph so it can be used by
>> sparql queries etc. (And then exposed as Commons RDF Jena bindings if one
>> so wanted)
>> 
>> On 31 Oct 2016 1:41 pm, "Claude Warren" <cl...@xenei.com> wrote:
>> 
>>> Andy,
>>> 
>>> This seems like a good approach but does not appear to be in the Jena
>> code
>>> base, which I suppose is your comment about an approach to developing
>> work.
>>> 
>>> Does it make sense to create git clones that contain the new work?  Or
>>> perhaps branches?
>>> 
>>> Do you have a suggestion or direction you would like to see this go?
>>> 
>>> Claude
>>> 
>>> 
>>> 
>>> On Fri, Oct 28, 2016 at 2:35 PM, Andy Seaborne <an...@apache.org> wrote:
>>> 
>>>> Claude,
>>>> 
>>>> These may help:
>>>> 
>>>> I have been thinking about an interface that is more oriented to the
>>>> storage than the full DatasetGraph.
>>>> 
>>>> StorageRDF breaks down all the operations into those on the default
>> graph
>>>> and those on named graphs.  For just a graph, simply ignore the named
>>> graph
>>>> operations.
>>>> 
>>>> https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
>>>> jects/dsg2/storage/StorageRDF.java
>>>> 
>>>> There is an adapter to the DatasetGraph hierarchy (which is needed for
>>>> SPARQL):
>>>> 
>>>> https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
>>>> jects/dsg2/DatasetGraphStorage.java
>>>> 
>>>> If you want to only use existing classes, DatasetGraphTriplesQuads is
>> the
>>>> place to start - used by TIM and TDB - yuo can implement without
>> needing
>>>> quads/named graphs. Again, simply ignore (throw
>>>> UnsupportedOperationException for the named graph calls).
>>>> 
>>>> Going the graph route could lead to rework later on for any kind of
>>>> performance issues because find(S,P,O) is so narrow and precludes union
>>>> default graph except by brute force.  DatasetGraph work with the SPARQL
>>>> execution engine.
>>>> 
>>>> We still need to discuss how best to approach developing work - it
>> should
>>>> not get sucked up by the release cycle.
>>>> 
>>>>        Andy
>>>> 
>>>> 
>>>> On 26/10/16 19:21, Claude Warren wrote:
>>>> 
>>>>> My plan is to start with a Graph implementation.  We expect to write 3
>>>>> tables: SPO, POS, OPS (I think).  Currently we don't have an easy way
>> to
>>>>> handle find( ANY, ANY, ANY) so I suspect we will just start with
>>>>> permitting
>>>>> a column scan on Cassandra.
>>>>> 
>>>>> I have not looked at DynamoDB but as I recall there are significant
>>>>> differences under the hood.
>>>>> 
>>>>> I expect that we will move on to a custom model or query engine to get
>>> the
>>>>> best performance but that is not what we are planning for the first
>> cut.
>>>>> 
>>>>> I am still waiting for management approval to do this at work ....
>>>>> sometimes it takes longer to get the paperwork done than it does to
>>> design
>>>>> the thing.
>>>>> 
>>>>> 
>>>>> Claude
>>>>> 
>>>>> On Mon, Oct 17, 2016 at 6:39 PM, Paul Houle <paul.houle@ontology2.com
>>> 
>>>>> wrote:
>>>>> 
>>>>> I like DynamoDB as a target for this sort of thing.  There are many
>>>>>> tasks which are small-scale yet critical where it would otherwise be
>>>>>> hard to provide a distributed and reliable database.  Put that
>> together
>>>>>> with Lambda,  which does the same for computation,  and you are
>> cooking
>>>>>> with gas.
>>>>>> 
>>>>>> I wrote a 1-1 translation of DynamoDB documents to RDF that I use
>>>>>> throughout an application;  the code is DynamoDB idiomatic in every
>>> way,
>>>>>> just the application reads and writes (a constrained set of) RDF
>>>>>> documents.
>>>>>> 
>>>>>> Right now I dump the documents from the DynamoDB system into a triple
>>>>>> store when I want a panoptic view,  but with a distributed graph like
>>>>>> that would mean being able to run SPARQL queries against DynamoDB
>>>>>> directly.
>>>>>> 
>>>>>> There are many products in the same family as Cassandra and DynamoDB
>>> and
>>>>>> it would be good to think through the math so we can approach them
>> all
>>>>>> in a similar way.
>>>>>> 
>>>>>> --
>>>>>>  Paul Houle
>>>>>>  paul.houle@ontology2.com
>>>>>> 
>>>>>> On Mon, Oct 17, 2016, at 12:31 PM, A. Soroka wrote:
>>>>>> 
>>>>>>> Yep,
>>>>>>> 
>>>>>>> http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/
>>>>>>> 
>>>>>> Workshops/SSWS/Ladwig-et-all-SSWS2011.pdf
>>>>>> 
>>>>>>> 
>>>>>>> indicates that they are indexing by subject. As someone who has
>>>>>>> implemented LDP, that is definitely the approach that makes sense
>>> there.
>>>>>>> 
>>>>>>> ---
>>>>>>> A. Soroka
>>>>>>> The University of Virginia Library
>>>>>>> 
>>>>>>> On Oct 17, 2016, at 12:20 PM, Andy Seaborne <an...@apache.org>
>> wrote:
>>>>>>>> 
>>>>>>>> IIRC It stores CBDs indexed by subject so it is the "other" model
>> to
>>>>>>>> 
>>>>>>> Rya.  Better for LDP (??).
>>>>>> 
>>>>>>> 
>>>>>>>>    Andy
>>>>>>>> 
>>>>>>>> On 17/10/16 15:41, A. Soroka wrote:
>>>>>>>> 
>>>>>>>>> There's also:
>>>>>>>>> 
>>>>>>>>> https://github.com/cumulusrdf/cumulusrdf
>>>>>>>>> 
>>>>>>>>> in a similar vein (RDF over Cassandra). Not sure what kind of
>>>>>>>>> 
>>>>>>>> particular uses it expects to support.
>>>>>> 
>>>>>>> 
>>>>>>>>> ---
>>>>>>>>> A. Soroka
>>>>>>>>> The University of Virginia Library
>>>>>>>>> 
>>>>>>>>> On Oct 17, 2016, at 7:02 AM, Andy Seaborne <an...@apache.org>
>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi Claude,
>>>>>>>>>> 
>>>>>>>>>> There is certainly interest from me.
>>>>>>>>>> 
>>>>>>>>>> What the best thing to do depends on various factors.  By putting
>>> it
>>>>>>>>>> 
>>>>>>>>> in extras I presume you mean it gets added to the release?  That
>> is
>>>>>> not the
>>>>>> only way forward.
>>>>>> 
>>>>>>> 
>>>>>>>>>> An important aspect of Apache is "Community over code" - will
>> there
>>>>>>>>>> 
>>>>>>>>> be a community around this code?  Is that community the same, or
>>>>>> significant overlap, as the Jena community?
>>>>>> 
>>>>>>> 
>>>>>>>>>> There are various reasons for wanting RDF over a column store -
>>>>>>>>>> 
>>>>>>>>> which use cases are the most important for this work?
>>>>>> 
>>>>>>> 
>>>>>>>>>> They lead to different ways of using Cassandra. For example,
>>>>>>>>>> 
>>>>>>>>> Rya(incubating) uses Accumulo tables as indexes, and partial scans
>>> of
>>>>>> the
>>>>>> table is streaming.  Other systems try to use the columns for
>>> properties,
>>>>>> possibly more useful for LDP style than SPARQL.
>>>>>> 
>>>>>>> 
>>>>>>>>>>  Andy
>>>>>>>>>> 
>>>>>>>>>> On 15/10/16 18:38, Claude Warren wrote:
>>>>>>>>>> 
>>>>>>>>>>> Howdy,
>>>>>>>>>>> 
>>>>>>>>>>> We have a project at work that is implementing Jena Graph on
>>>>>>>>>>> 
>>>>>>>>>> Cassandra.  I
>>>>>> 
>>>>>>> am wondering if there is enough interest here to accept it as a
>>>>>>>>>>> contribution.  I was thinking that it might fit in the Extras
>>>>>>>>>>> 
>>>>>>>>>> category.
>>>>>> 
>>>>>>> 
>>>>>>>>>>> I can not promise release of the code yet as I have to present
>> it
>>>>>>>>>>> 
>>>>>>>>>> to our
>>>>>> 
>>>>>>> internal Intellectual Property group first.
>>>>>>>>>>> 
>>>>>>>>>>> Thoughts?
>>>>>>>>>>> 
>>>>>>>>>>> Claude
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>>> --
>>> I like: Like Like - The likeliest place on the web
>>> <http://like-like.xenei.com>
>>> LinkedIn: http://www.linkedin.com/in/claudewarren
>>> 
>> 
> 
> 
> 
> -- 
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren