You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Claude Warren <cl...@xenei.com> on 2016/10/15 17:38:14 UTC

Graph on Cassandra

Howdy,

We have a project at work that is implementing Jena Graph on Cassandra.  I
am wondering if there is enough interest here to accept it as a
contribution.  I was thinking that it might fit in the Extras category.

I can not promise release of the code yet as I have to present it to our
internal Intellectual Property group first.

Thoughts?

Claude
-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: Graph on Cassandra

Posted by Stian Soiland-Reyes <st...@apache.org>.
While technically a GitHub repository would be the easiest way to
start, it will be harder to build a community around it. I guess
legally/politically it can also be harder to convince organizations to
contribute outside ASF.

The Apache incubator is one way, but it comes with a lot of overhead -
I think your project is smaller than a typical Incubator proposal and
would rather aim to join Cassandra or Jena PMC.

In Apache Commons we have a "sandbox" -
https://commons.apache.org/sandbox.html - any Apache committer can ask
to make something there. You can think of it like a PMC-hosted
incubator.


I don't see a big problem with trying out a "sandbox"-like
jena-cassandra git repository under Apache Jena PMC, if you think that
your other contributors might also potentially be helping Jena Core -
or that Jena folks would be lured into Jena-Cassandra. We already have
Fuseki and Elephas which at the spur might have seemed more 'exotic'
but now are in the Jena family.


I do however see a worry with a ever-growing single-git-repo model
with how it relates to releases (and build time :)), and the drag
between stable development of jena core (including tdb and friends
here) and more rapid development in new stuff. For instance, you may
want to have an early release of Jena-Cassandra without having to sync
up with a 6-monthly Jena release cycle. (Moving to 3M cycle would
avoid that problem though)

At the other side, many small git repos is at risk of not being
updated or released at all. But perhaps that's OK, not all experiments
work out!


BTW - perhaps check out why https://github.com/ruby-rdf/rdf-cassandra
didn't fly..

On 31 October 2016 at 20:26, Claude Warren <cl...@xenei.com> wrote:
> We don't have code at the moment.  We (the team I am on at work) are
> planning on implementing on Cassandra.  That would mean that we would have
> a couple of developers watching and at least one working on the code until
> it was stable.
>
> I was hoping that we would be able to contribute this to the jena project
> as a complete module.   I understand not wanting to put it in as part of
> the project at the beginning,  but that was my goal.
>
> I don't have a release schedule in mind as the in house project is still
> fluid.  It might make sense to put it on github to start, but I would like
> to see it in a Jena based repo in order to make it more visible to the
> development community.
>
> As I keep saying, I need to get final approval from legal before
> proceeding.  I expect to hear something later this week.
>
> Claude
>
> On Mon, Oct 31, 2016 at 5:53 PM, Andy Seaborne <an...@apache.org> wrote:
>
>>
>>
>> On 31/10/16 13:41, Claude Warren wrote:
>>
>>> Andy,
>>>
>>> This seems like a good approach but does not appear to be in the Jena code
>>> base, which I suppose is your comment about an approach to developing
>>> work.
>>>
>>> Does it make sense to create git clones that contain the new work?  Or
>>> perhaps branches?
>>>
>>> Do you have a suggestion or direction you would like to see this go?
>>>
>>
>> That's the discussion to have.  The first item is "Community".  This is
>> all new code? Who is involved? Just you so far?
>>
>> A storage layer is not trivial - this is not an "extra" thing.  It is a
>> module of it's own, and if the community is significantly different, maybe
>> a different different mailing lists (e.g. solr within the the Lucene
>> project), maybe even a different project; it can be "straight to TLP" or
>> "incubated" - that depends on who is involved.  There are a wide set of
>> possibilities.
>>
>> If it is starting off, then the Jena git repo isn't a good place to have
>> the code.  The lifecycles don't line up.
>>
>> A branch that is complete separate is really a separate repo.  Jena can
>> get another git repo.
>>
>> What would be the release cycle?
>> The real issue is the work needed by the PMC for releases.
>>
>> To get all options mentioned:
>>
>> If this is a one-person effort for now, then starting a github repo and
>> creating the initial sketch/framework is an option.  More focused. More
>> freedom to try things out and change directions.
>>
>>         Andy
>>
>>
>>
>>> Claude
>>>
>>>
>>>
>>> On Fri, Oct 28, 2016 at 2:35 PM, Andy Seaborne <an...@apache.org> wrote:
>>>
>>> Claude,
>>>>
>>>> These may help:
>>>>
>>>> I have been thinking about an interface that is more oriented to the
>>>> storage than the full DatasetGraph.
>>>>
>>>> StorageRDF breaks down all the operations into those on the default graph
>>>> and those on named graphs.  For just a graph, simply ignore the named
>>>> graph
>>>> operations.
>>>>
>>>> https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
>>>> jects/dsg2/storage/StorageRDF.java
>>>>
>>>> There is an adapter to the DatasetGraph hierarchy (which is needed for
>>>> SPARQL):
>>>>
>>>> https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
>>>> jects/dsg2/DatasetGraphStorage.java
>>>>
>>>> If you want to only use existing classes, DatasetGraphTriplesQuads is the
>>>> place to start - used by TIM and TDB - yuo can implement without needing
>>>> quads/named graphs. Again, simply ignore (throw
>>>> UnsupportedOperationException for the named graph calls).
>>>>
>>>> Going the graph route could lead to rework later on for any kind of
>>>> performance issues because find(S,P,O) is so narrow and precludes union
>>>> default graph except by brute force.  DatasetGraph work with the SPARQL
>>>> execution engine.
>>>>
>>>> We still need to discuss how best to approach developing work - it should
>>>> not get sucked up by the release cycle.
>>>>
>>>>         Andy
>>>>
>>>>
>>>> On 26/10/16 19:21, Claude Warren wrote:
>>>>
>>>> My plan is to start with a Graph implementation.  We expect to write 3
>>>>> tables: SPO, POS, OPS (I think).  Currently we don't have an easy way to
>>>>> handle find( ANY, ANY, ANY) so I suspect we will just start with
>>>>> permitting
>>>>> a column scan on Cassandra.
>>>>>
>>>>> I have not looked at DynamoDB but as I recall there are significant
>>>>> differences under the hood.
>>>>>
>>>>> I expect that we will move on to a custom model or query engine to get
>>>>> the
>>>>> best performance but that is not what we are planning for the first cut.
>>>>>
>>>>> I am still waiting for management approval to do this at work ....
>>>>> sometimes it takes longer to get the paperwork done than it does to
>>>>> design
>>>>> the thing.
>>>>>
>>>>>
>>>>> Claude
>>>>>
>>>>> On Mon, Oct 17, 2016 at 6:39 PM, Paul Houle <pa...@ontology2.com>
>>>>> wrote:
>>>>>
>>>>> I like DynamoDB as a target for this sort of thing.  There are many
>>>>>
>>>>>> tasks which are small-scale yet critical where it would otherwise be
>>>>>> hard to provide a distributed and reliable database.  Put that together
>>>>>> with Lambda,  which does the same for computation,  and you are cooking
>>>>>> with gas.
>>>>>>
>>>>>> I wrote a 1-1 translation of DynamoDB documents to RDF that I use
>>>>>> throughout an application;  the code is DynamoDB idiomatic in every
>>>>>> way,
>>>>>>  just the application reads and writes (a constrained set of) RDF
>>>>>> documents.
>>>>>>
>>>>>> Right now I dump the documents from the DynamoDB system into a triple
>>>>>> store when I want a panoptic view,  but with a distributed graph like
>>>>>> that would mean being able to run SPARQL queries against DynamoDB
>>>>>> directly.
>>>>>>
>>>>>> There are many products in the same family as Cassandra and DynamoDB
>>>>>> and
>>>>>> it would be good to think through the math so we can approach them all
>>>>>> in a similar way.
>>>>>>
>>>>>> --
>>>>>>   Paul Houle
>>>>>>   paul.houle@ontology2.com
>>>>>>
>>>>>> On Mon, Oct 17, 2016, at 12:31 PM, A. Soroka wrote:
>>>>>>
>>>>>> Yep,
>>>>>>>
>>>>>>> http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/
>>>>>>>
>>>>>>> Workshops/SSWS/Ladwig-et-all-SSWS2011.pdf
>>>>>>
>>>>>>
>>>>>>> indicates that they are indexing by subject. As someone who has
>>>>>>> implemented LDP, that is definitely the approach that makes sense
>>>>>>> there.
>>>>>>>
>>>>>>> ---
>>>>>>> A. Soroka
>>>>>>> The University of Virginia Library
>>>>>>>
>>>>>>> On Oct 17, 2016, at 12:20 PM, Andy Seaborne <an...@apache.org> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> IIRC It stores CBDs indexed by subject so it is the "other" model to
>>>>>>>>
>>>>>>>> Rya.  Better for LDP (??).
>>>>>>>
>>>>>>
>>>>>>
>>>>>>>     Andy
>>>>>>>>
>>>>>>>> On 17/10/16 15:41, A. Soroka wrote:
>>>>>>>>
>>>>>>>> There's also:
>>>>>>>>>
>>>>>>>>> https://github.com/cumulusrdf/cumulusrdf
>>>>>>>>>
>>>>>>>>> in a similar vein (RDF over Cassandra). Not sure what kind of
>>>>>>>>>
>>>>>>>>> particular uses it expects to support.
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>> ---
>>>>>>>>> A. Soroka
>>>>>>>>> The University of Virginia Library
>>>>>>>>>
>>>>>>>>> On Oct 17, 2016, at 7:02 AM, Andy Seaborne <an...@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Claude,
>>>>>>>>>>
>>>>>>>>>> There is certainly interest from me.
>>>>>>>>>>
>>>>>>>>>> What the best thing to do depends on various factors.  By putting
>>>>>>>>>> it
>>>>>>>>>>
>>>>>>>>>> in extras I presume you mean it gets added to the release?  That is
>>>>>>>>>
>>>>>>>> not the
>>>>>> only way forward.
>>>>>>
>>>>>>
>>>>>>> An important aspect of Apache is "Community over code" - will there
>>>>>>>>>>
>>>>>>>>>> be a community around this code?  Is that community the same, or
>>>>>>>>>
>>>>>>>> significant overlap, as the Jena community?
>>>>>>
>>>>>>
>>>>>>> There are various reasons for wanting RDF over a column store -
>>>>>>>>>>
>>>>>>>>>> which use cases are the most important for this work?
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>> They lead to different ways of using Cassandra. For example,
>>>>>>>>>>
>>>>>>>>>> Rya(incubating) uses Accumulo tables as indexes, and partial scans
>>>>>>>>> of
>>>>>>>>>
>>>>>>>> the
>>>>>> table is streaming.  Other systems try to use the columns for
>>>>>> properties,
>>>>>> possibly more useful for LDP style than SPARQL.
>>>>>>
>>>>>>
>>>>>>>   Andy
>>>>>>>>>>
>>>>>>>>>> On 15/10/16 18:38, Claude Warren wrote:
>>>>>>>>>>
>>>>>>>>>> Howdy,
>>>>>>>>>>>
>>>>>>>>>>> We have a project at work that is implementing Jena Graph on
>>>>>>>>>>>
>>>>>>>>>>> Cassandra.  I
>>>>>>>>>>
>>>>>>>>>
>>>>>> am wondering if there is enough interest here to accept it as a
>>>>>>>
>>>>>>>> contribution.  I was thinking that it might fit in the Extras
>>>>>>>>>>>
>>>>>>>>>>> category.
>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>>> I can not promise release of the code yet as I have to present it
>>>>>>>>>>>
>>>>>>>>>>> to our
>>>>>>>>>>
>>>>>>>>>
>>>>>> internal Intellectual Property group first.
>>>>>>>
>>>>>>>>
>>>>>>>>>>> Thoughts?
>>>>>>>>>>>
>>>>>>>>>>> Claude
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>
>
> --
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren



-- 
Stian Soiland-Reyes
http://orcid.org/0000-0001-9842-9718

Re: Graph on Cassandra

Posted by Claude Warren <cl...@xenei.com>.
We don't have code at the moment.  We (the team I am on at work) are
planning on implementing on Cassandra.  That would mean that we would have
a couple of developers watching and at least one working on the code until
it was stable.

I was hoping that we would be able to contribute this to the jena project
as a complete module.   I understand not wanting to put it in as part of
the project at the beginning,  but that was my goal.

I don't have a release schedule in mind as the in house project is still
fluid.  It might make sense to put it on github to start, but I would like
to see it in a Jena based repo in order to make it more visible to the
development community.

As I keep saying, I need to get final approval from legal before
proceeding.  I expect to hear something later this week.

Claude

On Mon, Oct 31, 2016 at 5:53 PM, Andy Seaborne <an...@apache.org> wrote:

>
>
> On 31/10/16 13:41, Claude Warren wrote:
>
>> Andy,
>>
>> This seems like a good approach but does not appear to be in the Jena code
>> base, which I suppose is your comment about an approach to developing
>> work.
>>
>> Does it make sense to create git clones that contain the new work?  Or
>> perhaps branches?
>>
>> Do you have a suggestion or direction you would like to see this go?
>>
>
> That's the discussion to have.  The first item is "Community".  This is
> all new code? Who is involved? Just you so far?
>
> A storage layer is not trivial - this is not an "extra" thing.  It is a
> module of it's own, and if the community is significantly different, maybe
> a different different mailing lists (e.g. solr within the the Lucene
> project), maybe even a different project; it can be "straight to TLP" or
> "incubated" - that depends on who is involved.  There are a wide set of
> possibilities.
>
> If it is starting off, then the Jena git repo isn't a good place to have
> the code.  The lifecycles don't line up.
>
> A branch that is complete separate is really a separate repo.  Jena can
> get another git repo.
>
> What would be the release cycle?
> The real issue is the work needed by the PMC for releases.
>
> To get all options mentioned:
>
> If this is a one-person effort for now, then starting a github repo and
> creating the initial sketch/framework is an option.  More focused. More
> freedom to try things out and change directions.
>
>         Andy
>
>
>
>> Claude
>>
>>
>>
>> On Fri, Oct 28, 2016 at 2:35 PM, Andy Seaborne <an...@apache.org> wrote:
>>
>> Claude,
>>>
>>> These may help:
>>>
>>> I have been thinking about an interface that is more oriented to the
>>> storage than the full DatasetGraph.
>>>
>>> StorageRDF breaks down all the operations into those on the default graph
>>> and those on named graphs.  For just a graph, simply ignore the named
>>> graph
>>> operations.
>>>
>>> https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
>>> jects/dsg2/storage/StorageRDF.java
>>>
>>> There is an adapter to the DatasetGraph hierarchy (which is needed for
>>> SPARQL):
>>>
>>> https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
>>> jects/dsg2/DatasetGraphStorage.java
>>>
>>> If you want to only use existing classes, DatasetGraphTriplesQuads is the
>>> place to start - used by TIM and TDB - yuo can implement without needing
>>> quads/named graphs. Again, simply ignore (throw
>>> UnsupportedOperationException for the named graph calls).
>>>
>>> Going the graph route could lead to rework later on for any kind of
>>> performance issues because find(S,P,O) is so narrow and precludes union
>>> default graph except by brute force.  DatasetGraph work with the SPARQL
>>> execution engine.
>>>
>>> We still need to discuss how best to approach developing work - it should
>>> not get sucked up by the release cycle.
>>>
>>>         Andy
>>>
>>>
>>> On 26/10/16 19:21, Claude Warren wrote:
>>>
>>> My plan is to start with a Graph implementation.  We expect to write 3
>>>> tables: SPO, POS, OPS (I think).  Currently we don't have an easy way to
>>>> handle find( ANY, ANY, ANY) so I suspect we will just start with
>>>> permitting
>>>> a column scan on Cassandra.
>>>>
>>>> I have not looked at DynamoDB but as I recall there are significant
>>>> differences under the hood.
>>>>
>>>> I expect that we will move on to a custom model or query engine to get
>>>> the
>>>> best performance but that is not what we are planning for the first cut.
>>>>
>>>> I am still waiting for management approval to do this at work ....
>>>> sometimes it takes longer to get the paperwork done than it does to
>>>> design
>>>> the thing.
>>>>
>>>>
>>>> Claude
>>>>
>>>> On Mon, Oct 17, 2016 at 6:39 PM, Paul Houle <pa...@ontology2.com>
>>>> wrote:
>>>>
>>>> I like DynamoDB as a target for this sort of thing.  There are many
>>>>
>>>>> tasks which are small-scale yet critical where it would otherwise be
>>>>> hard to provide a distributed and reliable database.  Put that together
>>>>> with Lambda,  which does the same for computation,  and you are cooking
>>>>> with gas.
>>>>>
>>>>> I wrote a 1-1 translation of DynamoDB documents to RDF that I use
>>>>> throughout an application;  the code is DynamoDB idiomatic in every
>>>>> way,
>>>>>  just the application reads and writes (a constrained set of) RDF
>>>>> documents.
>>>>>
>>>>> Right now I dump the documents from the DynamoDB system into a triple
>>>>> store when I want a panoptic view,  but with a distributed graph like
>>>>> that would mean being able to run SPARQL queries against DynamoDB
>>>>> directly.
>>>>>
>>>>> There are many products in the same family as Cassandra and DynamoDB
>>>>> and
>>>>> it would be good to think through the math so we can approach them all
>>>>> in a similar way.
>>>>>
>>>>> --
>>>>>   Paul Houle
>>>>>   paul.houle@ontology2.com
>>>>>
>>>>> On Mon, Oct 17, 2016, at 12:31 PM, A. Soroka wrote:
>>>>>
>>>>> Yep,
>>>>>>
>>>>>> http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/
>>>>>>
>>>>>> Workshops/SSWS/Ladwig-et-all-SSWS2011.pdf
>>>>>
>>>>>
>>>>>> indicates that they are indexing by subject. As someone who has
>>>>>> implemented LDP, that is definitely the approach that makes sense
>>>>>> there.
>>>>>>
>>>>>> ---
>>>>>> A. Soroka
>>>>>> The University of Virginia Library
>>>>>>
>>>>>> On Oct 17, 2016, at 12:20 PM, Andy Seaborne <an...@apache.org> wrote:
>>>>>>
>>>>>>>
>>>>>>> IIRC It stores CBDs indexed by subject so it is the "other" model to
>>>>>>>
>>>>>>> Rya.  Better for LDP (??).
>>>>>>
>>>>>
>>>>>
>>>>>>     Andy
>>>>>>>
>>>>>>> On 17/10/16 15:41, A. Soroka wrote:
>>>>>>>
>>>>>>> There's also:
>>>>>>>>
>>>>>>>> https://github.com/cumulusrdf/cumulusrdf
>>>>>>>>
>>>>>>>> in a similar vein (RDF over Cassandra). Not sure what kind of
>>>>>>>>
>>>>>>>> particular uses it expects to support.
>>>>>>>
>>>>>>
>>>>>
>>>>>> ---
>>>>>>>> A. Soroka
>>>>>>>> The University of Virginia Library
>>>>>>>>
>>>>>>>> On Oct 17, 2016, at 7:02 AM, Andy Seaborne <an...@apache.org> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Claude,
>>>>>>>>>
>>>>>>>>> There is certainly interest from me.
>>>>>>>>>
>>>>>>>>> What the best thing to do depends on various factors.  By putting
>>>>>>>>> it
>>>>>>>>>
>>>>>>>>> in extras I presume you mean it gets added to the release?  That is
>>>>>>>>
>>>>>>> not the
>>>>> only way forward.
>>>>>
>>>>>
>>>>>> An important aspect of Apache is "Community over code" - will there
>>>>>>>>>
>>>>>>>>> be a community around this code?  Is that community the same, or
>>>>>>>>
>>>>>>> significant overlap, as the Jena community?
>>>>>
>>>>>
>>>>>> There are various reasons for wanting RDF over a column store -
>>>>>>>>>
>>>>>>>>> which use cases are the most important for this work?
>>>>>>>>
>>>>>>>
>>>>>
>>>>>> They lead to different ways of using Cassandra. For example,
>>>>>>>>>
>>>>>>>>> Rya(incubating) uses Accumulo tables as indexes, and partial scans
>>>>>>>> of
>>>>>>>>
>>>>>>> the
>>>>> table is streaming.  Other systems try to use the columns for
>>>>> properties,
>>>>> possibly more useful for LDP style than SPARQL.
>>>>>
>>>>>
>>>>>>   Andy
>>>>>>>>>
>>>>>>>>> On 15/10/16 18:38, Claude Warren wrote:
>>>>>>>>>
>>>>>>>>> Howdy,
>>>>>>>>>>
>>>>>>>>>> We have a project at work that is implementing Jena Graph on
>>>>>>>>>>
>>>>>>>>>> Cassandra.  I
>>>>>>>>>
>>>>>>>>
>>>>> am wondering if there is enough interest here to accept it as a
>>>>>>
>>>>>>> contribution.  I was thinking that it might fit in the Extras
>>>>>>>>>>
>>>>>>>>>> category.
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>>> I can not promise release of the code yet as I have to present it
>>>>>>>>>>
>>>>>>>>>> to our
>>>>>>>>>
>>>>>>>>
>>>>> internal Intellectual Property group first.
>>>>>>
>>>>>>>
>>>>>>>>>> Thoughts?
>>>>>>>>>>
>>>>>>>>>> Claude
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>
>>


-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: Graph on Cassandra

Posted by Andy Seaborne <an...@apache.org>.

On 31/10/16 13:41, Claude Warren wrote:
> Andy,
>
> This seems like a good approach but does not appear to be in the Jena code
> base, which I suppose is your comment about an approach to developing work.
>
> Does it make sense to create git clones that contain the new work?  Or
> perhaps branches?
>
> Do you have a suggestion or direction you would like to see this go?

That's the discussion to have.  The first item is "Community".  This is 
all new code? Who is involved? Just you so far?

A storage layer is not trivial - this is not an "extra" thing.  It is a 
module of it's own, and if the community is significantly different, 
maybe a different different mailing lists (e.g. solr within the the 
Lucene project), maybe even a different project; it can be "straight to 
TLP" or "incubated" - that depends on who is involved.  There are a wide 
set of possibilities.

If it is starting off, then the Jena git repo isn't a good place to have 
the code.  The lifecycles don't line up.

A branch that is complete separate is really a separate repo.  Jena can 
get another git repo.

What would be the release cycle?
The real issue is the work needed by the PMC for releases.

To get all options mentioned:

If this is a one-person effort for now, then starting a github repo and 
creating the initial sketch/framework is an option.  More focused. More 
freedom to try things out and change directions.

	Andy

>
> Claude
>
>
>
> On Fri, Oct 28, 2016 at 2:35 PM, Andy Seaborne <an...@apache.org> wrote:
>
>> Claude,
>>
>> These may help:
>>
>> I have been thinking about an interface that is more oriented to the
>> storage than the full DatasetGraph.
>>
>> StorageRDF breaks down all the operations into those on the default graph
>> and those on named graphs.  For just a graph, simply ignore the named graph
>> operations.
>>
>> https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
>> jects/dsg2/storage/StorageRDF.java
>>
>> There is an adapter to the DatasetGraph hierarchy (which is needed for
>> SPARQL):
>>
>> https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
>> jects/dsg2/DatasetGraphStorage.java
>>
>> If you want to only use existing classes, DatasetGraphTriplesQuads is the
>> place to start - used by TIM and TDB - yuo can implement without needing
>> quads/named graphs. Again, simply ignore (throw
>> UnsupportedOperationException for the named graph calls).
>>
>> Going the graph route could lead to rework later on for any kind of
>> performance issues because find(S,P,O) is so narrow and precludes union
>> default graph except by brute force.  DatasetGraph work with the SPARQL
>> execution engine.
>>
>> We still need to discuss how best to approach developing work - it should
>> not get sucked up by the release cycle.
>>
>>         Andy
>>
>>
>> On 26/10/16 19:21, Claude Warren wrote:
>>
>>> My plan is to start with a Graph implementation.  We expect to write 3
>>> tables: SPO, POS, OPS (I think).  Currently we don't have an easy way to
>>> handle find( ANY, ANY, ANY) so I suspect we will just start with
>>> permitting
>>> a column scan on Cassandra.
>>>
>>> I have not looked at DynamoDB but as I recall there are significant
>>> differences under the hood.
>>>
>>> I expect that we will move on to a custom model or query engine to get the
>>> best performance but that is not what we are planning for the first cut.
>>>
>>> I am still waiting for management approval to do this at work ....
>>> sometimes it takes longer to get the paperwork done than it does to design
>>> the thing.
>>>
>>>
>>> Claude
>>>
>>> On Mon, Oct 17, 2016 at 6:39 PM, Paul Houle <pa...@ontology2.com>
>>> wrote:
>>>
>>> I like DynamoDB as a target for this sort of thing.  There are many
>>>> tasks which are small-scale yet critical where it would otherwise be
>>>> hard to provide a distributed and reliable database.  Put that together
>>>> with Lambda,  which does the same for computation,  and you are cooking
>>>> with gas.
>>>>
>>>> I wrote a 1-1 translation of DynamoDB documents to RDF that I use
>>>> throughout an application;  the code is DynamoDB idiomatic in every way,
>>>>  just the application reads and writes (a constrained set of) RDF
>>>> documents.
>>>>
>>>> Right now I dump the documents from the DynamoDB system into a triple
>>>> store when I want a panoptic view,  but with a distributed graph like
>>>> that would mean being able to run SPARQL queries against DynamoDB
>>>> directly.
>>>>
>>>> There are many products in the same family as Cassandra and DynamoDB and
>>>> it would be good to think through the math so we can approach them all
>>>> in a similar way.
>>>>
>>>> --
>>>>   Paul Houle
>>>>   paul.houle@ontology2.com
>>>>
>>>> On Mon, Oct 17, 2016, at 12:31 PM, A. Soroka wrote:
>>>>
>>>>> Yep,
>>>>>
>>>>> http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/
>>>>>
>>>> Workshops/SSWS/Ladwig-et-all-SSWS2011.pdf
>>>>
>>>>>
>>>>> indicates that they are indexing by subject. As someone who has
>>>>> implemented LDP, that is definitely the approach that makes sense there.
>>>>>
>>>>> ---
>>>>> A. Soroka
>>>>> The University of Virginia Library
>>>>>
>>>>> On Oct 17, 2016, at 12:20 PM, Andy Seaborne <an...@apache.org> wrote:
>>>>>>
>>>>>> IIRC It stores CBDs indexed by subject so it is the "other" model to
>>>>>>
>>>>> Rya.  Better for LDP (??).
>>>>
>>>>>
>>>>>>     Andy
>>>>>>
>>>>>> On 17/10/16 15:41, A. Soroka wrote:
>>>>>>
>>>>>>> There's also:
>>>>>>>
>>>>>>> https://github.com/cumulusrdf/cumulusrdf
>>>>>>>
>>>>>>> in a similar vein (RDF over Cassandra). Not sure what kind of
>>>>>>>
>>>>>> particular uses it expects to support.
>>>>
>>>>>
>>>>>>> ---
>>>>>>> A. Soroka
>>>>>>> The University of Virginia Library
>>>>>>>
>>>>>>> On Oct 17, 2016, at 7:02 AM, Andy Seaborne <an...@apache.org> wrote:
>>>>>>>>
>>>>>>>> Hi Claude,
>>>>>>>>
>>>>>>>> There is certainly interest from me.
>>>>>>>>
>>>>>>>> What the best thing to do depends on various factors.  By putting it
>>>>>>>>
>>>>>>> in extras I presume you mean it gets added to the release?  That is
>>>> not the
>>>> only way forward.
>>>>
>>>>>
>>>>>>>> An important aspect of Apache is "Community over code" - will there
>>>>>>>>
>>>>>>> be a community around this code?  Is that community the same, or
>>>> significant overlap, as the Jena community?
>>>>
>>>>>
>>>>>>>> There are various reasons for wanting RDF over a column store -
>>>>>>>>
>>>>>>> which use cases are the most important for this work?
>>>>
>>>>>
>>>>>>>> They lead to different ways of using Cassandra. For example,
>>>>>>>>
>>>>>>> Rya(incubating) uses Accumulo tables as indexes, and partial scans of
>>>> the
>>>> table is streaming.  Other systems try to use the columns for properties,
>>>> possibly more useful for LDP style than SPARQL.
>>>>
>>>>>
>>>>>>>>   Andy
>>>>>>>>
>>>>>>>> On 15/10/16 18:38, Claude Warren wrote:
>>>>>>>>
>>>>>>>>> Howdy,
>>>>>>>>>
>>>>>>>>> We have a project at work that is implementing Jena Graph on
>>>>>>>>>
>>>>>>>> Cassandra.  I
>>>>
>>>>> am wondering if there is enough interest here to accept it as a
>>>>>>>>> contribution.  I was thinking that it might fit in the Extras
>>>>>>>>>
>>>>>>>> category.
>>>>
>>>>>
>>>>>>>>> I can not promise release of the code yet as I have to present it
>>>>>>>>>
>>>>>>>> to our
>>>>
>>>>> internal Intellectual Property group first.
>>>>>>>>>
>>>>>>>>> Thoughts?
>>>>>>>>>
>>>>>>>>> Claude
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>
>

Re: Graph on Cassandra

Posted by "A. Soroka" <aj...@virginia.edu>.
Sounds in some ways like two different efforts. Jena brings a lot of assumptions and machinery that aren't present and won't ever be present in Commons. I could see a SPARQL-less Commons impl over Cassandra being useful for the LDP-style use case, but that doesn't sound like what Claude is trying to get done.

---
A. Soroka
The University of Virginia Library

> On Oct 31, 2016, at 10:27 AM, Claude Warren <cl...@xenei.com> wrote:
> 
> Well, I started the process at work with Apache Jena as the target, If I
> change target I have to start the process over.  Unless there is a very
> strong reason to move to Commons RDF I would prefer to stay with Jena.
> 
> Given that we want to run SPARQL queries over the data I think we want to
> stay with Jena.
> 
> Claude
> 
> On Mon, Oct 31, 2016 at 2:23 PM, Stian Soiland-Reyes <st...@apache.org>
> wrote:
> 
>> Do you think it would make sense to do a Cassandra  Commons RDF API binding
>> for Graph or Dataset..? Or would that be too high level?
>> 
>> The streaming part would fit well there I think.
>> 
>> Commons RDF 0.3.0 is under vote now, adding Quad, Dataset and "RDF" as the
>> factory interface.
>> 
>> https://commonsrdf.incubator.apache.org/apidocs/index.html?
>> org/apache/commons/rdf/api/package-summary.html
>> 
>> But it could make more sense as a Jena DatasetGraph so it can be used by
>> sparql queries etc. (And then exposed as Commons RDF Jena bindings if one
>> so wanted)
>> 
>> On 31 Oct 2016 1:41 pm, "Claude Warren" <cl...@xenei.com> wrote:
>> 
>>> Andy,
>>> 
>>> This seems like a good approach but does not appear to be in the Jena
>> code
>>> base, which I suppose is your comment about an approach to developing
>> work.
>>> 
>>> Does it make sense to create git clones that contain the new work?  Or
>>> perhaps branches?
>>> 
>>> Do you have a suggestion or direction you would like to see this go?
>>> 
>>> Claude
>>> 
>>> 
>>> 
>>> On Fri, Oct 28, 2016 at 2:35 PM, Andy Seaborne <an...@apache.org> wrote:
>>> 
>>>> Claude,
>>>> 
>>>> These may help:
>>>> 
>>>> I have been thinking about an interface that is more oriented to the
>>>> storage than the full DatasetGraph.
>>>> 
>>>> StorageRDF breaks down all the operations into those on the default
>> graph
>>>> and those on named graphs.  For just a graph, simply ignore the named
>>> graph
>>>> operations.
>>>> 
>>>> https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
>>>> jects/dsg2/storage/StorageRDF.java
>>>> 
>>>> There is an adapter to the DatasetGraph hierarchy (which is needed for
>>>> SPARQL):
>>>> 
>>>> https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
>>>> jects/dsg2/DatasetGraphStorage.java
>>>> 
>>>> If you want to only use existing classes, DatasetGraphTriplesQuads is
>> the
>>>> place to start - used by TIM and TDB - yuo can implement without
>> needing
>>>> quads/named graphs. Again, simply ignore (throw
>>>> UnsupportedOperationException for the named graph calls).
>>>> 
>>>> Going the graph route could lead to rework later on for any kind of
>>>> performance issues because find(S,P,O) is so narrow and precludes union
>>>> default graph except by brute force.  DatasetGraph work with the SPARQL
>>>> execution engine.
>>>> 
>>>> We still need to discuss how best to approach developing work - it
>> should
>>>> not get sucked up by the release cycle.
>>>> 
>>>>        Andy
>>>> 
>>>> 
>>>> On 26/10/16 19:21, Claude Warren wrote:
>>>> 
>>>>> My plan is to start with a Graph implementation.  We expect to write 3
>>>>> tables: SPO, POS, OPS (I think).  Currently we don't have an easy way
>> to
>>>>> handle find( ANY, ANY, ANY) so I suspect we will just start with
>>>>> permitting
>>>>> a column scan on Cassandra.
>>>>> 
>>>>> I have not looked at DynamoDB but as I recall there are significant
>>>>> differences under the hood.
>>>>> 
>>>>> I expect that we will move on to a custom model or query engine to get
>>> the
>>>>> best performance but that is not what we are planning for the first
>> cut.
>>>>> 
>>>>> I am still waiting for management approval to do this at work ....
>>>>> sometimes it takes longer to get the paperwork done than it does to
>>> design
>>>>> the thing.
>>>>> 
>>>>> 
>>>>> Claude
>>>>> 
>>>>> On Mon, Oct 17, 2016 at 6:39 PM, Paul Houle <paul.houle@ontology2.com
>>> 
>>>>> wrote:
>>>>> 
>>>>> I like DynamoDB as a target for this sort of thing.  There are many
>>>>>> tasks which are small-scale yet critical where it would otherwise be
>>>>>> hard to provide a distributed and reliable database.  Put that
>> together
>>>>>> with Lambda,  which does the same for computation,  and you are
>> cooking
>>>>>> with gas.
>>>>>> 
>>>>>> I wrote a 1-1 translation of DynamoDB documents to RDF that I use
>>>>>> throughout an application;  the code is DynamoDB idiomatic in every
>>> way,
>>>>>> just the application reads and writes (a constrained set of) RDF
>>>>>> documents.
>>>>>> 
>>>>>> Right now I dump the documents from the DynamoDB system into a triple
>>>>>> store when I want a panoptic view,  but with a distributed graph like
>>>>>> that would mean being able to run SPARQL queries against DynamoDB
>>>>>> directly.
>>>>>> 
>>>>>> There are many products in the same family as Cassandra and DynamoDB
>>> and
>>>>>> it would be good to think through the math so we can approach them
>> all
>>>>>> in a similar way.
>>>>>> 
>>>>>> --
>>>>>>  Paul Houle
>>>>>>  paul.houle@ontology2.com
>>>>>> 
>>>>>> On Mon, Oct 17, 2016, at 12:31 PM, A. Soroka wrote:
>>>>>> 
>>>>>>> Yep,
>>>>>>> 
>>>>>>> http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/
>>>>>>> 
>>>>>> Workshops/SSWS/Ladwig-et-all-SSWS2011.pdf
>>>>>> 
>>>>>>> 
>>>>>>> indicates that they are indexing by subject. As someone who has
>>>>>>> implemented LDP, that is definitely the approach that makes sense
>>> there.
>>>>>>> 
>>>>>>> ---
>>>>>>> A. Soroka
>>>>>>> The University of Virginia Library
>>>>>>> 
>>>>>>> On Oct 17, 2016, at 12:20 PM, Andy Seaborne <an...@apache.org>
>> wrote:
>>>>>>>> 
>>>>>>>> IIRC It stores CBDs indexed by subject so it is the "other" model
>> to
>>>>>>>> 
>>>>>>> Rya.  Better for LDP (??).
>>>>>> 
>>>>>>> 
>>>>>>>>    Andy
>>>>>>>> 
>>>>>>>> On 17/10/16 15:41, A. Soroka wrote:
>>>>>>>> 
>>>>>>>>> There's also:
>>>>>>>>> 
>>>>>>>>> https://github.com/cumulusrdf/cumulusrdf
>>>>>>>>> 
>>>>>>>>> in a similar vein (RDF over Cassandra). Not sure what kind of
>>>>>>>>> 
>>>>>>>> particular uses it expects to support.
>>>>>> 
>>>>>>> 
>>>>>>>>> ---
>>>>>>>>> A. Soroka
>>>>>>>>> The University of Virginia Library
>>>>>>>>> 
>>>>>>>>> On Oct 17, 2016, at 7:02 AM, Andy Seaborne <an...@apache.org>
>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi Claude,
>>>>>>>>>> 
>>>>>>>>>> There is certainly interest from me.
>>>>>>>>>> 
>>>>>>>>>> What the best thing to do depends on various factors.  By putting
>>> it
>>>>>>>>>> 
>>>>>>>>> in extras I presume you mean it gets added to the release?  That
>> is
>>>>>> not the
>>>>>> only way forward.
>>>>>> 
>>>>>>> 
>>>>>>>>>> An important aspect of Apache is "Community over code" - will
>> there
>>>>>>>>>> 
>>>>>>>>> be a community around this code?  Is that community the same, or
>>>>>> significant overlap, as the Jena community?
>>>>>> 
>>>>>>> 
>>>>>>>>>> There are various reasons for wanting RDF over a column store -
>>>>>>>>>> 
>>>>>>>>> which use cases are the most important for this work?
>>>>>> 
>>>>>>> 
>>>>>>>>>> They lead to different ways of using Cassandra. For example,
>>>>>>>>>> 
>>>>>>>>> Rya(incubating) uses Accumulo tables as indexes, and partial scans
>>> of
>>>>>> the
>>>>>> table is streaming.  Other systems try to use the columns for
>>> properties,
>>>>>> possibly more useful for LDP style than SPARQL.
>>>>>> 
>>>>>>> 
>>>>>>>>>>  Andy
>>>>>>>>>> 
>>>>>>>>>> On 15/10/16 18:38, Claude Warren wrote:
>>>>>>>>>> 
>>>>>>>>>>> Howdy,
>>>>>>>>>>> 
>>>>>>>>>>> We have a project at work that is implementing Jena Graph on
>>>>>>>>>>> 
>>>>>>>>>> Cassandra.  I
>>>>>> 
>>>>>>> am wondering if there is enough interest here to accept it as a
>>>>>>>>>>> contribution.  I was thinking that it might fit in the Extras
>>>>>>>>>>> 
>>>>>>>>>> category.
>>>>>> 
>>>>>>> 
>>>>>>>>>>> I can not promise release of the code yet as I have to present
>> it
>>>>>>>>>>> 
>>>>>>>>>> to our
>>>>>> 
>>>>>>> internal Intellectual Property group first.
>>>>>>>>>>> 
>>>>>>>>>>> Thoughts?
>>>>>>>>>>> 
>>>>>>>>>>> Claude
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>>> --
>>> I like: Like Like - The likeliest place on the web
>>> <http://like-like.xenei.com>
>>> LinkedIn: http://www.linkedin.com/in/claudewarren
>>> 
>> 
> 
> 
> 
> -- 
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren


Re: Graph on Cassandra

Posted by Claude Warren <cl...@xenei.com>.
Well, I started the process at work with Apache Jena as the target, If I
change target I have to start the process over.  Unless there is a very
strong reason to move to Commons RDF I would prefer to stay with Jena.

Given that we want to run SPARQL queries over the data I think we want to
stay with Jena.

Claude

On Mon, Oct 31, 2016 at 2:23 PM, Stian Soiland-Reyes <st...@apache.org>
wrote:

> Do you think it would make sense to do a Cassandra  Commons RDF API binding
> for Graph or Dataset..? Or would that be too high level?
>
> The streaming part would fit well there I think.
>
> Commons RDF 0.3.0 is under vote now, adding Quad, Dataset and "RDF" as the
> factory interface.
>
> https://commonsrdf.incubator.apache.org/apidocs/index.html?
> org/apache/commons/rdf/api/package-summary.html
>
> But it could make more sense as a Jena DatasetGraph so it can be used by
> sparql queries etc. (And then exposed as Commons RDF Jena bindings if one
> so wanted)
>
> On 31 Oct 2016 1:41 pm, "Claude Warren" <cl...@xenei.com> wrote:
>
> > Andy,
> >
> > This seems like a good approach but does not appear to be in the Jena
> code
> > base, which I suppose is your comment about an approach to developing
> work.
> >
> > Does it make sense to create git clones that contain the new work?  Or
> > perhaps branches?
> >
> > Do you have a suggestion or direction you would like to see this go?
> >
> > Claude
> >
> >
> >
> > On Fri, Oct 28, 2016 at 2:35 PM, Andy Seaborne <an...@apache.org> wrote:
> >
> > > Claude,
> > >
> > > These may help:
> > >
> > > I have been thinking about an interface that is more oriented to the
> > > storage than the full DatasetGraph.
> > >
> > > StorageRDF breaks down all the operations into those on the default
> graph
> > > and those on named graphs.  For just a graph, simply ignore the named
> > graph
> > > operations.
> > >
> > > https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
> > > jects/dsg2/storage/StorageRDF.java
> > >
> > > There is an adapter to the DatasetGraph hierarchy (which is needed for
> > > SPARQL):
> > >
> > > https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
> > > jects/dsg2/DatasetGraphStorage.java
> > >
> > > If you want to only use existing classes, DatasetGraphTriplesQuads is
> the
> > > place to start - used by TIM and TDB - yuo can implement without
> needing
> > > quads/named graphs. Again, simply ignore (throw
> > > UnsupportedOperationException for the named graph calls).
> > >
> > > Going the graph route could lead to rework later on for any kind of
> > > performance issues because find(S,P,O) is so narrow and precludes union
> > > default graph except by brute force.  DatasetGraph work with the SPARQL
> > > execution engine.
> > >
> > > We still need to discuss how best to approach developing work - it
> should
> > > not get sucked up by the release cycle.
> > >
> > >         Andy
> > >
> > >
> > > On 26/10/16 19:21, Claude Warren wrote:
> > >
> > >> My plan is to start with a Graph implementation.  We expect to write 3
> > >> tables: SPO, POS, OPS (I think).  Currently we don't have an easy way
> to
> > >> handle find( ANY, ANY, ANY) so I suspect we will just start with
> > >> permitting
> > >> a column scan on Cassandra.
> > >>
> > >> I have not looked at DynamoDB but as I recall there are significant
> > >> differences under the hood.
> > >>
> > >> I expect that we will move on to a custom model or query engine to get
> > the
> > >> best performance but that is not what we are planning for the first
> cut.
> > >>
> > >> I am still waiting for management approval to do this at work ....
> > >> sometimes it takes longer to get the paperwork done than it does to
> > design
> > >> the thing.
> > >>
> > >>
> > >> Claude
> > >>
> > >> On Mon, Oct 17, 2016 at 6:39 PM, Paul Houle <paul.houle@ontology2.com
> >
> > >> wrote:
> > >>
> > >> I like DynamoDB as a target for this sort of thing.  There are many
> > >>> tasks which are small-scale yet critical where it would otherwise be
> > >>> hard to provide a distributed and reliable database.  Put that
> together
> > >>> with Lambda,  which does the same for computation,  and you are
> cooking
> > >>> with gas.
> > >>>
> > >>> I wrote a 1-1 translation of DynamoDB documents to RDF that I use
> > >>> throughout an application;  the code is DynamoDB idiomatic in every
> > way,
> > >>>  just the application reads and writes (a constrained set of) RDF
> > >>> documents.
> > >>>
> > >>> Right now I dump the documents from the DynamoDB system into a triple
> > >>> store when I want a panoptic view,  but with a distributed graph like
> > >>> that would mean being able to run SPARQL queries against DynamoDB
> > >>> directly.
> > >>>
> > >>> There are many products in the same family as Cassandra and DynamoDB
> > and
> > >>> it would be good to think through the math so we can approach them
> all
> > >>> in a similar way.
> > >>>
> > >>> --
> > >>>   Paul Houle
> > >>>   paul.houle@ontology2.com
> > >>>
> > >>> On Mon, Oct 17, 2016, at 12:31 PM, A. Soroka wrote:
> > >>>
> > >>>> Yep,
> > >>>>
> > >>>> http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/
> > >>>>
> > >>> Workshops/SSWS/Ladwig-et-all-SSWS2011.pdf
> > >>>
> > >>>>
> > >>>> indicates that they are indexing by subject. As someone who has
> > >>>> implemented LDP, that is definitely the approach that makes sense
> > there.
> > >>>>
> > >>>> ---
> > >>>> A. Soroka
> > >>>> The University of Virginia Library
> > >>>>
> > >>>> On Oct 17, 2016, at 12:20 PM, Andy Seaborne <an...@apache.org>
> wrote:
> > >>>>>
> > >>>>> IIRC It stores CBDs indexed by subject so it is the "other" model
> to
> > >>>>>
> > >>>> Rya.  Better for LDP (??).
> > >>>
> > >>>>
> > >>>>>     Andy
> > >>>>>
> > >>>>> On 17/10/16 15:41, A. Soroka wrote:
> > >>>>>
> > >>>>>> There's also:
> > >>>>>>
> > >>>>>> https://github.com/cumulusrdf/cumulusrdf
> > >>>>>>
> > >>>>>> in a similar vein (RDF over Cassandra). Not sure what kind of
> > >>>>>>
> > >>>>> particular uses it expects to support.
> > >>>
> > >>>>
> > >>>>>> ---
> > >>>>>> A. Soroka
> > >>>>>> The University of Virginia Library
> > >>>>>>
> > >>>>>> On Oct 17, 2016, at 7:02 AM, Andy Seaborne <an...@apache.org>
> wrote:
> > >>>>>>>
> > >>>>>>> Hi Claude,
> > >>>>>>>
> > >>>>>>> There is certainly interest from me.
> > >>>>>>>
> > >>>>>>> What the best thing to do depends on various factors.  By putting
> > it
> > >>>>>>>
> > >>>>>> in extras I presume you mean it gets added to the release?  That
> is
> > >>> not the
> > >>> only way forward.
> > >>>
> > >>>>
> > >>>>>>> An important aspect of Apache is "Community over code" - will
> there
> > >>>>>>>
> > >>>>>> be a community around this code?  Is that community the same, or
> > >>> significant overlap, as the Jena community?
> > >>>
> > >>>>
> > >>>>>>> There are various reasons for wanting RDF over a column store -
> > >>>>>>>
> > >>>>>> which use cases are the most important for this work?
> > >>>
> > >>>>
> > >>>>>>> They lead to different ways of using Cassandra. For example,
> > >>>>>>>
> > >>>>>> Rya(incubating) uses Accumulo tables as indexes, and partial scans
> > of
> > >>> the
> > >>> table is streaming.  Other systems try to use the columns for
> > properties,
> > >>> possibly more useful for LDP style than SPARQL.
> > >>>
> > >>>>
> > >>>>>>>   Andy
> > >>>>>>>
> > >>>>>>> On 15/10/16 18:38, Claude Warren wrote:
> > >>>>>>>
> > >>>>>>>> Howdy,
> > >>>>>>>>
> > >>>>>>>> We have a project at work that is implementing Jena Graph on
> > >>>>>>>>
> > >>>>>>> Cassandra.  I
> > >>>
> > >>>> am wondering if there is enough interest here to accept it as a
> > >>>>>>>> contribution.  I was thinking that it might fit in the Extras
> > >>>>>>>>
> > >>>>>>> category.
> > >>>
> > >>>>
> > >>>>>>>> I can not promise release of the code yet as I have to present
> it
> > >>>>>>>>
> > >>>>>>> to our
> > >>>
> > >>>> internal Intellectual Property group first.
> > >>>>>>>>
> > >>>>>>>> Thoughts?
> > >>>>>>>>
> > >>>>>>>> Claude
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>
> > >>>
> > >>
> > >>
> > >>
> >
> >
> > --
> > I like: Like Like - The likeliest place on the web
> > <http://like-like.xenei.com>
> > LinkedIn: http://www.linkedin.com/in/claudewarren
> >
>



-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: Graph on Cassandra

Posted by Stian Soiland-Reyes <st...@apache.org>.
Do you think it would make sense to do a Cassandra  Commons RDF API binding
for Graph or Dataset..? Or would that be too high level?

The streaming part would fit well there I think.

Commons RDF 0.3.0 is under vote now, adding Quad, Dataset and "RDF" as the
factory interface.

https://commonsrdf.incubator.apache.org/apidocs/index.html?org/apache/commons/rdf/api/package-summary.html

But it could make more sense as a Jena DatasetGraph so it can be used by
sparql queries etc. (And then exposed as Commons RDF Jena bindings if one
so wanted)

On 31 Oct 2016 1:41 pm, "Claude Warren" <cl...@xenei.com> wrote:

> Andy,
>
> This seems like a good approach but does not appear to be in the Jena code
> base, which I suppose is your comment about an approach to developing work.
>
> Does it make sense to create git clones that contain the new work?  Or
> perhaps branches?
>
> Do you have a suggestion or direction you would like to see this go?
>
> Claude
>
>
>
> On Fri, Oct 28, 2016 at 2:35 PM, Andy Seaborne <an...@apache.org> wrote:
>
> > Claude,
> >
> > These may help:
> >
> > I have been thinking about an interface that is more oriented to the
> > storage than the full DatasetGraph.
> >
> > StorageRDF breaks down all the operations into those on the default graph
> > and those on named graphs.  For just a graph, simply ignore the named
> graph
> > operations.
> >
> > https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
> > jects/dsg2/storage/StorageRDF.java
> >
> > There is an adapter to the DatasetGraph hierarchy (which is needed for
> > SPARQL):
> >
> > https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
> > jects/dsg2/DatasetGraphStorage.java
> >
> > If you want to only use existing classes, DatasetGraphTriplesQuads is the
> > place to start - used by TIM and TDB - yuo can implement without needing
> > quads/named graphs. Again, simply ignore (throw
> > UnsupportedOperationException for the named graph calls).
> >
> > Going the graph route could lead to rework later on for any kind of
> > performance issues because find(S,P,O) is so narrow and precludes union
> > default graph except by brute force.  DatasetGraph work with the SPARQL
> > execution engine.
> >
> > We still need to discuss how best to approach developing work - it should
> > not get sucked up by the release cycle.
> >
> >         Andy
> >
> >
> > On 26/10/16 19:21, Claude Warren wrote:
> >
> >> My plan is to start with a Graph implementation.  We expect to write 3
> >> tables: SPO, POS, OPS (I think).  Currently we don't have an easy way to
> >> handle find( ANY, ANY, ANY) so I suspect we will just start with
> >> permitting
> >> a column scan on Cassandra.
> >>
> >> I have not looked at DynamoDB but as I recall there are significant
> >> differences under the hood.
> >>
> >> I expect that we will move on to a custom model or query engine to get
> the
> >> best performance but that is not what we are planning for the first cut.
> >>
> >> I am still waiting for management approval to do this at work ....
> >> sometimes it takes longer to get the paperwork done than it does to
> design
> >> the thing.
> >>
> >>
> >> Claude
> >>
> >> On Mon, Oct 17, 2016 at 6:39 PM, Paul Houle <pa...@ontology2.com>
> >> wrote:
> >>
> >> I like DynamoDB as a target for this sort of thing.  There are many
> >>> tasks which are small-scale yet critical where it would otherwise be
> >>> hard to provide a distributed and reliable database.  Put that together
> >>> with Lambda,  which does the same for computation,  and you are cooking
> >>> with gas.
> >>>
> >>> I wrote a 1-1 translation of DynamoDB documents to RDF that I use
> >>> throughout an application;  the code is DynamoDB idiomatic in every
> way,
> >>>  just the application reads and writes (a constrained set of) RDF
> >>> documents.
> >>>
> >>> Right now I dump the documents from the DynamoDB system into a triple
> >>> store when I want a panoptic view,  but with a distributed graph like
> >>> that would mean being able to run SPARQL queries against DynamoDB
> >>> directly.
> >>>
> >>> There are many products in the same family as Cassandra and DynamoDB
> and
> >>> it would be good to think through the math so we can approach them all
> >>> in a similar way.
> >>>
> >>> --
> >>>   Paul Houle
> >>>   paul.houle@ontology2.com
> >>>
> >>> On Mon, Oct 17, 2016, at 12:31 PM, A. Soroka wrote:
> >>>
> >>>> Yep,
> >>>>
> >>>> http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/
> >>>>
> >>> Workshops/SSWS/Ladwig-et-all-SSWS2011.pdf
> >>>
> >>>>
> >>>> indicates that they are indexing by subject. As someone who has
> >>>> implemented LDP, that is definitely the approach that makes sense
> there.
> >>>>
> >>>> ---
> >>>> A. Soroka
> >>>> The University of Virginia Library
> >>>>
> >>>> On Oct 17, 2016, at 12:20 PM, Andy Seaborne <an...@apache.org> wrote:
> >>>>>
> >>>>> IIRC It stores CBDs indexed by subject so it is the "other" model to
> >>>>>
> >>>> Rya.  Better for LDP (??).
> >>>
> >>>>
> >>>>>     Andy
> >>>>>
> >>>>> On 17/10/16 15:41, A. Soroka wrote:
> >>>>>
> >>>>>> There's also:
> >>>>>>
> >>>>>> https://github.com/cumulusrdf/cumulusrdf
> >>>>>>
> >>>>>> in a similar vein (RDF over Cassandra). Not sure what kind of
> >>>>>>
> >>>>> particular uses it expects to support.
> >>>
> >>>>
> >>>>>> ---
> >>>>>> A. Soroka
> >>>>>> The University of Virginia Library
> >>>>>>
> >>>>>> On Oct 17, 2016, at 7:02 AM, Andy Seaborne <an...@apache.org> wrote:
> >>>>>>>
> >>>>>>> Hi Claude,
> >>>>>>>
> >>>>>>> There is certainly interest from me.
> >>>>>>>
> >>>>>>> What the best thing to do depends on various factors.  By putting
> it
> >>>>>>>
> >>>>>> in extras I presume you mean it gets added to the release?  That is
> >>> not the
> >>> only way forward.
> >>>
> >>>>
> >>>>>>> An important aspect of Apache is "Community over code" - will there
> >>>>>>>
> >>>>>> be a community around this code?  Is that community the same, or
> >>> significant overlap, as the Jena community?
> >>>
> >>>>
> >>>>>>> There are various reasons for wanting RDF over a column store -
> >>>>>>>
> >>>>>> which use cases are the most important for this work?
> >>>
> >>>>
> >>>>>>> They lead to different ways of using Cassandra. For example,
> >>>>>>>
> >>>>>> Rya(incubating) uses Accumulo tables as indexes, and partial scans
> of
> >>> the
> >>> table is streaming.  Other systems try to use the columns for
> properties,
> >>> possibly more useful for LDP style than SPARQL.
> >>>
> >>>>
> >>>>>>>   Andy
> >>>>>>>
> >>>>>>> On 15/10/16 18:38, Claude Warren wrote:
> >>>>>>>
> >>>>>>>> Howdy,
> >>>>>>>>
> >>>>>>>> We have a project at work that is implementing Jena Graph on
> >>>>>>>>
> >>>>>>> Cassandra.  I
> >>>
> >>>> am wondering if there is enough interest here to accept it as a
> >>>>>>>> contribution.  I was thinking that it might fit in the Extras
> >>>>>>>>
> >>>>>>> category.
> >>>
> >>>>
> >>>>>>>> I can not promise release of the code yet as I have to present it
> >>>>>>>>
> >>>>>>> to our
> >>>
> >>>> internal Intellectual Property group first.
> >>>>>>>>
> >>>>>>>> Thoughts?
> >>>>>>>>
> >>>>>>>> Claude
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> >>
> >>
>
>
> --
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren
>

Re: Graph on Cassandra

Posted by Claude Warren <cl...@xenei.com>.
Andy,

This seems like a good approach but does not appear to be in the Jena code
base, which I suppose is your comment about an approach to developing work.

Does it make sense to create git clones that contain the new work?  Or
perhaps branches?

Do you have a suggestion or direction you would like to see this go?

Claude



On Fri, Oct 28, 2016 at 2:35 PM, Andy Seaborne <an...@apache.org> wrote:

> Claude,
>
> These may help:
>
> I have been thinking about an interface that is more oriented to the
> storage than the full DatasetGraph.
>
> StorageRDF breaks down all the operations into those on the default graph
> and those on named graphs.  For just a graph, simply ignore the named graph
> operations.
>
> https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
> jects/dsg2/storage/StorageRDF.java
>
> There is an adapter to the DatasetGraph hierarchy (which is needed for
> SPARQL):
>
> https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
> jects/dsg2/DatasetGraphStorage.java
>
> If you want to only use existing classes, DatasetGraphTriplesQuads is the
> place to start - used by TIM and TDB - yuo can implement without needing
> quads/named graphs. Again, simply ignore (throw
> UnsupportedOperationException for the named graph calls).
>
> Going the graph route could lead to rework later on for any kind of
> performance issues because find(S,P,O) is so narrow and precludes union
> default graph except by brute force.  DatasetGraph work with the SPARQL
> execution engine.
>
> We still need to discuss how best to approach developing work - it should
> not get sucked up by the release cycle.
>
>         Andy
>
>
> On 26/10/16 19:21, Claude Warren wrote:
>
>> My plan is to start with a Graph implementation.  We expect to write 3
>> tables: SPO, POS, OPS (I think).  Currently we don't have an easy way to
>> handle find( ANY, ANY, ANY) so I suspect we will just start with
>> permitting
>> a column scan on Cassandra.
>>
>> I have not looked at DynamoDB but as I recall there are significant
>> differences under the hood.
>>
>> I expect that we will move on to a custom model or query engine to get the
>> best performance but that is not what we are planning for the first cut.
>>
>> I am still waiting for management approval to do this at work ....
>> sometimes it takes longer to get the paperwork done than it does to design
>> the thing.
>>
>>
>> Claude
>>
>> On Mon, Oct 17, 2016 at 6:39 PM, Paul Houle <pa...@ontology2.com>
>> wrote:
>>
>> I like DynamoDB as a target for this sort of thing.  There are many
>>> tasks which are small-scale yet critical where it would otherwise be
>>> hard to provide a distributed and reliable database.  Put that together
>>> with Lambda,  which does the same for computation,  and you are cooking
>>> with gas.
>>>
>>> I wrote a 1-1 translation of DynamoDB documents to RDF that I use
>>> throughout an application;  the code is DynamoDB idiomatic in every way,
>>>  just the application reads and writes (a constrained set of) RDF
>>> documents.
>>>
>>> Right now I dump the documents from the DynamoDB system into a triple
>>> store when I want a panoptic view,  but with a distributed graph like
>>> that would mean being able to run SPARQL queries against DynamoDB
>>> directly.
>>>
>>> There are many products in the same family as Cassandra and DynamoDB and
>>> it would be good to think through the math so we can approach them all
>>> in a similar way.
>>>
>>> --
>>>   Paul Houle
>>>   paul.houle@ontology2.com
>>>
>>> On Mon, Oct 17, 2016, at 12:31 PM, A. Soroka wrote:
>>>
>>>> Yep,
>>>>
>>>> http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/
>>>>
>>> Workshops/SSWS/Ladwig-et-all-SSWS2011.pdf
>>>
>>>>
>>>> indicates that they are indexing by subject. As someone who has
>>>> implemented LDP, that is definitely the approach that makes sense there.
>>>>
>>>> ---
>>>> A. Soroka
>>>> The University of Virginia Library
>>>>
>>>> On Oct 17, 2016, at 12:20 PM, Andy Seaborne <an...@apache.org> wrote:
>>>>>
>>>>> IIRC It stores CBDs indexed by subject so it is the "other" model to
>>>>>
>>>> Rya.  Better for LDP (??).
>>>
>>>>
>>>>>     Andy
>>>>>
>>>>> On 17/10/16 15:41, A. Soroka wrote:
>>>>>
>>>>>> There's also:
>>>>>>
>>>>>> https://github.com/cumulusrdf/cumulusrdf
>>>>>>
>>>>>> in a similar vein (RDF over Cassandra). Not sure what kind of
>>>>>>
>>>>> particular uses it expects to support.
>>>
>>>>
>>>>>> ---
>>>>>> A. Soroka
>>>>>> The University of Virginia Library
>>>>>>
>>>>>> On Oct 17, 2016, at 7:02 AM, Andy Seaborne <an...@apache.org> wrote:
>>>>>>>
>>>>>>> Hi Claude,
>>>>>>>
>>>>>>> There is certainly interest from me.
>>>>>>>
>>>>>>> What the best thing to do depends on various factors.  By putting it
>>>>>>>
>>>>>> in extras I presume you mean it gets added to the release?  That is
>>> not the
>>> only way forward.
>>>
>>>>
>>>>>>> An important aspect of Apache is "Community over code" - will there
>>>>>>>
>>>>>> be a community around this code?  Is that community the same, or
>>> significant overlap, as the Jena community?
>>>
>>>>
>>>>>>> There are various reasons for wanting RDF over a column store -
>>>>>>>
>>>>>> which use cases are the most important for this work?
>>>
>>>>
>>>>>>> They lead to different ways of using Cassandra. For example,
>>>>>>>
>>>>>> Rya(incubating) uses Accumulo tables as indexes, and partial scans of
>>> the
>>> table is streaming.  Other systems try to use the columns for properties,
>>> possibly more useful for LDP style than SPARQL.
>>>
>>>>
>>>>>>>   Andy
>>>>>>>
>>>>>>> On 15/10/16 18:38, Claude Warren wrote:
>>>>>>>
>>>>>>>> Howdy,
>>>>>>>>
>>>>>>>> We have a project at work that is implementing Jena Graph on
>>>>>>>>
>>>>>>> Cassandra.  I
>>>
>>>> am wondering if there is enough interest here to accept it as a
>>>>>>>> contribution.  I was thinking that it might fit in the Extras
>>>>>>>>
>>>>>>> category.
>>>
>>>>
>>>>>>>> I can not promise release of the code yet as I have to present it
>>>>>>>>
>>>>>>> to our
>>>
>>>> internal Intellectual Property group first.
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>> Claude
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>
>>
>>
>>


-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: Graph on Cassandra

Posted by Andy Seaborne <an...@apache.org>.
Claude,

These may help:

I have been thinking about an interface that is more oriented to the 
storage than the full DatasetGraph.

StorageRDF breaks down all the operations into those on the default 
graph and those on named graphs.  For just a graph, simply ignore the 
named graph operations.

https://github.com/afs/AFS-Dev/blob/master/src/main/java/projects/dsg2/storage/StorageRDF.java

There is an adapter to the DatasetGraph hierarchy (which is needed for 
SPARQL):

https://github.com/afs/AFS-Dev/blob/master/src/main/java/projects/dsg2/DatasetGraphStorage.java

If you want to only use existing classes, DatasetGraphTriplesQuads is 
the place to start - used by TIM and TDB - yuo can implement without 
needing quads/named graphs. Again, simply ignore (throw 
UnsupportedOperationException for the named graph calls).

Going the graph route could lead to rework later on for any kind of 
performance issues because find(S,P,O) is so narrow and precludes union 
default graph except by brute force.  DatasetGraph work with the SPARQL 
execution engine.

We still need to discuss how best to approach developing work - it 
should not get sucked up by the release cycle.

	Andy

On 26/10/16 19:21, Claude Warren wrote:
> My plan is to start with a Graph implementation.  We expect to write 3
> tables: SPO, POS, OPS (I think).  Currently we don't have an easy way to
> handle find( ANY, ANY, ANY) so I suspect we will just start with permitting
> a column scan on Cassandra.
>
> I have not looked at DynamoDB but as I recall there are significant
> differences under the hood.
>
> I expect that we will move on to a custom model or query engine to get the
> best performance but that is not what we are planning for the first cut.
>
> I am still waiting for management approval to do this at work ....
> sometimes it takes longer to get the paperwork done than it does to design
> the thing.
>
>
> Claude
>
> On Mon, Oct 17, 2016 at 6:39 PM, Paul Houle <pa...@ontology2.com>
> wrote:
>
>> I like DynamoDB as a target for this sort of thing.  There are many
>> tasks which are small-scale yet critical where it would otherwise be
>> hard to provide a distributed and reliable database.  Put that together
>> with Lambda,  which does the same for computation,  and you are cooking
>> with gas.
>>
>> I wrote a 1-1 translation of DynamoDB documents to RDF that I use
>> throughout an application;  the code is DynamoDB idiomatic in every way,
>>  just the application reads and writes (a constrained set of) RDF
>> documents.
>>
>> Right now I dump the documents from the DynamoDB system into a triple
>> store when I want a panoptic view,  but with a distributed graph like
>> that would mean being able to run SPARQL queries against DynamoDB
>> directly.
>>
>> There are many products in the same family as Cassandra and DynamoDB and
>> it would be good to think through the math so we can approach them all
>> in a similar way.
>>
>> --
>>   Paul Houle
>>   paul.houle@ontology2.com
>>
>> On Mon, Oct 17, 2016, at 12:31 PM, A. Soroka wrote:
>>> Yep,
>>>
>>> http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/
>> Workshops/SSWS/Ladwig-et-all-SSWS2011.pdf
>>>
>>> indicates that they are indexing by subject. As someone who has
>>> implemented LDP, that is definitely the approach that makes sense there.
>>>
>>> ---
>>> A. Soroka
>>> The University of Virginia Library
>>>
>>>> On Oct 17, 2016, at 12:20 PM, Andy Seaborne <an...@apache.org> wrote:
>>>>
>>>> IIRC It stores CBDs indexed by subject so it is the "other" model to
>> Rya.  Better for LDP (??).
>>>>
>>>>     Andy
>>>>
>>>> On 17/10/16 15:41, A. Soroka wrote:
>>>>> There's also:
>>>>>
>>>>> https://github.com/cumulusrdf/cumulusrdf
>>>>>
>>>>> in a similar vein (RDF over Cassandra). Not sure what kind of
>> particular uses it expects to support.
>>>>>
>>>>> ---
>>>>> A. Soroka
>>>>> The University of Virginia Library
>>>>>
>>>>>> On Oct 17, 2016, at 7:02 AM, Andy Seaborne <an...@apache.org> wrote:
>>>>>>
>>>>>> Hi Claude,
>>>>>>
>>>>>> There is certainly interest from me.
>>>>>>
>>>>>> What the best thing to do depends on various factors.  By putting it
>> in extras I presume you mean it gets added to the release?  That is not the
>> only way forward.
>>>>>>
>>>>>> An important aspect of Apache is "Community over code" - will there
>> be a community around this code?  Is that community the same, or
>> significant overlap, as the Jena community?
>>>>>>
>>>>>> There are various reasons for wanting RDF over a column store -
>> which use cases are the most important for this work?
>>>>>>
>>>>>> They lead to different ways of using Cassandra. For example,
>> Rya(incubating) uses Accumulo tables as indexes, and partial scans of the
>> table is streaming.  Other systems try to use the columns for properties,
>> possibly more useful for LDP style than SPARQL.
>>>>>>
>>>>>>   Andy
>>>>>>
>>>>>> On 15/10/16 18:38, Claude Warren wrote:
>>>>>>> Howdy,
>>>>>>>
>>>>>>> We have a project at work that is implementing Jena Graph on
>> Cassandra.  I
>>>>>>> am wondering if there is enough interest here to accept it as a
>>>>>>> contribution.  I was thinking that it might fit in the Extras
>> category.
>>>>>>>
>>>>>>> I can not promise release of the code yet as I have to present it
>> to our
>>>>>>> internal Intellectual Property group first.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>> Claude
>>>>>>>
>>>>>
>>>
>>
>
>
>

Re: Graph on Cassandra

Posted by Claude Warren <cl...@xenei.com>.
My plan is to start with a Graph implementation.  We expect to write 3
tables: SPO, POS, OPS (I think).  Currently we don't have an easy way to
handle find( ANY, ANY, ANY) so I suspect we will just start with permitting
a column scan on Cassandra.

I have not looked at DynamoDB but as I recall there are significant
differences under the hood.

I expect that we will move on to a custom model or query engine to get the
best performance but that is not what we are planning for the first cut.

I am still waiting for management approval to do this at work ....
sometimes it takes longer to get the paperwork done than it does to design
the thing.


Claude

On Mon, Oct 17, 2016 at 6:39 PM, Paul Houle <pa...@ontology2.com>
wrote:

> I like DynamoDB as a target for this sort of thing.  There are many
> tasks which are small-scale yet critical where it would otherwise be
> hard to provide a distributed and reliable database.  Put that together
> with Lambda,  which does the same for computation,  and you are cooking
> with gas.
>
> I wrote a 1-1 translation of DynamoDB documents to RDF that I use
> throughout an application;  the code is DynamoDB idiomatic in every way,
>  just the application reads and writes (a constrained set of) RDF
> documents.
>
> Right now I dump the documents from the DynamoDB system into a triple
> store when I want a panoptic view,  but with a distributed graph like
> that would mean being able to run SPARQL queries against DynamoDB
> directly.
>
> There are many products in the same family as Cassandra and DynamoDB and
> it would be good to think through the math so we can approach them all
> in a similar way.
>
> --
>   Paul Houle
>   paul.houle@ontology2.com
>
> On Mon, Oct 17, 2016, at 12:31 PM, A. Soroka wrote:
> > Yep,
> >
> > http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/
> Workshops/SSWS/Ladwig-et-all-SSWS2011.pdf
> >
> > indicates that they are indexing by subject. As someone who has
> > implemented LDP, that is definitely the approach that makes sense there.
> >
> > ---
> > A. Soroka
> > The University of Virginia Library
> >
> > > On Oct 17, 2016, at 12:20 PM, Andy Seaborne <an...@apache.org> wrote:
> > >
> > > IIRC It stores CBDs indexed by subject so it is the "other" model to
> Rya.  Better for LDP (??).
> > >
> > >     Andy
> > >
> > > On 17/10/16 15:41, A. Soroka wrote:
> > >> There's also:
> > >>
> > >> https://github.com/cumulusrdf/cumulusrdf
> > >>
> > >> in a similar vein (RDF over Cassandra). Not sure what kind of
> particular uses it expects to support.
> > >>
> > >> ---
> > >> A. Soroka
> > >> The University of Virginia Library
> > >>
> > >>> On Oct 17, 2016, at 7:02 AM, Andy Seaborne <an...@apache.org> wrote:
> > >>>
> > >>> Hi Claude,
> > >>>
> > >>> There is certainly interest from me.
> > >>>
> > >>> What the best thing to do depends on various factors.  By putting it
> in extras I presume you mean it gets added to the release?  That is not the
> only way forward.
> > >>>
> > >>> An important aspect of Apache is "Community over code" - will there
> be a community around this code?  Is that community the same, or
> significant overlap, as the Jena community?
> > >>>
> > >>> There are various reasons for wanting RDF over a column store -
> which use cases are the most important for this work?
> > >>>
> > >>> They lead to different ways of using Cassandra. For example,
> Rya(incubating) uses Accumulo tables as indexes, and partial scans of the
> table is streaming.  Other systems try to use the columns for properties,
> possibly more useful for LDP style than SPARQL.
> > >>>
> > >>>   Andy
> > >>>
> > >>> On 15/10/16 18:38, Claude Warren wrote:
> > >>>> Howdy,
> > >>>>
> > >>>> We have a project at work that is implementing Jena Graph on
> Cassandra.  I
> > >>>> am wondering if there is enough interest here to accept it as a
> > >>>> contribution.  I was thinking that it might fit in the Extras
> category.
> > >>>>
> > >>>> I can not promise release of the code yet as I have to present it
> to our
> > >>>> internal Intellectual Property group first.
> > >>>>
> > >>>> Thoughts?
> > >>>>
> > >>>> Claude
> > >>>>
> > >>
> >
>



-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: Graph on Cassandra

Posted by Paul Houle <pa...@ontology2.com>.
I like DynamoDB as a target for this sort of thing.  There are many
tasks which are small-scale yet critical where it would otherwise be
hard to provide a distributed and reliable database.  Put that together
with Lambda,  which does the same for computation,  and you are cooking
with gas.

I wrote a 1-1 translation of DynamoDB documents to RDF that I use
throughout an application;  the code is DynamoDB idiomatic in every way,
 just the application reads and writes (a constrained set of) RDF
documents.

Right now I dump the documents from the DynamoDB system into a triple
store when I want a panoptic view,  but with a distributed graph like
that would mean being able to run SPARQL queries against DynamoDB
directly.

There are many products in the same family as Cassandra and DynamoDB and
it would be good to think through the math so we can approach them all
in a similar way.

-- 
  Paul Houle
  paul.houle@ontology2.com

On Mon, Oct 17, 2016, at 12:31 PM, A. Soroka wrote:
> Yep,
> 
> http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Workshops/SSWS/Ladwig-et-all-SSWS2011.pdf
> 
> indicates that they are indexing by subject. As someone who has
> implemented LDP, that is definitely the approach that makes sense there.
> 
> ---
> A. Soroka
> The University of Virginia Library
> 
> > On Oct 17, 2016, at 12:20 PM, Andy Seaborne <an...@apache.org> wrote:
> > 
> > IIRC It stores CBDs indexed by subject so it is the "other" model to Rya.  Better for LDP (??).
> > 
> > 	Andy
> > 
> > On 17/10/16 15:41, A. Soroka wrote:
> >> There's also:
> >> 
> >> https://github.com/cumulusrdf/cumulusrdf
> >> 
> >> in a similar vein (RDF over Cassandra). Not sure what kind of particular uses it expects to support.
> >> 
> >> ---
> >> A. Soroka
> >> The University of Virginia Library
> >> 
> >>> On Oct 17, 2016, at 7:02 AM, Andy Seaborne <an...@apache.org> wrote:
> >>> 
> >>> Hi Claude,
> >>> 
> >>> There is certainly interest from me.
> >>> 
> >>> What the best thing to do depends on various factors.  By putting it in extras I presume you mean it gets added to the release?  That is not the only way forward.
> >>> 
> >>> An important aspect of Apache is "Community over code" - will there be a community around this code?  Is that community the same, or significant overlap, as the Jena community?
> >>> 
> >>> There are various reasons for wanting RDF over a column store - which use cases are the most important for this work?
> >>> 
> >>> They lead to different ways of using Cassandra. For example, Rya(incubating) uses Accumulo tables as indexes, and partial scans of the table is streaming.  Other systems try to use the columns for properties, possibly more useful for LDP style than SPARQL.
> >>> 
> >>> 	Andy
> >>> 
> >>> On 15/10/16 18:38, Claude Warren wrote:
> >>>> Howdy,
> >>>> 
> >>>> We have a project at work that is implementing Jena Graph on Cassandra.  I
> >>>> am wondering if there is enough interest here to accept it as a
> >>>> contribution.  I was thinking that it might fit in the Extras category.
> >>>> 
> >>>> I can not promise release of the code yet as I have to present it to our
> >>>> internal Intellectual Property group first.
> >>>> 
> >>>> Thoughts?
> >>>> 
> >>>> Claude
> >>>> 
> >> 
> 

Re: Graph on Cassandra

Posted by "A. Soroka" <aj...@virginia.edu>.
Yep,

http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Workshops/SSWS/Ladwig-et-all-SSWS2011.pdf

indicates that they are indexing by subject. As someone who has implemented LDP, that is definitely the approach that makes sense there.

---
A. Soroka
The University of Virginia Library

> On Oct 17, 2016, at 12:20 PM, Andy Seaborne <an...@apache.org> wrote:
> 
> IIRC It stores CBDs indexed by subject so it is the "other" model to Rya.  Better for LDP (??).
> 
> 	Andy
> 
> On 17/10/16 15:41, A. Soroka wrote:
>> There's also:
>> 
>> https://github.com/cumulusrdf/cumulusrdf
>> 
>> in a similar vein (RDF over Cassandra). Not sure what kind of particular uses it expects to support.
>> 
>> ---
>> A. Soroka
>> The University of Virginia Library
>> 
>>> On Oct 17, 2016, at 7:02 AM, Andy Seaborne <an...@apache.org> wrote:
>>> 
>>> Hi Claude,
>>> 
>>> There is certainly interest from me.
>>> 
>>> What the best thing to do depends on various factors.  By putting it in extras I presume you mean it gets added to the release?  That is not the only way forward.
>>> 
>>> An important aspect of Apache is "Community over code" - will there be a community around this code?  Is that community the same, or significant overlap, as the Jena community?
>>> 
>>> There are various reasons for wanting RDF over a column store - which use cases are the most important for this work?
>>> 
>>> They lead to different ways of using Cassandra. For example, Rya(incubating) uses Accumulo tables as indexes, and partial scans of the table is streaming.  Other systems try to use the columns for properties, possibly more useful for LDP style than SPARQL.
>>> 
>>> 	Andy
>>> 
>>> On 15/10/16 18:38, Claude Warren wrote:
>>>> Howdy,
>>>> 
>>>> We have a project at work that is implementing Jena Graph on Cassandra.  I
>>>> am wondering if there is enough interest here to accept it as a
>>>> contribution.  I was thinking that it might fit in the Extras category.
>>>> 
>>>> I can not promise release of the code yet as I have to present it to our
>>>> internal Intellectual Property group first.
>>>> 
>>>> Thoughts?
>>>> 
>>>> Claude
>>>> 
>> 


Re: Graph on Cassandra

Posted by Andy Seaborne <an...@apache.org>.
IIRC It stores CBDs indexed by subject so it is the "other" model to 
Rya.  Better for LDP (??).

	Andy

On 17/10/16 15:41, A. Soroka wrote:
> There's also:
>
> https://github.com/cumulusrdf/cumulusrdf
>
> in a similar vein (RDF over Cassandra). Not sure what kind of particular uses it expects to support.
>
> ---
> A. Soroka
> The University of Virginia Library
>
>> On Oct 17, 2016, at 7:02 AM, Andy Seaborne <an...@apache.org> wrote:
>>
>> Hi Claude,
>>
>> There is certainly interest from me.
>>
>> What the best thing to do depends on various factors.  By putting it in extras I presume you mean it gets added to the release?  That is not the only way forward.
>>
>> An important aspect of Apache is "Community over code" - will there be a community around this code?  Is that community the same, or significant overlap, as the Jena community?
>>
>> There are various reasons for wanting RDF over a column store - which use cases are the most important for this work?
>>
>> They lead to different ways of using Cassandra. For example, Rya(incubating) uses Accumulo tables as indexes, and partial scans of the table is streaming.  Other systems try to use the columns for properties, possibly more useful for LDP style than SPARQL.
>>
>> 	Andy
>>
>> On 15/10/16 18:38, Claude Warren wrote:
>>> Howdy,
>>>
>>> We have a project at work that is implementing Jena Graph on Cassandra.  I
>>> am wondering if there is enough interest here to accept it as a
>>> contribution.  I was thinking that it might fit in the Extras category.
>>>
>>> I can not promise release of the code yet as I have to present it to our
>>> internal Intellectual Property group first.
>>>
>>> Thoughts?
>>>
>>> Claude
>>>
>

Re: Graph on Cassandra

Posted by "A. Soroka" <aj...@virginia.edu>.
There's also:

https://github.com/cumulusrdf/cumulusrdf

in a similar vein (RDF over Cassandra). Not sure what kind of particular uses it expects to support.

---
A. Soroka
The University of Virginia Library

> On Oct 17, 2016, at 7:02 AM, Andy Seaborne <an...@apache.org> wrote:
> 
> Hi Claude,
> 
> There is certainly interest from me.
> 
> What the best thing to do depends on various factors.  By putting it in extras I presume you mean it gets added to the release?  That is not the only way forward.
> 
> An important aspect of Apache is "Community over code" - will there be a community around this code?  Is that community the same, or significant overlap, as the Jena community?
> 
> There are various reasons for wanting RDF over a column store - which use cases are the most important for this work?
> 
> They lead to different ways of using Cassandra. For example, Rya(incubating) uses Accumulo tables as indexes, and partial scans of the table is streaming.  Other systems try to use the columns for properties, possibly more useful for LDP style than SPARQL.
> 
> 	Andy
> 
> On 15/10/16 18:38, Claude Warren wrote:
>> Howdy,
>> 
>> We have a project at work that is implementing Jena Graph on Cassandra.  I
>> am wondering if there is enough interest here to accept it as a
>> contribution.  I was thinking that it might fit in the Extras category.
>> 
>> I can not promise release of the code yet as I have to present it to our
>> internal Intellectual Property group first.
>> 
>> Thoughts?
>> 
>> Claude
>> 


Re: Graph on Cassandra

Posted by Andy Seaborne <an...@apache.org>.
Hi Claude,

There is certainly interest from me.

What the best thing to do depends on various factors.  By putting it in 
extras I presume you mean it gets added to the release?  That is not the 
only way forward.

An important aspect of Apache is "Community over code" - will there be a 
community around this code?  Is that community the same, or significant 
overlap, as the Jena community?

There are various reasons for wanting RDF over a column store - which 
use cases are the most important for this work?

They lead to different ways of using Cassandra. For example, 
Rya(incubating) uses Accumulo tables as indexes, and partial scans of 
the table is streaming.  Other systems try to use the columns for 
properties, possibly more useful for LDP style than SPARQL.

	Andy

On 15/10/16 18:38, Claude Warren wrote:
> Howdy,
>
> We have a project at work that is implementing Jena Graph on Cassandra.  I
> am wondering if there is enough interest here to accept it as a
> contribution.  I was thinking that it might fit in the Extras category.
>
> I can not promise release of the code yet as I have to present it to our
> internal Intellectual Property group first.
>
> Thoughts?
>
> Claude
>