You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@clerezza.apache.org by Reto Bachmann-Gmür <re...@apache.org> on 2012/08/24 14:25:44 UTC

CLEREZZA-715 resolution

Hello

I doubt that CLEREZZA-715 and its resolution are correct. IIUC the
default graph is the union of the all the other triple collection. As
some of these are mutable the content of the default graph doesn't
stay constant as this is required for graph instances (for their
hashCode and equals method to be usefull), for this reason I think it
should be a (read-only) MGraph rather than a Graph,

Cheers,
Reto

Re: CLEREZZA-715 resolution

Posted by Reto Bachmann-Gmür <re...@apache.org>.

--- slightly unfinished message sent by accident before --

Hi Rupert,

Good to have this discussion.

> I had always the impression that Graphs are read-only
> TripleCollections and MGraphs are read- and writeable
> TripleCollections. I had never through about immutable Graphs,
> read-only MGraphs and read- and writeable MGraphs.

The thing to use in most situations is indeed the MGraph, this is what
corresponds to a Graph and a Model in Jena and to a Context in Sesame.
Another named could be GraphChangingOverTime.

The RDF Semantics has another notion of Graph, two graphs are
identical iff they are isomorphic, clearly two jena models are not the
same, just because they happen to be isomorphic at some point in time.
That a triple collection is an MGraph (unintuively) doesn't mean that
you can actually modify it, on one hand you might lack the necessary
permissions on the other hand the graph might just mutate for other
reasons than some direct modification of it. An MGraph might describe
the value of a Stock value, it's not a fixed Graph but something that
changes over time but adding and removing triples is not supported

Graphs are useful typically for small triple collection,  self
contained molecules of information that can for example be signed and
added to (Hash)Sets.


> What are the use cases for immutable graphs in Clerezza? It it really
> important to have immutable Graphs?

I think for an RDF API it make sense to have this basic element of the
RDF specifications available as a core class. Even if there are more
evident usecases for MGraph, there are some situations when Graphs
have practical values:
- Doing RDF synchronization (RDFSync): What you sync is a set of MSG
(minimum self contained graphs) which are graphs
- Similarly for diffs and versioning: The units to deal with are not
triples (when there are bnodes) but small subgraphs
- Computing E-Tag in HTTP, the hash of the Graph can be used for it
- Digital signing: signing a mutable graph is not what you want

> Because creating those is really
> expensive (look at the MGraph#getGraph() implementations that create
> an in-memory copy of the MGrpah in an SimpleGraph instance). I have
> already written about that on the list [1] but at that time I was not
> aware about the reason for that and also the follow up discussion
> missed to come up with the reason for that.
Missed that thread.

> I imagine that a lot of users do call MGraph#getGraph() without
> realizing that this would clone all the data in the MGrpah.

Implementations can be smarter that that and clone the data only if
the mutable graph is modified after getGraph has been called. This
means that one can use an MGraph to add the triples and return
mGraph().getGraph() without the triples being duplicated.

A really clever implementation keeps weak-references to the Graphs
returned since the last change and only duplicates the data on a
modification when one of the Graphs id still referenced. This would
mean that the following would never (see limitation below) cause the
data to be duplicated.

                Lock readLock = mGraph.getLock().readLock();
                readLock.lock();
                try {
                        if (mGraph.getGraph().equals(referenceGraph)) {
                           alert("you did it!")
                        }
                } finally {
                        readLock.unlock();
                }

As isomorphism is an expensive operation it might be better to
duplicate the graph rather than to keep code that wants to add a
triple waiting. The following would duplicate the graph only (see
limitation below) if a triple is added while isomorphism is being
computed,

                if (mGraph.getGraph().equals(referenceGraph)) {
                   alert("you did it!")
                }

Limitation: Looking at the javadoc I realize that as long as garbage
collections doesn't happen it seems not be possible to find out that a
instance has no longer a reference to it. "An entry in a WeakHashMap
will automatically be removed when its key is no longer in ordinary
use" sound good, however the further details of WeakHashMap and
WeakReference api indicate that the reference is queued for
finalization only when the garbage collection detects it. Which is
probably not the very instant in which the object becomes eligible for
garbage collection.

>
> Changing the SingleTdbDatasetTcProvider so that the union-grpah is
> exposed as read-only MGrpah is really not a big deal. I am just
> wondering if I am the only one that uses the Graph interface different
> as the Javadoc says. In any case I would add a big WARNING to the
> MGrpah#getGraph() method saying that calling this method will create a
> copy (and not a read-only wrapper) of the MGraph.

I think the warning should say that the data will typically (not if
the backend supports versioning) duplicated as soon as a triple is
added or removed to the MGraph.

Cheers,
Reto
>
> best
> Rupert
>
> [1] http://mail-archives.apache.org/mod_mbox/incubator-clerezza-dev/201203.mbox/%3C1F7ADF98-D5F7-47F2-BE72-FC248B9219AB@gmail.com%3E
>
>> Cheers,
>> Reto
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen

Re: CLEREZZA-715 resolution

Posted by Reto Bachmann-Gmür <re...@apache.org>.

Hi Rupert,

Good to have this discussion.

> I had always the impression that Graphs are read-only
> TripleCollections and MGraphs are read- and writeable
> TripleCollections. I had never through about immutable Graphs,
> read-only MGraphs and read- and writeable MGraphs.

The thing to use in most situations is indeed the MGraph, this is what
corresponds to a Graph and a Model in Jena and to a Context in Sesame.
Another named could be GraphChangingOverTime.

The RDF Semantics has another notion of Graph, two graphs are
identical iff they are isomorphic, clearly two jena models are not the
same, just because they happen to be isomorphic at some point in time.
That a triple collection is an MGraph (unintuively) doesn't mean that
you can actually modify it, on one hand you might lack the necessary
permissions on the other hand the graph might just mutate for other
reasons than some direct modification of it. An MGraph might describe
the value of a Stock value, it's not a fixed Graph but something that
changes over time but adding and removing triples is not supported

Graphs are useful typically for small triple collection,  self
contained molecules of information that can for example be signed and
added to (Hash)Sets.


> What are the use cases for immutable graphs in Clerezza? It it really
> important to have immutable Graphs?

I think for an RDF API it make sense to have this basic element of the
RDF specifications available as a core class. Even if there are more
evident usecases for MGraph, there are some situations when Graphs
have practical values:
- Doing RDF synchronization (RDFSync): What you sync is a set of MSG
(minimum self contained graphs) which are graphs
- Similarly for diffs and versioning: The units to deal with are not
triples (when there are bnodes) but small subgraphs
- Computing E-Tag in HTTP, the hash of the Graph can be used for it
- Digital signing: signing a mutable graph is not what you want

> Because creating those is really
> expensive (look at the MGraph#getGraph() implementations that create
> an in-memory copy of the MGrpah in an SimpleGraph instance). I have
> already written about that on the list [1] but at that time I was not
> aware about the reason for that and also the follow up discussion
> missed to come up with the reason for that.
Missed that thread.

> I imagine that a lot of users do call MGraph#getGraph() without
> realizing that this would clone all the data in the MGrpah.
Implementations can be smarter that that and clone the data only if
the mutable graph is modified after getGraph has been called. This
means that one can use an MGraph to add the triples and return
mGraph().getGraph() without the triples being duplicated.

A really clever implementation keeps weak-references to the Graphs
returned since the last change and only duplicates the data on a
modification when one of the Graphs id still referenced. This would
mean that the following would never cause the data to be duplicated.

Lock readLock = mGraph.getLock().readLock();
		readLock.lock();
		try {
			if (mGraph.getGraph().equals(referenceGraph)) {
                           alert("you did it!")

		} finally {
			readLock.unlock();
		}

>
> Changing the SingleTdbDatasetTcProvider so that the union-grpah is
> exposed as read-only MGrpah is really not a big deal. I am just
> wondering if I am the only one that uses the Graph interface different
> as the Javadoc says. In any case I would add a big WARNING to the
> MGrpah#getGraph() method saying that calling this method will create a
> copy (and not a read-only wrapper) of the MGraph.
>
> best
> Rupert
>
> [1] http://mail-archives.apache.org/mod_mbox/incubator-clerezza-dev/201203.mbox/%3C1F7ADF98-D5F7-47F2-BE72-FC248B9219AB@gmail.com%3E
>
>> Cheers,
>> Reto
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen

Re: CLEREZZA-715 resolution

Posted by Rupert Westenthaler <ru...@gmail.com>.

On Fri, Aug 24, 2012 at 2:25 PM, Reto Bachmann-Gmür <re...@apache.org> wrote:
> Hello
>
> I doubt that CLEREZZA-715 and its resolution are correct. IIUC the
> default graph is the union of the all the other triple collection. As
> some of these are mutable the content of the default graph doesn't
> stay constant as this is required for graph instances (for their
> hashCode and equals method to be usefull), for this reason I think it
> should be a (read-only) MGraph rather than a Graph,
>

I had always the impression that Graphs are read-only
TripleCollections and MGraphs are read- and writeable
TripleCollections. I had never through about immutable Graphs,
read-only MGraphs and read- and writeable MGraphs.

What are the use cases for immutable graphs in Clerezza? It it really
important to have immutable Graphs? Because creating those is really
expensive (look at the MGraph#getGraph() implementations that create
an in-memory copy of the MGrpah in an SimpleGraph instance). I have
already written about that on the list [1] but at that time I was not
aware about the reason for that and also the follow up discussion
missed to come up with the reason for that.

I imagine that a lot of users do call MGraph#getGraph() without
realizing that this would clone all the data in the MGrpah.

Changing the SingleTdbDatasetTcProvider so that the union-grpah is
exposed as read-only MGrpah is really not a big deal. I am just
wondering if I am the only one that uses the Graph interface different
as the Javadoc says. In any case I would add a big WARNING to the
MGrpah#getGraph() method saying that calling this method will create a
copy (and not a read-only wrapper) of the MGraph.

best
Rupert

[1] http://mail-archives.apache.org/mod_mbox/incubator-clerezza-dev/201203.mbox/%3C1F7ADF98-D5F7-47F2-BE72-FC248B9219AB@gmail.com%3E

> Cheers,
> Reto

-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen