You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jena.apache.org by Andy Seaborne <an...@epimorphics.com> on 2014/06/06 11:13:39 UTC

Sketching for jena3

Just for discussion, here is a somewhat idealised form of Node:

https://svn.apache.org/repos/asf/jena/Experimental/jena3-sketch/

As before there is one "Node" for any RDF term + extras 9variables, 
graphs as nodes of a graph, "extension") because triple and quads are 
Node,Node,Node ... this layer does not reflect the current RDF 
restrictions of literals to objects or graph names.

Feel free to mess with the code, or put a different design along side, 
or sketch ideas for another area of Jena.  No sense of being "the design".

	Andy

And from a while ago:
http://mail-archives.apache.org/mod_mbox/jena-dev/201211.mbox/%3C50AE3EF1.2090005@apache.org%3E

Re: Sketching for jena3

Posted by Andy Seaborne <an...@apache.org>.

Claude,

The general idea of support for serialization makes a lot of sense, 
rdf-hadoop and DataBags.

The specifics of java serialization - not necessarily so. We might be 
forced into that for Java RMI if that's a goal but there are other RPC 
mechanisms including multi-language ones (Thrift).

When read/writing to storage, multi-language makes a lot of sense.

It might be better to have SNode implements java.io.Serializable that is 
simply wrapping a single Node then have that class provide all the 
writeObject/readObject.  This isolates RMI but with an extra indirection.

Serializable for Graph is a whole different discussion! Exchange may be 
structures like tuples.  At least with Graphs there are no cycle issues 
but graphs can be big.  So I guess for me it's understanding what sort 
of RMI operations are the design target.

Serialization does require stable bnode labels - hence using 2 longs for 
a global ID (the label is more for the convenience of assigning the 
label for small scale and debugging uses).

On 07/06/14 14:13, Claude Warren wrote:
> I would still like to see Node as a serializable object, or some standard
> mechanism to get a serialized version of the node.  Any thoughts along this
> path would be appreciated.
>
> I had thought about something along the lines of a type byte and raw data
> as a serialized form.  But that would mean that each type would have to
> "register" so we could keep them from stepping on each other.  This I
> realize is wholly unworkable.

I'm using protocol buffers [*] in Lizard.  I will be going a RDF.proto 
(unless I find one; it's not hard) as I have to transmit Nodes although 
at the moment, for expedience, I'm sending strings around.

Side effect - having protocol buffer encoding for a TDB node table.

> So I am back to thinking we should make the Node Serializable.
>
> Basically, I want to be able to serialize the node out so I can store it
> and deserialized it on demand, without having to worry about new and
> strange Node types.  This will make a remote client (using connections
> other than SPARQL, ala RMI) easier and will make the implementation of the
> Graph SPI easier for some types of storage (e.g. Hadoop).

We can concentrate on the core node types from RDF 1.1 + named variables 
(not NodeExt or NodeSymbol or NodeGraph).

(
PS I'm having trouble seeing why interfaces for Triple and Quad make any 
sense.  Thoughts?  I'm already wondering what debugging will feel like 
and whether doing it completely the other way round - one single über 
Node class for the usual suspects.
)

>
> Claude

[*] Why protocol buffers and not Thrift/Avro/a.n.other?

Protocol buffers are integrated into netty so I don't have to do that 
integration.  I'm using netty 5.0.0-alpha; netty+thrift is out of date.

So the choice is netty+PB vs thrift. The service layer in Thrift seems 
too RPC-ish - Lizard needs streams.  I don't know enough to decide for 
sure. netty documentation is better.  Switching between the two should 
be possible as the details of PB+M aren't exposed.  I'd like to do both.

> On Fri, Jun 6, 2014 at 10:13 AM, Andy Seaborne <
> andy.seaborne@epimorphics.com> wrote:
>
>> Just for discussion, here is a somewhat idealised form of Node:
>>
>> https://svn.apache.org/repos/asf/jena/Experimental/jena3-sketch/
>>
>> As before there is one "Node" for any RDF term + extras 9variables, graphs
>> as nodes of a graph, "extension") because triple and quads are
>> Node,Node,Node ... this layer does not reflect the current RDF restrictions
>> of literals to objects or graph names.
>>
>> Feel free to mess with the code, or put a different design along side, or
>> sketch ideas for another area of Jena.  No sense of being "the design".
>>
>>          Andy
>>
>> And from a while ago:
>> http://mail-archives.apache.org/mod_mbox/jena-dev/201211.
>> mbox/%3C50AE3EF1.2090005@apache.org%3E
>>
>
>
>

Re: Sketching for jena3

Posted by Claude Warren <cl...@xenei.com>.

I would still like to see Node as a serializable object, or some standard
mechanism to get a serialized version of the node.  Any thoughts along this
path would be appreciated.

I had thought about something along the lines of a type byte and raw data
as a serialized form.  But that would mean that each type would have to
"register" so we could keep them from stepping on each other.  This I
realize is wholly unworkable.

So I am back to thinking we should make the Node Serializable.

Basically, I want to be able to serialize the node out so I can store it
and deserialized it on demand, without having to worry about new and
strange Node types.  This will make a remote client (using connections
other than SPARQL, ala RMI) easier and will make the implementation of the
Graph SPI easier for some types of storage (e.g. Hadoop).

Claude

On Fri, Jun 6, 2014 at 10:13 AM, Andy Seaborne <
andy.seaborne@epimorphics.com> wrote:

> Just for discussion, here is a somewhat idealised form of Node:
>
> https://svn.apache.org/repos/asf/jena/Experimental/jena3-sketch/
>
> As before there is one "Node" for any RDF term + extras 9variables, graphs
> as nodes of a graph, "extension") because triple and quads are
> Node,Node,Node ... this layer does not reflect the current RDF restrictions
> of literals to objects or graph names.
>
> Feel free to mess with the code, or put a different design along side, or
> sketch ideas for another area of Jena.  No sense of being "the design".
>
>         Andy
>
> And from a while ago:
> http://mail-archives.apache.org/mod_mbox/jena-dev/201211.
> mbox/%3C50AE3EF1.2090005@apache.org%3E
>

-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren