You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Patrick Hoeffel <pa...@issinc.com> on 2015/08/27 17:11:19 UTC

Time Series Data Modeling in RDF

All,

I know this is not a new topic, so hopefully there is a reference to the de-facto standard answer on this (that I have not been able to find on my own so far).

When I have a Subject that I want to store additional information about, I can just add triples using the same Subject. Easy.

When I want to say things about a Statement (such as the date range within which the triple Statement is valid, or the strength or source of the relationship), the answer is more ambiguous. Reification is the standard answer, but it is also heavy. I've read about using Quads, Named Graphs, N-ary Relationships, etc.

What is the current state of the art or best practice in this regard?

Thanks very much,

Patrick Hoeffel
Software Engineer
Intelligent Software Solutions (www.issinc.com<http://www.issinc.com>)

Re: Time Series Data Modeling in RDF

Posted by Claude Warren <cl...@xenei.com>.

the Data Cube Vocabulary (http://www.w3.org/TR/vocab-data-cube/) indicates
that it can be used for time series data.

On Mon, Aug 31, 2015 at 2:45 PM, Bohms, H.M. (Michel) <mi...@tno.nl>
wrote:

> Very usefull, thx!!
>
> (also having many discussions on this in NL: should we use quads or is
> semantics too unclear yet? (ref.).  If reification how to keep all
> redundancy in sync, etc.)
>
> Ref http://www.w3.org/TR/rdf11-datasets/
>
>
>
> Dr. ir. H.M. (Michel) Bohms
> Sr. Research Scientist
> Structural Reliability
> T +31 (0)88 866 31 07
> M +31 (0)63 038 12 20
> E michel.bohms@tno.nl
>
> This message may contain information that is not intended for you. If you
> are not the addressee or if this message was sent to you by mistake, you
> are requested to inform the sender and delete the message. TNO accepts no
> liability for the content of this e-mail, for the manner in which you use
> it and for damage of any kind resulting from the risks inherent to the
> electronic transmission of messages.
>
> -----Original Message-----
> From: Daniel Hernández [mailto:daniel@degu.cl]
> Sent: donderdag 27 augustus 2015 18:59
> To: users@jena.apache.org
> Subject: Re: Time Series Data Modeling in RDF
>
> Hey Patrick, Maybe you could be interested in an experimental work that we
> do testing the performance of several of this reification approaches with
> Wikidata. Our work will be presented in the Workshop on Scalable Semantic
> Web Knowledge Bases Systems. You can read our results here:
>
> http://users.dcc.uchile.cl/~dhernand/research/ssws-2015-reifying.pdf
>
> Cheers,
> Daniel Hernández
>
> On Thu, 2015-08-27 at 15:11 +0000, Patrick Hoeffel wrote:
> > All,
> >
> > I know this is not a new topic, so hopefully there is a reference to
> > the de-facto standard answer on this (that I have not been able to
> > find on my own so far).
> >
> > When I have a Subject that I want to store additional information
> > about, I can just add triples using the same Subject. Easy.
> >
> > When I want to say things about a Statement (such as the date range
> > within which the triple Statement is valid, or the strength or source
> > of the relationship), the answer is more ambiguous. Reification is the
> > standard answer, but it is also heavy. I've read about using Quads,
> > Named Graphs, N-ary Relationships, etc.
> >
> > What is the current state of the art or best practice in this regard?
> >
> > Thanks very much,
> >
> > Patrick Hoeffel
> > Software Engineer
> > Intelligent Software Solutions (www.issinc.com<http://www.issinc.com>)
> >
>
>
>
>


-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

RE: Time Series Data Modeling in RDF

Posted by "Bohms, H.M. (Michel)" <mi...@tno.nl>.

Very usefull, thx!!

(also having many discussions on this in NL: should we use quads or is semantics too unclear yet? (ref.).  If reification how to keep all redundancy in sync, etc.)

Ref http://www.w3.org/TR/rdf11-datasets/

Dr. ir. H.M. (Michel) Bohms
Sr. Research Scientist
Structural Reliability
T +31 (0)88 866 31 07
M +31 (0)63 038 12 20
E michel.bohms@tno.nl

This message may contain information that is not intended for you. If you are not the addressee or if this message was sent to you by mistake, you are requested to inform the sender and delete the message. TNO accepts no liability for the content of this e-mail, for the manner in which you use it and for damage of any kind resulting from the risks inherent to the electronic transmission of messages.

-----Original Message-----
From: Daniel Hernández [mailto:daniel@degu.cl] 
Sent: donderdag 27 augustus 2015 18:59
To: users@jena.apache.org
Subject: Re: Time Series Data Modeling in RDF

Hey Patrick, Maybe you could be interested in an experimental work that we do testing the performance of several of this reification approaches with Wikidata. Our work will be presented in the Workshop on Scalable Semantic Web Knowledge Bases Systems. You can read our results here:

http://users.dcc.uchile.cl/~dhernand/research/ssws-2015-reifying.pdf

Cheers,
Daniel Hernández

On Thu, 2015-08-27 at 15:11 +0000, Patrick Hoeffel wrote:
> All,
> 
> I know this is not a new topic, so hopefully there is a reference to 
> the de-facto standard answer on this (that I have not been able to 
> find on my own so far).
> 
> When I have a Subject that I want to store additional information 
> about, I can just add triples using the same Subject. Easy.
> 
> When I want to say things about a Statement (such as the date range 
> within which the triple Statement is valid, or the strength or source 
> of the relationship), the answer is more ambiguous. Reification is the 
> standard answer, but it is also heavy. I've read about using Quads, 
> Named Graphs, N-ary Relationships, etc.
> 
> What is the current state of the art or best practice in this regard?
> 
> Thanks very much,
> 
> Patrick Hoeffel
> Software Engineer
> Intelligent Software Solutions (www.issinc.com<http://www.issinc.com>)
>

Re: Time Series Data Modeling in RDF

Posted by Daniel Hernández <da...@degu.cl>.

Hey Patrick, Maybe you could be interested in an experimental work that
we do testing the performance of several of this reification approaches
with Wikidata. Our work will be presented in the Workshop on Scalable
Semantic Web Knowledge Bases Systems. You can read our results here:

http://users.dcc.uchile.cl/~dhernand/research/ssws-2015-reifying.pdf

Cheers,
Daniel Hernández

On Thu, 2015-08-27 at 15:11 +0000, Patrick Hoeffel wrote:
> All,
> 
> I know this is not a new topic, so hopefully there is a
> reference to the de-facto standard answer on this (that
> I have not been able to find on my own so far).
> 
> When I have a Subject that I want to store additional
> information about, I can just add triples using the same
> Subject. Easy.
> 
> When I want to say things about a Statement (such as the
> date range within which the triple Statement is valid, or
> the strength or source of the relationship), the answer is
> more ambiguous. Reification is the standard answer, but it
> is also heavy. I've read about using Quads, Named Graphs,
> N-ary Relationships, etc.
> 
> What is the current state of the art or best practice in this regard?
> 
> Thanks very much,
> 
> Patrick Hoeffel
> Software Engineer
> Intelligent Software Solutions (www.issinc.com<http://www.issinc.com>)
>

RE: Time Series Data Modeling in RDF

Posted by "Bohms, H.M. (Michel)" <mi...@tno.nl>.

Right, on subject is easy (as indicated by Patrick)
But....Question was more on statement as a whole indeed involving approaches like reification, quads etc. with quite some pros and cons...

Gr Michel

Dr. ir. H.M. (Michel) Bohms
Sr. Research Scientist
Structural Reliability
T +31 (0)88 866 31 07
M +31 (0)63 038 12 20
E michel.bohms@tno.nl

This message may contain information that is not intended for you. If you are not the addressee or if this message was sent to you by mistake, you are requested to inform the sender and delete the message. TNO accepts no liability for the content of this e-mail, for the manner in which you use it and for damage of any kind resulting from the risks inherent to the electronic transmission of messages.

-----Original Message-----
From: David Moss [mailto:admoss0@gmail.com] 
Sent: donderdag 27 augustus 2015 23:16
To: users@jena.apache.org
Subject: Re: Time Series Data Modeling in RDF

Why is it necessary to store the additional information in the same triple as the data?
RDF lets you store as much information you like about a subject. Just use more triples.

<thing> a <datapoint>
<thing> <experiment> <experiment20936>
<thing> <datapoint> 123
<thing> <timestamp> 1440709931
<thing> <moonphase> <lastquarter>
<thing> <observer> <DavidMoss>
Š

On 28/08/2015 1:11 am, "Patrick Hoeffel" <pa...@issinc.com>
wrote:

>All,
>
>I know this is not a new topic, so hopefully there is a reference to 
>the de-facto standard answer on this (that I have not been able to find 
>on my own so far).
>
>When I have a Subject that I want to store additional information 
>about, I can just add triples using the same Subject. Easy.
>
>When I want to say things about a Statement (such as the date range 
>within which the triple Statement is valid, or the strength or source 
>of the relationship), the answer is more ambiguous. Reification is the 
>standard answer, but it is also heavy. I've read about using Quads, 
>Named Graphs, N-ary Relationships, etc.
>
>What is the current state of the art or best practice in this regard?
>
>Thanks very much,
>
>Patrick Hoeffel
>Software Engineer
>Intelligent Software Solutions (www.issinc.com<http://www.issinc.com>)
>

Re: Time Series Data Modeling in RDF

Posted by David Moss <ad...@gmail.com>.

Why is it necessary to store the additional information in the same triple
as the data?
RDF lets you store as much information you like about a subject. Just use
more triples.

<thing> a <datapoint>
<thing> <experiment> <experiment20936>
<thing> <datapoint> 123
<thing> <timestamp> 1440709931
<thing> <moonphase> <lastquarter>
<thing> <observer> <DavidMoss>




On 28/08/2015 1:11 am, "Patrick Hoeffel" <pa...@issinc.com>
wrote:

>All,
>
>I know this is not a new topic, so hopefully there is a reference to the
>de-facto standard answer on this (that I have not been able to find on my
>own so far).
>
>When I have a Subject that I want to store additional information about,
>I can just add triples using the same Subject. Easy.
>
>When I want to say things about a Statement (such as the date range
>within which the triple Statement is valid, or the strength or source of
>the relationship), the answer is more ambiguous. Reification is the
>standard answer, but it is also heavy. I've read about using Quads, Named
>Graphs, N-ary Relationships, etc.
>
>What is the current state of the art or best practice in this regard?
>
>Thanks very much,
>
>Patrick Hoeffel
>Software Engineer
>Intelligent Software Solutions (www.issinc.com<http://www.issinc.com>)
>

Re: Time Series Data Modeling in RDF

Posted by Dave Reynolds <da...@gmail.com>.

On 27/08/15 16:11, Patrick Hoeffel wrote:
> All,
>
> I know this is not a new topic, so hopefully there is a reference to the de-facto standard answer on this (that I have not been able to find on my own so far).
>
> When I have a Subject that I want to store additional information about, I can just add triples using the same Subject. Easy.
>
> When I want to say things about a Statement (such as the date range within which the triple Statement is valid, or the strength or source of the relationship), the answer is more ambiguous. Reification is the standard answer, but it is also heavy. I've read about using Quads, Named Graphs, N-ary Relationships, etc.
>
> What is the current state of the art or best practice in this regard?

No one size fits all. In particular it depends on whether you really 
want such qualifiers to be applicable to any triple level statement or 
only to particular cases, and whether your data follows particular patterns.

My personal heuristics are something like ...

If you have some graph of relationships where you might want to qualify 
relationships by strength or validity but where the entities have other 
unqualified attributes then I would go with an n-ary relation type 
approach [1].

If your data is more regular and can be thought of as similar to 
measurements, or observations or derivations therefrom then I would go 
with RDF Data Cube [2] which allows you include time ranges as one of 
the dimensions and strength as a qualifying attribute.

If your data comes in batches with different time validity for the 
batches then I'd consider named graphs (aka quads, same difference).

If your data is fundamentally arbitrary RDF statements and anything in 
there might be qualified and annotated with different values then I'd be 
inclined to use reification.

Dave

[1] http://www.w3.org/TR/swbp-n-aryRelations/

[2] http://www.w3.org/TR/vocab-data-cube/