You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Laurent Pellegrino <la...@gmail.com> on 2011/10/25 21:38:35 UTC

Last N quadruples inserted into a TDB repository

Hi all,

Is it possible to retrieve the last N quadruples which have been
inserted into a Jena TDB datastore when the quadruples which have been
inserted do not contain any information about their publication time
(e.g. a jena built-in function to use in order to order by using the
internal identifiers used by the repository)?

Kind Regards,
Laurent

Re: Last N quadruples inserted into a TDB repository

Posted by Stephen Allen <sa...@apache.org>.
Parliament could theoretically provide this information, as it maintains
statements in a monotonically increasing statement table (it doesn't as of
yet reuse the space freed when deleting statements).  Practically, however,
it would probably require a good amount of work to expose it.

Also, although I don't know that any implementations exist, a store with
triple/quad level MVCC could provide revision history (including the
statements added in the last transaction).

-Stephen


On Fri, Oct 28, 2011 at 8:27 AM, Laurent Pellegrino <
laurent.pellegrino@gmail.com> wrote:

> Thanks to all and especially Andy for pointing me to this paper :)
>
> Laurent
>
> On Fri, Oct 28, 2011 at 1:31 PM, Andy Seaborne <an...@apache.org> wrote:
> > On 26/10/11 08:39, Dave Reynolds wrote:
> >>
> >> On Tue, 2011-10-25 at 21:38 +0200, Laurent Pellegrino wrote:
> >>>
> >>> Hi all,
> >>>
> >>> Is it possible to retrieve the last N quadruples which have been
> >>> inserted into a Jena TDB datastore when the quadruples which have been
> >>> inserted do not contain any information about their publication time
> >>> (e.g. a jena built-in function to use in order to order by using the
> >>> internal identifiers used by the repository)?
> >>
> >> I don't believe there is such an insert timestamp available.
> >
> > There is no such timestamp.
> >
> > There was a paper at ISWC2011 that suggested using timestamps on
> something
> > (e.g. B+Tree blocks) to guide caching. [1]
> >
> > But as Greg (The presenter) said, "it cuts through every layer of
> > abstraction in a system to provide that information." and various
> > implementers in the audience smirked.
> >
> > So a system derived from TDB to add it maybe done by someone who s'
> > interested.  Also, for most users, it would be a noticable cost.
> >
> >        Andy
> >
> >
> > [1]
> > Enabling fine-grained HTTP caching of SPARQL query results
> > Gregory Todd Williams and Jesse Weave
> >
> >
> http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Research_Paper/03/70310752.pdf
> >
> >>
> >> Dave
> >>
> >>
> >
> >
>

Re: Last N quadruples inserted into a TDB repository

Posted by Laurent Pellegrino <la...@gmail.com>.
Thanks to all and especially Andy for pointing me to this paper :)

Laurent

On Fri, Oct 28, 2011 at 1:31 PM, Andy Seaborne <an...@apache.org> wrote:
> On 26/10/11 08:39, Dave Reynolds wrote:
>>
>> On Tue, 2011-10-25 at 21:38 +0200, Laurent Pellegrino wrote:
>>>
>>> Hi all,
>>>
>>> Is it possible to retrieve the last N quadruples which have been
>>> inserted into a Jena TDB datastore when the quadruples which have been
>>> inserted do not contain any information about their publication time
>>> (e.g. a jena built-in function to use in order to order by using the
>>> internal identifiers used by the repository)?
>>
>> I don't believe there is such an insert timestamp available.
>
> There is no such timestamp.
>
> There was a paper at ISWC2011 that suggested using timestamps on something
> (e.g. B+Tree blocks) to guide caching. [1]
>
> But as Greg (The presenter) said, "it cuts through every layer of
> abstraction in a system to provide that information." and various
> implementers in the audience smirked.
>
> So a system derived from TDB to add it maybe done by someone who s'
> interested.  Also, for most users, it would be a noticable cost.
>
>        Andy
>
>
> [1]
> Enabling fine-grained HTTP caching of SPARQL query results
> Gregory Todd Williams and Jesse Weave
>
> http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Research_Paper/03/70310752.pdf
>
>>
>> Dave
>>
>>
>
>

Re: Last N quadruples inserted into a TDB repository

Posted by Andy Seaborne <an...@apache.org>.
On 26/10/11 08:39, Dave Reynolds wrote:
> On Tue, 2011-10-25 at 21:38 +0200, Laurent Pellegrino wrote:
>> Hi all,
>>
>> Is it possible to retrieve the last N quadruples which have been
>> inserted into a Jena TDB datastore when the quadruples which have been
>> inserted do not contain any information about their publication time
>> (e.g. a jena built-in function to use in order to order by using the
>> internal identifiers used by the repository)?
>
> I don't believe there is such an insert timestamp available.

There is no such timestamp.

There was a paper at ISWC2011 that suggested using timestamps on 
something (e.g. B+Tree blocks) to guide caching. [1]

But as Greg (The presenter) said, "it cuts through every layer of 
abstraction in a system to provide that information." and various 
implementers in the audience smirked.

So a system derived from TDB to add it maybe done by someone who s' 
interested.  Also, for most users, it would be a noticable cost.

	Andy


[1]
Enabling fine-grained HTTP caching of SPARQL query results
Gregory Todd Williams and Jesse Weave

http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Research_Paper/03/70310752.pdf

>
> Dave
>
>


Re: Last N quadruples inserted into a TDB repository

Posted by Dave Reynolds <da...@gmail.com>.
On Tue, 2011-10-25 at 21:38 +0200, Laurent Pellegrino wrote: 
> Hi all,
> 
> Is it possible to retrieve the last N quadruples which have been
> inserted into a Jena TDB datastore when the quadruples which have been
> inserted do not contain any information about their publication time
> (e.g. a jena built-in function to use in order to order by using the
> internal identifiers used by the repository)?

I don't believe there is such an insert timestamp available.

Dave



Re: Last N quadruples inserted into a TDB repository

Posted by Paolo Castagna <ca...@googlemail.com>.
Laurent Pellegrino wrote:
> Hi all,
> 
> Is it possible to retrieve the last N quadruples which have been
> inserted into a Jena TDB datastore when the quadruples which have been
> inserted do not contain any information about their publication time
> (e.g. a jena built-in function to use in order to order by using the
> internal identifiers used by the repository)?

Hi Laurent,
just my humble opinions, you have been warned!

I found myself in the past in a situation where I wanted to know who
stated what and when. But there isn't a solution for that out-of-the-box,
not a the granularity of a single triple|quad. Some people use named
graphs but it is at a different level of granularity and I do not think
named graphs of just one triple are a good idea. ;-)

If you have a control point, external to your TDB, which see all your
updates, you can save them (in a key-value store, for example) and
timestamp them.

Your idea of using NodeIds to identify latest triples/quads added is
only going to work if you already do not have the RDF nodes in your
TDB store. For URIs or literals which are already in the store the
same NodeId will be used (it's like a dictionary).

The next release of TDB will also have a Journal and support for
transactions, that could be an interesting extension point for those
who need to intercept or timestamp changes.

Having said all this, TDB is open source and although not a trivial
change, you could try to explore what is involved in having an extra
column to the indexes. I would expect a general and broad decrease in
performances and doing it externally seems a much better (and easier)
approach to me.

I am interested in finding the best way to allow other software to be
notified of changes to a TDB dataset. Finding last quads added to a
TDB dataset and when the last update was is a use case for this.
Another use case is keeping an external index up-to-date with the
TDB indexes.

Cheers,
Paolo

> 
> Kind Regards,
> Laurent