You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Aayush Yadav <aa...@gmail.com> on 2022/01/13 06:33:04 UTC
Handling Deleted nodes in TDB2 storage
Hi,
I just had a query regarding how JENA handles deleted nodes in TDB2? I read
in the documentation and even saw while implementing, that the nodes that
are deleted are not removed from storage until compact is run. So how are
these nodes handled exactly? By handled I mean, how does compact know which
nodes to delete from storage, and how does running select all skip these
triples?
Any insights on this or if anyone could point to which file in the Jena
code too look into, might help.
Thanks,
Aayush.
Re: Handling Deleted nodes in TDB2 storage
Posted by brain <br...@analyticservice.net>.
Hi Aayush,
Here are some code snippets maybe help.
org.apache.jena.tdb2.store.StorageTDB
/** Delete a tuple */
public void delete( Tuple<NodeId> t ) {
if ( tupleLen != t.len() )
throw new TDBException(format("Mismatch: deleting tuple of length %d from a table of tuples of length %d", t.len(), tupleLen));
for ( TupleIndex index : indexes ) {
if ( index == null )
continue;
index.delete( t );
}
}
org.apache.jena.tdb2.store.tupletable.TupleTable
/** Delete a tuple */
public void delete( Tuple<NodeId> t ) {
if ( tupleLen != t.len() )
throw new TDBException(format("Mismatch: deleting tuple of length %d from a table of tuples of length %d", t.len(), tupleLen));
for ( TupleIndex index : indexes ) {
if ( index == null )
continue;
index.delete( t );
}
}
org.apache.jena.tdb2.store.tupletable.TupleIndexRecord
@Override
protected void performDelete(Tuple<NodeId> tuple) {
Record r = TupleLib.record(factory, tuple, tupleMap);
index.delete(r);
}
org.apache.jena.dboe.trans.bplustree.BPlusTree
public boolean delete(Record record) {
return this.deleteAndReturnOld(record) != null;
}
public Record deleteAndReturnOld(Record record) {
this.startUpdateBlkMgr();
BPTreeNode root = this.getRootWrite();
Record r = BPTreeNode.delete(root, record);
this.releaseRootWrite(root);
this.finishUpdateBlkMgr();
return r;
}
> On Jan 13, 2022, at 2:33 PM, Aayush Yadav <aa...@gmail.com> wrote:
>
> Hi,
>
> I just had a query regarding how JENA handles deleted nodes in TDB2? I read
> in the documentation and even saw while implementing, that the nodes that
> are deleted are not removed from storage until compact is run. So how are
> these nodes handled exactly? By handled I mean, how does compact know which
> nodes to delete from storage, and how does running select all skip these
> triples?
>
> Any insights on this or if anyone could point to which file in the Jena
> code too look into, might help.
>
> Thanks,
> Aayush.
Re: Handling Deleted nodes in TDB2 storage
Posted by Aayush Yadav <aa...@gmail.com>.
Hi,
Thanks Brain and Andy.
Got it.
Regards,
Aayush.
On Thu, Jan 13, 2022, 3:34 PM Andy Seaborne <an...@apache.org> wrote:
>
>
> On 13/01/2022 06:33, Aayush Yadav wrote:
> > Hi,
> >
> > I just had a query regarding how JENA handles deleted nodes in TDB2? I
> read
> > in the documentation and even saw while implementing, that the nodes that
> > are deleted are not removed from storage until compact is run. So how are
> > these nodes handled exactly? By handled I mean, how does compact know
> which
> > nodes to delete from storage, and how does running select all skip these
> > triples?
> >
> > Any insights on this or if anyone could point to which file in the Jena
> > code too look into, might help.
> >
> > Thanks,
> > Aayush.
> >
>
> Triples/quads become unreachable from the current roots of the indexes.
>
> The only nodes to keep are these accessible from triples that are
> reachable from the current roots.
>
> Compact performed by copying the current view of the database and
> (optionally) throwing away the old one.
>
> The copy is of the current state of the database. If a node isn't
> reached when copying, it isn't in the new node table. (Same for
> triples/quads.)
>
> There is no reference counting of nodes - too expensive and not simple
> because of transactions having different views of the database and may
> abort, not commit.
>
> In TDB2, there are subdirectories "Data-0001" etc The highest number
> Data* sub-directory is the active one. The rest are no longer used - you
> can zip+moved them elsewhere as a record of the database at a point in
> time or just delete them.
>
> Andy
>
>
>
>
Re: Handling Deleted nodes in TDB2 storage
Posted by Andy Seaborne <an...@apache.org>.
On 13/01/2022 06:33, Aayush Yadav wrote:
> Hi,
>
> I just had a query regarding how JENA handles deleted nodes in TDB2? I read
> in the documentation and even saw while implementing, that the nodes that
> are deleted are not removed from storage until compact is run. So how are
> these nodes handled exactly? By handled I mean, how does compact know which
> nodes to delete from storage, and how does running select all skip these
> triples?
>
> Any insights on this or if anyone could point to which file in the Jena
> code too look into, might help.
>
> Thanks,
> Aayush.
>
Triples/quads become unreachable from the current roots of the indexes.
The only nodes to keep are these accessible from triples that are
reachable from the current roots.
Compact performed by copying the current view of the database and
(optionally) throwing away the old one.
The copy is of the current state of the database. If a node isn't
reached when copying, it isn't in the new node table. (Same for
triples/quads.)
There is no reference counting of nodes - too expensive and not simple
because of transactions having different views of the database and may
abort, not commit.
In TDB2, there are subdirectories "Data-0001" etc The highest number
Data* sub-directory is the active one. The rest are no longer used - you
can zip+moved them elsewhere as a record of the database at a point in
time or just delete them.
Andy