You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Aayush Yadav <aa...@gmail.com> on 2022/01/13 06:33:04 UTC

Handling Deleted nodes in TDB2 storage

Hi,

I just had a query regarding how JENA handles deleted nodes in TDB2? I read
in the documentation and even saw while implementing, that the nodes that
are deleted are not removed from storage until compact is run. So how are
these nodes handled exactly? By handled I mean, how does compact know which
nodes to delete from storage, and how does running select all skip these
triples?

Any insights on this or if anyone could point to which file in the Jena
code too look into, might help.

Thanks,
Aayush.

Re: Handling Deleted nodes in TDB2 storage

Posted by brain <br...@analyticservice.net>.
Hi Aayush,

Here are some code snippets maybe help.

org.apache.jena.tdb2.store.StorageTDB

/** Delete a tuple */
public void delete( Tuple<NodeId> t ) {
    if ( tupleLen != t.len() )
        throw new TDBException(format("Mismatch: deleting tuple of length %d from a table of tuples of length %d", t.len(), tupleLen));

    for ( TupleIndex index : indexes ) {
        if ( index == null )
            continue;
        index.delete( t );
    }
}

org.apache.jena.tdb2.store.tupletable.TupleTable


/** Delete a tuple */
public void delete( Tuple<NodeId> t ) {
    if ( tupleLen != t.len() )
        throw new TDBException(format("Mismatch: deleting tuple of length %d from a table of tuples of length %d", t.len(), tupleLen));

    for ( TupleIndex index : indexes ) {
        if ( index == null )
            continue;
        index.delete( t );
    }
}

org.apache.jena.tdb2.store.tupletable.TupleIndexRecord

@Override
protected void performDelete(Tuple<NodeId> tuple) {
    Record r = TupleLib.record(factory, tuple, tupleMap);
    index.delete(r);
}

org.apache.jena.dboe.trans.bplustree.BPlusTree

public boolean delete(Record record) {
    return this.deleteAndReturnOld(record) != null;
}

public Record deleteAndReturnOld(Record record) {
    this.startUpdateBlkMgr();
    BPTreeNode root = this.getRootWrite();
    Record r = BPTreeNode.delete(root, record);
    this.releaseRootWrite(root);
    this.finishUpdateBlkMgr();
    return r;
}





> On Jan 13, 2022, at 2:33 PM, Aayush Yadav <aa...@gmail.com> wrote:
> 
> Hi,
> 
> I just had a query regarding how JENA handles deleted nodes in TDB2? I read
> in the documentation and even saw while implementing, that the nodes that
> are deleted are not removed from storage until compact is run. So how are
> these nodes handled exactly? By handled I mean, how does compact know which
> nodes to delete from storage, and how does running select all skip these
> triples?
> 
> Any insights on this or if anyone could point to which file in the Jena
> code too look into, might help.
> 
> Thanks,
> Aayush.


Re: Handling Deleted nodes in TDB2 storage

Posted by Aayush Yadav <aa...@gmail.com>.
Hi,

Thanks Brain and Andy.
Got it.

Regards,
Aayush.

On Thu, Jan 13, 2022, 3:34 PM Andy Seaborne <an...@apache.org> wrote:

>
>
> On 13/01/2022 06:33, Aayush Yadav wrote:
> > Hi,
> >
> > I just had a query regarding how JENA handles deleted nodes in TDB2? I
> read
> > in the documentation and even saw while implementing, that the nodes that
> > are deleted are not removed from storage until compact is run. So how are
> > these nodes handled exactly? By handled I mean, how does compact know
> which
> > nodes to delete from storage, and how does running select all skip these
> > triples?
> >
> > Any insights on this or if anyone could point to which file in the Jena
> > code too look into, might help.
> >
> > Thanks,
> > Aayush.
> >
>
> Triples/quads become unreachable from the current roots of the indexes.
>
> The only nodes to keep are these accessible from triples that are
> reachable from the current roots.
>
> Compact performed by copying the current view of the database and
> (optionally) throwing away the old one.
>
> The copy is of the current state of the database. If a node isn't
> reached when copying, it isn't in the new node table. (Same for
> triples/quads.)
>
> There is no reference counting of nodes - too expensive and not simple
> because of transactions having different views of the database and may
> abort, not commit.
>
> In TDB2, there are subdirectories "Data-0001" etc  The highest number
> Data* sub-directory is the active one. The rest are no longer used - you
> can zip+moved them elsewhere as a record of the database at a point in
> time or just delete them.
>
>      Andy
>
>
>
>

Re: Handling Deleted nodes in TDB2 storage

Posted by Andy Seaborne <an...@apache.org>.

On 13/01/2022 06:33, Aayush Yadav wrote:
> Hi,
> 
> I just had a query regarding how JENA handles deleted nodes in TDB2? I read
> in the documentation and even saw while implementing, that the nodes that
> are deleted are not removed from storage until compact is run. So how are
> these nodes handled exactly? By handled I mean, how does compact know which
> nodes to delete from storage, and how does running select all skip these
> triples?
> 
> Any insights on this or if anyone could point to which file in the Jena
> code too look into, might help.
> 
> Thanks,
> Aayush.
> 

Triples/quads become unreachable from the current roots of the indexes.

The only nodes to keep are these accessible from triples that are 
reachable from the current roots.

Compact performed by copying the current view of the database and 
(optionally) throwing away the old one.

The copy is of the current state of the database. If a node isn't 
reached when copying, it isn't in the new node table. (Same for 
triples/quads.)

There is no reference counting of nodes - too expensive and not simple 
because of transactions having different views of the database and may 
abort, not commit.

In TDB2, there are subdirectories "Data-0001" etc  The highest number 
Data* sub-directory is the active one. The rest are no longer used - you 
can zip+moved them elsewhere as a record of the database at a point in 
time or just delete them.

     Andy