You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Hugh Cayless <ph...@gmail.com> on 2014/09/05 13:58:38 UTC

DROP ALL behavior

Hello all,

I've used Jena-Fuseki previously, but when I needed to reload all my data
(I'm using TDB), I've generally erased the contents of my data directory
and recreated it because it's faster than dropping the graph. I'm noticing
now though that if I issue a SPARQL DROP ALL update, the graph does indeed
get dropped, but if I check the size of my data directory, it's the same as
it was. When my data gets added back, the data directory gets that much
larger, eventually causing me to run out of free space on the volume.

Is there some sort of vacuum procedure I need to run to clear the stale
data? Or a reset command that will restore the contents of the data
directory to its default, empty state? It would be nice to be able to do
this without stopping Fuseki, as it will be serving other databases besides
the one I'm currently messing with.

Thanks,
Hugh

Re: DROP ALL behavior

Posted by Hugh Cayless <ph...@gmail.com>.
Thanks for the response, Andy.  My RDF doesn't contain any BNodes, so I
think there must be a problem with reusing NodeIds.

Fortunately I'm not using Windows, so the database deletion option will be
welcome!

Thanks for the info!

Hugh


On Fri, Sep 5, 2014 at 5:13 PM, Andy Seaborne <an...@apache.org> wrote:

> On 05/09/14 12:58, Hugh Cayless wrote:
>
>> Hello all,
>>
>> I've used Jena-Fuseki previously, but when I needed to reload all my data
>> (I'm using TDB), I've generally erased the contents of my data directory
>> and recreated it because it's faster than dropping the graph. I'm noticing
>> now though that if I issue a SPARQL DROP ALL update, the graph does indeed
>> get dropped, but if I check the size of my data directory, it's the same
>> as
>> it was. When my data gets added back, the data directory gets that much
>> larger, eventually causing me to run out of free space on the volume.
>>
>> Is there some sort of vacuum procedure I need to run to clear the stale
>> data? Or a reset command that will restore the contents of the data
>> directory to its default, empty state? It would be nice to be able to do
>> this without stopping Fuseki, as it will be serving other databases
>> besides
>> the one I'm currently messing with.
>>
>> Thanks,
>> Hugh
>>
>>
> Hugh,
>
> Space is not recycled back to the OS so files do not get smaller.  Space
> is partially reused but it could be better.
>
> The node table is not cleared up - NodeIds are reused should RDF data be
> added again with the same URIs or literals. BNodes will likely be fresh
> ones so they do waste space in the node tables.  The cost of reference
> counting node usage would be very high.
>
> In indexes, space should be reused but isn't as well as it should be and
> its only reused within the same JVM run.  Restart looses the chance to
> reuse the space.
>
> I'm afraid the only reset is to stop the server and delete the files.
>
> Fuseki2 will add the option of deleting a database.  However, on MS
> Windows, the well-know java bug that memory mapped files can't be deleted
> until the the JVM exists blocks even this.
>
>         Andy
>
>

Re: DROP ALL behavior

Posted by Andy Seaborne <an...@apache.org>.
On 05/09/14 12:58, Hugh Cayless wrote:
> Hello all,
>
> I've used Jena-Fuseki previously, but when I needed to reload all my data
> (I'm using TDB), I've generally erased the contents of my data directory
> and recreated it because it's faster than dropping the graph. I'm noticing
> now though that if I issue a SPARQL DROP ALL update, the graph does indeed
> get dropped, but if I check the size of my data directory, it's the same as
> it was. When my data gets added back, the data directory gets that much
> larger, eventually causing me to run out of free space on the volume.
>
> Is there some sort of vacuum procedure I need to run to clear the stale
> data? Or a reset command that will restore the contents of the data
> directory to its default, empty state? It would be nice to be able to do
> this without stopping Fuseki, as it will be serving other databases besides
> the one I'm currently messing with.
>
> Thanks,
> Hugh
>

Hugh,

Space is not recycled back to the OS so files do not get smaller.  Space 
is partially reused but it could be better.

The node table is not cleared up - NodeIds are reused should RDF data be 
added again with the same URIs or literals. BNodes will likely be fresh 
ones so they do waste space in the node tables.  The cost of reference 
counting node usage would be very high.

In indexes, space should be reused but isn't as well as it should be and 
its only reused within the same JVM run.  Restart looses the chance to 
reuse the space.

I'm afraid the only reset is to stop the server and delete the files.

Fuseki2 will add the option of deleting a database.  However, on MS 
Windows, the well-know java bug that memory mapped files can't be 
deleted until the the JVM exists blocks even this.

	Andy