You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by "Jeffrey C. Witt" <je...@gmail.com> on 2021/12/14 12:14:31 UTC

Question about disk space usage when updating from fuseki 2 to 4

Dear List,

I've been way behind on versions (happily using fuseki 2 for a long time).
But with log4j, I'm trying to switch over to the most current version of
Fuseki (4.3.1).

Initially, I wanted to just use the TDB build that I had ingested using
fuseki 2 and load that using fuseki 4. But that didn't seem to work.

So I decided to re-ingest everything into a new TDB build using fuseki
4.3.1. This seemed to work better. However I am noticing a dramatic
difference in space usage.

The size of my old TDB build (created using fuseki 2) was 51 Gigs.
The size of the new TDB build (created using fuseki 4)  was *152 Gigs.*
There are about 30,892,337 in each build

Does this large jump in disk size make sense as part of the move from fusek
2 to fuseki 4.3.1? For example: Is 4.3.1. creating new indices or other
kinds of ancillary data that could account for the large jump in disk space
used?

On the surface it doesn't make sense to me and it suggests I have a problem
elsewhere, but before I start over and look for other problems I wanted to
make sure this jump in disk space usage wasn't something I should expect
from moving to fuseki 4.3.1 from fuseki 2.

Here are some break-downs of individual file sizes in the two builds

In the previous (50g build from fuseki 2)

792M Aug  3 18:51 node2id.dat
112M Aug  3 17:20 node2id.idn
724M Aug  3 18:51 nodes.dat
0B Aug  3 18:51 nodes.dat-jrnl

In the new (150g build from fusek 4)

16B Dec 14 06:07 nodes-data.bdf
650M Dec 14 04:19 nodes-data.obj
24B Dec 14 06:07 nodes.bpt
69G Dec 14 04:19 nodes.dat
10G Dec 14 04:19 nodes.idn

The additional 79G in nodes.dat and nodes.idn could almost entirely
explain my near 100G increase

Another large increase is in the POSG file:

In the 50gig build this was:

-rw-r--r--   1 jcwitt  staff   6.5G Aug  3 18:52 POSG.dat
-rw-r--r--   1 jcwitt  staff   456M Aug  3 18:51 POSG.idn

and in the new build this was:

-rw-r--r--   1 jcwitt  staff    24B Dec 14 06:07 POSG.bpt
-rw-r--r--   1 jcwitt  staff    23G Dec 14 04:19 POSG.dat
-rw-r--r--   1 jcwitt  staff    13G Dec 14 04:19 POSG.idn

and near 33gig increase.

Combined with the above 79gig increase in the nodes file, this would seem
to explain it.

Any ideas why this is happening?

Many thanks for any generous assistance you can provide.