You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Mikhail Sogrin <so...@gmail.com> on 2011/09/12 19:13:39 UTC
TDB initial size with mapped files
Hi,
With memory mapped TDB storage (default with 64-bit JVM), the initial size
of TDB store without any data at all is 200 MB, because most of index files
are 8 MB, and there's quite a number of them.
It may be a good number when loading big data sets, but is absolutely huge
if an user expects to load only a bit of data.
In comparison, direct file method (with 32-bit JVM) makes only 8 KB index
files resulting in only 200 KB usage for an empty database.
Is there a way to configure initial size of index files?
The only method I could think of was to set 'direct' method, create dataset,
close it, set method to 'mapped' and open dataset again. But it prints a
warning "System file mode already determined - setting it has no effect",
and yes, the second setting does not seem to have any effect.
Kind regards,
Mikhail
Re: TDB initial size with mapped files
Posted by Andy Seaborne <an...@epimorphics.com>.
On 15/09/11 15:03, Mikhail Sogrin wrote:
> Hi Andy,
>
> While the files show as 200MB that are sparse files. Linux will show 8M
>> files with "ls -l" but the directory, to "du -sh" is 208K. Sparse files
>> don't allocate all their space. OS/X seems to be difefrent - "du -sh"
>> reports the sum of the file sizes, but they are still sparse files and don't
>> consume all their disk space.
>>
> Good to know that they are sparse files and not taking much space on disk.
> But they still require space when backing up, zipping, copying to other
> machines, etc.
>
>
>> In theory, the index segment size is configurable (see
>> SystemTDB.SegmentSize) but it isn't tested for in the test suite.
>>
> That would require recompilation, as the parameter is final and cannot be
> changed with original jars.
Yes - it needs a rebuild. It must not change during a run - complete
and utter chaos will result!
Different runs over the same data will work -- the Block size is the
unit of access, and segments is the mapping grouping only. This is way
direct mode and mapped mode can use the same database (at different times).
Andy
>
> Kind regards,
> Mikhail
>
Re: TDB initial size with mapped files
Posted by Mikhail Sogrin <so...@gmail.com>.
Hi Andy,
While the files show as 200MB that are sparse files. Linux will show 8M
> files with "ls -l" but the directory, to "du -sh" is 208K. Sparse files
> don't allocate all their space. OS/X seems to be difefrent - "du -sh"
> reports the sum of the file sizes, but they are still sparse files and don't
> consume all their disk space.
>
Good to know that they are sparse files and not taking much space on disk.
But they still require space when backing up, zipping, copying to other
machines, etc.
> In theory, the index segment size is configurable (see
> SystemTDB.SegmentSize) but it isn't tested for in the test suite.
>
That would require recompilation, as the parameter is final and cannot be
changed with original jars.
Kind regards,
Mikhail
Re: TDB initial size with mapped files
Posted by Andy Seaborne <an...@epimorphics.com>.
Hi Mikhail,
Your right - the use case of many smaller datasets isn't best served by
memory mapped mode.
The mode of operation currently has to be set very early on a per JVM
basis and ideally to the JVMitself -Dtdb:fileMode=direct . This is
because TDB reads the setting rather early - there is no fundamental
reason for this and it could be done on a per dataset basis, it just isn't.
While the files show as 200MB that are sparse files. Linux will show 8M
files with "ls -l" but the directory, to "du -sh" is 208K. Sparse files
don't allocate all their space. OS/X seems to be difefrent - "du -sh"
reports the sum of the file sizes, but they are still sparse files and
don't consume all their disk space.
In theory, the index segment size is configurable (see
SystemTDB.SegmentSize) but it isn't tested for in the test suite.
Andy
On 12/09/11 18:13, Mikhail Sogrin wrote:
> Hi,
>
> With memory mapped TDB storage (default with 64-bit JVM), the initial size
> of TDB store without any data at all is 200 MB, because most of index files
> are 8 MB, and there's quite a number of them.
> It may be a good number when loading big data sets, but is absolutely huge
> if an user expects to load only a bit of data.
>
> In comparison, direct file method (with 32-bit JVM) makes only 8 KB index
> files resulting in only 200 KB usage for an empty database.
>
> Is there a way to configure initial size of index files?
> The only method I could think of was to set 'direct' method, create dataset,
> close it, set method to 'mapped' and open dataset again. But it prints a
> warning "System file mode already determined - setting it has no effect",
> and yes, the second setting does not seem to have any effect.
>
> Kind regards,
> Mikhail
>