You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Mikhail Sogrin <so...@gmail.com> on 2011/09/12 19:13:39 UTC

TDB initial size with mapped files

Hi,

With memory mapped TDB storage (default with 64-bit JVM), the initial size
of TDB store without any data at all is 200 MB, because most of index files
are 8 MB, and there's quite a number of them.
It may be a good number when loading big data sets, but is absolutely huge
if an user expects to load only a bit of data.

In comparison, direct file method (with 32-bit JVM) makes only 8 KB index
files resulting in only 200 KB usage for an empty database.

Is there a way to configure initial size of index files?
The only method I could think of was to set 'direct' method, create dataset,
close it, set method to 'mapped' and open dataset again. But it prints a
warning "System file mode already determined - setting it has no effect",
and yes, the second setting does not seem to have any effect.

Kind regards,
Mikhail

Re: TDB initial size with mapped files

Posted by Andy Seaborne <an...@epimorphics.com>.

On 15/09/11 15:03, Mikhail Sogrin wrote:
> Hi Andy,
>
> While the files show as 200MB that are sparse files.  Linux will show 8M
>> files with "ls -l" but the directory, to "du -sh" is 208K.  Sparse files
>> don't allocate all their space.  OS/X seems to be difefrent - "du -sh"
>> reports the sum of the file sizes, but they are still sparse files and don't
>> consume all their disk space.
>>
> Good to know that they are sparse files and not taking much space on disk.
> But they still require space when backing up, zipping, copying to other
> machines, etc.
>
>
>> In theory, the index segment size is configurable (see
>> SystemTDB.SegmentSize) but it isn't tested for in the test suite.
>>
>   That would require recompilation, as the parameter is final and cannot be
> changed with original jars.

Yes - it needs a rebuild.  It must not change during a run - complete 
and utter chaos will result!

Different runs over the same data will work -- the Block size is the 
unit of access, and segments is the mapping grouping only.  This is way 
direct mode and mapped mode can use the same database (at different times).

	Andy

>
> Kind regards,
> Mikhail
>

Re: TDB initial size with mapped files

Posted by Mikhail Sogrin <so...@gmail.com>.
Hi Andy,

While the files show as 200MB that are sparse files.  Linux will show 8M
> files with "ls -l" but the directory, to "du -sh" is 208K.  Sparse files
> don't allocate all their space.  OS/X seems to be difefrent - "du -sh"
> reports the sum of the file sizes, but they are still sparse files and don't
> consume all their disk space.
>
Good to know that they are sparse files and not taking much space on disk.
But they still require space when backing up, zipping, copying to other
machines, etc.


> In theory, the index segment size is configurable (see
> SystemTDB.SegmentSize) but it isn't tested for in the test suite.
>
 That would require recompilation, as the parameter is final and cannot be
changed with original jars.

Kind regards,
Mikhail

Re: TDB initial size with mapped files

Posted by Andy Seaborne <an...@epimorphics.com>.
Hi Mikhail,

Your right - the use case of many smaller datasets isn't best served by 
memory mapped mode.

The mode of operation currently has to be set very early on a per JVM 
basis and ideally to the JVMitself -Dtdb:fileMode=direct .  This is 
because TDB reads the setting rather early - there is no fundamental 
reason for this and it could be done on a per dataset basis, it just isn't.

While the files show as 200MB that are sparse files.  Linux will show 8M 
files with "ls -l" but the directory, to "du -sh" is 208K.  Sparse files 
don't allocate all their space.  OS/X seems to be difefrent - "du -sh" 
reports the sum of the file sizes, but they are still sparse files and 
don't consume all their disk space.

In theory, the index segment size is configurable (see 
SystemTDB.SegmentSize) but it isn't tested for in the test suite.

	Andy

On 12/09/11 18:13, Mikhail Sogrin wrote:
> Hi,
>
> With memory mapped TDB storage (default with 64-bit JVM), the initial size
> of TDB store without any data at all is 200 MB, because most of index files
> are 8 MB, and there's quite a number of them.
> It may be a good number when loading big data sets, but is absolutely huge
> if an user expects to load only a bit of data.
>
> In comparison, direct file method (with 32-bit JVM) makes only 8 KB index
> files resulting in only 200 KB usage for an empty database.
>
> Is there a way to configure initial size of index files?
> The only method I could think of was to set 'direct' method, create dataset,
> close it, set method to 'mapped' and open dataset again. But it prints a
> warning "System file mode already determined - setting it has no effect",
> and yes, the second setting does not seem to have any effect.
>
> Kind regards,
> Mikhail
>