You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by ja...@kolumbus.fi on 2021/04/08 15:03:43 UTC

Fwd: Re: updating persistent jena-fuseki dataset increases memory consumption in gigas

Hello,

Still one question regarding this old issue. The previous answer said:

> The heap size by default is quite small in the scripts. It might be an
> idea to increase it a bit to give query working space but 0.5 million
> is really not very big.

What would be the suitable heap size in my case ?
(And then very stupid additional question: If I'm running JVM jand 
jena-fuseki in the same docker container, there's a risk that JVM would 
take all free memory, thus I've set the JVM heap size to 2 G using 
JVM_ARGS=-Xmx2g. So, which variables should I use to set the heap size 
for jena-fuseki ? )

Br, Jaana

Andy Seaborne kirjoitti 10.3.2021 17:04:
> On 10/03/2021 02:33, jaanam@kolumbus.fi wrote:
>> Hi, Thanks for your quick anserwer and pls see my answers below!
>> 
>>> How many triples?
>>> And is is new data to replace the old data or in addition to the 
>>> existing data?
>> 
>> 476955 triplets, most parts will bu just same as the old data, just 
>> some triplets may change. And some new triplets may be added.
>> 
>>> This is a TDB1 database?
>> 
>> jena-fuseki UI does not mention TDB1, but this is persistent and not 
>> TDB2.
>> 
>> But in our use case also memory-based datasets might work, as far as 
>> I've been testing in my PC they seem to work even better than 
>> persistent ones. What do you think ?
> 
> In-memory should be fine. Obviously, its lost when the server exits
> but it sounds like the data isn't the primary copy and loading 476955
> triples at start up is not big.
> 
> The heap size by default is quite small in the scripts. It might be an
> idea to increase it a bit to give query working space but 0.5 million
> is really not very big.
> 
>     Andy
> 
>> 
>> Br Jaana
>> 
>> 
>> 
>> Andy Seaborne kirjoitti 9.3.2021 19:58:
>>> Hi Jaana,
>>> 
>>> On 09/03/2021 11:40, jaanam@kolumbus.fi wrote:
>>>> hello,
>>>> 
>>>> I've met the following problem with jena-fuseki (should I create bug 
>>>> ticket ?):
>>>> 
>>>> We need to update jena-fuseki dataset every 5 minutes by a 50 Mbytes 
>>>> ttl-file.
>>> 
>>> How many triples?
>>> And is is new data to replace the old data or in addition to the 
>>> existing data?
>>> 
>>>> This causes the memory consumption in the machine where jena-fuseki 
>>>> is running to increase by gigas.
>>>> 
>>>> This was 1st detected with jena-fuseki 3.8 and later with 
>>>> jena-fuseki 3.17.
>>>> 
>>>> To be exact I executed blankdots/jena-fuseki:fuseki3.17.0 in a 
>>>> docker container posting continously that ttl-file into the same 
>>>> dataset (pxmeta_hub_fed_prod).
>>> 
>>> This is a TDB1 database?
>>> 
>>> TDB2 is better at this - the database still grows but there is a way
>>> to compact the database live.
>>> 
>>> JENA-1987 exposes the compaction in Fuseki.
>>> https://jena.apache.org/documentation/tdb2/tdb2_admin.html
>>> 
>>> The database grows for two reasons: it allocates space in sparse 
>>> files
>>> in 8M chunks but the space does not count in du until actually used.
>>> The space for deleted data is not fully recycled across transactions
>>> because it may be in-use in a concurrent operation. (TDB1 would be
>>> very difficult to do block ref counting; in TDB2 the solution is
>>> compaction.)
>>> 
>>>     Andy
>>> 
>>>> 
>>>> see the output of command "du -h | sort -hr|head -30" below. 
>>>> attached the shell-script that I was executing during the time 
>>>> period.
>>>> 
>>>> root@3d53dc3fdf8d:/#alias du3="du -h | sort -hr|head -30"
>>>> root@3d53dc3fdf8d:/# du3
>>>> 9.0G    .
>>>> 8.5G    ./data/fuseki/databases/pxmeta_hub_fed_prod
>>>> 8.5G    ./data/fuseki/databases
>>>> 8.5G    ./data/fuseki
>>>> 8.5G    ./data
>>> 
>>>> root@3d53dc3fdf8d:/# date
>>>> Tue Mar  9 06:02:46 UTC 2021
>>>> root@3d53dc3fdf8d:/#
>>>> 
>>> 
>>>> 3.5G    .
>>>> 3.0G    ./data/fuseki/databases/pxmeta_hub_fed_prod
>>>> 3.0G    ./data/fuseki/databases
>>>> 3.0G    ./data/fuseki
>>>> 3.0G    ./data
>>>> root@3d53dc3fdf8d:/# date
>>>> Tue Mar  9 05:28:09 UTC 2021
>>>> root@3d53dc3fdf8d:/#
>>>> 
>>>> Br, Jaana

Re: Fwd: Re: updating persistent jena-fuseki dataset increases memory consumption in gigas

Posted by Andy Seaborne <an...@apache.org>.


On 09/04/2021 14:11, jaanam@kolumbus.fi wrote:
> Hi,
> 
> Could you suggest an optimal jena-fuseki heap size for my case ?  I'm 
> sending 50 MBs file to my jena-fuseki memory-based dataset every 5 minutes.

Ultimately, for fine-tuning, the answer is "try". But a 2G or so per 
dataset.

> 
> Jaana
> 
> (and should this be set to JVM actually ?)
> 
> jaanam@kolumbus.fi kirjoitti 8.4.2021 18:03:
>> Hello,
>>
>> Still one question regarding this old issue. The previous answer said:
>>
>>> The heap size by default is quite small in the scripts. It might be an
>>> idea to increase it a bit to give query working space but 0.5 million
>>> is really not very big.
>>
>> What would be the suitable heap size in my case ?
>> (And then very stupid additional question: If I'm running JVM jand
>> jena-fuseki in the same docker container, there's a risk that JVM
>> would take all free memory, thus I've set the JVM heap size to 2 G
>> using JVM_ARGS=-Xmx2g. So, which variables should I use to set the
>> heap size for jena-fuseki ? )

Don't understand - are you running a Java application and Fuseki in the 
container or just Fuseki.

If just Fuseki, there is only one JVM process - Fuseki.

See the start-up script for Fuseki configuration:

# JAVA_OPTIONS
#   Extra options to pass to the JVM.

>>
>> Br, Jaana
>>
>> Andy Seaborne kirjoitti 10.3.2021 17:04:
>>> On 10/03/2021 02:33, jaanam@kolumbus.fi wrote:
>>>> Hi, Thanks for your quick anserwer and pls see my answers below!
>>>>
>>>>> How many triples?
>>>>> And is is new data to replace the old data or in addition to the 
>>>>> existing data?
>>>>
>>>> 476955 triplets, most parts will bu just same as the old data, just 
>>>> some triplets may change. And some new triplets may be added.
>>>>
>>>>> This is a TDB1 database?
>>>>
>>>> jena-fuseki UI does not mention TDB1, but this is persistent and not 
>>>> TDB2.
>>>>
>>>> But in our use case also memory-based datasets might work, as far as 
>>>> I've been testing in my PC they seem to work even better than 
>>>> persistent ones. What do you think ?
>>>
>>> In-memory should be fine. Obviously, its lost when the server exits
>>> but it sounds like the data isn't the primary copy and loading 476955
>>> triples at start up is not big.
>>>
>>> The heap size by default is quite small in the scripts. It might be an
>>> idea to increase it a bit to give query working space but 0.5 million
>>> is really not very big.
>>>
>>>     Andy
>>>
>>>>
>>>> Br Jaana
>>>>
>>>>
>>>>
>>>> Andy Seaborne kirjoitti 9.3.2021 19:58:
>>>>> Hi Jaana,
>>>>>
>>>>> On 09/03/2021 11:40, jaanam@kolumbus.fi wrote:
>>>>>> hello,
>>>>>>
>>>>>> I've met the following problem with jena-fuseki (should I create 
>>>>>> bug ticket ?):
>>>>>>
>>>>>> We need to update jena-fuseki dataset every 5 minutes by a 50 
>>>>>> Mbytes ttl-file.
>>>>>
>>>>> How many triples?
>>>>> And is is new data to replace the old data or in addition to the 
>>>>> existing data?
>>>>>
>>>>>> This causes the memory consumption in the machine where 
>>>>>> jena-fuseki is running to increase by gigas.
>>>>>>
>>>>>> This was 1st detected with jena-fuseki 3.8 and later with 
>>>>>> jena-fuseki 3.17.
>>>>>>
>>>>>> To be exact I executed blankdots/jena-fuseki:fuseki3.17.0 in a 
>>>>>> docker container posting continously that ttl-file into the same 
>>>>>> dataset (pxmeta_hub_fed_prod).
>>>>>
>>>>> This is a TDB1 database?
>>>>>
>>>>> TDB2 is better at this - the database still grows but there is a way
>>>>> to compact the database live.
>>>>>
>>>>> JENA-1987 exposes the compaction in Fuseki.
>>>>> https://jena.apache.org/documentation/tdb2/tdb2_admin.html
>>>>>
>>>>> The database grows for two reasons: it allocates space in sparse files
>>>>> in 8M chunks but the space does not count in du until actually used.
>>>>> The space for deleted data is not fully recycled across transactions
>>>>> because it may be in-use in a concurrent operation. (TDB1 would be
>>>>> very difficult to do block ref counting; in TDB2 the solution is
>>>>> compaction.)
>>>>>
>>>>>     Andy
>>>>>
>>>>>>
>>>>>> see the output of command "du -h | sort -hr|head -30" below. 
>>>>>> attached the shell-script that I was executing during the time 
>>>>>> period.
>>>>>>
>>>>>> root@3d53dc3fdf8d:/#alias du3="du -h | sort -hr|head -30"
>>>>>> root@3d53dc3fdf8d:/# du3
>>>>>> 9.0G    .
>>>>>> 8.5G    ./data/fuseki/databases/pxmeta_hub_fed_prod
>>>>>> 8.5G    ./data/fuseki/databases
>>>>>> 8.5G    ./data/fuseki
>>>>>> 8.5G    ./data
>>>>>
>>>>>> root@3d53dc3fdf8d:/# date
>>>>>> Tue Mar  9 06:02:46 UTC 2021
>>>>>> root@3d53dc3fdf8d:/#
>>>>>>
>>>>>
>>>>>> 3.5G    .
>>>>>> 3.0G    ./data/fuseki/databases/pxmeta_hub_fed_prod
>>>>>> 3.0G    ./data/fuseki/databases
>>>>>> 3.0G    ./data/fuseki
>>>>>> 3.0G    ./data
>>>>>> root@3d53dc3fdf8d:/# date
>>>>>> Tue Mar  9 05:28:09 UTC 2021
>>>>>> root@3d53dc3fdf8d:/#
>>>>>>
>>>>>> Br, Jaana

Re: Fwd: Re: updating persistent jena-fuseki dataset increases memory consumption in gigas

Posted by ja...@kolumbus.fi.

Hi,

Could you suggest an optimal jena-fuseki heap size for my case ?  I'm 
sending 50 MBs file to my jena-fuseki memory-based dataset every 5 
minutes.

Jaana

(and should this be set to JVM actually ?)

jaanam@kolumbus.fi kirjoitti 8.4.2021 18:03:
> Hello,
> 
> Still one question regarding this old issue. The previous answer said:
> 
>> The heap size by default is quite small in the scripts. It might be an
>> idea to increase it a bit to give query working space but 0.5 million
>> is really not very big.
> 
> What would be the suitable heap size in my case ?
> (And then very stupid additional question: If I'm running JVM jand
> jena-fuseki in the same docker container, there's a risk that JVM
> would take all free memory, thus I've set the JVM heap size to 2 G
> using JVM_ARGS=-Xmx2g. So, which variables should I use to set the
> heap size for jena-fuseki ? )
> 
> Br, Jaana
> 
> Andy Seaborne kirjoitti 10.3.2021 17:04:
>> On 10/03/2021 02:33, jaanam@kolumbus.fi wrote:
>>> Hi, Thanks for your quick anserwer and pls see my answers below!
>>> 
>>>> How many triples?
>>>> And is is new data to replace the old data or in addition to the 
>>>> existing data?
>>> 
>>> 476955 triplets, most parts will bu just same as the old data, just 
>>> some triplets may change. And some new triplets may be added.
>>> 
>>>> This is a TDB1 database?
>>> 
>>> jena-fuseki UI does not mention TDB1, but this is persistent and not 
>>> TDB2.
>>> 
>>> But in our use case also memory-based datasets might work, as far as 
>>> I've been testing in my PC they seem to work even better than 
>>> persistent ones. What do you think ?
>> 
>> In-memory should be fine. Obviously, its lost when the server exits
>> but it sounds like the data isn't the primary copy and loading 476955
>> triples at start up is not big.
>> 
>> The heap size by default is quite small in the scripts. It might be an
>> idea to increase it a bit to give query working space but 0.5 million
>> is really not very big.
>> 
>>     Andy
>> 
>>> 
>>> Br Jaana
>>> 
>>> 
>>> 
>>> Andy Seaborne kirjoitti 9.3.2021 19:58:
>>>> Hi Jaana,
>>>> 
>>>> On 09/03/2021 11:40, jaanam@kolumbus.fi wrote:
>>>>> hello,
>>>>> 
>>>>> I've met the following problem with jena-fuseki (should I create 
>>>>> bug ticket ?):
>>>>> 
>>>>> We need to update jena-fuseki dataset every 5 minutes by a 50 
>>>>> Mbytes ttl-file.
>>>> 
>>>> How many triples?
>>>> And is is new data to replace the old data or in addition to the 
>>>> existing data?
>>>> 
>>>>> This causes the memory consumption in the machine where jena-fuseki 
>>>>> is running to increase by gigas.
>>>>> 
>>>>> This was 1st detected with jena-fuseki 3.8 and later with 
>>>>> jena-fuseki 3.17.
>>>>> 
>>>>> To be exact I executed blankdots/jena-fuseki:fuseki3.17.0 in a 
>>>>> docker container posting continously that ttl-file into the same 
>>>>> dataset (pxmeta_hub_fed_prod).
>>>> 
>>>> This is a TDB1 database?
>>>> 
>>>> TDB2 is better at this - the database still grows but there is a way
>>>> to compact the database live.
>>>> 
>>>> JENA-1987 exposes the compaction in Fuseki.
>>>> https://jena.apache.org/documentation/tdb2/tdb2_admin.html
>>>> 
>>>> The database grows for two reasons: it allocates space in sparse 
>>>> files
>>>> in 8M chunks but the space does not count in du until actually used.
>>>> The space for deleted data is not fully recycled across transactions
>>>> because it may be in-use in a concurrent operation. (TDB1 would be
>>>> very difficult to do block ref counting; in TDB2 the solution is
>>>> compaction.)
>>>> 
>>>>     Andy
>>>> 
>>>>> 
>>>>> see the output of command "du -h | sort -hr|head -30" below. 
>>>>> attached the shell-script that I was executing during the time 
>>>>> period.
>>>>> 
>>>>> root@3d53dc3fdf8d:/#alias du3="du -h | sort -hr|head -30"
>>>>> root@3d53dc3fdf8d:/# du3
>>>>> 9.0G    .
>>>>> 8.5G    ./data/fuseki/databases/pxmeta_hub_fed_prod
>>>>> 8.5G    ./data/fuseki/databases
>>>>> 8.5G    ./data/fuseki
>>>>> 8.5G    ./data
>>>> 
>>>>> root@3d53dc3fdf8d:/# date
>>>>> Tue Mar  9 06:02:46 UTC 2021
>>>>> root@3d53dc3fdf8d:/#
>>>>> 
>>>> 
>>>>> 3.5G    .
>>>>> 3.0G    ./data/fuseki/databases/pxmeta_hub_fed_prod
>>>>> 3.0G    ./data/fuseki/databases
>>>>> 3.0G    ./data/fuseki
>>>>> 3.0G    ./data
>>>>> root@3d53dc3fdf8d:/# date
>>>>> Tue Mar  9 05:28:09 UTC 2021
>>>>> root@3d53dc3fdf8d:/#
>>>>> 
>>>>> Br, Jaana