You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by ja...@kolumbus.fi on 2021/04/08 15:03:43 UTC
Fwd: Re: updating persistent jena-fuseki dataset increases memory
consumption in gigas
Hello,
Still one question regarding this old issue. The previous answer said:
> The heap size by default is quite small in the scripts. It might be an
> idea to increase it a bit to give query working space but 0.5 million
> is really not very big.
What would be the suitable heap size in my case ?
(And then very stupid additional question: If I'm running JVM jand
jena-fuseki in the same docker container, there's a risk that JVM would
take all free memory, thus I've set the JVM heap size to 2 G using
JVM_ARGS=-Xmx2g. So, which variables should I use to set the heap size
for jena-fuseki ? )
Br, Jaana
Andy Seaborne kirjoitti 10.3.2021 17:04:
> On 10/03/2021 02:33, jaanam@kolumbus.fi wrote:
>> Hi, Thanks for your quick anserwer and pls see my answers below!
>>
>>> How many triples?
>>> And is is new data to replace the old data or in addition to the
>>> existing data?
>>
>> 476955 triplets, most parts will bu just same as the old data, just
>> some triplets may change. And some new triplets may be added.
>>
>>> This is a TDB1 database?
>>
>> jena-fuseki UI does not mention TDB1, but this is persistent and not
>> TDB2.
>>
>> But in our use case also memory-based datasets might work, as far as
>> I've been testing in my PC they seem to work even better than
>> persistent ones. What do you think ?
>
> In-memory should be fine. Obviously, its lost when the server exits
> but it sounds like the data isn't the primary copy and loading 476955
> triples at start up is not big.
>
> The heap size by default is quite small in the scripts. It might be an
> idea to increase it a bit to give query working space but 0.5 million
> is really not very big.
>
> Andy
>
>>
>> Br Jaana
>>
>>
>>
>> Andy Seaborne kirjoitti 9.3.2021 19:58:
>>> Hi Jaana,
>>>
>>> On 09/03/2021 11:40, jaanam@kolumbus.fi wrote:
>>>> hello,
>>>>
>>>> I've met the following problem with jena-fuseki (should I create bug
>>>> ticket ?):
>>>>
>>>> We need to update jena-fuseki dataset every 5 minutes by a 50 Mbytes
>>>> ttl-file.
>>>
>>> How many triples?
>>> And is is new data to replace the old data or in addition to the
>>> existing data?
>>>
>>>> This causes the memory consumption in the machine where jena-fuseki
>>>> is running to increase by gigas.
>>>>
>>>> This was 1st detected with jena-fuseki 3.8 and later with
>>>> jena-fuseki 3.17.
>>>>
>>>> To be exact I executed blankdots/jena-fuseki:fuseki3.17.0 in a
>>>> docker container posting continously that ttl-file into the same
>>>> dataset (pxmeta_hub_fed_prod).
>>>
>>> This is a TDB1 database?
>>>
>>> TDB2 is better at this - the database still grows but there is a way
>>> to compact the database live.
>>>
>>> JENA-1987 exposes the compaction in Fuseki.
>>> https://jena.apache.org/documentation/tdb2/tdb2_admin.html
>>>
>>> The database grows for two reasons: it allocates space in sparse
>>> files
>>> in 8M chunks but the space does not count in du until actually used.
>>> The space for deleted data is not fully recycled across transactions
>>> because it may be in-use in a concurrent operation. (TDB1 would be
>>> very difficult to do block ref counting; in TDB2 the solution is
>>> compaction.)
>>>
>>> Andy
>>>
>>>>
>>>> see the output of command "du -h | sort -hr|head -30" below.
>>>> attached the shell-script that I was executing during the time
>>>> period.
>>>>
>>>> root@3d53dc3fdf8d:/#alias du3="du -h | sort -hr|head -30"
>>>> root@3d53dc3fdf8d:/# du3
>>>> 9.0G .
>>>> 8.5G ./data/fuseki/databases/pxmeta_hub_fed_prod
>>>> 8.5G ./data/fuseki/databases
>>>> 8.5G ./data/fuseki
>>>> 8.5G ./data
>>>
>>>> root@3d53dc3fdf8d:/# date
>>>> Tue Mar 9 06:02:46 UTC 2021
>>>> root@3d53dc3fdf8d:/#
>>>>
>>>
>>>> 3.5G .
>>>> 3.0G ./data/fuseki/databases/pxmeta_hub_fed_prod
>>>> 3.0G ./data/fuseki/databases
>>>> 3.0G ./data/fuseki
>>>> 3.0G ./data
>>>> root@3d53dc3fdf8d:/# date
>>>> Tue Mar 9 05:28:09 UTC 2021
>>>> root@3d53dc3fdf8d:/#
>>>>
>>>> Br, Jaana
Re: Fwd: Re: updating persistent jena-fuseki dataset increases memory
consumption in gigas
Posted by Andy Seaborne <an...@apache.org>.
On 09/04/2021 14:11, jaanam@kolumbus.fi wrote:
> Hi,
>
> Could you suggest an optimal jena-fuseki heap size for my case ? I'm
> sending 50 MBs file to my jena-fuseki memory-based dataset every 5 minutes.
Ultimately, for fine-tuning, the answer is "try". But a 2G or so per
dataset.
>
> Jaana
>
> (and should this be set to JVM actually ?)
>
> jaanam@kolumbus.fi kirjoitti 8.4.2021 18:03:
>> Hello,
>>
>> Still one question regarding this old issue. The previous answer said:
>>
>>> The heap size by default is quite small in the scripts. It might be an
>>> idea to increase it a bit to give query working space but 0.5 million
>>> is really not very big.
>>
>> What would be the suitable heap size in my case ?
>> (And then very stupid additional question: If I'm running JVM jand
>> jena-fuseki in the same docker container, there's a risk that JVM
>> would take all free memory, thus I've set the JVM heap size to 2 G
>> using JVM_ARGS=-Xmx2g. So, which variables should I use to set the
>> heap size for jena-fuseki ? )
Don't understand - are you running a Java application and Fuseki in the
container or just Fuseki.
If just Fuseki, there is only one JVM process - Fuseki.
See the start-up script for Fuseki configuration:
# JAVA_OPTIONS
# Extra options to pass to the JVM.
>>
>> Br, Jaana
>>
>> Andy Seaborne kirjoitti 10.3.2021 17:04:
>>> On 10/03/2021 02:33, jaanam@kolumbus.fi wrote:
>>>> Hi, Thanks for your quick anserwer and pls see my answers below!
>>>>
>>>>> How many triples?
>>>>> And is is new data to replace the old data or in addition to the
>>>>> existing data?
>>>>
>>>> 476955 triplets, most parts will bu just same as the old data, just
>>>> some triplets may change. And some new triplets may be added.
>>>>
>>>>> This is a TDB1 database?
>>>>
>>>> jena-fuseki UI does not mention TDB1, but this is persistent and not
>>>> TDB2.
>>>>
>>>> But in our use case also memory-based datasets might work, as far as
>>>> I've been testing in my PC they seem to work even better than
>>>> persistent ones. What do you think ?
>>>
>>> In-memory should be fine. Obviously, its lost when the server exits
>>> but it sounds like the data isn't the primary copy and loading 476955
>>> triples at start up is not big.
>>>
>>> The heap size by default is quite small in the scripts. It might be an
>>> idea to increase it a bit to give query working space but 0.5 million
>>> is really not very big.
>>>
>>> Andy
>>>
>>>>
>>>> Br Jaana
>>>>
>>>>
>>>>
>>>> Andy Seaborne kirjoitti 9.3.2021 19:58:
>>>>> Hi Jaana,
>>>>>
>>>>> On 09/03/2021 11:40, jaanam@kolumbus.fi wrote:
>>>>>> hello,
>>>>>>
>>>>>> I've met the following problem with jena-fuseki (should I create
>>>>>> bug ticket ?):
>>>>>>
>>>>>> We need to update jena-fuseki dataset every 5 minutes by a 50
>>>>>> Mbytes ttl-file.
>>>>>
>>>>> How many triples?
>>>>> And is is new data to replace the old data or in addition to the
>>>>> existing data?
>>>>>
>>>>>> This causes the memory consumption in the machine where
>>>>>> jena-fuseki is running to increase by gigas.
>>>>>>
>>>>>> This was 1st detected with jena-fuseki 3.8 and later with
>>>>>> jena-fuseki 3.17.
>>>>>>
>>>>>> To be exact I executed blankdots/jena-fuseki:fuseki3.17.0 in a
>>>>>> docker container posting continously that ttl-file into the same
>>>>>> dataset (pxmeta_hub_fed_prod).
>>>>>
>>>>> This is a TDB1 database?
>>>>>
>>>>> TDB2 is better at this - the database still grows but there is a way
>>>>> to compact the database live.
>>>>>
>>>>> JENA-1987 exposes the compaction in Fuseki.
>>>>> https://jena.apache.org/documentation/tdb2/tdb2_admin.html
>>>>>
>>>>> The database grows for two reasons: it allocates space in sparse files
>>>>> in 8M chunks but the space does not count in du until actually used.
>>>>> The space for deleted data is not fully recycled across transactions
>>>>> because it may be in-use in a concurrent operation. (TDB1 would be
>>>>> very difficult to do block ref counting; in TDB2 the solution is
>>>>> compaction.)
>>>>>
>>>>> Andy
>>>>>
>>>>>>
>>>>>> see the output of command "du -h | sort -hr|head -30" below.
>>>>>> attached the shell-script that I was executing during the time
>>>>>> period.
>>>>>>
>>>>>> root@3d53dc3fdf8d:/#alias du3="du -h | sort -hr|head -30"
>>>>>> root@3d53dc3fdf8d:/# du3
>>>>>> 9.0G .
>>>>>> 8.5G ./data/fuseki/databases/pxmeta_hub_fed_prod
>>>>>> 8.5G ./data/fuseki/databases
>>>>>> 8.5G ./data/fuseki
>>>>>> 8.5G ./data
>>>>>
>>>>>> root@3d53dc3fdf8d:/# date
>>>>>> Tue Mar 9 06:02:46 UTC 2021
>>>>>> root@3d53dc3fdf8d:/#
>>>>>>
>>>>>
>>>>>> 3.5G .
>>>>>> 3.0G ./data/fuseki/databases/pxmeta_hub_fed_prod
>>>>>> 3.0G ./data/fuseki/databases
>>>>>> 3.0G ./data/fuseki
>>>>>> 3.0G ./data
>>>>>> root@3d53dc3fdf8d:/# date
>>>>>> Tue Mar 9 05:28:09 UTC 2021
>>>>>> root@3d53dc3fdf8d:/#
>>>>>>
>>>>>> Br, Jaana
Re: Fwd: Re: updating persistent jena-fuseki dataset increases memory
consumption in gigas
Posted by ja...@kolumbus.fi.
Hi,
Could you suggest an optimal jena-fuseki heap size for my case ? I'm
sending 50 MBs file to my jena-fuseki memory-based dataset every 5
minutes.
Jaana
(and should this be set to JVM actually ?)
jaanam@kolumbus.fi kirjoitti 8.4.2021 18:03:
> Hello,
>
> Still one question regarding this old issue. The previous answer said:
>
>> The heap size by default is quite small in the scripts. It might be an
>> idea to increase it a bit to give query working space but 0.5 million
>> is really not very big.
>
> What would be the suitable heap size in my case ?
> (And then very stupid additional question: If I'm running JVM jand
> jena-fuseki in the same docker container, there's a risk that JVM
> would take all free memory, thus I've set the JVM heap size to 2 G
> using JVM_ARGS=-Xmx2g. So, which variables should I use to set the
> heap size for jena-fuseki ? )
>
> Br, Jaana
>
> Andy Seaborne kirjoitti 10.3.2021 17:04:
>> On 10/03/2021 02:33, jaanam@kolumbus.fi wrote:
>>> Hi, Thanks for your quick anserwer and pls see my answers below!
>>>
>>>> How many triples?
>>>> And is is new data to replace the old data or in addition to the
>>>> existing data?
>>>
>>> 476955 triplets, most parts will bu just same as the old data, just
>>> some triplets may change. And some new triplets may be added.
>>>
>>>> This is a TDB1 database?
>>>
>>> jena-fuseki UI does not mention TDB1, but this is persistent and not
>>> TDB2.
>>>
>>> But in our use case also memory-based datasets might work, as far as
>>> I've been testing in my PC they seem to work even better than
>>> persistent ones. What do you think ?
>>
>> In-memory should be fine. Obviously, its lost when the server exits
>> but it sounds like the data isn't the primary copy and loading 476955
>> triples at start up is not big.
>>
>> The heap size by default is quite small in the scripts. It might be an
>> idea to increase it a bit to give query working space but 0.5 million
>> is really not very big.
>>
>> Andy
>>
>>>
>>> Br Jaana
>>>
>>>
>>>
>>> Andy Seaborne kirjoitti 9.3.2021 19:58:
>>>> Hi Jaana,
>>>>
>>>> On 09/03/2021 11:40, jaanam@kolumbus.fi wrote:
>>>>> hello,
>>>>>
>>>>> I've met the following problem with jena-fuseki (should I create
>>>>> bug ticket ?):
>>>>>
>>>>> We need to update jena-fuseki dataset every 5 minutes by a 50
>>>>> Mbytes ttl-file.
>>>>
>>>> How many triples?
>>>> And is is new data to replace the old data or in addition to the
>>>> existing data?
>>>>
>>>>> This causes the memory consumption in the machine where jena-fuseki
>>>>> is running to increase by gigas.
>>>>>
>>>>> This was 1st detected with jena-fuseki 3.8 and later with
>>>>> jena-fuseki 3.17.
>>>>>
>>>>> To be exact I executed blankdots/jena-fuseki:fuseki3.17.0 in a
>>>>> docker container posting continously that ttl-file into the same
>>>>> dataset (pxmeta_hub_fed_prod).
>>>>
>>>> This is a TDB1 database?
>>>>
>>>> TDB2 is better at this - the database still grows but there is a way
>>>> to compact the database live.
>>>>
>>>> JENA-1987 exposes the compaction in Fuseki.
>>>> https://jena.apache.org/documentation/tdb2/tdb2_admin.html
>>>>
>>>> The database grows for two reasons: it allocates space in sparse
>>>> files
>>>> in 8M chunks but the space does not count in du until actually used.
>>>> The space for deleted data is not fully recycled across transactions
>>>> because it may be in-use in a concurrent operation. (TDB1 would be
>>>> very difficult to do block ref counting; in TDB2 the solution is
>>>> compaction.)
>>>>
>>>> Andy
>>>>
>>>>>
>>>>> see the output of command "du -h | sort -hr|head -30" below.
>>>>> attached the shell-script that I was executing during the time
>>>>> period.
>>>>>
>>>>> root@3d53dc3fdf8d:/#alias du3="du -h | sort -hr|head -30"
>>>>> root@3d53dc3fdf8d:/# du3
>>>>> 9.0G .
>>>>> 8.5G ./data/fuseki/databases/pxmeta_hub_fed_prod
>>>>> 8.5G ./data/fuseki/databases
>>>>> 8.5G ./data/fuseki
>>>>> 8.5G ./data
>>>>
>>>>> root@3d53dc3fdf8d:/# date
>>>>> Tue Mar 9 06:02:46 UTC 2021
>>>>> root@3d53dc3fdf8d:/#
>>>>>
>>>>
>>>>> 3.5G .
>>>>> 3.0G ./data/fuseki/databases/pxmeta_hub_fed_prod
>>>>> 3.0G ./data/fuseki/databases
>>>>> 3.0G ./data/fuseki
>>>>> 3.0G ./data
>>>>> root@3d53dc3fdf8d:/# date
>>>>> Tue Mar 9 05:28:09 UTC 2021
>>>>> root@3d53dc3fdf8d:/#
>>>>>
>>>>> Br, Jaana