You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Mikael Pesonen <mi...@lingsoft.fi> on 2019/01/29 11:32:30 UTC
Out of memory
I'm not able to run a basic read-only script without running out of
memory on the server.
Consumption goes to 7+gigs (VM 10+ gigs), then system kills Fuseki when
running out of memory.
All I'm running is simple sparql query getting few triples of resource.
This is run for about 50k times.
All settings are default, using GSP.
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Out of memory
Posted by Andy Seaborne <an...@apache.org>.
On 29/01/2019 14:28, Rob Vesse wrote:
> This may be partly a case of a simple looking query having unexpected execution semantics. Strictly speaking your query says select all triples in the specific graph then join them with these list of values for ?s. Now the optimiser should, and does appear, to do the right thing and flip the join order i.e. it uses the concrete values from the VALUES block to search for triples with those subjects in the specific graph. However if the query had other elements involved the optimiser might not kick in, a better query would place the VALUES prior to using the variables defined in the VALUES block.
>
> This sounds like memory/cache thrashing. From what you have described, running variants on this query 50k times, you are basically walking over your entire dataset extracting it piece by piece?
And as that happen, more and more of the nodes get cached. The node
cache is of a fixed number of if the literals are big, the size is big.
The cache is usually 1-2G per database but it can be more.
And then ther is workspace - and it might be the GC is close to full,
meaning the GC is doing a lot of work.
How much free RAM is there before the 50K queries start? (visualvm and
force a GC). visualvm also tells you how much work the GC is doing.
Andy
>
> Assuming the Graph URI and the URIs in your VALUES block change in each query then every query is looking at a different section of the database causing a lot of data to be cached and then evicted both in terms of on-heap memory structures (the node table cache) and potentially also for the off heap memory mapped files which may be being paged in and out as the code traverses the B-Tree indexes.
>
> Is there also some other query involved that extracts the Graph URIs and Subject URIs of interest that is being executed in parallel with the script? Or has the input from the script been pre-calculated ahead of time, comes from elsewhere etc?
>
> Rob
>
> On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:
>
>
> Server:
>
> /usr/bin/java
> -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
> -Xmx5600M -jar fuseki-server.jar --update --port 3030
> --loc=/home/text/tools/jena_data_test/ /ds
>
> No custom configs, default installation package.
>
>
> Sparql similar to this (returns 5-10 triplets) :
>
> CONSTRUCT { ?s ?p ?o }
> FROM <https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
> WHERE
> {
> ?s ?p ?o
>
> VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
> lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
> lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
> lsr:239c6da0-4c24-4539-a277-c9756d6257ee
> lsr:2ef0190d-6271-447a-992f-6225fc440897
> lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
> lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
> lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
> lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
> lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
> }
>
>
> I solved this by adding sleep to script. So I guess it's about the java
> memory manager not getting time to free memory? Even with sleep it was
> barely doable, memory consumption changing rapidly between 1,5 gig - 6 gig.
>
>
>
> On 29/01/2019 15:50, Andy Seaborne wrote:
> > Mikael,
> >
> > There aren't enough details except to mention the suspects like sorting.
> >
> > With all the questions on the list, I personally don't track the
> > details of each installation so please also remind me of your current
> > setup.
> >
> > Andy
> >
> > On 29/01/2019 11:32, Mikael Pesonen wrote:
> >>
> >> I'm not able to run a basic read-only script without running out of
> >> memory on the server.
> >>
> >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills Fuseki
> >> when running out of memory.
> >> All I'm running is simple sparql query getting few triples of
> >> resource. This is run for about 50k times.
> >>
> >> All settings are default, using GSP.
> >>
> >>
>
> --
> Lingsoft - 30 years of Leading Language Management
>
> www.lingsoft.fi
>
> Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
>
> Mikael Pesonen
> System Engineer
>
> e-mail: mikael.pesonen@lingsoft.fi
> Tel. +358 2 279 3300
>
> Time zone: GMT+2
>
> Helsinki Office
> Eteläranta 10
> FI-00130 Helsinki
> FINLAND
>
> Turku Office
> Kauppiaskatu 5 A
> FI-20100 Turku
> FINLAND
>
>
>
>
>
>
Re: Out of memory
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Thanks for these, haven't even heard of this syntax before so have to
study...
On 29/01/2019 19:25, Andy Seaborne wrote:
> This case should be optimized to be the flipped join(VALUES, BGP)
>
> (prefix ((lsr: <lsr:>))
> (sequence
> (table (vars ?s)
> (row [?s lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8])
> (row [?s lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985])
> (row [?s lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902])
> (row [?s lsr:239c6da0-4c24-4539-a277-c9756d6257ee])
> (row [?s lsr:2ef0190d-6271-447a-992f-6225fc440897])
> (row [?s lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9])
> (row [?s lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf])
> (row [?s lsr:6f6802cf-0336-4234-90b8-cc8780058f0d])
> (row [?s lsr:d1e2751b-4332-4d57-95e4-ca8070c16782])
> (row [?s lsr:81053775-4722-4a00-b3f7-33d4feb3629b])
> )
> (bgp (triple ?s ?p ?o))))
>
> Andy
>
> On 29/01/2019 14:28, Rob Vesse wrote:
>> This may be partly a case of a simple looking query having unexpected
>> execution semantics. Strictly speaking your query says select all
>> triples in the specific graph then join them with these list of
>> values for ?s. Now the optimiser should, and does appear, to do the
>> right thing and flip the join order i.e. it uses the concrete values
>> from the VALUES block to search for triples with those subjects in
>> the specific graph. However if the query had other elements involved
>> the optimiser might not kick in, a better query would place the
>> VALUES prior to using the variables defined in the VALUES block.
>>
>> This sounds like memory/cache thrashing. From what you have
>> described, running variants on this query 50k times, you are
>> basically walking over your entire dataset extracting it piece by piece?
>>
>> Assuming the Graph URI and the URIs in your VALUES block change in
>> each query then every query is looking at a different section of the
>> database causing a lot of data to be cached and then evicted both in
>> terms of on-heap memory structures (the node table cache) and
>> potentially also for the off heap memory mapped files which may be
>> being paged in and out as the code traverses the B-Tree indexes.
>>
>> Is there also some other query involved that extracts the Graph URIs
>> and Subject URIs of interest that is being executed in parallel with
>> the script? Or has the input from the script been pre-calculated
>> ahead of time, comes from elsewhere etc?
>>
>> Rob
>>
>> On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi>
>> wrote:
>>
>> Server:
>> /usr/bin/java
>> -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
>> -Xmx5600M -jar fuseki-server.jar --update --port 3030
>> --loc=/home/text/tools/jena_data_test/ /ds
>> No custom configs, default installation package.
>> Sparql similar to this (returns 5-10 triplets) :
>> CONSTRUCT { ?s ?p ?o }
>> FROM
>> <https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
>> WHERE
>> {
>> ?s ?p ?o
>> VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
>> lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
>> lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
>> lsr:239c6da0-4c24-4539-a277-c9756d6257ee
>> lsr:2ef0190d-6271-447a-992f-6225fc440897
>> lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
>> lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
>> lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
>> lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
>> lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
>> }
>> I solved this by adding sleep to script. So I guess
>> it's about the java
>> memory manager not getting time to free memory? Even with sleep
>> it was
>> barely doable, memory consumption changing rapidly between 1,5
>> gig - 6 gig.
>> On 29/01/2019 15:50, Andy Seaborne wrote:
>> > Mikael,
>> >
>> > There aren't enough details except to mention the suspects
>> like sorting.
>> >
>> > With all the questions on the list, I personally don't track the
>> > details of each installation so please also remind me of your
>> current
>> > setup.
>> >
>> > Andy
>> >
>> > On 29/01/2019 11:32, Mikael Pesonen wrote:
>> >>
>> >> I'm not able to run a basic read-only script without running
>> out of
>> >> memory on the server.
>> >>
>> >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills
>> Fuseki
>> >> when running out of memory.
>> >> All I'm running is simple sparql query getting few triples of
>> >> resource. This is run for about 50k times.
>> >>
>> >> All settings are default, using GSP.
>> >>
>> >>
>> --
>> Lingsoft - 30 years of Leading Language Management
>> www.lingsoft.fi
>> Speech Applications - Language Management - Translation -
>> Reader's and Writer's Tools - Text Tools - E-books and M-books
>> Mikael Pesonen
>> System Engineer
>> e-mail: mikael.pesonen@lingsoft.fi
>> Tel. +358 2 279 3300
>> Time zone: GMT+2
>> Helsinki Office
>> Eteläranta 10
>> FI-00130 Helsinki
>> FINLAND
>> Turku Office
>> Kauppiaskatu 5 A
>> FI-20100 Turku
>> FINLAND
>>
>>
>>
>>
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Out of memory
Posted by Andy Seaborne <an...@apache.org>.
This case should be optimized to be the flipped join(VALUES, BGP)
(prefix ((lsr: <lsr:>))
(sequence
(table (vars ?s)
(row [?s lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8])
(row [?s lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985])
(row [?s lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902])
(row [?s lsr:239c6da0-4c24-4539-a277-c9756d6257ee])
(row [?s lsr:2ef0190d-6271-447a-992f-6225fc440897])
(row [?s lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9])
(row [?s lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf])
(row [?s lsr:6f6802cf-0336-4234-90b8-cc8780058f0d])
(row [?s lsr:d1e2751b-4332-4d57-95e4-ca8070c16782])
(row [?s lsr:81053775-4722-4a00-b3f7-33d4feb3629b])
)
(bgp (triple ?s ?p ?o))))
Andy
On 29/01/2019 14:28, Rob Vesse wrote:
> This may be partly a case of a simple looking query having unexpected execution semantics. Strictly speaking your query says select all triples in the specific graph then join them with these list of values for ?s. Now the optimiser should, and does appear, to do the right thing and flip the join order i.e. it uses the concrete values from the VALUES block to search for triples with those subjects in the specific graph. However if the query had other elements involved the optimiser might not kick in, a better query would place the VALUES prior to using the variables defined in the VALUES block.
>
> This sounds like memory/cache thrashing. From what you have described, running variants on this query 50k times, you are basically walking over your entire dataset extracting it piece by piece?
>
> Assuming the Graph URI and the URIs in your VALUES block change in each query then every query is looking at a different section of the database causing a lot of data to be cached and then evicted both in terms of on-heap memory structures (the node table cache) and potentially also for the off heap memory mapped files which may be being paged in and out as the code traverses the B-Tree indexes.
>
> Is there also some other query involved that extracts the Graph URIs and Subject URIs of interest that is being executed in parallel with the script? Or has the input from the script been pre-calculated ahead of time, comes from elsewhere etc?
>
> Rob
>
> On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:
>
>
> Server:
>
> /usr/bin/java
> -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
> -Xmx5600M -jar fuseki-server.jar --update --port 3030
> --loc=/home/text/tools/jena_data_test/ /ds
>
> No custom configs, default installation package.
>
>
> Sparql similar to this (returns 5-10 triplets) :
>
> CONSTRUCT { ?s ?p ?o }
> FROM <https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
> WHERE
> {
> ?s ?p ?o
>
> VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
> lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
> lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
> lsr:239c6da0-4c24-4539-a277-c9756d6257ee
> lsr:2ef0190d-6271-447a-992f-6225fc440897
> lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
> lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
> lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
> lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
> lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
> }
>
>
> I solved this by adding sleep to script. So I guess it's about the java
> memory manager not getting time to free memory? Even with sleep it was
> barely doable, memory consumption changing rapidly between 1,5 gig - 6 gig.
>
>
>
> On 29/01/2019 15:50, Andy Seaborne wrote:
> > Mikael,
> >
> > There aren't enough details except to mention the suspects like sorting.
> >
> > With all the questions on the list, I personally don't track the
> > details of each installation so please also remind me of your current
> > setup.
> >
> > Andy
> >
> > On 29/01/2019 11:32, Mikael Pesonen wrote:
> >>
> >> I'm not able to run a basic read-only script without running out of
> >> memory on the server.
> >>
> >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills Fuseki
> >> when running out of memory.
> >> All I'm running is simple sparql query getting few triples of
> >> resource. This is run for about 50k times.
> >>
> >> All settings are default, using GSP.
> >>
> >>
>
> --
> Lingsoft - 30 years of Leading Language Management
>
> www.lingsoft.fi
>
> Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
>
> Mikael Pesonen
> System Engineer
>
> e-mail: mikael.pesonen@lingsoft.fi
> Tel. +358 2 279 3300
>
> Time zone: GMT+2
>
> Helsinki Office
> Eteläranta 10
> FI-00130 Helsinki
> FINLAND
>
> Turku Office
> Kauppiaskatu 5 A
> FI-20100 Turku
> FINLAND
>
>
>
>
>
>
Re: Out of memory
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
I haven't got desktop but free says
total used free shared buff/cache
available
Mem: 8691124 1844328 399084 100032 6447712 6463068
Swap: 0 0 0
when Fuseki is "resting".
On 29/01/2019 22:46, Andy Seaborne wrote:
> TDB uses the OS file cache via mmap files.
>
> The files appear as part of the process address space but of course
> they are not part of the heap. It also flex up and down as needed
> (unlike the heap).
>
> Some sys tools report the total address - and that is not the amount
> of RAM the process is using.
>
> In top(1) Linux-speak: roughly VIRT and RES (assuming no old-fashioned
> swapping is going on which with java should be avoided at all costs -
> the JVM heap on swap is very bad for performance).
>
> RES is approximately yhe
>
> visualvm allows you see the heap size. That's the figure to look at
> first.
>
> For Mikael,
>
> -Xmx5600M
> process space 7+gigs
> (VM 10+ gigs)
>
> so
>
> (For Fuseki+TDB its either heap or mapped files - there isn't use of
> direct memory (RAM, but not heap).
>
> and start with -Xms5600M as well.
>
> Andy
>
> On 29/01/2019 17:41, Dan Pritts wrote:
>> It's often misunderstood, but Java programs use memory in addition to
>> the
>> configured heap. Fuseki in my experience sometimes uses a LOT more,
>> more
>> than I could explain. Some of the folks here (Andy for sure) spent some
>> time looking at it with me and weren't able to come to any conclusions.
>> You can look throught he list archives for the discussion, maybe 6
>> months
>> ago.
>>
>> I ended up significantly overallocating memory to the instance and being
>> done with it.
>>
>> How much RAM does your instance have? You mentioned -Xmx 5600, and
>> total
>> usage of 17GB ram+swap - sounds like you have maybe 8GB ram? I'd try
>> 16GB and see how it does; watch the total memory usage.
>>
>>
>>
>> On Tue, Jan 29, 2019 at 9:43 AM Mikael Pesonen
>> <mi...@lingsoft.fi>
>> wrote:
>>
>>>
>>>
>>>
>>> On 29/01/2019 16:28, Rob Vesse wrote:
>>>> This may be partly a case of a simple looking query having unexpected
>>> execution semantics. Strictly speaking your query says select all
>>> triples
>>> in the specific graph then join them with these list of values for
>>> ?s. Now
>>> the optimiser should, and does appear, to do the right thing and
>>> flip the
>>> join order i.e. it uses the concrete values from the VALUES block to
>>> search
>>> for triples with those subjects in the specific graph. However if the
>>> query had other elements involved the optimiser might not kick in, a
>>> better
>>> query would place the VALUES prior to using the variables defined in
>>> the
>>> VALUES block.
>>> Thanks for the reminder on VALUES order
>>>>
>>>> This sounds like memory/cache thrashing. From what you have
>>>> described,
>>> running variants on this query 50k times, you are basically walking
>>> over
>>> your entire dataset extracting it piece by piece?
>>> Dataset is larger, these small sets (VALUES) are coming from out
>>> external index for similar document search. Index returns id and
>>> related
>>> metadata is fetched from Jena.
>>>>
>>>> Assuming the Graph URI and the URIs in your VALUES block change in
>>>> each
>>> query then every query is looking at a different section of the
>>> database
>>> causing a lot of data to be cached and then evicted both in terms of
>>> on-heap memory structures (the node table cache) and potentially
>>> also for
>>> the off heap memory mapped files which may be being paged in and out
>>> as the
>>> code traverses the B-Tree indexes.
>>>>
>>>> Is there also some other query involved that extracts the Graph
>>>> URIs and
>>> Subject URIs of interest that is being executed in parallel with the
>>> script? Or has the input from the script been pre-calculated ahead of
>>> time, comes from elsewhere etc?
>>> There is no parrallelism from our part in this case. Only one php
>>> script
>>> running and making GSP calls.
>>>>
>>>> Rob
>>>>
>>>> On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi>
>>> wrote:
>>>>
>>>>
>>>> Server:
>>>>
>>>> /usr/bin/java
>>>>
>>> -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
>>>
>>>> -Xmx5600M -jar fuseki-server.jar --update --port 3030
>>>> --loc=/home/text/tools/jena_data_test/ /ds
>>>>
>>>> No custom configs, default installation package.
>>>>
>>>>
>>>> Sparql similar to this (returns 5-10 triplets) :
>>>>
>>>> CONSTRUCT { ?s ?p ?o }
>>>> FROM <
>>> https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
>>>> WHERE
>>>> {
>>>> ?s ?p ?o
>>>>
>>>> VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
>>>> lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
>>>> lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
>>>> lsr:239c6da0-4c24-4539-a277-c9756d6257ee
>>>> lsr:2ef0190d-6271-447a-992f-6225fc440897
>>>> lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
>>>> lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
>>>> lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
>>>> lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
>>>> lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
>>>> }
>>>>
>>>>
>>>> I solved this by adding sleep to script. So I guess it's
>>>> about the
>>> java
>>>> memory manager not getting time to free memory? Even with
>>>> sleep it
>>> was
>>>> barely doable, memory consumption changing rapidly between
>>>> 1,5 gig
>>> - 6 gig.
>>>>
>>>>
>>>>
>>>> On 29/01/2019 15:50, Andy Seaborne wrote:
>>>> > Mikael,
>>>> >
>>>> > There aren't enough details except to mention the suspects
>>>> like
>>> sorting.
>>>> >
>>>> > With all the questions on the list, I personally don't
>>>> track the
>>>> > details of each installation so please also remind me of your
>>> current
>>>> > setup.
>>>> >
>>>> > Andy
>>>> >
>>>> > On 29/01/2019 11:32, Mikael Pesonen wrote:
>>>> >>
>>>> >> I'm not able to run a basic read-only script without
>>>> running out
>>> of
>>>> >> memory on the server.
>>>> >>
>>>> >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills
>>> Fuseki
>>>> >> when running out of memory.
>>>> >> All I'm running is simple sparql query getting few triples of
>>>> >> resource. This is run for about 50k times.
>>>> >>
>>>> >> All settings are default, using GSP.
>>>> >>
>>>> >>
>>>>
>>>> --
>>>> Lingsoft - 30 years of Leading Language Management
>>>>
>>>> www.lingsoft.fi
>>>>
>>>> Speech Applications - Language Management - Translation -
>>>> Reader's
>>> and Writer's Tools - Text Tools - E-books and M-books
>>>>
>>>> Mikael Pesonen
>>>> System Engineer
>>>>
>>>> e-mail: mikael.pesonen@lingsoft.fi
>>>> Tel. +358 2 279 3300
>>>>
>>>> Time zone: GMT+2
>>>>
>>>> Helsinki Office
>>>> Eteläranta 10
>>>> FI-00130 Helsinki
>>>> FINLAND
>>>>
>>>> Turku Office
>>>> Kauppiaskatu 5 A
>>>> FI-20100 Turku
>>>> FINLAND
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> Lingsoft - 30 years of Leading Language Management
>>>
>>> www.lingsoft.fi
>>>
>>> Speech Applications - Language Management - Translation - Reader's and
>>> Writer's Tools - Text Tools - E-books and M-books
>>>
>>> Mikael Pesonen
>>> System Engineer
>>>
>>> e-mail: mikael.pesonen@lingsoft.fi
>>> Tel. +358 2 279 3300
>>>
>>> Time zone: GMT+2
>>>
>>> Helsinki Office
>>> Eteläranta 10
>>> FI-00130 Helsinki
>>> FINLAND
>>>
>>> Turku Office
>>> Kauppiaskatu 5 A
>>> FI-20100 Turku
>>> FINLAND
>>>
>>>
>>
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Out of memory
Posted by Andy Seaborne <an...@apache.org>.
TDB uses the OS file cache via mmap files.
The files appear as part of the process address space but of course they
are not part of the heap. It also flex up and down as needed (unlike the
heap).
Some sys tools report the total address - and that is not the amount of
RAM the process is using.
In top(1) Linux-speak: roughly VIRT and RES (assuming no old-fashioned
swapping is going on which with java should be avoided at all costs -
the JVM heap on swap is very bad for performance).
RES is approximately yhe
visualvm allows you see the heap size. That's the figure to look at first.
For Mikael,
-Xmx5600M
process space 7+gigs
(VM 10+ gigs)
so
(For Fuseki+TDB its either heap or mapped files - there isn't use of
direct memory (RAM, but not heap).
and start with -Xms5600M as well.
Andy
On 29/01/2019 17:41, Dan Pritts wrote:
> It's often misunderstood, but Java programs use memory in addition to the
> configured heap. Fuseki in my experience sometimes uses a LOT more, more
> than I could explain. Some of the folks here (Andy for sure) spent some
> time looking at it with me and weren't able to come to any conclusions.
> You can look throught he list archives for the discussion, maybe 6 months
> ago.
>
> I ended up significantly overallocating memory to the instance and being
> done with it.
>
> How much RAM does your instance have? You mentioned -Xmx 5600, and total
> usage of 17GB ram+swap - sounds like you have maybe 8GB ram? I'd try
> 16GB and see how it does; watch the total memory usage.
>
>
>
> On Tue, Jan 29, 2019 at 9:43 AM Mikael Pesonen <mi...@lingsoft.fi>
> wrote:
>
>>
>>
>>
>> On 29/01/2019 16:28, Rob Vesse wrote:
>>> This may be partly a case of a simple looking query having unexpected
>> execution semantics. Strictly speaking your query says select all triples
>> in the specific graph then join them with these list of values for ?s. Now
>> the optimiser should, and does appear, to do the right thing and flip the
>> join order i.e. it uses the concrete values from the VALUES block to search
>> for triples with those subjects in the specific graph. However if the
>> query had other elements involved the optimiser might not kick in, a better
>> query would place the VALUES prior to using the variables defined in the
>> VALUES block.
>> Thanks for the reminder on VALUES order
>>>
>>> This sounds like memory/cache thrashing. From what you have described,
>> running variants on this query 50k times, you are basically walking over
>> your entire dataset extracting it piece by piece?
>> Dataset is larger, these small sets (VALUES) are coming from out
>> external index for similar document search. Index returns id and related
>> metadata is fetched from Jena.
>>>
>>> Assuming the Graph URI and the URIs in your VALUES block change in each
>> query then every query is looking at a different section of the database
>> causing a lot of data to be cached and then evicted both in terms of
>> on-heap memory structures (the node table cache) and potentially also for
>> the off heap memory mapped files which may be being paged in and out as the
>> code traverses the B-Tree indexes.
>>>
>>> Is there also some other query involved that extracts the Graph URIs and
>> Subject URIs of interest that is being executed in parallel with the
>> script? Or has the input from the script been pre-calculated ahead of
>> time, comes from elsewhere etc?
>> There is no parrallelism from our part in this case. Only one php script
>> running and making GSP calls.
>>>
>>> Rob
>>>
>>> On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi>
>> wrote:
>>>
>>>
>>> Server:
>>>
>>> /usr/bin/java
>>>
>> -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
>>> -Xmx5600M -jar fuseki-server.jar --update --port 3030
>>> --loc=/home/text/tools/jena_data_test/ /ds
>>>
>>> No custom configs, default installation package.
>>>
>>>
>>> Sparql similar to this (returns 5-10 triplets) :
>>>
>>> CONSTRUCT { ?s ?p ?o }
>>> FROM <
>> https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
>>> WHERE
>>> {
>>> ?s ?p ?o
>>>
>>> VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
>>> lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
>>> lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
>>> lsr:239c6da0-4c24-4539-a277-c9756d6257ee
>>> lsr:2ef0190d-6271-447a-992f-6225fc440897
>>> lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
>>> lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
>>> lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
>>> lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
>>> lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
>>> }
>>>
>>>
>>> I solved this by adding sleep to script. So I guess it's about the
>> java
>>> memory manager not getting time to free memory? Even with sleep it
>> was
>>> barely doable, memory consumption changing rapidly between 1,5 gig
>> - 6 gig.
>>>
>>>
>>>
>>> On 29/01/2019 15:50, Andy Seaborne wrote:
>>> > Mikael,
>>> >
>>> > There aren't enough details except to mention the suspects like
>> sorting.
>>> >
>>> > With all the questions on the list, I personally don't track the
>>> > details of each installation so please also remind me of your
>> current
>>> > setup.
>>> >
>>> > Andy
>>> >
>>> > On 29/01/2019 11:32, Mikael Pesonen wrote:
>>> >>
>>> >> I'm not able to run a basic read-only script without running out
>> of
>>> >> memory on the server.
>>> >>
>>> >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills
>> Fuseki
>>> >> when running out of memory.
>>> >> All I'm running is simple sparql query getting few triples of
>>> >> resource. This is run for about 50k times.
>>> >>
>>> >> All settings are default, using GSP.
>>> >>
>>> >>
>>>
>>> --
>>> Lingsoft - 30 years of Leading Language Management
>>>
>>> www.lingsoft.fi
>>>
>>> Speech Applications - Language Management - Translation - Reader's
>> and Writer's Tools - Text Tools - E-books and M-books
>>>
>>> Mikael Pesonen
>>> System Engineer
>>>
>>> e-mail: mikael.pesonen@lingsoft.fi
>>> Tel. +358 2 279 3300
>>>
>>> Time zone: GMT+2
>>>
>>> Helsinki Office
>>> Eteläranta 10
>>> FI-00130 Helsinki
>>> FINLAND
>>>
>>> Turku Office
>>> Kauppiaskatu 5 A
>>> FI-20100 Turku
>>> FINLAND
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Lingsoft - 30 years of Leading Language Management
>>
>> www.lingsoft.fi
>>
>> Speech Applications - Language Management - Translation - Reader's and
>> Writer's Tools - Text Tools - E-books and M-books
>>
>> Mikael Pesonen
>> System Engineer
>>
>> e-mail: mikael.pesonen@lingsoft.fi
>> Tel. +358 2 279 3300
>>
>> Time zone: GMT+2
>>
>> Helsinki Office
>> Eteläranta 10
>> FI-00130 Helsinki
>> FINLAND
>>
>> Turku Office
>> Kauppiaskatu 5 A
>> FI-20100 Turku
>> FINLAND
>>
>>
>
Re: Out of memory
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Sorry meant -Xmx. Debugging Fuseki is out of my work scope, but good to
know that 16GB seems to do the trick. So maybe deploy more cases on same
bigger server instead of splitting them on smaller ones.
On 05/02/2019 17:48, Rob Vesse wrote:
> And I realise browsing back through the thread that you mentioned that you don't have a desktop in a previous reply. So I presume you mean you only have terminal access to the machine where you are running Fuseki?
>
> In which case you might want to try out jvmtop - https://github.com/patric-r/jvmtop - as an open source command line based JVM profiler
>
> Rob
>
> On 05/02/2019, 15:44, "Rob Vesse" <rv...@dotnetrdf.org> wrote:
>
> -Xms and -Xmx do two different things (the previous email in the thread mentioned -Xmx but then you referenced -Xms in your question).
>
> The former sets the minimum heap size which is the minimum amount of memory the JVM will allocate for the heap when it starts
>
> The latter sets the maximum heap size which is the maximum amount of memory the JVM will allocate for the heap during runtime. The heap may start smaller than this and grow up to this maximum
>
> When one/both of these is not set your JVM chooses default values, usually based upon some percentage of the system memory. Exact behaviour will vary between JVMs.
>
> As I think has been suggested earlier in this thread if you are continuing to have issues with memory consumption your best bet to investigate further is to attach a JVM profiler to the running Fuseki process. With that you can take Snapshots of the memory usage over time and inspect them to see where the memory consumption is going.
>
> Visual VM - https://visualvm.github.io - is one such free tool, there are of course other free and proprietary JVM profilers available.
>
> Rob
>
> On 05/02/2019, 11:07, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:
>
>
> Tested with 16GB, and java mem usage goes up to 10G (virt 14G).
> Wondering what does the java -Xms do actually...
>
> There was no way to limit mem usage for 8GB server?
>
>
> On 29/01/2019 19:41, Dan Pritts wrote:
> > It's often misunderstood, but Java programs use memory in addition to the
> > configured heap. Fuseki in my experience sometimes uses a LOT more, more
> > than I could explain. Some of the folks here (Andy for sure) spent some
> > time looking at it with me and weren't able to come to any conclusions.
> > You can look throught he list archives for the discussion, maybe 6 months
> > ago.
> >
> > I ended up significantly overallocating memory to the instance and being
> > done with it.
> >
> > How much RAM does your instance have? You mentioned -Xmx 5600, and total
> > usage of 17GB ram+swap - sounds like you have maybe 8GB ram? I'd try
> > 16GB and see how it does; watch the total memory usage.
> >
> >
> >
> > On Tue, Jan 29, 2019 at 9:43 AM Mikael Pesonen <mi...@lingsoft.fi>
> > wrote:
> >
> >>
> >>
> >> On 29/01/2019 16:28, Rob Vesse wrote:
> >>> This may be partly a case of a simple looking query having unexpected
> >> execution semantics. Strictly speaking your query says select all triples
> >> in the specific graph then join them with these list of values for ?s. Now
> >> the optimiser should, and does appear, to do the right thing and flip the
> >> join order i.e. it uses the concrete values from the VALUES block to search
> >> for triples with those subjects in the specific graph. However if the
> >> query had other elements involved the optimiser might not kick in, a better
> >> query would place the VALUES prior to using the variables defined in the
> >> VALUES block.
> >> Thanks for the reminder on VALUES order
> >>> This sounds like memory/cache thrashing. From what you have described,
> >> running variants on this query 50k times, you are basically walking over
> >> your entire dataset extracting it piece by piece?
> >> Dataset is larger, these small sets (VALUES) are coming from out
> >> external index for similar document search. Index returns id and related
> >> metadata is fetched from Jena.
> >>> Assuming the Graph URI and the URIs in your VALUES block change in each
> >> query then every query is looking at a different section of the database
> >> causing a lot of data to be cached and then evicted both in terms of
> >> on-heap memory structures (the node table cache) and potentially also for
> >> the off heap memory mapped files which may be being paged in and out as the
> >> code traverses the B-Tree indexes.
> >>> Is there also some other query involved that extracts the Graph URIs and
> >> Subject URIs of interest that is being executed in parallel with the
> >> script? Or has the input from the script been pre-calculated ahead of
> >> time, comes from elsewhere etc?
> >> There is no parrallelism from our part in this case. Only one php script
> >> running and making GSP calls.
> >>> Rob
> >>>
> >>> On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi>
> >> wrote:
> >>>
> >>> Server:
> >>>
> >>> /usr/bin/java
> >>>
> >> -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
> >>> -Xmx5600M -jar fuseki-server.jar --update --port 3030
> >>> --loc=/home/text/tools/jena_data_test/ /ds
> >>>
> >>> No custom configs, default installation package.
> >>>
> >>>
> >>> Sparql similar to this (returns 5-10 triplets) :
> >>>
> >>> CONSTRUCT { ?s ?p ?o }
> >>> FROM <
> >> https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
> >>> WHERE
> >>> {
> >>> ?s ?p ?o
> >>>
> >>> VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
> >>> lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
> >>> lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
> >>> lsr:239c6da0-4c24-4539-a277-c9756d6257ee
> >>> lsr:2ef0190d-6271-447a-992f-6225fc440897
> >>> lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
> >>> lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
> >>> lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
> >>> lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
> >>> lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
> >>> }
> >>>
> >>>
> >>> I solved this by adding sleep to script. So I guess it's about the
> >> java
> >>> memory manager not getting time to free memory? Even with sleep it
> >> was
> >>> barely doable, memory consumption changing rapidly between 1,5 gig
> >> - 6 gig.
> >>>
> >>>
> >>> On 29/01/2019 15:50, Andy Seaborne wrote:
> >>> > Mikael,
> >>> >
> >>> > There aren't enough details except to mention the suspects like
> >> sorting.
> >>> >
> >>> > With all the questions on the list, I personally don't track the
> >>> > details of each installation so please also remind me of your
> >> current
> >>> > setup.
> >>> >
> >>> > Andy
> >>> >
> >>> > On 29/01/2019 11:32, Mikael Pesonen wrote:
> >>> >>
> >>> >> I'm not able to run a basic read-only script without running out
> >> of
> >>> >> memory on the server.
> >>> >>
> >>> >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills
> >> Fuseki
> >>> >> when running out of memory.
> >>> >> All I'm running is simple sparql query getting few triples of
> >>> >> resource. This is run for about 50k times.
> >>> >>
> >>> >> All settings are default, using GSP.
> >>> >>
> >>> >>
> >>>
> >>> --
> >>> Lingsoft - 30 years of Leading Language Management
> >>>
> >>> www.lingsoft.fi
> >>>
> >>> Speech Applications - Language Management - Translation - Reader's
> >> and Writer's Tools - Text Tools - E-books and M-books
> >>> Mikael Pesonen
> >>> System Engineer
> >>>
> >>> e-mail: mikael.pesonen@lingsoft.fi
> >>> Tel. +358 2 279 3300
> >>>
> >>> Time zone: GMT+2
> >>>
> >>> Helsinki Office
> >>> Eteläranta 10
> >>> FI-00130 Helsinki
> >>> FINLAND
> >>>
> >>> Turku Office
> >>> Kauppiaskatu 5 A
> >>> FI-20100 Turku
> >>> FINLAND
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >> --
> >> Lingsoft - 30 years of Leading Language Management
> >>
> >> www.lingsoft.fi
> >>
> >> Speech Applications - Language Management - Translation - Reader's and
> >> Writer's Tools - Text Tools - E-books and M-books
> >>
> >> Mikael Pesonen
> >> System Engineer
> >>
> >> e-mail: mikael.pesonen@lingsoft.fi
> >> Tel. +358 2 279 3300
> >>
> >> Time zone: GMT+2
> >>
> >> Helsinki Office
> >> Eteläranta 10
> >> FI-00130 Helsinki
> >> FINLAND
> >>
> >> Turku Office
> >> Kauppiaskatu 5 A
> >> FI-20100 Turku
> >> FINLAND
> >>
> >>
>
> --
> Lingsoft - 30 years of Leading Language Management
>
> www.lingsoft.fi
>
> Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
>
> Mikael Pesonen
> System Engineer
>
> e-mail: mikael.pesonen@lingsoft.fi
> Tel. +358 2 279 3300
>
> Time zone: GMT+2
>
> Helsinki Office
> Eteläranta 10
> FI-00130 Helsinki
> FINLAND
>
> Turku Office
> Kauppiaskatu 5 A
> FI-20100 Turku
> FINLAND
>
>
>
>
>
>
>
>
>
>
>
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Out of memory
Posted by Rob Vesse <rv...@dotnetrdf.org>.
And I realise browsing back through the thread that you mentioned that you don't have a desktop in a previous reply. So I presume you mean you only have terminal access to the machine where you are running Fuseki?
In which case you might want to try out jvmtop - https://github.com/patric-r/jvmtop - as an open source command line based JVM profiler
Rob
On 05/02/2019, 15:44, "Rob Vesse" <rv...@dotnetrdf.org> wrote:
-Xms and -Xmx do two different things (the previous email in the thread mentioned -Xmx but then you referenced -Xms in your question).
The former sets the minimum heap size which is the minimum amount of memory the JVM will allocate for the heap when it starts
The latter sets the maximum heap size which is the maximum amount of memory the JVM will allocate for the heap during runtime. The heap may start smaller than this and grow up to this maximum
When one/both of these is not set your JVM chooses default values, usually based upon some percentage of the system memory. Exact behaviour will vary between JVMs.
As I think has been suggested earlier in this thread if you are continuing to have issues with memory consumption your best bet to investigate further is to attach a JVM profiler to the running Fuseki process. With that you can take Snapshots of the memory usage over time and inspect them to see where the memory consumption is going.
Visual VM - https://visualvm.github.io - is one such free tool, there are of course other free and proprietary JVM profilers available.
Rob
On 05/02/2019, 11:07, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:
Tested with 16GB, and java mem usage goes up to 10G (virt 14G).
Wondering what does the java -Xms do actually...
There was no way to limit mem usage for 8GB server?
On 29/01/2019 19:41, Dan Pritts wrote:
> It's often misunderstood, but Java programs use memory in addition to the
> configured heap. Fuseki in my experience sometimes uses a LOT more, more
> than I could explain. Some of the folks here (Andy for sure) spent some
> time looking at it with me and weren't able to come to any conclusions.
> You can look throught he list archives for the discussion, maybe 6 months
> ago.
>
> I ended up significantly overallocating memory to the instance and being
> done with it.
>
> How much RAM does your instance have? You mentioned -Xmx 5600, and total
> usage of 17GB ram+swap - sounds like you have maybe 8GB ram? I'd try
> 16GB and see how it does; watch the total memory usage.
>
>
>
> On Tue, Jan 29, 2019 at 9:43 AM Mikael Pesonen <mi...@lingsoft.fi>
> wrote:
>
>>
>>
>> On 29/01/2019 16:28, Rob Vesse wrote:
>>> This may be partly a case of a simple looking query having unexpected
>> execution semantics. Strictly speaking your query says select all triples
>> in the specific graph then join them with these list of values for ?s. Now
>> the optimiser should, and does appear, to do the right thing and flip the
>> join order i.e. it uses the concrete values from the VALUES block to search
>> for triples with those subjects in the specific graph. However if the
>> query had other elements involved the optimiser might not kick in, a better
>> query would place the VALUES prior to using the variables defined in the
>> VALUES block.
>> Thanks for the reminder on VALUES order
>>> This sounds like memory/cache thrashing. From what you have described,
>> running variants on this query 50k times, you are basically walking over
>> your entire dataset extracting it piece by piece?
>> Dataset is larger, these small sets (VALUES) are coming from out
>> external index for similar document search. Index returns id and related
>> metadata is fetched from Jena.
>>> Assuming the Graph URI and the URIs in your VALUES block change in each
>> query then every query is looking at a different section of the database
>> causing a lot of data to be cached and then evicted both in terms of
>> on-heap memory structures (the node table cache) and potentially also for
>> the off heap memory mapped files which may be being paged in and out as the
>> code traverses the B-Tree indexes.
>>> Is there also some other query involved that extracts the Graph URIs and
>> Subject URIs of interest that is being executed in parallel with the
>> script? Or has the input from the script been pre-calculated ahead of
>> time, comes from elsewhere etc?
>> There is no parrallelism from our part in this case. Only one php script
>> running and making GSP calls.
>>> Rob
>>>
>>> On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi>
>> wrote:
>>>
>>> Server:
>>>
>>> /usr/bin/java
>>>
>> -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
>>> -Xmx5600M -jar fuseki-server.jar --update --port 3030
>>> --loc=/home/text/tools/jena_data_test/ /ds
>>>
>>> No custom configs, default installation package.
>>>
>>>
>>> Sparql similar to this (returns 5-10 triplets) :
>>>
>>> CONSTRUCT { ?s ?p ?o }
>>> FROM <
>> https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
>>> WHERE
>>> {
>>> ?s ?p ?o
>>>
>>> VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
>>> lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
>>> lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
>>> lsr:239c6da0-4c24-4539-a277-c9756d6257ee
>>> lsr:2ef0190d-6271-447a-992f-6225fc440897
>>> lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
>>> lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
>>> lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
>>> lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
>>> lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
>>> }
>>>
>>>
>>> I solved this by adding sleep to script. So I guess it's about the
>> java
>>> memory manager not getting time to free memory? Even with sleep it
>> was
>>> barely doable, memory consumption changing rapidly between 1,5 gig
>> - 6 gig.
>>>
>>>
>>> On 29/01/2019 15:50, Andy Seaborne wrote:
>>> > Mikael,
>>> >
>>> > There aren't enough details except to mention the suspects like
>> sorting.
>>> >
>>> > With all the questions on the list, I personally don't track the
>>> > details of each installation so please also remind me of your
>> current
>>> > setup.
>>> >
>>> > Andy
>>> >
>>> > On 29/01/2019 11:32, Mikael Pesonen wrote:
>>> >>
>>> >> I'm not able to run a basic read-only script without running out
>> of
>>> >> memory on the server.
>>> >>
>>> >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills
>> Fuseki
>>> >> when running out of memory.
>>> >> All I'm running is simple sparql query getting few triples of
>>> >> resource. This is run for about 50k times.
>>> >>
>>> >> All settings are default, using GSP.
>>> >>
>>> >>
>>>
>>> --
>>> Lingsoft - 30 years of Leading Language Management
>>>
>>> www.lingsoft.fi
>>>
>>> Speech Applications - Language Management - Translation - Reader's
>> and Writer's Tools - Text Tools - E-books and M-books
>>> Mikael Pesonen
>>> System Engineer
>>>
>>> e-mail: mikael.pesonen@lingsoft.fi
>>> Tel. +358 2 279 3300
>>>
>>> Time zone: GMT+2
>>>
>>> Helsinki Office
>>> Eteläranta 10
>>> FI-00130 Helsinki
>>> FINLAND
>>>
>>> Turku Office
>>> Kauppiaskatu 5 A
>>> FI-20100 Turku
>>> FINLAND
>>>
>>>
>>>
>>>
>>>
>>>
>> --
>> Lingsoft - 30 years of Leading Language Management
>>
>> www.lingsoft.fi
>>
>> Speech Applications - Language Management - Translation - Reader's and
>> Writer's Tools - Text Tools - E-books and M-books
>>
>> Mikael Pesonen
>> System Engineer
>>
>> e-mail: mikael.pesonen@lingsoft.fi
>> Tel. +358 2 279 3300
>>
>> Time zone: GMT+2
>>
>> Helsinki Office
>> Eteläranta 10
>> FI-00130 Helsinki
>> FINLAND
>>
>> Turku Office
>> Kauppiaskatu 5 A
>> FI-20100 Turku
>> FINLAND
>>
>>
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Out of memory
Posted by Rob Vesse <rv...@dotnetrdf.org>.
-Xms and -Xmx do two different things (the previous email in the thread mentioned -Xmx but then you referenced -Xms in your question).
The former sets the minimum heap size which is the minimum amount of memory the JVM will allocate for the heap when it starts
The latter sets the maximum heap size which is the maximum amount of memory the JVM will allocate for the heap during runtime. The heap may start smaller than this and grow up to this maximum
When one/both of these is not set your JVM chooses default values, usually based upon some percentage of the system memory. Exact behaviour will vary between JVMs.
As I think has been suggested earlier in this thread if you are continuing to have issues with memory consumption your best bet to investigate further is to attach a JVM profiler to the running Fuseki process. With that you can take Snapshots of the memory usage over time and inspect them to see where the memory consumption is going.
Visual VM - https://visualvm.github.io - is one such free tool, there are of course other free and proprietary JVM profilers available.
Rob
On 05/02/2019, 11:07, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:
Tested with 16GB, and java mem usage goes up to 10G (virt 14G).
Wondering what does the java -Xms do actually...
There was no way to limit mem usage for 8GB server?
On 29/01/2019 19:41, Dan Pritts wrote:
> It's often misunderstood, but Java programs use memory in addition to the
> configured heap. Fuseki in my experience sometimes uses a LOT more, more
> than I could explain. Some of the folks here (Andy for sure) spent some
> time looking at it with me and weren't able to come to any conclusions.
> You can look throught he list archives for the discussion, maybe 6 months
> ago.
>
> I ended up significantly overallocating memory to the instance and being
> done with it.
>
> How much RAM does your instance have? You mentioned -Xmx 5600, and total
> usage of 17GB ram+swap - sounds like you have maybe 8GB ram? I'd try
> 16GB and see how it does; watch the total memory usage.
>
>
>
> On Tue, Jan 29, 2019 at 9:43 AM Mikael Pesonen <mi...@lingsoft.fi>
> wrote:
>
>>
>>
>> On 29/01/2019 16:28, Rob Vesse wrote:
>>> This may be partly a case of a simple looking query having unexpected
>> execution semantics. Strictly speaking your query says select all triples
>> in the specific graph then join them with these list of values for ?s. Now
>> the optimiser should, and does appear, to do the right thing and flip the
>> join order i.e. it uses the concrete values from the VALUES block to search
>> for triples with those subjects in the specific graph. However if the
>> query had other elements involved the optimiser might not kick in, a better
>> query would place the VALUES prior to using the variables defined in the
>> VALUES block.
>> Thanks for the reminder on VALUES order
>>> This sounds like memory/cache thrashing. From what you have described,
>> running variants on this query 50k times, you are basically walking over
>> your entire dataset extracting it piece by piece?
>> Dataset is larger, these small sets (VALUES) are coming from out
>> external index for similar document search. Index returns id and related
>> metadata is fetched from Jena.
>>> Assuming the Graph URI and the URIs in your VALUES block change in each
>> query then every query is looking at a different section of the database
>> causing a lot of data to be cached and then evicted both in terms of
>> on-heap memory structures (the node table cache) and potentially also for
>> the off heap memory mapped files which may be being paged in and out as the
>> code traverses the B-Tree indexes.
>>> Is there also some other query involved that extracts the Graph URIs and
>> Subject URIs of interest that is being executed in parallel with the
>> script? Or has the input from the script been pre-calculated ahead of
>> time, comes from elsewhere etc?
>> There is no parrallelism from our part in this case. Only one php script
>> running and making GSP calls.
>>> Rob
>>>
>>> On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi>
>> wrote:
>>>
>>> Server:
>>>
>>> /usr/bin/java
>>>
>> -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
>>> -Xmx5600M -jar fuseki-server.jar --update --port 3030
>>> --loc=/home/text/tools/jena_data_test/ /ds
>>>
>>> No custom configs, default installation package.
>>>
>>>
>>> Sparql similar to this (returns 5-10 triplets) :
>>>
>>> CONSTRUCT { ?s ?p ?o }
>>> FROM <
>> https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
>>> WHERE
>>> {
>>> ?s ?p ?o
>>>
>>> VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
>>> lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
>>> lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
>>> lsr:239c6da0-4c24-4539-a277-c9756d6257ee
>>> lsr:2ef0190d-6271-447a-992f-6225fc440897
>>> lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
>>> lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
>>> lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
>>> lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
>>> lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
>>> }
>>>
>>>
>>> I solved this by adding sleep to script. So I guess it's about the
>> java
>>> memory manager not getting time to free memory? Even with sleep it
>> was
>>> barely doable, memory consumption changing rapidly between 1,5 gig
>> - 6 gig.
>>>
>>>
>>> On 29/01/2019 15:50, Andy Seaborne wrote:
>>> > Mikael,
>>> >
>>> > There aren't enough details except to mention the suspects like
>> sorting.
>>> >
>>> > With all the questions on the list, I personally don't track the
>>> > details of each installation so please also remind me of your
>> current
>>> > setup.
>>> >
>>> > Andy
>>> >
>>> > On 29/01/2019 11:32, Mikael Pesonen wrote:
>>> >>
>>> >> I'm not able to run a basic read-only script without running out
>> of
>>> >> memory on the server.
>>> >>
>>> >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills
>> Fuseki
>>> >> when running out of memory.
>>> >> All I'm running is simple sparql query getting few triples of
>>> >> resource. This is run for about 50k times.
>>> >>
>>> >> All settings are default, using GSP.
>>> >>
>>> >>
>>>
>>> --
>>> Lingsoft - 30 years of Leading Language Management
>>>
>>> www.lingsoft.fi
>>>
>>> Speech Applications - Language Management - Translation - Reader's
>> and Writer's Tools - Text Tools - E-books and M-books
>>> Mikael Pesonen
>>> System Engineer
>>>
>>> e-mail: mikael.pesonen@lingsoft.fi
>>> Tel. +358 2 279 3300
>>>
>>> Time zone: GMT+2
>>>
>>> Helsinki Office
>>> Eteläranta 10
>>> FI-00130 Helsinki
>>> FINLAND
>>>
>>> Turku Office
>>> Kauppiaskatu 5 A
>>> FI-20100 Turku
>>> FINLAND
>>>
>>>
>>>
>>>
>>>
>>>
>> --
>> Lingsoft - 30 years of Leading Language Management
>>
>> www.lingsoft.fi
>>
>> Speech Applications - Language Management - Translation - Reader's and
>> Writer's Tools - Text Tools - E-books and M-books
>>
>> Mikael Pesonen
>> System Engineer
>>
>> e-mail: mikael.pesonen@lingsoft.fi
>> Tel. +358 2 279 3300
>>
>> Time zone: GMT+2
>>
>> Helsinki Office
>> Eteläranta 10
>> FI-00130 Helsinki
>> FINLAND
>>
>> Turku Office
>> Kauppiaskatu 5 A
>> FI-20100 Turku
>> FINLAND
>>
>>
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Out of memory
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Tested with 16GB, and java mem usage goes up to 10G (virt 14G).
Wondering what does the java -Xms do actually...
There was no way to limit mem usage for 8GB server?
On 29/01/2019 19:41, Dan Pritts wrote:
> It's often misunderstood, but Java programs use memory in addition to the
> configured heap. Fuseki in my experience sometimes uses a LOT more, more
> than I could explain. Some of the folks here (Andy for sure) spent some
> time looking at it with me and weren't able to come to any conclusions.
> You can look throught he list archives for the discussion, maybe 6 months
> ago.
>
> I ended up significantly overallocating memory to the instance and being
> done with it.
>
> How much RAM does your instance have? You mentioned -Xmx 5600, and total
> usage of 17GB ram+swap - sounds like you have maybe 8GB ram? I'd try
> 16GB and see how it does; watch the total memory usage.
>
>
>
> On Tue, Jan 29, 2019 at 9:43 AM Mikael Pesonen <mi...@lingsoft.fi>
> wrote:
>
>>
>>
>> On 29/01/2019 16:28, Rob Vesse wrote:
>>> This may be partly a case of a simple looking query having unexpected
>> execution semantics. Strictly speaking your query says select all triples
>> in the specific graph then join them with these list of values for ?s. Now
>> the optimiser should, and does appear, to do the right thing and flip the
>> join order i.e. it uses the concrete values from the VALUES block to search
>> for triples with those subjects in the specific graph. However if the
>> query had other elements involved the optimiser might not kick in, a better
>> query would place the VALUES prior to using the variables defined in the
>> VALUES block.
>> Thanks for the reminder on VALUES order
>>> This sounds like memory/cache thrashing. From what you have described,
>> running variants on this query 50k times, you are basically walking over
>> your entire dataset extracting it piece by piece?
>> Dataset is larger, these small sets (VALUES) are coming from out
>> external index for similar document search. Index returns id and related
>> metadata is fetched from Jena.
>>> Assuming the Graph URI and the URIs in your VALUES block change in each
>> query then every query is looking at a different section of the database
>> causing a lot of data to be cached and then evicted both in terms of
>> on-heap memory structures (the node table cache) and potentially also for
>> the off heap memory mapped files which may be being paged in and out as the
>> code traverses the B-Tree indexes.
>>> Is there also some other query involved that extracts the Graph URIs and
>> Subject URIs of interest that is being executed in parallel with the
>> script? Or has the input from the script been pre-calculated ahead of
>> time, comes from elsewhere etc?
>> There is no parrallelism from our part in this case. Only one php script
>> running and making GSP calls.
>>> Rob
>>>
>>> On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi>
>> wrote:
>>>
>>> Server:
>>>
>>> /usr/bin/java
>>>
>> -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
>>> -Xmx5600M -jar fuseki-server.jar --update --port 3030
>>> --loc=/home/text/tools/jena_data_test/ /ds
>>>
>>> No custom configs, default installation package.
>>>
>>>
>>> Sparql similar to this (returns 5-10 triplets) :
>>>
>>> CONSTRUCT { ?s ?p ?o }
>>> FROM <
>> https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
>>> WHERE
>>> {
>>> ?s ?p ?o
>>>
>>> VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
>>> lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
>>> lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
>>> lsr:239c6da0-4c24-4539-a277-c9756d6257ee
>>> lsr:2ef0190d-6271-447a-992f-6225fc440897
>>> lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
>>> lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
>>> lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
>>> lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
>>> lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
>>> }
>>>
>>>
>>> I solved this by adding sleep to script. So I guess it's about the
>> java
>>> memory manager not getting time to free memory? Even with sleep it
>> was
>>> barely doable, memory consumption changing rapidly between 1,5 gig
>> - 6 gig.
>>>
>>>
>>> On 29/01/2019 15:50, Andy Seaborne wrote:
>>> > Mikael,
>>> >
>>> > There aren't enough details except to mention the suspects like
>> sorting.
>>> >
>>> > With all the questions on the list, I personally don't track the
>>> > details of each installation so please also remind me of your
>> current
>>> > setup.
>>> >
>>> > Andy
>>> >
>>> > On 29/01/2019 11:32, Mikael Pesonen wrote:
>>> >>
>>> >> I'm not able to run a basic read-only script without running out
>> of
>>> >> memory on the server.
>>> >>
>>> >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills
>> Fuseki
>>> >> when running out of memory.
>>> >> All I'm running is simple sparql query getting few triples of
>>> >> resource. This is run for about 50k times.
>>> >>
>>> >> All settings are default, using GSP.
>>> >>
>>> >>
>>>
>>> --
>>> Lingsoft - 30 years of Leading Language Management
>>>
>>> www.lingsoft.fi
>>>
>>> Speech Applications - Language Management - Translation - Reader's
>> and Writer's Tools - Text Tools - E-books and M-books
>>> Mikael Pesonen
>>> System Engineer
>>>
>>> e-mail: mikael.pesonen@lingsoft.fi
>>> Tel. +358 2 279 3300
>>>
>>> Time zone: GMT+2
>>>
>>> Helsinki Office
>>> Eteläranta 10
>>> FI-00130 Helsinki
>>> FINLAND
>>>
>>> Turku Office
>>> Kauppiaskatu 5 A
>>> FI-20100 Turku
>>> FINLAND
>>>
>>>
>>>
>>>
>>>
>>>
>> --
>> Lingsoft - 30 years of Leading Language Management
>>
>> www.lingsoft.fi
>>
>> Speech Applications - Language Management - Translation - Reader's and
>> Writer's Tools - Text Tools - E-books and M-books
>>
>> Mikael Pesonen
>> System Engineer
>>
>> e-mail: mikael.pesonen@lingsoft.fi
>> Tel. +358 2 279 3300
>>
>> Time zone: GMT+2
>>
>> Helsinki Office
>> Eteläranta 10
>> FI-00130 Helsinki
>> FINLAND
>>
>> Turku Office
>> Kauppiaskatu 5 A
>> FI-20100 Turku
>> FINLAND
>>
>>
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Out of memory
Posted by Dan Pritts <da...@umich.edu>.
It's often misunderstood, but Java programs use memory in addition to the
configured heap. Fuseki in my experience sometimes uses a LOT more, more
than I could explain. Some of the folks here (Andy for sure) spent some
time looking at it with me and weren't able to come to any conclusions.
You can look throught he list archives for the discussion, maybe 6 months
ago.
I ended up significantly overallocating memory to the instance and being
done with it.
How much RAM does your instance have? You mentioned -Xmx 5600, and total
usage of 17GB ram+swap - sounds like you have maybe 8GB ram? I'd try
16GB and see how it does; watch the total memory usage.
On Tue, Jan 29, 2019 at 9:43 AM Mikael Pesonen <mi...@lingsoft.fi>
wrote:
>
>
>
> On 29/01/2019 16:28, Rob Vesse wrote:
> > This may be partly a case of a simple looking query having unexpected
> execution semantics. Strictly speaking your query says select all triples
> in the specific graph then join them with these list of values for ?s. Now
> the optimiser should, and does appear, to do the right thing and flip the
> join order i.e. it uses the concrete values from the VALUES block to search
> for triples with those subjects in the specific graph. However if the
> query had other elements involved the optimiser might not kick in, a better
> query would place the VALUES prior to using the variables defined in the
> VALUES block.
> Thanks for the reminder on VALUES order
> >
> > This sounds like memory/cache thrashing. From what you have described,
> running variants on this query 50k times, you are basically walking over
> your entire dataset extracting it piece by piece?
> Dataset is larger, these small sets (VALUES) are coming from out
> external index for similar document search. Index returns id and related
> metadata is fetched from Jena.
> >
> > Assuming the Graph URI and the URIs in your VALUES block change in each
> query then every query is looking at a different section of the database
> causing a lot of data to be cached and then evicted both in terms of
> on-heap memory structures (the node table cache) and potentially also for
> the off heap memory mapped files which may be being paged in and out as the
> code traverses the B-Tree indexes.
> >
> > Is there also some other query involved that extracts the Graph URIs and
> Subject URIs of interest that is being executed in parallel with the
> script? Or has the input from the script been pre-calculated ahead of
> time, comes from elsewhere etc?
> There is no parrallelism from our part in this case. Only one php script
> running and making GSP calls.
> >
> > Rob
> >
> > On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi>
> wrote:
> >
> >
> > Server:
> >
> > /usr/bin/java
> >
> -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
> > -Xmx5600M -jar fuseki-server.jar --update --port 3030
> > --loc=/home/text/tools/jena_data_test/ /ds
> >
> > No custom configs, default installation package.
> >
> >
> > Sparql similar to this (returns 5-10 triplets) :
> >
> > CONSTRUCT { ?s ?p ?o }
> > FROM <
> https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
> > WHERE
> > {
> > ?s ?p ?o
> >
> > VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
> > lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
> > lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
> > lsr:239c6da0-4c24-4539-a277-c9756d6257ee
> > lsr:2ef0190d-6271-447a-992f-6225fc440897
> > lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
> > lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
> > lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
> > lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
> > lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
> > }
> >
> >
> > I solved this by adding sleep to script. So I guess it's about the
> java
> > memory manager not getting time to free memory? Even with sleep it
> was
> > barely doable, memory consumption changing rapidly between 1,5 gig
> - 6 gig.
> >
> >
> >
> > On 29/01/2019 15:50, Andy Seaborne wrote:
> > > Mikael,
> > >
> > > There aren't enough details except to mention the suspects like
> sorting.
> > >
> > > With all the questions on the list, I personally don't track the
> > > details of each installation so please also remind me of your
> current
> > > setup.
> > >
> > > Andy
> > >
> > > On 29/01/2019 11:32, Mikael Pesonen wrote:
> > >>
> > >> I'm not able to run a basic read-only script without running out
> of
> > >> memory on the server.
> > >>
> > >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills
> Fuseki
> > >> when running out of memory.
> > >> All I'm running is simple sparql query getting few triples of
> > >> resource. This is run for about 50k times.
> > >>
> > >> All settings are default, using GSP.
> > >>
> > >>
> >
> > --
> > Lingsoft - 30 years of Leading Language Management
> >
> > www.lingsoft.fi
> >
> > Speech Applications - Language Management - Translation - Reader's
> and Writer's Tools - Text Tools - E-books and M-books
> >
> > Mikael Pesonen
> > System Engineer
> >
> > e-mail: mikael.pesonen@lingsoft.fi
> > Tel. +358 2 279 3300
> >
> > Time zone: GMT+2
> >
> > Helsinki Office
> > Eteläranta 10
> > FI-00130 Helsinki
> > FINLAND
> >
> > Turku Office
> > Kauppiaskatu 5 A
> > FI-20100 Turku
> > FINLAND
> >
> >
> >
> >
> >
> >
>
> --
> Lingsoft - 30 years of Leading Language Management
>
> www.lingsoft.fi
>
> Speech Applications - Language Management - Translation - Reader's and
> Writer's Tools - Text Tools - E-books and M-books
>
> Mikael Pesonen
> System Engineer
>
> e-mail: mikael.pesonen@lingsoft.fi
> Tel. +358 2 279 3300
>
> Time zone: GMT+2
>
> Helsinki Office
> Eteläranta 10
> FI-00130 Helsinki
> FINLAND
>
> Turku Office
> Kauppiaskatu 5 A
> FI-20100 Turku
> FINLAND
>
>
--
Dan Pritts
ICPSR Computing & Network Services
University of Michigan
Re: Out of memory
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
On 29/01/2019 16:28, Rob Vesse wrote:
> This may be partly a case of a simple looking query having unexpected execution semantics. Strictly speaking your query says select all triples in the specific graph then join them with these list of values for ?s. Now the optimiser should, and does appear, to do the right thing and flip the join order i.e. it uses the concrete values from the VALUES block to search for triples with those subjects in the specific graph. However if the query had other elements involved the optimiser might not kick in, a better query would place the VALUES prior to using the variables defined in the VALUES block.
Thanks for the reminder on VALUES order
>
> This sounds like memory/cache thrashing. From what you have described, running variants on this query 50k times, you are basically walking over your entire dataset extracting it piece by piece?
Dataset is larger, these small sets (VALUES) are coming from out
external index for similar document search. Index returns id and related
metadata is fetched from Jena.
>
> Assuming the Graph URI and the URIs in your VALUES block change in each query then every query is looking at a different section of the database causing a lot of data to be cached and then evicted both in terms of on-heap memory structures (the node table cache) and potentially also for the off heap memory mapped files which may be being paged in and out as the code traverses the B-Tree indexes.
>
> Is there also some other query involved that extracts the Graph URIs and Subject URIs of interest that is being executed in parallel with the script? Or has the input from the script been pre-calculated ahead of time, comes from elsewhere etc?
There is no parrallelism from our part in this case. Only one php script
running and making GSP calls.
>
> Rob
>
> On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:
>
>
> Server:
>
> /usr/bin/java
> -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
> -Xmx5600M -jar fuseki-server.jar --update --port 3030
> --loc=/home/text/tools/jena_data_test/ /ds
>
> No custom configs, default installation package.
>
>
> Sparql similar to this (returns 5-10 triplets) :
>
> CONSTRUCT { ?s ?p ?o }
> FROM <https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
> WHERE
> {
> ?s ?p ?o
>
> VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
> lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
> lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
> lsr:239c6da0-4c24-4539-a277-c9756d6257ee
> lsr:2ef0190d-6271-447a-992f-6225fc440897
> lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
> lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
> lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
> lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
> lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
> }
>
>
> I solved this by adding sleep to script. So I guess it's about the java
> memory manager not getting time to free memory? Even with sleep it was
> barely doable, memory consumption changing rapidly between 1,5 gig - 6 gig.
>
>
>
> On 29/01/2019 15:50, Andy Seaborne wrote:
> > Mikael,
> >
> > There aren't enough details except to mention the suspects like sorting.
> >
> > With all the questions on the list, I personally don't track the
> > details of each installation so please also remind me of your current
> > setup.
> >
> > Andy
> >
> > On 29/01/2019 11:32, Mikael Pesonen wrote:
> >>
> >> I'm not able to run a basic read-only script without running out of
> >> memory on the server.
> >>
> >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills Fuseki
> >> when running out of memory.
> >> All I'm running is simple sparql query getting few triples of
> >> resource. This is run for about 50k times.
> >>
> >> All settings are default, using GSP.
> >>
> >>
>
> --
> Lingsoft - 30 years of Leading Language Management
>
> www.lingsoft.fi
>
> Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
>
> Mikael Pesonen
> System Engineer
>
> e-mail: mikael.pesonen@lingsoft.fi
> Tel. +358 2 279 3300
>
> Time zone: GMT+2
>
> Helsinki Office
> Eteläranta 10
> FI-00130 Helsinki
> FINLAND
>
> Turku Office
> Kauppiaskatu 5 A
> FI-20100 Turku
> FINLAND
>
>
>
>
>
>
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Out of memory
Posted by Rob Vesse <rv...@dotnetrdf.org>.
This may be partly a case of a simple looking query having unexpected execution semantics. Strictly speaking your query says select all triples in the specific graph then join them with these list of values for ?s. Now the optimiser should, and does appear, to do the right thing and flip the join order i.e. it uses the concrete values from the VALUES block to search for triples with those subjects in the specific graph. However if the query had other elements involved the optimiser might not kick in, a better query would place the VALUES prior to using the variables defined in the VALUES block.
This sounds like memory/cache thrashing. From what you have described, running variants on this query 50k times, you are basically walking over your entire dataset extracting it piece by piece?
Assuming the Graph URI and the URIs in your VALUES block change in each query then every query is looking at a different section of the database causing a lot of data to be cached and then evicted both in terms of on-heap memory structures (the node table cache) and potentially also for the off heap memory mapped files which may be being paged in and out as the code traverses the B-Tree indexes.
Is there also some other query involved that extracts the Graph URIs and Subject URIs of interest that is being executed in parallel with the script? Or has the input from the script been pre-calculated ahead of time, comes from elsewhere etc?
Rob
On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:
Server:
/usr/bin/java
-Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
-Xmx5600M -jar fuseki-server.jar --update --port 3030
--loc=/home/text/tools/jena_data_test/ /ds
No custom configs, default installation package.
Sparql similar to this (returns 5-10 triplets) :
CONSTRUCT { ?s ?p ?o }
FROM <https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
WHERE
{
?s ?p ?o
VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
lsr:239c6da0-4c24-4539-a277-c9756d6257ee
lsr:2ef0190d-6271-447a-992f-6225fc440897
lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
}
I solved this by adding sleep to script. So I guess it's about the java
memory manager not getting time to free memory? Even with sleep it was
barely doable, memory consumption changing rapidly between 1,5 gig - 6 gig.
On 29/01/2019 15:50, Andy Seaborne wrote:
> Mikael,
>
> There aren't enough details except to mention the suspects like sorting.
>
> With all the questions on the list, I personally don't track the
> details of each installation so please also remind me of your current
> setup.
>
> Andy
>
> On 29/01/2019 11:32, Mikael Pesonen wrote:
>>
>> I'm not able to run a basic read-only script without running out of
>> memory on the server.
>>
>> Consumption goes to 7+gigs (VM 10+ gigs), then system kills Fuseki
>> when running out of memory.
>> All I'm running is simple sparql query getting few triples of
>> resource. This is run for about 50k times.
>>
>> All settings are default, using GSP.
>>
>>
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Out of memory
Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Server:
/usr/bin/java
-Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
-Xmx5600M -jar fuseki-server.jar --update --port 3030
--loc=/home/text/tools/jena_data_test/ /ds
No custom configs, default installation package.
Sparql similar to this (returns 5-10 triplets) :
CONSTRUCT { ?s ?p ?o }
FROM <https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
WHERE
{
?s ?p ?o
VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
lsr:239c6da0-4c24-4539-a277-c9756d6257ee
lsr:2ef0190d-6271-447a-992f-6225fc440897
lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
}
I solved this by adding sleep to script. So I guess it's about the java
memory manager not getting time to free memory? Even with sleep it was
barely doable, memory consumption changing rapidly between 1,5 gig - 6 gig.
On 29/01/2019 15:50, Andy Seaborne wrote:
> Mikael,
>
> There aren't enough details except to mention the suspects like sorting.
>
> With all the questions on the list, I personally don't track the
> details of each installation so please also remind me of your current
> setup.
>
> Andy
>
> On 29/01/2019 11:32, Mikael Pesonen wrote:
>>
>> I'm not able to run a basic read-only script without running out of
>> memory on the server.
>>
>> Consumption goes to 7+gigs (VM 10+ gigs), then system kills Fuseki
>> when running out of memory.
>> All I'm running is simple sparql query getting few triples of
>> resource. This is run for about 50k times.
>>
>> All settings are default, using GSP.
>>
>>
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND
Re: Out of memory
Posted by Andy Seaborne <an...@apache.org>.
Mikael,
There aren't enough details except to mention the suspects like sorting.
With all the questions on the list, I personally don't track the details
of each installation so please also remind me of your current setup.
Andy
On 29/01/2019 11:32, Mikael Pesonen wrote:
>
> I'm not able to run a basic read-only script without running out of
> memory on the server.
>
> Consumption goes to 7+gigs (VM 10+ gigs), then system kills Fuseki when
> running out of memory.
> All I'm running is simple sparql query getting few triples of resource.
> This is run for about 50k times.
>
> All settings are default, using GSP.
>
>