You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Mikael Pesonen <mi...@lingsoft.fi> on 2019/01/29 11:32:30 UTC

Out of memory

I'm not able to run a basic read-only script without running out of 
memory on the server.

Consumption goes to 7+gigs (VM 10+ gigs), then system kills Fuseki when 
running out of memory.
All I'm running is simple sparql query getting few triples of resource. 
This is run for about 50k times.

All settings are default, using GSP.


-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Out of memory

Posted by Andy Seaborne <an...@apache.org>.

On 29/01/2019 14:28, Rob Vesse wrote:
> This may be partly a case of a simple looking query having unexpected execution semantics.  Strictly speaking your query says select all triples in the specific graph then join them with these list of values for ?s.  Now the optimiser should, and does appear, to do the right thing and flip the join order i.e. it uses the concrete values from the VALUES block to search for triples with those subjects in the specific graph.  However if the query had other elements involved the optimiser might not kick in, a better query would place the VALUES prior to using the variables defined in the VALUES block.
> 
> This sounds like memory/cache thrashing.  From what you have described, running variants on this query 50k times, you are basically walking over your entire dataset extracting it piece by piece?

And as that happen, more and more of the nodes get cached.  The node 
cache is of a fixed number of if the literals are big, the size is big. 
The cache is usually 1-2G per database but it can be more.

And then ther is workspace - and it might be the GC is close to full, 
meaning the GC is doing a lot of work.

How much free RAM is there before the 50K queries start? (visualvm and 
force a GC).  visualvm also tells you how much work the GC is doing.

      Andy

> 
> Assuming the Graph URI and the URIs in your VALUES block change in each query then every query is looking at a different section of the database causing a lot of data to be cached and then evicted both in terms of on-heap memory structures (the node table cache) and potentially also for the off heap memory mapped files which may be being paged in and out as the code traverses the B-Tree indexes.
> 
> Is there also some other query involved that extracts the Graph URIs and Subject URIs of interest that is being executed in parallel with the script?  Or has the input from the script been pre-calculated ahead of time, comes from elsewhere etc?
> 
> Rob
> 
> On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:
> 
>      
>      Server:
>      
>      /usr/bin/java
>      -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
>      -Xmx5600M -jar fuseki-server.jar --update --port 3030
>      --loc=/home/text/tools/jena_data_test/ /ds
>      
>      No custom configs, default installation package.
>      
>      
>      Sparql similar to this (returns 5-10 triplets) :
>      
>      CONSTRUCT { ?s ?p ?o }
>      FROM <https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
>      WHERE
>      {
>               ?s ?p ?o
>      
>      VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
>      lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
>      lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
>      lsr:239c6da0-4c24-4539-a277-c9756d6257ee
>      lsr:2ef0190d-6271-447a-992f-6225fc440897
>      lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
>      lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
>      lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
>      lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
>      lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
>      }
>      
>      
>      I solved this by adding sleep to script. So I guess it's about the java
>      memory manager not getting time to free memory? Even with sleep it was
>      barely doable, memory consumption changing rapidly between 1,5 gig - 6 gig.
>      
>      
>      
>      On 29/01/2019 15:50, Andy Seaborne wrote:
>      > Mikael,
>      >
>      > There aren't enough details except to mention the suspects like sorting.
>      >
>      > With all the questions on the list, I personally don't track the
>      > details of each installation so please also remind me of your current
>      > setup.
>      >
>      >     Andy
>      >
>      > On 29/01/2019 11:32, Mikael Pesonen wrote:
>      >>
>      >> I'm not able to run a basic read-only script without running out of
>      >> memory on the server.
>      >>
>      >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills Fuseki
>      >> when running out of memory.
>      >> All I'm running is simple sparql query getting few triples of
>      >> resource. This is run for about 50k times.
>      >>
>      >> All settings are default, using GSP.
>      >>
>      >>
>      
>      --
>      Lingsoft - 30 years of Leading Language Management
>      
>      www.lingsoft.fi
>      
>      Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
>      
>      Mikael Pesonen
>      System Engineer
>      
>      e-mail: mikael.pesonen@lingsoft.fi
>      Tel. +358 2 279 3300
>      
>      Time zone: GMT+2
>      
>      Helsinki Office
>      Eteläranta 10
>      FI-00130 Helsinki
>      FINLAND
>      
>      Turku Office
>      Kauppiaskatu 5 A
>      FI-20100 Turku
>      FINLAND
>      
>      
> 
> 
> 
> 

Re: Out of memory

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Thanks for these, haven't even heard of this syntax before so have to 
study...

On 29/01/2019 19:25, Andy Seaborne wrote:
> This case should be optimized to be the flipped join(VALUES, BGP)
>
> (prefix ((lsr: <lsr:>))
>   (sequence
>     (table (vars ?s)
>       (row [?s lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8])
>       (row [?s lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985])
>       (row [?s lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902])
>       (row [?s lsr:239c6da0-4c24-4539-a277-c9756d6257ee])
>       (row [?s lsr:2ef0190d-6271-447a-992f-6225fc440897])
>       (row [?s lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9])
>       (row [?s lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf])
>       (row [?s lsr:6f6802cf-0336-4234-90b8-cc8780058f0d])
>       (row [?s lsr:d1e2751b-4332-4d57-95e4-ca8070c16782])
>       (row [?s lsr:81053775-4722-4a00-b3f7-33d4feb3629b])
>     )
>     (bgp (triple ?s ?p ?o))))
>
>     Andy
>
> On 29/01/2019 14:28, Rob Vesse wrote:
>> This may be partly a case of a simple looking query having unexpected 
>> execution semantics.  Strictly speaking your query says select all 
>> triples in the specific graph then join them with these list of 
>> values for ?s.  Now the optimiser should, and does appear, to do the 
>> right thing and flip the join order i.e. it uses the concrete values 
>> from the VALUES block to search for triples with those subjects in 
>> the specific graph.  However if the query had other elements involved 
>> the optimiser might not kick in, a better query would place the 
>> VALUES prior to using the variables defined in the VALUES block.
>>
>> This sounds like memory/cache thrashing.  From what you have 
>> described, running variants on this query 50k times, you are 
>> basically walking over your entire dataset extracting it piece by piece?
>>
>> Assuming the Graph URI and the URIs in your VALUES block change in 
>> each query then every query is looking at a different section of the 
>> database causing a lot of data to be cached and then evicted both in 
>> terms of on-heap memory structures (the node table cache) and 
>> potentially also for the off heap memory mapped files which may be 
>> being paged in and out as the code traverses the B-Tree indexes.
>>
>> Is there also some other query involved that extracts the Graph URIs 
>> and Subject URIs of interest that is being executed in parallel with 
>> the script?  Or has the input from the script been pre-calculated 
>> ahead of time, comes from elsewhere etc?
>>
>> Rob
>>
>> On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi> 
>> wrote:
>>
>>           Server:
>>           /usr/bin/java
>> -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
>>      -Xmx5600M -jar fuseki-server.jar --update --port 3030
>>      --loc=/home/text/tools/jena_data_test/ /ds
>>           No custom configs, default installation package.
>>                Sparql similar to this (returns 5-10 triplets) :
>>           CONSTRUCT { ?s ?p ?o }
>>      FROM 
>> <https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
>>      WHERE
>>      {
>>               ?s ?p ?o
>>           VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
>>      lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
>>      lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
>>      lsr:239c6da0-4c24-4539-a277-c9756d6257ee
>>      lsr:2ef0190d-6271-447a-992f-6225fc440897
>>      lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
>>      lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
>>      lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
>>      lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
>>      lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
>>      }
>>                I solved this by adding sleep to script. So I guess 
>> it's about the java
>>      memory manager not getting time to free memory? Even with sleep 
>> it was
>>      barely doable, memory consumption changing rapidly between 1,5 
>> gig - 6 gig.
>>                     On 29/01/2019 15:50, Andy Seaborne wrote:
>>      > Mikael,
>>      >
>>      > There aren't enough details except to mention the suspects 
>> like sorting.
>>      >
>>      > With all the questions on the list, I personally don't track the
>>      > details of each installation so please also remind me of your 
>> current
>>      > setup.
>>      >
>>      >     Andy
>>      >
>>      > On 29/01/2019 11:32, Mikael Pesonen wrote:
>>      >>
>>      >> I'm not able to run a basic read-only script without running 
>> out of
>>      >> memory on the server.
>>      >>
>>      >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills 
>> Fuseki
>>      >> when running out of memory.
>>      >> All I'm running is simple sparql query getting few triples of
>>      >> resource. This is run for about 50k times.
>>      >>
>>      >> All settings are default, using GSP.
>>      >>
>>      >>
>>           --
>>      Lingsoft - 30 years of Leading Language Management
>>           www.lingsoft.fi
>>           Speech Applications - Language Management - Translation - 
>> Reader's and Writer's Tools - Text Tools - E-books and M-books
>>           Mikael Pesonen
>>      System Engineer
>>           e-mail: mikael.pesonen@lingsoft.fi
>>      Tel. +358 2 279 3300
>>           Time zone: GMT+2
>>           Helsinki Office
>>      Eteläranta 10
>>      FI-00130 Helsinki
>>      FINLAND
>>           Turku Office
>>      Kauppiaskatu 5 A
>>      FI-20100 Turku
>>      FINLAND
>>
>>
>>
>>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Out of memory

Posted by Andy Seaborne <an...@apache.org>.
This case should be optimized to be the flipped join(VALUES, BGP)

(prefix ((lsr: <lsr:>))
   (sequence
     (table (vars ?s)
       (row [?s lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8])
       (row [?s lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985])
       (row [?s lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902])
       (row [?s lsr:239c6da0-4c24-4539-a277-c9756d6257ee])
       (row [?s lsr:2ef0190d-6271-447a-992f-6225fc440897])
       (row [?s lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9])
       (row [?s lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf])
       (row [?s lsr:6f6802cf-0336-4234-90b8-cc8780058f0d])
       (row [?s lsr:d1e2751b-4332-4d57-95e4-ca8070c16782])
       (row [?s lsr:81053775-4722-4a00-b3f7-33d4feb3629b])
     )
     (bgp (triple ?s ?p ?o))))

     Andy

On 29/01/2019 14:28, Rob Vesse wrote:
> This may be partly a case of a simple looking query having unexpected execution semantics.  Strictly speaking your query says select all triples in the specific graph then join them with these list of values for ?s.  Now the optimiser should, and does appear, to do the right thing and flip the join order i.e. it uses the concrete values from the VALUES block to search for triples with those subjects in the specific graph.  However if the query had other elements involved the optimiser might not kick in, a better query would place the VALUES prior to using the variables defined in the VALUES block.
> 
> This sounds like memory/cache thrashing.  From what you have described, running variants on this query 50k times, you are basically walking over your entire dataset extracting it piece by piece?
> 
> Assuming the Graph URI and the URIs in your VALUES block change in each query then every query is looking at a different section of the database causing a lot of data to be cached and then evicted both in terms of on-heap memory structures (the node table cache) and potentially also for the off heap memory mapped files which may be being paged in and out as the code traverses the B-Tree indexes.
> 
> Is there also some other query involved that extracts the Graph URIs and Subject URIs of interest that is being executed in parallel with the script?  Or has the input from the script been pre-calculated ahead of time, comes from elsewhere etc?
> 
> Rob
> 
> On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:
> 
>      
>      Server:
>      
>      /usr/bin/java
>      -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
>      -Xmx5600M -jar fuseki-server.jar --update --port 3030
>      --loc=/home/text/tools/jena_data_test/ /ds
>      
>      No custom configs, default installation package.
>      
>      
>      Sparql similar to this (returns 5-10 triplets) :
>      
>      CONSTRUCT { ?s ?p ?o }
>      FROM <https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
>      WHERE
>      {
>               ?s ?p ?o
>      
>      VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
>      lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
>      lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
>      lsr:239c6da0-4c24-4539-a277-c9756d6257ee
>      lsr:2ef0190d-6271-447a-992f-6225fc440897
>      lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
>      lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
>      lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
>      lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
>      lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
>      }
>      
>      
>      I solved this by adding sleep to script. So I guess it's about the java
>      memory manager not getting time to free memory? Even with sleep it was
>      barely doable, memory consumption changing rapidly between 1,5 gig - 6 gig.
>      
>      
>      
>      On 29/01/2019 15:50, Andy Seaborne wrote:
>      > Mikael,
>      >
>      > There aren't enough details except to mention the suspects like sorting.
>      >
>      > With all the questions on the list, I personally don't track the
>      > details of each installation so please also remind me of your current
>      > setup.
>      >
>      >     Andy
>      >
>      > On 29/01/2019 11:32, Mikael Pesonen wrote:
>      >>
>      >> I'm not able to run a basic read-only script without running out of
>      >> memory on the server.
>      >>
>      >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills Fuseki
>      >> when running out of memory.
>      >> All I'm running is simple sparql query getting few triples of
>      >> resource. This is run for about 50k times.
>      >>
>      >> All settings are default, using GSP.
>      >>
>      >>
>      
>      --
>      Lingsoft - 30 years of Leading Language Management
>      
>      www.lingsoft.fi
>      
>      Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
>      
>      Mikael Pesonen
>      System Engineer
>      
>      e-mail: mikael.pesonen@lingsoft.fi
>      Tel. +358 2 279 3300
>      
>      Time zone: GMT+2
>      
>      Helsinki Office
>      Eteläranta 10
>      FI-00130 Helsinki
>      FINLAND
>      
>      Turku Office
>      Kauppiaskatu 5 A
>      FI-20100 Turku
>      FINLAND
>      
>      
> 
> 
> 
> 

Re: Out of memory

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
I haven't got desktop but free says

               total        used        free      shared buff/cache   
available
Mem:        8691124     1844328      399084      100032 6447712     6463068
Swap:             0           0           0

when Fuseki is "resting".

On 29/01/2019 22:46, Andy Seaborne wrote:
> TDB uses the OS file cache via mmap files.
>
> The files appear as part of the process address space but of course 
> they are not part of the heap. It also flex up and down as needed 
> (unlike the heap).
>
> Some sys tools report the total address - and that is not the amount 
> of RAM the process is using.
>
> In top(1) Linux-speak: roughly VIRT and RES (assuming no old-fashioned 
> swapping is going on which with java should be avoided at all costs - 
> the JVM heap on swap is very bad for performance).
>
> RES is approximately yhe
>
> visualvm allows you see the heap size. That's the figure to look at 
> first.
>
> For Mikael,
>
> -Xmx5600M
> process space 7+gigs
> (VM 10+ gigs)
>
> so
>
> (For Fuseki+TDB its either heap or mapped files - there isn't use of 
> direct memory (RAM, but not heap).
>
> and start with -Xms5600M as well.
>
>     Andy
>
> On 29/01/2019 17:41, Dan Pritts wrote:
>> It's often misunderstood, but Java programs use memory in addition to 
>> the
>> configured heap.  Fuseki in my experience sometimes uses a LOT more, 
>> more
>> than I could explain.  Some of the folks here (Andy for sure) spent some
>> time looking at it with me and weren't able to come to any conclusions.
>> You can look throught he list archives for the discussion, maybe 6 
>> months
>> ago.
>>
>> I ended up significantly overallocating memory to the instance and being
>> done with it.
>>
>> How much RAM does your instance have?  You mentioned -Xmx 5600, and 
>> total
>> usage of 17GB ram+swap - sounds like you have maybe 8GB ram? I'd try
>> 16GB and see how it does; watch the total memory usage.
>>
>>
>>
>> On Tue, Jan 29, 2019 at 9:43 AM Mikael Pesonen 
>> <mi...@lingsoft.fi>
>> wrote:
>>
>>>
>>>
>>>
>>> On 29/01/2019 16:28, Rob Vesse wrote:
>>>> This may be partly a case of a simple looking query having unexpected
>>> execution semantics.  Strictly speaking your query says select all 
>>> triples
>>> in the specific graph then join them with these list of values for 
>>> ?s.  Now
>>> the optimiser should, and does appear, to do the right thing and 
>>> flip the
>>> join order i.e. it uses the concrete values from the VALUES block to 
>>> search
>>> for triples with those subjects in the specific graph. However if the
>>> query had other elements involved the optimiser might not kick in, a 
>>> better
>>> query would place the VALUES prior to using the variables defined in 
>>> the
>>> VALUES block.
>>> Thanks for the reminder on VALUES order
>>>>
>>>> This sounds like memory/cache thrashing.  From what you have 
>>>> described,
>>> running variants on this query 50k times, you are basically walking 
>>> over
>>> your entire dataset extracting it piece by piece?
>>> Dataset is larger, these small sets (VALUES) are coming from out
>>> external index for similar document search. Index returns id and 
>>> related
>>> metadata is fetched from Jena.
>>>>
>>>> Assuming the Graph URI and the URIs in your VALUES block change in 
>>>> each
>>> query then every query is looking at a different section of the 
>>> database
>>> causing a lot of data to be cached and then evicted both in terms of
>>> on-heap memory structures (the node table cache) and potentially 
>>> also for
>>> the off heap memory mapped files which may be being paged in and out 
>>> as the
>>> code traverses the B-Tree indexes.
>>>>
>>>> Is there also some other query involved that extracts the Graph 
>>>> URIs and
>>> Subject URIs of interest that is being executed in parallel with the
>>> script?  Or has the input from the script been pre-calculated ahead of
>>> time, comes from elsewhere etc?
>>> There is no parrallelism from our part in this case. Only one php 
>>> script
>>> running and making GSP calls.
>>>>
>>>> Rob
>>>>
>>>> On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi>
>>> wrote:
>>>>
>>>>
>>>>       Server:
>>>>
>>>>       /usr/bin/java
>>>>
>>> -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties 
>>>
>>>>       -Xmx5600M -jar fuseki-server.jar --update --port 3030
>>>>       --loc=/home/text/tools/jena_data_test/ /ds
>>>>
>>>>       No custom configs, default installation package.
>>>>
>>>>
>>>>       Sparql similar to this (returns 5-10 triplets) :
>>>>
>>>>       CONSTRUCT { ?s ?p ?o }
>>>>       FROM <
>>> https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
>>>>       WHERE
>>>>       {
>>>>                ?s ?p ?o
>>>>
>>>>       VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
>>>>       lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
>>>>       lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
>>>>       lsr:239c6da0-4c24-4539-a277-c9756d6257ee
>>>>       lsr:2ef0190d-6271-447a-992f-6225fc440897
>>>>       lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
>>>>       lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
>>>>       lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
>>>>       lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
>>>>       lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
>>>>       }
>>>>
>>>>
>>>>       I solved this by adding sleep to script. So I guess it's 
>>>> about the
>>> java
>>>>       memory manager not getting time to free memory? Even with 
>>>> sleep it
>>> was
>>>>       barely doable, memory consumption changing rapidly between 
>>>> 1,5 gig
>>> - 6 gig.
>>>>
>>>>
>>>>
>>>>       On 29/01/2019 15:50, Andy Seaborne wrote:
>>>>       > Mikael,
>>>>       >
>>>>       > There aren't enough details except to mention the suspects 
>>>> like
>>> sorting.
>>>>       >
>>>>       > With all the questions on the list, I personally don't 
>>>> track the
>>>>       > details of each installation so please also remind me of your
>>> current
>>>>       > setup.
>>>>       >
>>>>       >     Andy
>>>>       >
>>>>       > On 29/01/2019 11:32, Mikael Pesonen wrote:
>>>>       >>
>>>>       >> I'm not able to run a basic read-only script without 
>>>> running out
>>> of
>>>>       >> memory on the server.
>>>>       >>
>>>>       >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills
>>> Fuseki
>>>>       >> when running out of memory.
>>>>       >> All I'm running is simple sparql query getting few triples of
>>>>       >> resource. This is run for about 50k times.
>>>>       >>
>>>>       >> All settings are default, using GSP.
>>>>       >>
>>>>       >>
>>>>
>>>>       --
>>>>       Lingsoft - 30 years of Leading Language Management
>>>>
>>>>       www.lingsoft.fi
>>>>
>>>>       Speech Applications - Language Management - Translation - 
>>>> Reader's
>>> and Writer's Tools - Text Tools - E-books and M-books
>>>>
>>>>       Mikael Pesonen
>>>>       System Engineer
>>>>
>>>>       e-mail: mikael.pesonen@lingsoft.fi
>>>>       Tel. +358 2 279 3300
>>>>
>>>>       Time zone: GMT+2
>>>>
>>>>       Helsinki Office
>>>>       Eteläranta 10
>>>>       FI-00130 Helsinki
>>>>       FINLAND
>>>>
>>>>       Turku Office
>>>>       Kauppiaskatu 5 A
>>>>       FI-20100 Turku
>>>>       FINLAND
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>> -- 
>>> Lingsoft - 30 years of Leading Language Management
>>>
>>> www.lingsoft.fi
>>>
>>> Speech Applications - Language Management - Translation - Reader's and
>>> Writer's Tools - Text Tools - E-books and M-books
>>>
>>> Mikael Pesonen
>>> System Engineer
>>>
>>> e-mail: mikael.pesonen@lingsoft.fi
>>> Tel. +358 2 279 3300
>>>
>>> Time zone: GMT+2
>>>
>>> Helsinki Office
>>> Eteläranta 10
>>> FI-00130 Helsinki
>>> FINLAND
>>>
>>> Turku Office
>>> Kauppiaskatu 5 A
>>> FI-20100 Turku
>>> FINLAND
>>>
>>>
>>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Out of memory

Posted by Andy Seaborne <an...@apache.org>.
TDB uses the OS file cache via mmap files.

The files appear as part of the process address space but of course they 
are not part of the heap. It also flex up and down as needed (unlike the 
heap).

Some sys tools report the total address - and that is not the amount of 
RAM the process is using.

In top(1) Linux-speak: roughly VIRT and RES (assuming no old-fashioned 
swapping is going on which with java should be avoided at all costs - 
the JVM heap on swap is very bad for performance).

RES is approximately yhe

visualvm allows you see the heap size. That's the figure to look at first.

For Mikael,

-Xmx5600M
process space 7+gigs
(VM 10+ gigs)

so

(For Fuseki+TDB its either heap or mapped files - there isn't use of 
direct memory (RAM, but not heap).

and start with -Xms5600M as well.

     Andy

On 29/01/2019 17:41, Dan Pritts wrote:
> It's often misunderstood, but Java programs use memory in addition to the
> configured heap.  Fuseki in my experience sometimes uses a LOT more, more
> than I could explain.  Some of the folks here (Andy for sure) spent some
> time looking at it with me and weren't able to come to any conclusions.
> You can look throught he list archives for the discussion, maybe 6 months
> ago.
> 
> I ended up significantly overallocating memory to the instance and being
> done with it.
> 
> How much RAM does your instance have?  You mentioned -Xmx 5600, and total
> usage of 17GB ram+swap - sounds like you have maybe 8GB ram?    I'd try
> 16GB and see how it does; watch the total memory usage.
> 
> 
> 
> On Tue, Jan 29, 2019 at 9:43 AM Mikael Pesonen <mi...@lingsoft.fi>
> wrote:
> 
>>
>>
>>
>> On 29/01/2019 16:28, Rob Vesse wrote:
>>> This may be partly a case of a simple looking query having unexpected
>> execution semantics.  Strictly speaking your query says select all triples
>> in the specific graph then join them with these list of values for ?s.  Now
>> the optimiser should, and does appear, to do the right thing and flip the
>> join order i.e. it uses the concrete values from the VALUES block to search
>> for triples with those subjects in the specific graph.  However if the
>> query had other elements involved the optimiser might not kick in, a better
>> query would place the VALUES prior to using the variables defined in the
>> VALUES block.
>> Thanks for the reminder on VALUES order
>>>
>>> This sounds like memory/cache thrashing.  From what you have described,
>> running variants on this query 50k times, you are basically walking over
>> your entire dataset extracting it piece by piece?
>> Dataset is larger, these small sets (VALUES) are coming from out
>> external index for similar document search. Index returns id and related
>> metadata is fetched from Jena.
>>>
>>> Assuming the Graph URI and the URIs in your VALUES block change in each
>> query then every query is looking at a different section of the database
>> causing a lot of data to be cached and then evicted both in terms of
>> on-heap memory structures (the node table cache) and potentially also for
>> the off heap memory mapped files which may be being paged in and out as the
>> code traverses the B-Tree indexes.
>>>
>>> Is there also some other query involved that extracts the Graph URIs and
>> Subject URIs of interest that is being executed in parallel with the
>> script?  Or has the input from the script been pre-calculated ahead of
>> time, comes from elsewhere etc?
>> There is no parrallelism from our part in this case. Only one php script
>> running and making GSP calls.
>>>
>>> Rob
>>>
>>> On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi>
>> wrote:
>>>
>>>
>>>       Server:
>>>
>>>       /usr/bin/java
>>>
>> -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
>>>       -Xmx5600M -jar fuseki-server.jar --update --port 3030
>>>       --loc=/home/text/tools/jena_data_test/ /ds
>>>
>>>       No custom configs, default installation package.
>>>
>>>
>>>       Sparql similar to this (returns 5-10 triplets) :
>>>
>>>       CONSTRUCT { ?s ?p ?o }
>>>       FROM <
>> https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
>>>       WHERE
>>>       {
>>>                ?s ?p ?o
>>>
>>>       VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
>>>       lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
>>>       lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
>>>       lsr:239c6da0-4c24-4539-a277-c9756d6257ee
>>>       lsr:2ef0190d-6271-447a-992f-6225fc440897
>>>       lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
>>>       lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
>>>       lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
>>>       lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
>>>       lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
>>>       }
>>>
>>>
>>>       I solved this by adding sleep to script. So I guess it's about the
>> java
>>>       memory manager not getting time to free memory? Even with sleep it
>> was
>>>       barely doable, memory consumption changing rapidly between 1,5 gig
>> - 6 gig.
>>>
>>>
>>>
>>>       On 29/01/2019 15:50, Andy Seaborne wrote:
>>>       > Mikael,
>>>       >
>>>       > There aren't enough details except to mention the suspects like
>> sorting.
>>>       >
>>>       > With all the questions on the list, I personally don't track the
>>>       > details of each installation so please also remind me of your
>> current
>>>       > setup.
>>>       >
>>>       >     Andy
>>>       >
>>>       > On 29/01/2019 11:32, Mikael Pesonen wrote:
>>>       >>
>>>       >> I'm not able to run a basic read-only script without running out
>> of
>>>       >> memory on the server.
>>>       >>
>>>       >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills
>> Fuseki
>>>       >> when running out of memory.
>>>       >> All I'm running is simple sparql query getting few triples of
>>>       >> resource. This is run for about 50k times.
>>>       >>
>>>       >> All settings are default, using GSP.
>>>       >>
>>>       >>
>>>
>>>       --
>>>       Lingsoft - 30 years of Leading Language Management
>>>
>>>       www.lingsoft.fi
>>>
>>>       Speech Applications - Language Management - Translation - Reader's
>> and Writer's Tools - Text Tools - E-books and M-books
>>>
>>>       Mikael Pesonen
>>>       System Engineer
>>>
>>>       e-mail: mikael.pesonen@lingsoft.fi
>>>       Tel. +358 2 279 3300
>>>
>>>       Time zone: GMT+2
>>>
>>>       Helsinki Office
>>>       Eteläranta 10
>>>       FI-00130 Helsinki
>>>       FINLAND
>>>
>>>       Turku Office
>>>       Kauppiaskatu 5 A
>>>       FI-20100 Turku
>>>       FINLAND
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Lingsoft - 30 years of Leading Language Management
>>
>> www.lingsoft.fi
>>
>> Speech Applications - Language Management - Translation - Reader's and
>> Writer's Tools - Text Tools - E-books and M-books
>>
>> Mikael Pesonen
>> System Engineer
>>
>> e-mail: mikael.pesonen@lingsoft.fi
>> Tel. +358 2 279 3300
>>
>> Time zone: GMT+2
>>
>> Helsinki Office
>> Eteläranta 10
>> FI-00130 Helsinki
>> FINLAND
>>
>> Turku Office
>> Kauppiaskatu 5 A
>> FI-20100 Turku
>> FINLAND
>>
>>
> 

Re: Out of memory

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Sorry meant -Xmx. Debugging Fuseki is out of my work scope, but good to 
know that 16GB seems to do the trick. So maybe deploy more cases on same 
bigger server instead of splitting them on smaller ones.

On 05/02/2019 17:48, Rob Vesse wrote:
> And I realise browsing back through the thread that you mentioned that you don't have a desktop in a previous reply.  So I presume you mean you only have terminal access to the machine where you are running Fuseki?
>
> In which case you might want to try out jvmtop - https://github.com/patric-r/jvmtop - as an open source command line based JVM profiler
>
> Rob
>
> On 05/02/2019, 15:44, "Rob Vesse" <rv...@dotnetrdf.org> wrote:
>
>      -Xms and -Xmx do two different things (the previous email in the thread mentioned -Xmx but then you referenced -Xms in your question).
>      
>      The former sets the minimum heap size which is the minimum amount of memory the JVM will allocate for the heap when it starts
>      
>      The latter sets the maximum heap size which is the maximum amount of memory the JVM will allocate for the heap during runtime.  The heap may start smaller than this and grow up to this maximum
>      
>      When one/both of these is not set your JVM chooses default values, usually based upon some percentage of the system memory.  Exact behaviour will vary between JVMs.
>      
>      As I think has been suggested earlier in this thread if you are continuing to have issues with memory consumption your best bet to investigate further is to attach a JVM profiler to the running Fuseki process.  With that you can take Snapshots of the memory usage over time and inspect them to see where the memory consumption is going.
>      
>      Visual VM - https://visualvm.github.io - is one such free tool, there are of course other free and proprietary JVM profilers available.
>      
>      Rob
>      
>      On 05/02/2019, 11:07, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:
>      
>          
>          Tested with 16GB, and java mem usage goes up to 10G (virt 14G).
>          Wondering what does the java -Xms do actually...
>          
>          There was no way to limit mem usage for 8GB server?
>          
>          
>          On 29/01/2019 19:41, Dan Pritts wrote:
>          > It's often misunderstood, but Java programs use memory in addition to the
>          > configured heap.  Fuseki in my experience sometimes uses a LOT more, more
>          > than I could explain.  Some of the folks here (Andy for sure) spent some
>          > time looking at it with me and weren't able to come to any conclusions.
>          > You can look throught he list archives for the discussion, maybe 6 months
>          > ago.
>          >
>          > I ended up significantly overallocating memory to the instance and being
>          > done with it.
>          >
>          > How much RAM does your instance have?  You mentioned -Xmx 5600, and total
>          > usage of 17GB ram+swap - sounds like you have maybe 8GB ram?    I'd try
>          > 16GB and see how it does; watch the total memory usage.
>          >
>          >
>          >
>          > On Tue, Jan 29, 2019 at 9:43 AM Mikael Pesonen <mi...@lingsoft.fi>
>          > wrote:
>          >
>          >>
>          >>
>          >> On 29/01/2019 16:28, Rob Vesse wrote:
>          >>> This may be partly a case of a simple looking query having unexpected
>          >> execution semantics.  Strictly speaking your query says select all triples
>          >> in the specific graph then join them with these list of values for ?s.  Now
>          >> the optimiser should, and does appear, to do the right thing and flip the
>          >> join order i.e. it uses the concrete values from the VALUES block to search
>          >> for triples with those subjects in the specific graph.  However if the
>          >> query had other elements involved the optimiser might not kick in, a better
>          >> query would place the VALUES prior to using the variables defined in the
>          >> VALUES block.
>          >> Thanks for the reminder on VALUES order
>          >>> This sounds like memory/cache thrashing.  From what you have described,
>          >> running variants on this query 50k times, you are basically walking over
>          >> your entire dataset extracting it piece by piece?
>          >> Dataset is larger, these small sets (VALUES) are coming from out
>          >> external index for similar document search. Index returns id and related
>          >> metadata is fetched from Jena.
>          >>> Assuming the Graph URI and the URIs in your VALUES block change in each
>          >> query then every query is looking at a different section of the database
>          >> causing a lot of data to be cached and then evicted both in terms of
>          >> on-heap memory structures (the node table cache) and potentially also for
>          >> the off heap memory mapped files which may be being paged in and out as the
>          >> code traverses the B-Tree indexes.
>          >>> Is there also some other query involved that extracts the Graph URIs and
>          >> Subject URIs of interest that is being executed in parallel with the
>          >> script?  Or has the input from the script been pre-calculated ahead of
>          >> time, comes from elsewhere etc?
>          >> There is no parrallelism from our part in this case. Only one php script
>          >> running and making GSP calls.
>          >>> Rob
>          >>>
>          >>> On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi>
>          >> wrote:
>          >>>
>          >>>       Server:
>          >>>
>          >>>       /usr/bin/java
>          >>>
>          >> -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
>          >>>       -Xmx5600M -jar fuseki-server.jar --update --port 3030
>          >>>       --loc=/home/text/tools/jena_data_test/ /ds
>          >>>
>          >>>       No custom configs, default installation package.
>          >>>
>          >>>
>          >>>       Sparql similar to this (returns 5-10 triplets) :
>          >>>
>          >>>       CONSTRUCT { ?s ?p ?o }
>          >>>       FROM <
>          >> https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
>          >>>       WHERE
>          >>>       {
>          >>>                ?s ?p ?o
>          >>>
>          >>>       VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
>          >>>       lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
>          >>>       lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
>          >>>       lsr:239c6da0-4c24-4539-a277-c9756d6257ee
>          >>>       lsr:2ef0190d-6271-447a-992f-6225fc440897
>          >>>       lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
>          >>>       lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
>          >>>       lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
>          >>>       lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
>          >>>       lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
>          >>>       }
>          >>>
>          >>>
>          >>>       I solved this by adding sleep to script. So I guess it's about the
>          >> java
>          >>>       memory manager not getting time to free memory? Even with sleep it
>          >> was
>          >>>       barely doable, memory consumption changing rapidly between 1,5 gig
>          >> - 6 gig.
>          >>>
>          >>>
>          >>>       On 29/01/2019 15:50, Andy Seaborne wrote:
>          >>>       > Mikael,
>          >>>       >
>          >>>       > There aren't enough details except to mention the suspects like
>          >> sorting.
>          >>>       >
>          >>>       > With all the questions on the list, I personally don't track the
>          >>>       > details of each installation so please also remind me of your
>          >> current
>          >>>       > setup.
>          >>>       >
>          >>>       >     Andy
>          >>>       >
>          >>>       > On 29/01/2019 11:32, Mikael Pesonen wrote:
>          >>>       >>
>          >>>       >> I'm not able to run a basic read-only script without running out
>          >> of
>          >>>       >> memory on the server.
>          >>>       >>
>          >>>       >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills
>          >> Fuseki
>          >>>       >> when running out of memory.
>          >>>       >> All I'm running is simple sparql query getting few triples of
>          >>>       >> resource. This is run for about 50k times.
>          >>>       >>
>          >>>       >> All settings are default, using GSP.
>          >>>       >>
>          >>>       >>
>          >>>
>          >>>       --
>          >>>       Lingsoft - 30 years of Leading Language Management
>          >>>
>          >>>       www.lingsoft.fi
>          >>>
>          >>>       Speech Applications - Language Management - Translation - Reader's
>          >> and Writer's Tools - Text Tools - E-books and M-books
>          >>>       Mikael Pesonen
>          >>>       System Engineer
>          >>>
>          >>>       e-mail: mikael.pesonen@lingsoft.fi
>          >>>       Tel. +358 2 279 3300
>          >>>
>          >>>       Time zone: GMT+2
>          >>>
>          >>>       Helsinki Office
>          >>>       Eteläranta 10
>          >>>       FI-00130 Helsinki
>          >>>       FINLAND
>          >>>
>          >>>       Turku Office
>          >>>       Kauppiaskatu 5 A
>          >>>       FI-20100 Turku
>          >>>       FINLAND
>          >>>
>          >>>
>          >>>
>          >>>
>          >>>
>          >>>
>          >> --
>          >> Lingsoft - 30 years of Leading Language Management
>          >>
>          >> www.lingsoft.fi
>          >>
>          >> Speech Applications - Language Management - Translation - Reader's and
>          >> Writer's Tools - Text Tools - E-books and M-books
>          >>
>          >> Mikael Pesonen
>          >> System Engineer
>          >>
>          >> e-mail: mikael.pesonen@lingsoft.fi
>          >> Tel. +358 2 279 3300
>          >>
>          >> Time zone: GMT+2
>          >>
>          >> Helsinki Office
>          >> Eteläranta 10
>          >> FI-00130 Helsinki
>          >> FINLAND
>          >>
>          >> Turku Office
>          >> Kauppiaskatu 5 A
>          >> FI-20100 Turku
>          >> FINLAND
>          >>
>          >>
>          
>          --
>          Lingsoft - 30 years of Leading Language Management
>          
>          www.lingsoft.fi
>          
>          Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
>          
>          Mikael Pesonen
>          System Engineer
>          
>          e-mail: mikael.pesonen@lingsoft.fi
>          Tel. +358 2 279 3300
>          
>          Time zone: GMT+2
>          
>          Helsinki Office
>          Eteläranta 10
>          FI-00130 Helsinki
>          FINLAND
>          
>          Turku Office
>          Kauppiaskatu 5 A
>          FI-20100 Turku
>          FINLAND
>          
>          
>      
>      
>      
>      
>      
>
>
>
>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Out of memory

Posted by Rob Vesse <rv...@dotnetrdf.org>.
And I realise browsing back through the thread that you mentioned that you don't have a desktop in a previous reply.  So I presume you mean you only have terminal access to the machine where you are running Fuseki?

In which case you might want to try out jvmtop - https://github.com/patric-r/jvmtop - as an open source command line based JVM profiler

Rob

On 05/02/2019, 15:44, "Rob Vesse" <rv...@dotnetrdf.org> wrote:

    -Xms and -Xmx do two different things (the previous email in the thread mentioned -Xmx but then you referenced -Xms in your question).
    
    The former sets the minimum heap size which is the minimum amount of memory the JVM will allocate for the heap when it starts
    
    The latter sets the maximum heap size which is the maximum amount of memory the JVM will allocate for the heap during runtime.  The heap may start smaller than this and grow up to this maximum
    
    When one/both of these is not set your JVM chooses default values, usually based upon some percentage of the system memory.  Exact behaviour will vary between JVMs.
    
    As I think has been suggested earlier in this thread if you are continuing to have issues with memory consumption your best bet to investigate further is to attach a JVM profiler to the running Fuseki process.  With that you can take Snapshots of the memory usage over time and inspect them to see where the memory consumption is going.
    
    Visual VM - https://visualvm.github.io - is one such free tool, there are of course other free and proprietary JVM profilers available.
    
    Rob
    
    On 05/02/2019, 11:07, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:
    
        
        Tested with 16GB, and java mem usage goes up to 10G (virt 14G). 
        Wondering what does the java -Xms do actually...
        
        There was no way to limit mem usage for 8GB server?
        
        
        On 29/01/2019 19:41, Dan Pritts wrote:
        > It's often misunderstood, but Java programs use memory in addition to the
        > configured heap.  Fuseki in my experience sometimes uses a LOT more, more
        > than I could explain.  Some of the folks here (Andy for sure) spent some
        > time looking at it with me and weren't able to come to any conclusions.
        > You can look throught he list archives for the discussion, maybe 6 months
        > ago.
        >
        > I ended up significantly overallocating memory to the instance and being
        > done with it.
        >
        > How much RAM does your instance have?  You mentioned -Xmx 5600, and total
        > usage of 17GB ram+swap - sounds like you have maybe 8GB ram?    I'd try
        > 16GB and see how it does; watch the total memory usage.
        >
        >
        >
        > On Tue, Jan 29, 2019 at 9:43 AM Mikael Pesonen <mi...@lingsoft.fi>
        > wrote:
        >
        >>
        >>
        >> On 29/01/2019 16:28, Rob Vesse wrote:
        >>> This may be partly a case of a simple looking query having unexpected
        >> execution semantics.  Strictly speaking your query says select all triples
        >> in the specific graph then join them with these list of values for ?s.  Now
        >> the optimiser should, and does appear, to do the right thing and flip the
        >> join order i.e. it uses the concrete values from the VALUES block to search
        >> for triples with those subjects in the specific graph.  However if the
        >> query had other elements involved the optimiser might not kick in, a better
        >> query would place the VALUES prior to using the variables defined in the
        >> VALUES block.
        >> Thanks for the reminder on VALUES order
        >>> This sounds like memory/cache thrashing.  From what you have described,
        >> running variants on this query 50k times, you are basically walking over
        >> your entire dataset extracting it piece by piece?
        >> Dataset is larger, these small sets (VALUES) are coming from out
        >> external index for similar document search. Index returns id and related
        >> metadata is fetched from Jena.
        >>> Assuming the Graph URI and the URIs in your VALUES block change in each
        >> query then every query is looking at a different section of the database
        >> causing a lot of data to be cached and then evicted both in terms of
        >> on-heap memory structures (the node table cache) and potentially also for
        >> the off heap memory mapped files which may be being paged in and out as the
        >> code traverses the B-Tree indexes.
        >>> Is there also some other query involved that extracts the Graph URIs and
        >> Subject URIs of interest that is being executed in parallel with the
        >> script?  Or has the input from the script been pre-calculated ahead of
        >> time, comes from elsewhere etc?
        >> There is no parrallelism from our part in this case. Only one php script
        >> running and making GSP calls.
        >>> Rob
        >>>
        >>> On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi>
        >> wrote:
        >>>
        >>>       Server:
        >>>
        >>>       /usr/bin/java
        >>>
        >> -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
        >>>       -Xmx5600M -jar fuseki-server.jar --update --port 3030
        >>>       --loc=/home/text/tools/jena_data_test/ /ds
        >>>
        >>>       No custom configs, default installation package.
        >>>
        >>>
        >>>       Sparql similar to this (returns 5-10 triplets) :
        >>>
        >>>       CONSTRUCT { ?s ?p ?o }
        >>>       FROM <
        >> https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
        >>>       WHERE
        >>>       {
        >>>                ?s ?p ?o
        >>>
        >>>       VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
        >>>       lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
        >>>       lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
        >>>       lsr:239c6da0-4c24-4539-a277-c9756d6257ee
        >>>       lsr:2ef0190d-6271-447a-992f-6225fc440897
        >>>       lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
        >>>       lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
        >>>       lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
        >>>       lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
        >>>       lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
        >>>       }
        >>>
        >>>
        >>>       I solved this by adding sleep to script. So I guess it's about the
        >> java
        >>>       memory manager not getting time to free memory? Even with sleep it
        >> was
        >>>       barely doable, memory consumption changing rapidly between 1,5 gig
        >> - 6 gig.
        >>>
        >>>
        >>>       On 29/01/2019 15:50, Andy Seaborne wrote:
        >>>       > Mikael,
        >>>       >
        >>>       > There aren't enough details except to mention the suspects like
        >> sorting.
        >>>       >
        >>>       > With all the questions on the list, I personally don't track the
        >>>       > details of each installation so please also remind me of your
        >> current
        >>>       > setup.
        >>>       >
        >>>       >     Andy
        >>>       >
        >>>       > On 29/01/2019 11:32, Mikael Pesonen wrote:
        >>>       >>
        >>>       >> I'm not able to run a basic read-only script without running out
        >> of
        >>>       >> memory on the server.
        >>>       >>
        >>>       >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills
        >> Fuseki
        >>>       >> when running out of memory.
        >>>       >> All I'm running is simple sparql query getting few triples of
        >>>       >> resource. This is run for about 50k times.
        >>>       >>
        >>>       >> All settings are default, using GSP.
        >>>       >>
        >>>       >>
        >>>
        >>>       --
        >>>       Lingsoft - 30 years of Leading Language Management
        >>>
        >>>       www.lingsoft.fi
        >>>
        >>>       Speech Applications - Language Management - Translation - Reader's
        >> and Writer's Tools - Text Tools - E-books and M-books
        >>>       Mikael Pesonen
        >>>       System Engineer
        >>>
        >>>       e-mail: mikael.pesonen@lingsoft.fi
        >>>       Tel. +358 2 279 3300
        >>>
        >>>       Time zone: GMT+2
        >>>
        >>>       Helsinki Office
        >>>       Eteläranta 10
        >>>       FI-00130 Helsinki
        >>>       FINLAND
        >>>
        >>>       Turku Office
        >>>       Kauppiaskatu 5 A
        >>>       FI-20100 Turku
        >>>       FINLAND
        >>>
        >>>
        >>>
        >>>
        >>>
        >>>
        >> --
        >> Lingsoft - 30 years of Leading Language Management
        >>
        >> www.lingsoft.fi
        >>
        >> Speech Applications - Language Management - Translation - Reader's and
        >> Writer's Tools - Text Tools - E-books and M-books
        >>
        >> Mikael Pesonen
        >> System Engineer
        >>
        >> e-mail: mikael.pesonen@lingsoft.fi
        >> Tel. +358 2 279 3300
        >>
        >> Time zone: GMT+2
        >>
        >> Helsinki Office
        >> Eteläranta 10
        >> FI-00130 Helsinki
        >> FINLAND
        >>
        >> Turku Office
        >> Kauppiaskatu 5 A
        >> FI-20100 Turku
        >> FINLAND
        >>
        >>
        
        -- 
        Lingsoft - 30 years of Leading Language Management
        
        www.lingsoft.fi
        
        Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
        
        Mikael Pesonen
        System Engineer
        
        e-mail: mikael.pesonen@lingsoft.fi
        Tel. +358 2 279 3300
        
        Time zone: GMT+2
        
        Helsinki Office
        Eteläranta 10
        FI-00130 Helsinki
        FINLAND
        
        Turku Office
        Kauppiaskatu 5 A
        FI-20100 Turku
        FINLAND
        
        
    
    
    
    
    





Re: Out of memory

Posted by Rob Vesse <rv...@dotnetrdf.org>.
-Xms and -Xmx do two different things (the previous email in the thread mentioned -Xmx but then you referenced -Xms in your question).

The former sets the minimum heap size which is the minimum amount of memory the JVM will allocate for the heap when it starts

The latter sets the maximum heap size which is the maximum amount of memory the JVM will allocate for the heap during runtime.  The heap may start smaller than this and grow up to this maximum

When one/both of these is not set your JVM chooses default values, usually based upon some percentage of the system memory.  Exact behaviour will vary between JVMs.

As I think has been suggested earlier in this thread if you are continuing to have issues with memory consumption your best bet to investigate further is to attach a JVM profiler to the running Fuseki process.  With that you can take Snapshots of the memory usage over time and inspect them to see where the memory consumption is going.

Visual VM - https://visualvm.github.io - is one such free tool, there are of course other free and proprietary JVM profilers available.

Rob

On 05/02/2019, 11:07, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:

    
    Tested with 16GB, and java mem usage goes up to 10G (virt 14G). 
    Wondering what does the java -Xms do actually...
    
    There was no way to limit mem usage for 8GB server?
    
    
    On 29/01/2019 19:41, Dan Pritts wrote:
    > It's often misunderstood, but Java programs use memory in addition to the
    > configured heap.  Fuseki in my experience sometimes uses a LOT more, more
    > than I could explain.  Some of the folks here (Andy for sure) spent some
    > time looking at it with me and weren't able to come to any conclusions.
    > You can look throught he list archives for the discussion, maybe 6 months
    > ago.
    >
    > I ended up significantly overallocating memory to the instance and being
    > done with it.
    >
    > How much RAM does your instance have?  You mentioned -Xmx 5600, and total
    > usage of 17GB ram+swap - sounds like you have maybe 8GB ram?    I'd try
    > 16GB and see how it does; watch the total memory usage.
    >
    >
    >
    > On Tue, Jan 29, 2019 at 9:43 AM Mikael Pesonen <mi...@lingsoft.fi>
    > wrote:
    >
    >>
    >>
    >> On 29/01/2019 16:28, Rob Vesse wrote:
    >>> This may be partly a case of a simple looking query having unexpected
    >> execution semantics.  Strictly speaking your query says select all triples
    >> in the specific graph then join them with these list of values for ?s.  Now
    >> the optimiser should, and does appear, to do the right thing and flip the
    >> join order i.e. it uses the concrete values from the VALUES block to search
    >> for triples with those subjects in the specific graph.  However if the
    >> query had other elements involved the optimiser might not kick in, a better
    >> query would place the VALUES prior to using the variables defined in the
    >> VALUES block.
    >> Thanks for the reminder on VALUES order
    >>> This sounds like memory/cache thrashing.  From what you have described,
    >> running variants on this query 50k times, you are basically walking over
    >> your entire dataset extracting it piece by piece?
    >> Dataset is larger, these small sets (VALUES) are coming from out
    >> external index for similar document search. Index returns id and related
    >> metadata is fetched from Jena.
    >>> Assuming the Graph URI and the URIs in your VALUES block change in each
    >> query then every query is looking at a different section of the database
    >> causing a lot of data to be cached and then evicted both in terms of
    >> on-heap memory structures (the node table cache) and potentially also for
    >> the off heap memory mapped files which may be being paged in and out as the
    >> code traverses the B-Tree indexes.
    >>> Is there also some other query involved that extracts the Graph URIs and
    >> Subject URIs of interest that is being executed in parallel with the
    >> script?  Or has the input from the script been pre-calculated ahead of
    >> time, comes from elsewhere etc?
    >> There is no parrallelism from our part in this case. Only one php script
    >> running and making GSP calls.
    >>> Rob
    >>>
    >>> On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi>
    >> wrote:
    >>>
    >>>       Server:
    >>>
    >>>       /usr/bin/java
    >>>
    >> -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
    >>>       -Xmx5600M -jar fuseki-server.jar --update --port 3030
    >>>       --loc=/home/text/tools/jena_data_test/ /ds
    >>>
    >>>       No custom configs, default installation package.
    >>>
    >>>
    >>>       Sparql similar to this (returns 5-10 triplets) :
    >>>
    >>>       CONSTRUCT { ?s ?p ?o }
    >>>       FROM <
    >> https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
    >>>       WHERE
    >>>       {
    >>>                ?s ?p ?o
    >>>
    >>>       VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
    >>>       lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
    >>>       lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
    >>>       lsr:239c6da0-4c24-4539-a277-c9756d6257ee
    >>>       lsr:2ef0190d-6271-447a-992f-6225fc440897
    >>>       lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
    >>>       lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
    >>>       lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
    >>>       lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
    >>>       lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
    >>>       }
    >>>
    >>>
    >>>       I solved this by adding sleep to script. So I guess it's about the
    >> java
    >>>       memory manager not getting time to free memory? Even with sleep it
    >> was
    >>>       barely doable, memory consumption changing rapidly between 1,5 gig
    >> - 6 gig.
    >>>
    >>>
    >>>       On 29/01/2019 15:50, Andy Seaborne wrote:
    >>>       > Mikael,
    >>>       >
    >>>       > There aren't enough details except to mention the suspects like
    >> sorting.
    >>>       >
    >>>       > With all the questions on the list, I personally don't track the
    >>>       > details of each installation so please also remind me of your
    >> current
    >>>       > setup.
    >>>       >
    >>>       >     Andy
    >>>       >
    >>>       > On 29/01/2019 11:32, Mikael Pesonen wrote:
    >>>       >>
    >>>       >> I'm not able to run a basic read-only script without running out
    >> of
    >>>       >> memory on the server.
    >>>       >>
    >>>       >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills
    >> Fuseki
    >>>       >> when running out of memory.
    >>>       >> All I'm running is simple sparql query getting few triples of
    >>>       >> resource. This is run for about 50k times.
    >>>       >>
    >>>       >> All settings are default, using GSP.
    >>>       >>
    >>>       >>
    >>>
    >>>       --
    >>>       Lingsoft - 30 years of Leading Language Management
    >>>
    >>>       www.lingsoft.fi
    >>>
    >>>       Speech Applications - Language Management - Translation - Reader's
    >> and Writer's Tools - Text Tools - E-books and M-books
    >>>       Mikael Pesonen
    >>>       System Engineer
    >>>
    >>>       e-mail: mikael.pesonen@lingsoft.fi
    >>>       Tel. +358 2 279 3300
    >>>
    >>>       Time zone: GMT+2
    >>>
    >>>       Helsinki Office
    >>>       Eteläranta 10
    >>>       FI-00130 Helsinki
    >>>       FINLAND
    >>>
    >>>       Turku Office
    >>>       Kauppiaskatu 5 A
    >>>       FI-20100 Turku
    >>>       FINLAND
    >>>
    >>>
    >>>
    >>>
    >>>
    >>>
    >> --
    >> Lingsoft - 30 years of Leading Language Management
    >>
    >> www.lingsoft.fi
    >>
    >> Speech Applications - Language Management - Translation - Reader's and
    >> Writer's Tools - Text Tools - E-books and M-books
    >>
    >> Mikael Pesonen
    >> System Engineer
    >>
    >> e-mail: mikael.pesonen@lingsoft.fi
    >> Tel. +358 2 279 3300
    >>
    >> Time zone: GMT+2
    >>
    >> Helsinki Office
    >> Eteläranta 10
    >> FI-00130 Helsinki
    >> FINLAND
    >>
    >> Turku Office
    >> Kauppiaskatu 5 A
    >> FI-20100 Turku
    >> FINLAND
    >>
    >>
    
    -- 
    Lingsoft - 30 years of Leading Language Management
    
    www.lingsoft.fi
    
    Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
    
    Mikael Pesonen
    System Engineer
    
    e-mail: mikael.pesonen@lingsoft.fi
    Tel. +358 2 279 3300
    
    Time zone: GMT+2
    
    Helsinki Office
    Eteläranta 10
    FI-00130 Helsinki
    FINLAND
    
    Turku Office
    Kauppiaskatu 5 A
    FI-20100 Turku
    FINLAND
    
    





Re: Out of memory

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Tested with 16GB, and java mem usage goes up to 10G (virt 14G). 
Wondering what does the java -Xms do actually...

There was no way to limit mem usage for 8GB server?


On 29/01/2019 19:41, Dan Pritts wrote:
> It's often misunderstood, but Java programs use memory in addition to the
> configured heap.  Fuseki in my experience sometimes uses a LOT more, more
> than I could explain.  Some of the folks here (Andy for sure) spent some
> time looking at it with me and weren't able to come to any conclusions.
> You can look throught he list archives for the discussion, maybe 6 months
> ago.
>
> I ended up significantly overallocating memory to the instance and being
> done with it.
>
> How much RAM does your instance have?  You mentioned -Xmx 5600, and total
> usage of 17GB ram+swap - sounds like you have maybe 8GB ram?    I'd try
> 16GB and see how it does; watch the total memory usage.
>
>
>
> On Tue, Jan 29, 2019 at 9:43 AM Mikael Pesonen <mi...@lingsoft.fi>
> wrote:
>
>>
>>
>> On 29/01/2019 16:28, Rob Vesse wrote:
>>> This may be partly a case of a simple looking query having unexpected
>> execution semantics.  Strictly speaking your query says select all triples
>> in the specific graph then join them with these list of values for ?s.  Now
>> the optimiser should, and does appear, to do the right thing and flip the
>> join order i.e. it uses the concrete values from the VALUES block to search
>> for triples with those subjects in the specific graph.  However if the
>> query had other elements involved the optimiser might not kick in, a better
>> query would place the VALUES prior to using the variables defined in the
>> VALUES block.
>> Thanks for the reminder on VALUES order
>>> This sounds like memory/cache thrashing.  From what you have described,
>> running variants on this query 50k times, you are basically walking over
>> your entire dataset extracting it piece by piece?
>> Dataset is larger, these small sets (VALUES) are coming from out
>> external index for similar document search. Index returns id and related
>> metadata is fetched from Jena.
>>> Assuming the Graph URI and the URIs in your VALUES block change in each
>> query then every query is looking at a different section of the database
>> causing a lot of data to be cached and then evicted both in terms of
>> on-heap memory structures (the node table cache) and potentially also for
>> the off heap memory mapped files which may be being paged in and out as the
>> code traverses the B-Tree indexes.
>>> Is there also some other query involved that extracts the Graph URIs and
>> Subject URIs of interest that is being executed in parallel with the
>> script?  Or has the input from the script been pre-calculated ahead of
>> time, comes from elsewhere etc?
>> There is no parrallelism from our part in this case. Only one php script
>> running and making GSP calls.
>>> Rob
>>>
>>> On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi>
>> wrote:
>>>
>>>       Server:
>>>
>>>       /usr/bin/java
>>>
>> -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
>>>       -Xmx5600M -jar fuseki-server.jar --update --port 3030
>>>       --loc=/home/text/tools/jena_data_test/ /ds
>>>
>>>       No custom configs, default installation package.
>>>
>>>
>>>       Sparql similar to this (returns 5-10 triplets) :
>>>
>>>       CONSTRUCT { ?s ?p ?o }
>>>       FROM <
>> https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
>>>       WHERE
>>>       {
>>>                ?s ?p ?o
>>>
>>>       VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
>>>       lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
>>>       lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
>>>       lsr:239c6da0-4c24-4539-a277-c9756d6257ee
>>>       lsr:2ef0190d-6271-447a-992f-6225fc440897
>>>       lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
>>>       lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
>>>       lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
>>>       lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
>>>       lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
>>>       }
>>>
>>>
>>>       I solved this by adding sleep to script. So I guess it's about the
>> java
>>>       memory manager not getting time to free memory? Even with sleep it
>> was
>>>       barely doable, memory consumption changing rapidly between 1,5 gig
>> - 6 gig.
>>>
>>>
>>>       On 29/01/2019 15:50, Andy Seaborne wrote:
>>>       > Mikael,
>>>       >
>>>       > There aren't enough details except to mention the suspects like
>> sorting.
>>>       >
>>>       > With all the questions on the list, I personally don't track the
>>>       > details of each installation so please also remind me of your
>> current
>>>       > setup.
>>>       >
>>>       >     Andy
>>>       >
>>>       > On 29/01/2019 11:32, Mikael Pesonen wrote:
>>>       >>
>>>       >> I'm not able to run a basic read-only script without running out
>> of
>>>       >> memory on the server.
>>>       >>
>>>       >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills
>> Fuseki
>>>       >> when running out of memory.
>>>       >> All I'm running is simple sparql query getting few triples of
>>>       >> resource. This is run for about 50k times.
>>>       >>
>>>       >> All settings are default, using GSP.
>>>       >>
>>>       >>
>>>
>>>       --
>>>       Lingsoft - 30 years of Leading Language Management
>>>
>>>       www.lingsoft.fi
>>>
>>>       Speech Applications - Language Management - Translation - Reader's
>> and Writer's Tools - Text Tools - E-books and M-books
>>>       Mikael Pesonen
>>>       System Engineer
>>>
>>>       e-mail: mikael.pesonen@lingsoft.fi
>>>       Tel. +358 2 279 3300
>>>
>>>       Time zone: GMT+2
>>>
>>>       Helsinki Office
>>>       Eteläranta 10
>>>       FI-00130 Helsinki
>>>       FINLAND
>>>
>>>       Turku Office
>>>       Kauppiaskatu 5 A
>>>       FI-20100 Turku
>>>       FINLAND
>>>
>>>
>>>
>>>
>>>
>>>
>> --
>> Lingsoft - 30 years of Leading Language Management
>>
>> www.lingsoft.fi
>>
>> Speech Applications - Language Management - Translation - Reader's and
>> Writer's Tools - Text Tools - E-books and M-books
>>
>> Mikael Pesonen
>> System Engineer
>>
>> e-mail: mikael.pesonen@lingsoft.fi
>> Tel. +358 2 279 3300
>>
>> Time zone: GMT+2
>>
>> Helsinki Office
>> Eteläranta 10
>> FI-00130 Helsinki
>> FINLAND
>>
>> Turku Office
>> Kauppiaskatu 5 A
>> FI-20100 Turku
>> FINLAND
>>
>>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Out of memory

Posted by Dan Pritts <da...@umich.edu>.
It's often misunderstood, but Java programs use memory in addition to the
configured heap.  Fuseki in my experience sometimes uses a LOT more, more
than I could explain.  Some of the folks here (Andy for sure) spent some
time looking at it with me and weren't able to come to any conclusions.
You can look throught he list archives for the discussion, maybe 6 months
ago.

I ended up significantly overallocating memory to the instance and being
done with it.

How much RAM does your instance have?  You mentioned -Xmx 5600, and total
usage of 17GB ram+swap - sounds like you have maybe 8GB ram?    I'd try
16GB and see how it does; watch the total memory usage.



On Tue, Jan 29, 2019 at 9:43 AM Mikael Pesonen <mi...@lingsoft.fi>
wrote:

>
>
>
> On 29/01/2019 16:28, Rob Vesse wrote:
> > This may be partly a case of a simple looking query having unexpected
> execution semantics.  Strictly speaking your query says select all triples
> in the specific graph then join them with these list of values for ?s.  Now
> the optimiser should, and does appear, to do the right thing and flip the
> join order i.e. it uses the concrete values from the VALUES block to search
> for triples with those subjects in the specific graph.  However if the
> query had other elements involved the optimiser might not kick in, a better
> query would place the VALUES prior to using the variables defined in the
> VALUES block.
> Thanks for the reminder on VALUES order
> >
> > This sounds like memory/cache thrashing.  From what you have described,
> running variants on this query 50k times, you are basically walking over
> your entire dataset extracting it piece by piece?
> Dataset is larger, these small sets (VALUES) are coming from out
> external index for similar document search. Index returns id and related
> metadata is fetched from Jena.
> >
> > Assuming the Graph URI and the URIs in your VALUES block change in each
> query then every query is looking at a different section of the database
> causing a lot of data to be cached and then evicted both in terms of
> on-heap memory structures (the node table cache) and potentially also for
> the off heap memory mapped files which may be being paged in and out as the
> code traverses the B-Tree indexes.
> >
> > Is there also some other query involved that extracts the Graph URIs and
> Subject URIs of interest that is being executed in parallel with the
> script?  Or has the input from the script been pre-calculated ahead of
> time, comes from elsewhere etc?
> There is no parrallelism from our part in this case. Only one php script
> running and making GSP calls.
> >
> > Rob
> >
> > On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi>
> wrote:
> >
> >
> >      Server:
> >
> >      /usr/bin/java
> >
> -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
> >      -Xmx5600M -jar fuseki-server.jar --update --port 3030
> >      --loc=/home/text/tools/jena_data_test/ /ds
> >
> >      No custom configs, default installation package.
> >
> >
> >      Sparql similar to this (returns 5-10 triplets) :
> >
> >      CONSTRUCT { ?s ?p ?o }
> >      FROM <
> https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
> >      WHERE
> >      {
> >               ?s ?p ?o
> >
> >      VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
> >      lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
> >      lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
> >      lsr:239c6da0-4c24-4539-a277-c9756d6257ee
> >      lsr:2ef0190d-6271-447a-992f-6225fc440897
> >      lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
> >      lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
> >      lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
> >      lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
> >      lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
> >      }
> >
> >
> >      I solved this by adding sleep to script. So I guess it's about the
> java
> >      memory manager not getting time to free memory? Even with sleep it
> was
> >      barely doable, memory consumption changing rapidly between 1,5 gig
> - 6 gig.
> >
> >
> >
> >      On 29/01/2019 15:50, Andy Seaborne wrote:
> >      > Mikael,
> >      >
> >      > There aren't enough details except to mention the suspects like
> sorting.
> >      >
> >      > With all the questions on the list, I personally don't track the
> >      > details of each installation so please also remind me of your
> current
> >      > setup.
> >      >
> >      >     Andy
> >      >
> >      > On 29/01/2019 11:32, Mikael Pesonen wrote:
> >      >>
> >      >> I'm not able to run a basic read-only script without running out
> of
> >      >> memory on the server.
> >      >>
> >      >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills
> Fuseki
> >      >> when running out of memory.
> >      >> All I'm running is simple sparql query getting few triples of
> >      >> resource. This is run for about 50k times.
> >      >>
> >      >> All settings are default, using GSP.
> >      >>
> >      >>
> >
> >      --
> >      Lingsoft - 30 years of Leading Language Management
> >
> >      www.lingsoft.fi
> >
> >      Speech Applications - Language Management - Translation - Reader's
> and Writer's Tools - Text Tools - E-books and M-books
> >
> >      Mikael Pesonen
> >      System Engineer
> >
> >      e-mail: mikael.pesonen@lingsoft.fi
> >      Tel. +358 2 279 3300
> >
> >      Time zone: GMT+2
> >
> >      Helsinki Office
> >      Eteläranta 10
> >      FI-00130 Helsinki
> >      FINLAND
> >
> >      Turku Office
> >      Kauppiaskatu 5 A
> >      FI-20100 Turku
> >      FINLAND
> >
> >
> >
> >
> >
> >
>
> --
> Lingsoft - 30 years of Leading Language Management
>
> www.lingsoft.fi
>
> Speech Applications - Language Management - Translation - Reader's and
> Writer's Tools - Text Tools - E-books and M-books
>
> Mikael Pesonen
> System Engineer
>
> e-mail: mikael.pesonen@lingsoft.fi
> Tel. +358 2 279 3300
>
> Time zone: GMT+2
>
> Helsinki Office
> Eteläranta 10
> FI-00130 Helsinki
> FINLAND
>
> Turku Office
> Kauppiaskatu 5 A
> FI-20100 Turku
> FINLAND
>
>

-- 
Dan Pritts
ICPSR Computing & Network Services
University of Michigan

Re: Out of memory

Posted by Mikael Pesonen <mi...@lingsoft.fi>.


On 29/01/2019 16:28, Rob Vesse wrote:
> This may be partly a case of a simple looking query having unexpected execution semantics.  Strictly speaking your query says select all triples in the specific graph then join them with these list of values for ?s.  Now the optimiser should, and does appear, to do the right thing and flip the join order i.e. it uses the concrete values from the VALUES block to search for triples with those subjects in the specific graph.  However if the query had other elements involved the optimiser might not kick in, a better query would place the VALUES prior to using the variables defined in the VALUES block.
Thanks for the reminder on VALUES order
>
> This sounds like memory/cache thrashing.  From what you have described, running variants on this query 50k times, you are basically walking over your entire dataset extracting it piece by piece?
Dataset is larger, these small sets (VALUES) are coming from out 
external index for similar document search. Index returns id and related 
metadata is fetched from Jena.
>
> Assuming the Graph URI and the URIs in your VALUES block change in each query then every query is looking at a different section of the database causing a lot of data to be cached and then evicted both in terms of on-heap memory structures (the node table cache) and potentially also for the off heap memory mapped files which may be being paged in and out as the code traverses the B-Tree indexes.
>
> Is there also some other query involved that extracts the Graph URIs and Subject URIs of interest that is being executed in parallel with the script?  Or has the input from the script been pre-calculated ahead of time, comes from elsewhere etc?
There is no parrallelism from our part in this case. Only one php script 
running and making GSP calls.
>
> Rob
>
> On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:
>
>      
>      Server:
>      
>      /usr/bin/java
>      -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties
>      -Xmx5600M -jar fuseki-server.jar --update --port 3030
>      --loc=/home/text/tools/jena_data_test/ /ds
>      
>      No custom configs, default installation package.
>      
>      
>      Sparql similar to this (returns 5-10 triplets) :
>      
>      CONSTRUCT { ?s ?p ?o }
>      FROM <https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
>      WHERE
>      {
>               ?s ?p ?o
>      
>      VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8
>      lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985
>      lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902
>      lsr:239c6da0-4c24-4539-a277-c9756d6257ee
>      lsr:2ef0190d-6271-447a-992f-6225fc440897
>      lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9
>      lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf
>      lsr:6f6802cf-0336-4234-90b8-cc8780058f0d
>      lsr:d1e2751b-4332-4d57-95e4-ca8070c16782
>      lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
>      }
>      
>      
>      I solved this by adding sleep to script. So I guess it's about the java
>      memory manager not getting time to free memory? Even with sleep it was
>      barely doable, memory consumption changing rapidly between 1,5 gig - 6 gig.
>      
>      
>      
>      On 29/01/2019 15:50, Andy Seaborne wrote:
>      > Mikael,
>      >
>      > There aren't enough details except to mention the suspects like sorting.
>      >
>      > With all the questions on the list, I personally don't track the
>      > details of each installation so please also remind me of your current
>      > setup.
>      >
>      >     Andy
>      >
>      > On 29/01/2019 11:32, Mikael Pesonen wrote:
>      >>
>      >> I'm not able to run a basic read-only script without running out of
>      >> memory on the server.
>      >>
>      >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills Fuseki
>      >> when running out of memory.
>      >> All I'm running is simple sparql query getting few triples of
>      >> resource. This is run for about 50k times.
>      >>
>      >> All settings are default, using GSP.
>      >>
>      >>
>      
>      --
>      Lingsoft - 30 years of Leading Language Management
>      
>      www.lingsoft.fi
>      
>      Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
>      
>      Mikael Pesonen
>      System Engineer
>      
>      e-mail: mikael.pesonen@lingsoft.fi
>      Tel. +358 2 279 3300
>      
>      Time zone: GMT+2
>      
>      Helsinki Office
>      Eteläranta 10
>      FI-00130 Helsinki
>      FINLAND
>      
>      Turku Office
>      Kauppiaskatu 5 A
>      FI-20100 Turku
>      FINLAND
>      
>      
>
>
>
>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Out of memory

Posted by Rob Vesse <rv...@dotnetrdf.org>.
This may be partly a case of a simple looking query having unexpected execution semantics.  Strictly speaking your query says select all triples in the specific graph then join them with these list of values for ?s.  Now the optimiser should, and does appear, to do the right thing and flip the join order i.e. it uses the concrete values from the VALUES block to search for triples with those subjects in the specific graph.  However if the query had other elements involved the optimiser might not kick in, a better query would place the VALUES prior to using the variables defined in the VALUES block.

This sounds like memory/cache thrashing.  From what you have described, running variants on this query 50k times, you are basically walking over your entire dataset extracting it piece by piece?

Assuming the Graph URI and the URIs in your VALUES block change in each query then every query is looking at a different section of the database causing a lot of data to be cached and then evicted both in terms of on-heap memory structures (the node table cache) and potentially also for the off heap memory mapped files which may be being paged in and out as the code traverses the B-Tree indexes.

Is there also some other query involved that extracts the Graph URIs and Subject URIs of interest that is being executed in parallel with the script?  Or has the input from the script been pre-calculated ahead of time, comes from elsewhere etc?

Rob

On 29/01/2019, 14:06, "Mikael Pesonen" <mi...@lingsoft.fi> wrote:

    
    Server:
    
    /usr/bin/java 
    -Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties 
    -Xmx5600M -jar fuseki-server.jar --update --port 3030 
    --loc=/home/text/tools/jena_data_test/ /ds
    
    No custom configs, default installation package.
    
    
    Sparql similar to this (returns 5-10 triplets) :
    
    CONSTRUCT { ?s ?p ?o }
    FROM <https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
    WHERE
    {
             ?s ?p ?o
    
    VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8 
    lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985 
    lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902 
    lsr:239c6da0-4c24-4539-a277-c9756d6257ee 
    lsr:2ef0190d-6271-447a-992f-6225fc440897 
    lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9 
    lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf 
    lsr:6f6802cf-0336-4234-90b8-cc8780058f0d 
    lsr:d1e2751b-4332-4d57-95e4-ca8070c16782 
    lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
    }
    
    
    I solved this by adding sleep to script. So I guess it's about the java 
    memory manager not getting time to free memory? Even with sleep it was 
    barely doable, memory consumption changing rapidly between 1,5 gig - 6 gig.
    
    
    
    On 29/01/2019 15:50, Andy Seaborne wrote:
    > Mikael,
    >
    > There aren't enough details except to mention the suspects like sorting.
    >
    > With all the questions on the list, I personally don't track the 
    > details of each installation so please also remind me of your current 
    > setup.
    >
    >     Andy
    >
    > On 29/01/2019 11:32, Mikael Pesonen wrote:
    >>
    >> I'm not able to run a basic read-only script without running out of 
    >> memory on the server.
    >>
    >> Consumption goes to 7+gigs (VM 10+ gigs), then system kills Fuseki 
    >> when running out of memory.
    >> All I'm running is simple sparql query getting few triples of 
    >> resource. This is run for about 50k times.
    >>
    >> All settings are default, using GSP.
    >>
    >>
    
    -- 
    Lingsoft - 30 years of Leading Language Management
    
    www.lingsoft.fi
    
    Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books
    
    Mikael Pesonen
    System Engineer
    
    e-mail: mikael.pesonen@lingsoft.fi
    Tel. +358 2 279 3300
    
    Time zone: GMT+2
    
    Helsinki Office
    Eteläranta 10
    FI-00130 Helsinki
    FINLAND
    
    Turku Office
    Kauppiaskatu 5 A
    FI-20100 Turku
    FINLAND
    
    





Re: Out of memory

Posted by Mikael Pesonen <mi...@lingsoft.fi>.
Server:

/usr/bin/java 
-Dlog4j.configuration=file:/home/text/tools/apache-jena-fuseki-3.9.0/log4j.properties 
-Xmx5600M -jar fuseki-server.jar --update --port 3030 
--loc=/home/text/tools/jena_data_test/ /ds

No custom configs, default installation package.


Sparql similar to this (returns 5-10 triplets) :

CONSTRUCT { ?s ?p ?o }
FROM <https://resource.lingsoft.fi/4f13c609-48b4-4e4d-a40b-2d7946f88234/>
WHERE
{
         ?s ?p ?o

VALUES ?s {lsr:10609f75-5cf3-4544-8fc1-c361778c3bd8 
lsr:88d0bb8c-35d8-4051-a27d-a0d93af77985 
lsr:fc7b2c65-453e-469b-9c5d-8c7ee4ee6902 
lsr:239c6da0-4c24-4539-a277-c9756d6257ee 
lsr:2ef0190d-6271-447a-992f-6225fc440897 
lsr:6aaf601c-ccf4-4e59-9757-1a463db49fa9 
lsr:d7c9dc96-cd61-4a31-b466-bb2491a3ceaf 
lsr:6f6802cf-0336-4234-90b8-cc8780058f0d 
lsr:d1e2751b-4332-4d57-95e4-ca8070c16782 
lsr:81053775-4722-4a00-b3f7-33d4feb3629b}
}


I solved this by adding sleep to script. So I guess it's about the java 
memory manager not getting time to free memory? Even with sleep it was 
barely doable, memory consumption changing rapidly between 1,5 gig - 6 gig.



On 29/01/2019 15:50, Andy Seaborne wrote:
> Mikael,
>
> There aren't enough details except to mention the suspects like sorting.
>
> With all the questions on the list, I personally don't track the 
> details of each installation so please also remind me of your current 
> setup.
>
>     Andy
>
> On 29/01/2019 11:32, Mikael Pesonen wrote:
>>
>> I'm not able to run a basic read-only script without running out of 
>> memory on the server.
>>
>> Consumption goes to 7+gigs (VM 10+ gigs), then system kills Fuseki 
>> when running out of memory.
>> All I'm running is simple sparql query getting few triples of 
>> resource. This is run for about 50k times.
>>
>> All settings are default, using GSP.
>>
>>

-- 
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.pesonen@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND


Re: Out of memory

Posted by Andy Seaborne <an...@apache.org>.
Mikael,

There aren't enough details except to mention the suspects like sorting.

With all the questions on the list, I personally don't track the details 
of each installation so please also remind me of your current setup.

     Andy

On 29/01/2019 11:32, Mikael Pesonen wrote:
> 
> I'm not able to run a basic read-only script without running out of 
> memory on the server.
> 
> Consumption goes to 7+gigs (VM 10+ gigs), then system kills Fuseki when 
> running out of memory.
> All I'm running is simple sparql query getting few triples of resource. 
> This is run for about 50k times.
> 
> All settings are default, using GSP.
> 
>