You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by nadav hoze <na...@gmail.com> on 2013/09/02 13:51:21 UTC

Heavy queries followed by light queries

hi,

We are doing stress tests to our service which it's underlying data layer
is jena TDB.
one of our tests is tor run heavy queries for long time (about 6 Hrs) and
afterwards run light queries. (we have clients which are in that mode).
What we witness is a huge performance degradation, light queries which
usually took around 0.1-0.2 sec after the heavy queries execution took more
than 3 seconds.

Also the heavy query execution had a huge performance degradation after
only one minute:
each heavy query fetched around  35000 triplets and for the first minutes
it took between 10-40 seconds (which is OK), afterwards it peaked to
200-8000 seconds.
Same thing memory wise, after a minute it peaked from 200mg to 2.2g.

What I would like to know is if there could be memory leak in jena, or
whether jena objects are cached in some way and maybe we can release them.

Here are important details for answering:
*jena version: 2.6.4*
*tdb version: 0.8.9*
*arq: 2.8.7*
*we use a single model and no datasets.*

Also can an upgrade to jena latest stable version help us here ?

Help is much appreciated :)

Thanks,

Nadav

Re: Heavy queries followed by light queries

Posted by nadav hoze <na...@gmail.com>.
Andy hi,

After a long investigation we found a memory leak on our side not closing
QueryExecution, we fixed that and noticed some improvement.
But still we reach a point where there is no more memory and our service
hangs.
We turned on java profiler, did the following test (on a service with 2GB)
and found very interesting results:
we ran the query above with the same data (which returned around 100000
triples) every 10 seconds and jena managed to deal with it!
the allocated and used memory increased to some point but it did not
"Choke"  ( 1.4 allocated memory / 1GB used memory).
but when we ran the same test with 4 different inputs, the allocated memory
increased to 1.8 and the used memory increased to 1.5 causing a serious
performance degradation results returned after 30 minutes and afterward not
at all.
In the profiler we saw that most of the used memory is in the old
generation meaning that allot of objects are preserved and are not freed,
my guess here is that this is the cache.

So I have 2 questions:

1. Why is that when we ran heavy query with the same input we do not
experience memory issues.
2. After a long dig I found that the following how to access the model
cache.
((EnhGraph) model).getNodeCacheControl().
   Just to see if it has any effects I cleared once in a while and noticed
that memory does not increase dramatically, but the time execution of the
results was terribly
   long.
   Can you recommend on the way I can clear the cache properly, is it
thread safe ?

Thanks,

Nadav




On Wed, Sep 4, 2013 at 5:34 PM, Andy Seaborne <an...@apache.org> wrote:

> On 03/09/13 20:26, nadav hoze wrote:
>
>> OK the bottom line is that I must somehow free memory to prevent such a
>> huge performance degradation.
>> The stress tests we did are of course extreme and what we got in couple of
>> hours will be on client side after couple of weeks.
>> This means that we can somehow from time to time clear data not used.
>>
>
> You do not need to do anything - the internal caches are LRU and the
> operating system will manage memory mapped files.
>
>
>  I read in
>> http://jena.apache.org/**documentation/tdb/**
>> architecture.html#caching-on-**32-and-64-bit-java-systems<http://jena.apache.org/documentation/tdb/architecture.html#caching-on-32-and-64-bit-java-systems>
>> that on 64-bit "TDB uses memory mapped files, accessed 8M segments, and
>> the
>> operating system handles caching between RAM and disk"
>> and on 32 bit "TDB provides an in-heap LRU cache of B+Tree blocks"  can I
>> force jena to use this cache on 64 bit ? (does the set of the file access
>> mode already does that?)
>>
>
> Yes - call
>
> SystemTDB.setFileMode(**FileMode.direct) ;
>
> before anything else calls jena especially TDB.
>
> Unfortunately, you can't resize the caches easily.  In theory, there is a
> properties file named by "tdb:settings" or "com.hp.hpl.jena.tdb.settings"*
> *.  You'll need to look in teh code for SystemTDB.
>
> Let me know if this has any effect, useful or otherwise.
>
>         Andy
>
>
>> Thanks,
>>
>> Nadav
>> Thanks,
>>
>>
>> On Tue, Sep 3, 2013 at 5:43 PM, Andy Seaborne <an...@apache.org> wrote:
>>
>>  On 03/09/13 07:01, nadav hoze wrote:
>>>
>>>  1. Regarding VM when I said varies from client to client I meant that
>>>> some uses VM and some don't but the 12GB is always for a single machine.
>>>> Also forgot to state that of course other processes works on that
>>>> machine
>>>> beside this service that uses jena, but this service get his shared part
>>>> and I don't think it's a lack of resources issue.
>>>>
>>>>  What I have seen happening on other systems is that the VM
>>> configuration
>>> is limiting the growth of the VM, causing it to not use as much of the
>>> machine as it might.
>>>
>>> Can you see that the whole 12G is being used at all?
>>>
>>> Network drives don't help.
>>>
>>>
>>>
>>>  2. about the matching pattern here it is again, hopes it's OK now (I
>>>> also
>>>> attached it):
>>>>
>>>>
>>> This :
>>> FILTER NOT EXISTS { ?ontologyConcept schema:isDeleted true }
>>>
>>> is better than:
>>>
>>>
>>> OPTIONAL{?ontologyConcept schema:isDeleted ?ontologyConceptDeleted}
>>> FILTER(!bound(?****ontologyConceptDeleted) || (bound(?****
>>> ontologyConceptDeleted)
>>>
>>> && ?ontologyConceptDeleted = false))
>>>
>>>
>>>   Just a short explanation before you read the matching pattern:
>>>
>>>> this query should fetch all the triplets with relation subClassOf to a
>>>> given ontologyConcept. it's identifiers are @concept.code and
>>>> @concept.codeSystemId which are basically placeholders which we replace
>>>> in
>>>> our service.
>>>> The OPTIONAL parts you see in the query are for ignoring concepts which
>>>>   are marked as deleted or not bound to the schema.
>>>>
>>>>
>>>> ?ontologyConcept schema:code @concept.code^^xsd:string .
>>>> ?ontologyConcept schema:codeSystemId @concept.codeSystemId^^xsd:****
>>>> string
>>>>
>>>> OPTIONAL{?ontologyConcept schema:isDeleted ?ontologyConceptDeleted}
>>>> FILTER(!bound(?****ontologyConceptDeleted) || (bound(?****
>>>> ontologyConceptDeleted)
>>>>
>>>> && ?ontologyConceptDeleted = false))
>>>> {
>>>> ?child relations:subClassOf ?ontologyConcept .
>>>> OPTIONAL{?child schema:isDeleted ?childDeleted}
>>>> FILTER(!bound(?childDeleted) || (bound(?childDeleted) && ?childDeleted =
>>>> false))
>>>> ?concept relations:equalsTo ?child .
>>>> OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
>>>> FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
>>>> ?conceptDeleted = false))
>>>> ?concept rdf:type schema:Concept
>>>> }
>>>> UNION
>>>> {
>>>> ?concept relations:equalsTo ?ontologyConcept .
>>>> ?concept rdf:type schema:Concept
>>>> OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
>>>> FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
>>>> ?conceptDeleted = false))
>>>> }
>>>>
>>>> 3. About the direct mode, we already use it so no effect there, is there
>>>> a way to clear the memory cache from the model ?
>>>>
>>>>
>>> No but I doubt it would make much difference.  If you clear the cache,
>>> there machine has to go to disk to fetch the data just as if it's doing
>>> cache replacement.
>>>
>>>
>>>
>>>> Thanks,
>>>>
>>>> Nadav
>>>>
>>>>
>>>>
>>>> On Mon, Sep 2, 2013 at 6:21 PM, Andy Seaborne <andy@apache.org <mailto:
>>>> andy@apache.org>> wrote:
>>>>
>>>>      On 02/09/13 14:33, nadav hoze wrote:
>>>>
>>>>          Machine size: 12 GB
>>>>          OS: Windows Server 2008 <tel:2008> 64 bit
>>>>
>>>>
>>>>
>>>>      I don't have much experience of Windows 64 bit and mmap files -
>>>>      you may find running with 32 bit mode a useful datapoint (this
>>>>      does not use memory mapped files which, from reading around the
>>>>      web, and anecdotal evidence on users@, do not have the same
>>>>      benefits as on Linux).
>>>>
>>>>
>>>>          VM: varies from client to client.
>>>>
>>>>
>>>>      Does this mean that several VMs for running on the same 12G
>>>> hardware?
>>>>      If so, how much RAM is allocate to each VM?
>>>>
>>>>
>>>>          data (in triples): 20,000,000 (3.6 GB)
>>>>          Heap size: 2 GB
>>>>
>>>>
>>>>      How big does the entire JVM process get?  At that scale, the
>>>>      entire DB should be mapped into memory
>>>>
>>>>
>>>>          Driver program : ? (didn't understand)
>>>>
>>>>
>>>>      You say the test program issuing TDB directly so it must be in the
>>>>      same JVM.
>>>>
>>>>      It may be useful to you to run on native hardware to see what
>>>>      effect VM's are having.  It can range from no measurable effect to
>>>>      very significant.
>>>>
>>>>
>>>>          No the database is on a network shared drive (different
>>>> server).
>>>>
>>>>          pattern matching (where clause):
>>>>
>>>>
>>>>      Sorry - this is unreadable and being a partial extract, I can't
>>>>      reformat it.
>>>>
>>>>              Andy
>>>>
>>>>          *?ontologyConcept schema:code @concept.code^^xsd:string .*
>>>>          *?ontologyConcept schema:codeSystemId
>>>>          @concept.codeSystemId^^xsd:****string*
>>>>
>>>>          *OPTIONAL{?ontologyConcept schema:isDeleted
>>>>          ?ontologyConceptDeleted}
>>>>          FILTER(!bound(?****ontologyConceptDeleted) ||
>>>>          (bound(?****ontologyConceptDeleted)
>>>>
>>>>          && ?ontologyConceptDeleted = false))*
>>>>          *{*
>>>>          * ?child relations:subClassOf ?ontologyConcept .*
>>>>          * OPTIONAL{?child schema:isDeleted ?childDeleted}
>>>>
>>>>          FILTER(!bound(?childDeleted) || (bound(?childDeleted) &&
>>>>          ?childDeleted =
>>>>          false))*
>>>>          * ?concept relations:equalsTo ?child .*
>>>>          * OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
>>>>          FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
>>>>          ?conceptDeleted = false))*
>>>>          * ?concept rdf:type schema:Concept*
>>>>          *}*
>>>>          *UNION*
>>>>          *{*
>>>>          * ?concept relations:equalsTo ?ontologyConcept .*
>>>>          * ?concept rdf:type schema:Concept*
>>>>          * OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
>>>>          FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
>>>>          ?conceptDeleted = false))*
>>>>          *}*
>>>>
>>>>
>>>>          basically all this big fuss is to find all child concepts of a
>>>>          specified
>>>>          parent concept identified by concept.code and
>>>>          concept.codeSystemId.
>>>>          so the  @concept.code and  @concept.codeSystemId you see are
>>>>          replaced in
>>>>          runtime to actual values.
>>>>          all of the optional sections you see are to ignore deleted
>>>>          (logically) or
>>>>          not bound concepts.
>>>>
>>>>          Thanks,
>>>>
>>>>          Nadav
>>>>
>>>>          On Mon, Sep 2, 2013 <tel:2013> at 4:14 PM, Andy Seaborne
>>>>
>>>>          <andy@apache.org <ma...@apache.org>> wrote:
>>>>
>>>>              On 02/09/13 12:51, nadav hoze wrote:
>>>>
>>>>                  hi,
>>>>
>>>>                  We are doing stress tests to our service which it's
>>>>                  underlying data layer
>>>>                  is jena TDB.
>>>>                  one of our tests is tor run heavy queries for long
>>>>                  time (about 6 Hrs) and
>>>>                  afterwards run light queries. (we have clients which
>>>>                  are in that mode).
>>>>                  What we witness is a huge performance degradation,
>>>>                  light queries which
>>>>                  usually took around 0.1-0.2 sec after the heavy
>>>>                  queries execution took
>>>>                  more
>>>>                  than 3 seconds.
>>>>
>>>>
>>>>              Not surprising - the heavy queries will have taken over
>>>> the OS
>>>>              cache.(assuming 64 bit - a similar effect occurs on 32
>>>>              bit).  The
>>>>              light-after-heavy is effectively running cold.
>>>>
>>>>                Also the heavy query execution had a huge performance
>>>>              degradation after
>>>>
>>>>                  only one minute:
>>>>                  each heavy query fetched around  35000 triplets and
>>>>                  for the first minutes
>>>>                  it took between 10-40 seconds (which is OK),
>>>>                  afterwards it peaked to
>>>>                  200-8000 seconds.
>>>>                  Same thing memory wise, after a minute it peaked from
>>>>                  200mg to 2.2g.
>>>>
>>>>                  What I would like to know is if there could be memory
>>>>                  leak in jena, or
>>>>                  whether jena objects are cached in some way and maybe
>>>>                  we can release them.
>>>>
>>>>                  Here are important details for answering:
>>>>                  *jena version: 2.6.4*
>>>>                  *tdb version: 0.8.9*
>>>>                  *arq: 2.8.7*
>>>>                  *we use a single model and no datasets.*
>>>>
>>>>
>>>>                  Also can an upgrade to jena latest stable version help
>>>>                  us here ?
>>>>
>>>>
>>>>              You should upgrade anyway. There are bug fixes.  And a
>>>>              different license.
>>>>
>>>>
>>>>
>>>>                  Help is much appreciated :)
>>>>
>>>>
>>>>              All depends on what the heavy query touches in the
>>>>              database (the pattern
>>>>              matching part), the size of the machine, whether anything
>>>>              else is running
>>>>              on the machine, ...
>>>>
>>>>              There are many, many factors:
>>>>
>>>>              What size of the machine?
>>>>              What OS?
>>>>              Is it a VM?
>>>>              How much data (in triples) is there in the DB?
>>>>              Heap size?
>>>>              The driver program is on What
>>>>              the same machine as the database - does this matter?
>>>>              ...
>>>>
>>>>                       Andy
>>>>
>>>>
>>>>                Thanks,
>>>>
>>>>
>>>>                  Nadav
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: Heavy queries followed by light queries

Posted by Andy Seaborne <an...@apache.org>.
On 03/09/13 20:26, nadav hoze wrote:
> OK the bottom line is that I must somehow free memory to prevent such a
> huge performance degradation.
> The stress tests we did are of course extreme and what we got in couple of
> hours will be on client side after couple of weeks.
> This means that we can somehow from time to time clear data not used.

You do not need to do anything - the internal caches are LRU and the 
operating system will manage memory mapped files.

> I read in
> http://jena.apache.org/documentation/tdb/architecture.html#caching-on-32-and-64-bit-java-systems
> that on 64-bit "TDB uses memory mapped files, accessed 8M segments, and the
> operating system handles caching between RAM and disk"
> and on 32 bit "TDB provides an in-heap LRU cache of B+Tree blocks"  can I
> force jena to use this cache on 64 bit ? (does the set of the file access
> mode already does that?)

Yes - call

SystemTDB.setFileMode(FileMode.direct) ;

before anything else calls jena especially TDB.

Unfortunately, you can't resize the caches easily.  In theory, there is 
a properties file named by "tdb:settings" or 
"com.hp.hpl.jena.tdb.settings".  You'll need to look in teh code for 
SystemTDB.

Let me know if this has any effect, useful or otherwise.

	Andy

>
> Thanks,
>
> Nadav
> Thanks,
>
>
> On Tue, Sep 3, 2013 at 5:43 PM, Andy Seaborne <an...@apache.org> wrote:
>
>> On 03/09/13 07:01, nadav hoze wrote:
>>
>>> 1. Regarding VM when I said varies from client to client I meant that
>>> some uses VM and some don't but the 12GB is always for a single machine.
>>> Also forgot to state that of course other processes works on that machine
>>> beside this service that uses jena, but this service get his shared part
>>> and I don't think it's a lack of resources issue.
>>>
>> What I have seen happening on other systems is that the VM configuration
>> is limiting the growth of the VM, causing it to not use as much of the
>> machine as it might.
>>
>> Can you see that the whole 12G is being used at all?
>>
>> Network drives don't help.
>>
>>
>>
>>> 2. about the matching pattern here it is again, hopes it's OK now (I also
>>> attached it):
>>>
>>
>> This :
>> FILTER NOT EXISTS { ?ontologyConcept schema:isDeleted true }
>>
>> is better than:
>>
>>
>> OPTIONAL{?ontologyConcept schema:isDeleted ?ontologyConceptDeleted}
>> FILTER(!bound(?**ontologyConceptDeleted) || (bound(?**ontologyConceptDeleted)
>> && ?ontologyConceptDeleted = false))
>>
>>
>>   Just a short explanation before you read the matching pattern:
>>> this query should fetch all the triplets with relation subClassOf to a
>>> given ontologyConcept. it's identifiers are @concept.code and
>>> @concept.codeSystemId which are basically placeholders which we replace in
>>> our service.
>>> The OPTIONAL parts you see in the query are for ignoring concepts which
>>>   are marked as deleted or not bound to the schema.
>>>
>>>
>>> ?ontologyConcept schema:code @concept.code^^xsd:string .
>>> ?ontologyConcept schema:codeSystemId @concept.codeSystemId^^xsd:**string
>>> OPTIONAL{?ontologyConcept schema:isDeleted ?ontologyConceptDeleted}
>>> FILTER(!bound(?**ontologyConceptDeleted) || (bound(?**ontologyConceptDeleted)
>>> && ?ontologyConceptDeleted = false))
>>> {
>>> ?child relations:subClassOf ?ontologyConcept .
>>> OPTIONAL{?child schema:isDeleted ?childDeleted}
>>> FILTER(!bound(?childDeleted) || (bound(?childDeleted) && ?childDeleted =
>>> false))
>>> ?concept relations:equalsTo ?child .
>>> OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
>>> FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
>>> ?conceptDeleted = false))
>>> ?concept rdf:type schema:Concept
>>> }
>>> UNION
>>> {
>>> ?concept relations:equalsTo ?ontologyConcept .
>>> ?concept rdf:type schema:Concept
>>> OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
>>> FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
>>> ?conceptDeleted = false))
>>> }
>>>
>>> 3. About the direct mode, we already use it so no effect there, is there
>>> a way to clear the memory cache from the model ?
>>>
>>
>> No but I doubt it would make much difference.  If you clear the cache,
>> there machine has to go to disk to fetch the data just as if it's doing
>> cache replacement.
>>
>>
>>>
>>> Thanks,
>>>
>>> Nadav
>>>
>>>
>>>
>>> On Mon, Sep 2, 2013 at 6:21 PM, Andy Seaborne <andy@apache.org <mailto:
>>> andy@apache.org>> wrote:
>>>
>>>      On 02/09/13 14:33, nadav hoze wrote:
>>>
>>>          Machine size: 12 GB
>>>          OS: Windows Server 2008 <tel:2008> 64 bit
>>>
>>>
>>>
>>>      I don't have much experience of Windows 64 bit and mmap files -
>>>      you may find running with 32 bit mode a useful datapoint (this
>>>      does not use memory mapped files which, from reading around the
>>>      web, and anecdotal evidence on users@, do not have the same
>>>      benefits as on Linux).
>>>
>>>
>>>          VM: varies from client to client.
>>>
>>>
>>>      Does this mean that several VMs for running on the same 12G hardware?
>>>      If so, how much RAM is allocate to each VM?
>>>
>>>
>>>          data (in triples): 20,000,000 (3.6 GB)
>>>          Heap size: 2 GB
>>>
>>>
>>>      How big does the entire JVM process get?  At that scale, the
>>>      entire DB should be mapped into memory
>>>
>>>
>>>          Driver program : ? (didn't understand)
>>>
>>>
>>>      You say the test program issuing TDB directly so it must be in the
>>>      same JVM.
>>>
>>>      It may be useful to you to run on native hardware to see what
>>>      effect VM's are having.  It can range from no measurable effect to
>>>      very significant.
>>>
>>>
>>>          No the database is on a network shared drive (different server).
>>>
>>>          pattern matching (where clause):
>>>
>>>
>>>      Sorry - this is unreadable and being a partial extract, I can't
>>>      reformat it.
>>>
>>>              Andy
>>>
>>>          *?ontologyConcept schema:code @concept.code^^xsd:string .*
>>>          *?ontologyConcept schema:codeSystemId
>>>          @concept.codeSystemId^^xsd:**string*
>>>          *OPTIONAL{?ontologyConcept schema:isDeleted
>>>          ?ontologyConceptDeleted}
>>>          FILTER(!bound(?**ontologyConceptDeleted) ||
>>>          (bound(?**ontologyConceptDeleted)
>>>          && ?ontologyConceptDeleted = false))*
>>>          *{*
>>>          * ?child relations:subClassOf ?ontologyConcept .*
>>>          * OPTIONAL{?child schema:isDeleted ?childDeleted}
>>>
>>>          FILTER(!bound(?childDeleted) || (bound(?childDeleted) &&
>>>          ?childDeleted =
>>>          false))*
>>>          * ?concept relations:equalsTo ?child .*
>>>          * OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
>>>          FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
>>>          ?conceptDeleted = false))*
>>>          * ?concept rdf:type schema:Concept*
>>>          *}*
>>>          *UNION*
>>>          *{*
>>>          * ?concept relations:equalsTo ?ontologyConcept .*
>>>          * ?concept rdf:type schema:Concept*
>>>          * OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
>>>          FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
>>>          ?conceptDeleted = false))*
>>>          *}*
>>>
>>>
>>>          basically all this big fuss is to find all child concepts of a
>>>          specified
>>>          parent concept identified by concept.code and
>>>          concept.codeSystemId.
>>>          so the  @concept.code and  @concept.codeSystemId you see are
>>>          replaced in
>>>          runtime to actual values.
>>>          all of the optional sections you see are to ignore deleted
>>>          (logically) or
>>>          not bound concepts.
>>>
>>>          Thanks,
>>>
>>>          Nadav
>>>
>>>          On Mon, Sep 2, 2013 <tel:2013> at 4:14 PM, Andy Seaborne
>>>
>>>          <andy@apache.org <ma...@apache.org>> wrote:
>>>
>>>              On 02/09/13 12:51, nadav hoze wrote:
>>>
>>>                  hi,
>>>
>>>                  We are doing stress tests to our service which it's
>>>                  underlying data layer
>>>                  is jena TDB.
>>>                  one of our tests is tor run heavy queries for long
>>>                  time (about 6 Hrs) and
>>>                  afterwards run light queries. (we have clients which
>>>                  are in that mode).
>>>                  What we witness is a huge performance degradation,
>>>                  light queries which
>>>                  usually took around 0.1-0.2 sec after the heavy
>>>                  queries execution took
>>>                  more
>>>                  than 3 seconds.
>>>
>>>
>>>              Not surprising - the heavy queries will have taken over the OS
>>>              cache.(assuming 64 bit - a similar effect occurs on 32
>>>              bit).  The
>>>              light-after-heavy is effectively running cold.
>>>
>>>                Also the heavy query execution had a huge performance
>>>              degradation after
>>>
>>>                  only one minute:
>>>                  each heavy query fetched around  35000 triplets and
>>>                  for the first minutes
>>>                  it took between 10-40 seconds (which is OK),
>>>                  afterwards it peaked to
>>>                  200-8000 seconds.
>>>                  Same thing memory wise, after a minute it peaked from
>>>                  200mg to 2.2g.
>>>
>>>                  What I would like to know is if there could be memory
>>>                  leak in jena, or
>>>                  whether jena objects are cached in some way and maybe
>>>                  we can release them.
>>>
>>>                  Here are important details for answering:
>>>                  *jena version: 2.6.4*
>>>                  *tdb version: 0.8.9*
>>>                  *arq: 2.8.7*
>>>                  *we use a single model and no datasets.*
>>>
>>>
>>>                  Also can an upgrade to jena latest stable version help
>>>                  us here ?
>>>
>>>
>>>              You should upgrade anyway. There are bug fixes.  And a
>>>              different license.
>>>
>>>
>>>
>>>                  Help is much appreciated :)
>>>
>>>
>>>              All depends on what the heavy query touches in the
>>>              database (the pattern
>>>              matching part), the size of the machine, whether anything
>>>              else is running
>>>              on the machine, ...
>>>
>>>              There are many, many factors:
>>>
>>>              What size of the machine?
>>>              What OS?
>>>              Is it a VM?
>>>              How much data (in triples) is there in the DB?
>>>              Heap size?
>>>              The driver program is on What
>>>              the same machine as the database - does this matter?
>>>              ...
>>>
>>>                       Andy
>>>
>>>
>>>                Thanks,
>>>
>>>
>>>                  Nadav
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>


Re: Heavy queries followed by light queries

Posted by nadav hoze <na...@gmail.com>.
OK the bottom line is that I must somehow free memory to prevent such a
huge performance degradation.
The stress tests we did are of course extreme and what we got in couple of
hours will be on client side after couple of weeks.
This means that we can somehow from time to time clear data not used.
I read in
http://jena.apache.org/documentation/tdb/architecture.html#caching-on-32-and-64-bit-java-systems
that on 64-bit "TDB uses memory mapped files, accessed 8M segments, and the
operating system handles caching between RAM and disk"
and on 32 bit "TDB provides an in-heap LRU cache of B+Tree blocks"  can I
force jena to use this cache on 64 bit ? (does the set of the file access
mode already does that?)

Thanks,

Nadav
Thanks,


On Tue, Sep 3, 2013 at 5:43 PM, Andy Seaborne <an...@apache.org> wrote:

> On 03/09/13 07:01, nadav hoze wrote:
>
>> 1. Regarding VM when I said varies from client to client I meant that
>> some uses VM and some don't but the 12GB is always for a single machine.
>> Also forgot to state that of course other processes works on that machine
>> beside this service that uses jena, but this service get his shared part
>> and I don't think it's a lack of resources issue.
>>
> What I have seen happening on other systems is that the VM configuration
> is limiting the growth of the VM, causing it to not use as much of the
> machine as it might.
>
> Can you see that the whole 12G is being used at all?
>
> Network drives don't help.
>
>
>
>> 2. about the matching pattern here it is again, hopes it's OK now (I also
>> attached it):
>>
>
> This :
> FILTER NOT EXISTS { ?ontologyConcept schema:isDeleted true }
>
> is better than:
>
>
> OPTIONAL{?ontologyConcept schema:isDeleted ?ontologyConceptDeleted}
> FILTER(!bound(?**ontologyConceptDeleted) || (bound(?**ontologyConceptDeleted)
> && ?ontologyConceptDeleted = false))
>
>
>  Just a short explanation before you read the matching pattern:
>> this query should fetch all the triplets with relation subClassOf to a
>> given ontologyConcept. it's identifiers are @concept.code and
>> @concept.codeSystemId which are basically placeholders which we replace in
>> our service.
>> The OPTIONAL parts you see in the query are for ignoring concepts which
>>  are marked as deleted or not bound to the schema.
>>
>>
>> ?ontologyConcept schema:code @concept.code^^xsd:string .
>> ?ontologyConcept schema:codeSystemId @concept.codeSystemId^^xsd:**string
>> OPTIONAL{?ontologyConcept schema:isDeleted ?ontologyConceptDeleted}
>> FILTER(!bound(?**ontologyConceptDeleted) || (bound(?**ontologyConceptDeleted)
>> && ?ontologyConceptDeleted = false))
>> {
>> ?child relations:subClassOf ?ontologyConcept .
>> OPTIONAL{?child schema:isDeleted ?childDeleted}
>> FILTER(!bound(?childDeleted) || (bound(?childDeleted) && ?childDeleted =
>> false))
>> ?concept relations:equalsTo ?child .
>> OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
>> FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
>> ?conceptDeleted = false))
>> ?concept rdf:type schema:Concept
>> }
>> UNION
>> {
>> ?concept relations:equalsTo ?ontologyConcept .
>> ?concept rdf:type schema:Concept
>> OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
>> FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
>> ?conceptDeleted = false))
>> }
>>
>> 3. About the direct mode, we already use it so no effect there, is there
>> a way to clear the memory cache from the model ?
>>
>
> No but I doubt it would make much difference.  If you clear the cache,
> there machine has to go to disk to fetch the data just as if it's doing
> cache replacement.
>
>
>>
>> Thanks,
>>
>> Nadav
>>
>>
>>
>> On Mon, Sep 2, 2013 at 6:21 PM, Andy Seaborne <andy@apache.org <mailto:
>> andy@apache.org>> wrote:
>>
>>     On 02/09/13 14:33, nadav hoze wrote:
>>
>>         Machine size: 12 GB
>>         OS: Windows Server 2008 <tel:2008> 64 bit
>>
>>
>>
>>     I don't have much experience of Windows 64 bit and mmap files -
>>     you may find running with 32 bit mode a useful datapoint (this
>>     does not use memory mapped files which, from reading around the
>>     web, and anecdotal evidence on users@, do not have the same
>>     benefits as on Linux).
>>
>>
>>         VM: varies from client to client.
>>
>>
>>     Does this mean that several VMs for running on the same 12G hardware?
>>     If so, how much RAM is allocate to each VM?
>>
>>
>>         data (in triples): 20,000,000 (3.6 GB)
>>         Heap size: 2 GB
>>
>>
>>     How big does the entire JVM process get?  At that scale, the
>>     entire DB should be mapped into memory
>>
>>
>>         Driver program : ? (didn't understand)
>>
>>
>>     You say the test program issuing TDB directly so it must be in the
>>     same JVM.
>>
>>     It may be useful to you to run on native hardware to see what
>>     effect VM's are having.  It can range from no measurable effect to
>>     very significant.
>>
>>
>>         No the database is on a network shared drive (different server).
>>
>>         pattern matching (where clause):
>>
>>
>>     Sorry - this is unreadable and being a partial extract, I can't
>>     reformat it.
>>
>>             Andy
>>
>>         *?ontologyConcept schema:code @concept.code^^xsd:string .*
>>         *?ontologyConcept schema:codeSystemId
>>         @concept.codeSystemId^^xsd:**string*
>>         *OPTIONAL{?ontologyConcept schema:isDeleted
>>         ?ontologyConceptDeleted}
>>         FILTER(!bound(?**ontologyConceptDeleted) ||
>>         (bound(?**ontologyConceptDeleted)
>>         && ?ontologyConceptDeleted = false))*
>>         *{*
>>         * ?child relations:subClassOf ?ontologyConcept .*
>>         * OPTIONAL{?child schema:isDeleted ?childDeleted}
>>
>>         FILTER(!bound(?childDeleted) || (bound(?childDeleted) &&
>>         ?childDeleted =
>>         false))*
>>         * ?concept relations:equalsTo ?child .*
>>         * OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
>>         FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
>>         ?conceptDeleted = false))*
>>         * ?concept rdf:type schema:Concept*
>>         *}*
>>         *UNION*
>>         *{*
>>         * ?concept relations:equalsTo ?ontologyConcept .*
>>         * ?concept rdf:type schema:Concept*
>>         * OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
>>         FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
>>         ?conceptDeleted = false))*
>>         *}*
>>
>>
>>         basically all this big fuss is to find all child concepts of a
>>         specified
>>         parent concept identified by concept.code and
>>         concept.codeSystemId.
>>         so the  @concept.code and  @concept.codeSystemId you see are
>>         replaced in
>>         runtime to actual values.
>>         all of the optional sections you see are to ignore deleted
>>         (logically) or
>>         not bound concepts.
>>
>>         Thanks,
>>
>>         Nadav
>>
>>         On Mon, Sep 2, 2013 <tel:2013> at 4:14 PM, Andy Seaborne
>>
>>         <andy@apache.org <ma...@apache.org>> wrote:
>>
>>             On 02/09/13 12:51, nadav hoze wrote:
>>
>>                 hi,
>>
>>                 We are doing stress tests to our service which it's
>>                 underlying data layer
>>                 is jena TDB.
>>                 one of our tests is tor run heavy queries for long
>>                 time (about 6 Hrs) and
>>                 afterwards run light queries. (we have clients which
>>                 are in that mode).
>>                 What we witness is a huge performance degradation,
>>                 light queries which
>>                 usually took around 0.1-0.2 sec after the heavy
>>                 queries execution took
>>                 more
>>                 than 3 seconds.
>>
>>
>>             Not surprising - the heavy queries will have taken over the OS
>>             cache.(assuming 64 bit - a similar effect occurs on 32
>>             bit).  The
>>             light-after-heavy is effectively running cold.
>>
>>               Also the heavy query execution had a huge performance
>>             degradation after
>>
>>                 only one minute:
>>                 each heavy query fetched around  35000 triplets and
>>                 for the first minutes
>>                 it took between 10-40 seconds (which is OK),
>>                 afterwards it peaked to
>>                 200-8000 seconds.
>>                 Same thing memory wise, after a minute it peaked from
>>                 200mg to 2.2g.
>>
>>                 What I would like to know is if there could be memory
>>                 leak in jena, or
>>                 whether jena objects are cached in some way and maybe
>>                 we can release them.
>>
>>                 Here are important details for answering:
>>                 *jena version: 2.6.4*
>>                 *tdb version: 0.8.9*
>>                 *arq: 2.8.7*
>>                 *we use a single model and no datasets.*
>>
>>
>>                 Also can an upgrade to jena latest stable version help
>>                 us here ?
>>
>>
>>             You should upgrade anyway. There are bug fixes.  And a
>>             different license.
>>
>>
>>
>>                 Help is much appreciated :)
>>
>>
>>             All depends on what the heavy query touches in the
>>             database (the pattern
>>             matching part), the size of the machine, whether anything
>>             else is running
>>             on the machine, ...
>>
>>             There are many, many factors:
>>
>>             What size of the machine?
>>             What OS?
>>             Is it a VM?
>>             How much data (in triples) is there in the DB?
>>             Heap size?
>>             The driver program is on What
>>             the same machine as the database - does this matter?
>>             ...
>>
>>                      Andy
>>
>>
>>               Thanks,
>>
>>
>>                 Nadav
>>
>>
>>
>>
>>
>>
>>
>

Re: Heavy queries followed by light queries

Posted by Andy Seaborne <an...@apache.org>.
On 03/09/13 07:01, nadav hoze wrote:
> 1. Regarding VM when I said varies from client to client I meant that 
> some uses VM and some don't but the 12GB is always for a single machine.
> Also forgot to state that of course other processes works on that 
> machine beside this service that uses jena, but this service get his 
> shared part and I don't think it's a lack of resources issue.
What I have seen happening on other systems is that the VM configuration 
is limiting the growth of the VM, causing it to not use as much of the 
machine as it might.

Can you see that the whole 12G is being used at all?

Network drives don't help.

>
> 2. about the matching pattern here it is again, hopes it's OK now (I 
> also attached it):

This :
FILTER NOT EXISTS { ?ontologyConcept schema:isDeleted true }

is better than:

OPTIONAL{?ontologyConcept schema:isDeleted ?ontologyConceptDeleted}
FILTER(!bound(?ontologyConceptDeleted) || 
(bound(?ontologyConceptDeleted) && ?ontologyConceptDeleted = false))


> Just a short explanation before you read the matching pattern:
> this query should fetch all the triplets with relation subClassOf to a 
> given ontologyConcept. it's identifiers are @concept.code and 
> @concept.codeSystemId which are basically placeholders which we 
> replace in our service.
> The OPTIONAL parts you see in the query are for ignoring concepts 
> which  are marked as deleted or not bound to the schema.
>
>
> ?ontologyConcept schema:code @concept.code^^xsd:string .
> ?ontologyConcept schema:codeSystemId @concept.codeSystemId^^xsd:string
> OPTIONAL{?ontologyConcept schema:isDeleted ?ontologyConceptDeleted} 
> FILTER(!bound(?ontologyConceptDeleted) || 
> (bound(?ontologyConceptDeleted) && ?ontologyConceptDeleted = false))
> {
> ?child relations:subClassOf ?ontologyConcept .
> OPTIONAL{?child schema:isDeleted ?childDeleted} 
> FILTER(!bound(?childDeleted) || (bound(?childDeleted) && ?childDeleted 
> = false))
> ?concept relations:equalsTo ?child .
> OPTIONAL{?concept schema:isDeleted ?conceptDeleted} 
> FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) && 
> ?conceptDeleted = false))
> ?concept rdf:type schema:Concept
> }
> UNION
> {
> ?concept relations:equalsTo ?ontologyConcept .
> ?concept rdf:type schema:Concept
> OPTIONAL{?concept schema:isDeleted ?conceptDeleted} 
> FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) && 
> ?conceptDeleted = false))
> }
>
> 3. About the direct mode, we already use it so no effect there, is 
> there a way to clear the memory cache from the model ?

No but I doubt it would make much difference.  If you clear the cache, 
there machine has to go to disk to fetch the data just as if it's doing 
cache replacement.

>
>
> Thanks,
>
> Nadav
>
>
> On Mon, Sep 2, 2013 at 6:21 PM, Andy Seaborne <andy@apache.org 
> <ma...@apache.org>> wrote:
>
>     On 02/09/13 14:33, nadav hoze wrote:
>
>         Machine size: 12 GB
>         OS: Windows Server 2008 <tel:2008> 64 bit
>
>
>     I don't have much experience of Windows 64 bit and mmap files -
>     you may find running with 32 bit mode a useful datapoint (this
>     does not use memory mapped files which, from reading around the
>     web, and anecdotal evidence on users@, do not have the same
>     benefits as on Linux).
>
>
>         VM: varies from client to client.
>
>
>     Does this mean that several VMs for running on the same 12G hardware?
>     If so, how much RAM is allocate to each VM?
>
>
>         data (in triples): 20,000,000 (3.6 GB)
>         Heap size: 2 GB
>
>
>     How big does the entire JVM process get?  At that scale, the
>     entire DB should be mapped into memory
>
>
>         Driver program : ? (didn't understand)
>
>
>     You say the test program issuing TDB directly so it must be in the
>     same JVM.
>
>     It may be useful to you to run on native hardware to see what
>     effect VM's are having.  It can range from no measurable effect to
>     very significant.
>
>
>         No the database is on a network shared drive (different server).
>
>         pattern matching (where clause):
>
>
>     Sorry - this is unreadable and being a partial extract, I can't
>     reformat it.
>
>             Andy
>
>         *?ontologyConcept schema:code @concept.code^^xsd:string .*
>         *?ontologyConcept schema:codeSystemId
>         @concept.codeSystemId^^xsd:string*
>         *OPTIONAL{?ontologyConcept schema:isDeleted
>         ?ontologyConceptDeleted}
>         FILTER(!bound(?ontologyConceptDeleted) ||
>         (bound(?ontologyConceptDeleted)
>         && ?ontologyConceptDeleted = false))*
>         *{*
>         * ?child relations:subClassOf ?ontologyConcept .*
>         * OPTIONAL{?child schema:isDeleted ?childDeleted}
>
>         FILTER(!bound(?childDeleted) || (bound(?childDeleted) &&
>         ?childDeleted =
>         false))*
>         * ?concept relations:equalsTo ?child .*
>         * OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
>         FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
>         ?conceptDeleted = false))*
>         * ?concept rdf:type schema:Concept*
>         *}*
>         *UNION*
>         *{*
>         * ?concept relations:equalsTo ?ontologyConcept .*
>         * ?concept rdf:type schema:Concept*
>         * OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
>         FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
>         ?conceptDeleted = false))*
>         *}*
>
>
>         basically all this big fuss is to find all child concepts of a
>         specified
>         parent concept identified by concept.code and
>         concept.codeSystemId.
>         so the  @concept.code and  @concept.codeSystemId you see are
>         replaced in
>         runtime to actual values.
>         all of the optional sections you see are to ignore deleted
>         (logically) or
>         not bound concepts.
>
>         Thanks,
>
>         Nadav
>
>         On Mon, Sep 2, 2013 <tel:2013> at 4:14 PM, Andy Seaborne
>         <andy@apache.org <ma...@apache.org>> wrote:
>
>             On 02/09/13 12:51, nadav hoze wrote:
>
>                 hi,
>
>                 We are doing stress tests to our service which it's
>                 underlying data layer
>                 is jena TDB.
>                 one of our tests is tor run heavy queries for long
>                 time (about 6 Hrs) and
>                 afterwards run light queries. (we have clients which
>                 are in that mode).
>                 What we witness is a huge performance degradation,
>                 light queries which
>                 usually took around 0.1-0.2 sec after the heavy
>                 queries execution took
>                 more
>                 than 3 seconds.
>
>
>             Not surprising - the heavy queries will have taken over the OS
>             cache.(assuming 64 bit - a similar effect occurs on 32
>             bit).  The
>             light-after-heavy is effectively running cold.
>
>               Also the heavy query execution had a huge performance
>             degradation after
>
>                 only one minute:
>                 each heavy query fetched around  35000 triplets and
>                 for the first minutes
>                 it took between 10-40 seconds (which is OK),
>                 afterwards it peaked to
>                 200-8000 seconds.
>                 Same thing memory wise, after a minute it peaked from
>                 200mg to 2.2g.
>
>                 What I would like to know is if there could be memory
>                 leak in jena, or
>                 whether jena objects are cached in some way and maybe
>                 we can release them.
>
>                 Here are important details for answering:
>                 *jena version: 2.6.4*
>                 *tdb version: 0.8.9*
>                 *arq: 2.8.7*
>                 *we use a single model and no datasets.*
>
>
>                 Also can an upgrade to jena latest stable version help
>                 us here ?
>
>
>             You should upgrade anyway. There are bug fixes.  And a
>             different license.
>
>
>
>                 Help is much appreciated :)
>
>
>             All depends on what the heavy query touches in the
>             database (the pattern
>             matching part), the size of the machine, whether anything
>             else is running
>             on the machine, ...
>
>             There are many, many factors:
>
>             What size of the machine?
>             What OS?
>             Is it a VM?
>             How much data (in triples) is there in the DB?
>             Heap size?
>             The driver program is on What
>             the same machine as the database - does this matter?
>             ...
>
>                      Andy
>
>
>               Thanks,
>
>
>                 Nadav
>
>
>
>
>
>


Re: Heavy queries followed by light queries

Posted by nadav hoze <na...@gmail.com>.
1. Regarding VM when I said varies from client to client I meant that some
uses VM and some don't but the 12GB is always for a single machine.
Also forgot to state that of course other processes works on that machine
beside this service that uses jena, but this service get his shared part
and I don't think it's a lack of resources issue.

2. about the matching pattern here it is again, hopes it's OK now (I also
attached it):

Just a short explanation before you read the matching pattern:
this query should fetch all the triplets with relation subClassOf to a
given ontologyConcept. it's identifiers are @concept.code and
@concept.codeSystemId which are basically placeholders which we replace in
our service.
The OPTIONAL parts you see in the query are for ignoring concepts which
 are marked as deleted or not bound to the schema.


?ontologyConcept schema:code @concept.code^^xsd:string .
?ontologyConcept schema:codeSystemId @concept.codeSystemId^^xsd:string
OPTIONAL{?ontologyConcept schema:isDeleted ?ontologyConceptDeleted}
FILTER(!bound(?ontologyConceptDeleted) || (bound(?ontologyConceptDeleted)
&& ?ontologyConceptDeleted = false))
{
?child relations:subClassOf ?ontologyConcept .
OPTIONAL{?child schema:isDeleted ?childDeleted}
FILTER(!bound(?childDeleted) || (bound(?childDeleted) && ?childDeleted =
false))
?concept relations:equalsTo ?child .
OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
?conceptDeleted = false))
?concept rdf:type schema:Concept
}
UNION
{
?concept relations:equalsTo ?ontologyConcept .
?concept rdf:type schema:Concept
OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
?conceptDeleted = false))
}

3. About the direct mode, we already use it so no effect there, is there a
way to clear the memory cache from the model ?


Thanks,

Nadav


On Mon, Sep 2, 2013 at 6:21 PM, Andy Seaborne <an...@apache.org> wrote:

> On 02/09/13 14:33, nadav hoze wrote:
>
>> Machine size: 12 GB
>> OS: Windows Server 2008 64 bit
>>
>
> I don't have much experience of Windows 64 bit and mmap files - you may
> find running with 32 bit mode a useful datapoint (this does not use memory
> mapped files which, from reading around the web, and anecdotal evidence on
> users@, do not have the same benefits as on Linux).
>
>
>  VM: varies from client to client.
>>
>
> Does this mean that several VMs for running on the same 12G hardware?
> If so, how much RAM is allocate to each VM?
>
>
>  data (in triples): 20,000,000 (3.6 GB)
>> Heap size: 2 GB
>>
>
> How big does the entire JVM process get?  At that scale, the entire DB
> should be mapped into memory
>
>
>  Driver program : ? (didn't understand)
>>
>
> You say the test program issuing TDB directly so it must be in the same
> JVM.
>
> It may be useful to you to run on native hardware to see what effect VM's
> are having.  It can range from no measurable effect to very significant.
>
>
>  No the database is on a network shared drive (different server).
>>
>> pattern matching (where clause):
>>
>>
> Sorry - this is unreadable and being a partial extract, I can't reformat
> it.
>
>         Andy
>
>  *?ontologyConcept schema:code @concept.code^^xsd:string .*
>> *?ontologyConcept schema:codeSystemId @concept.codeSystemId^^xsd:**
>> string*
>> *OPTIONAL{?ontologyConcept schema:isDeleted ?ontologyConceptDeleted}
>> FILTER(!bound(?**ontologyConceptDeleted) || (bound(?**
>> ontologyConceptDeleted)
>> && ?ontologyConceptDeleted = false))*
>> *{*
>> * ?child relations:subClassOf ?ontologyConcept .*
>> * OPTIONAL{?child schema:isDeleted ?childDeleted}
>>
>> FILTER(!bound(?childDeleted) || (bound(?childDeleted) && ?childDeleted =
>> false))*
>> * ?concept relations:equalsTo ?child .*
>> * OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
>> FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
>> ?conceptDeleted = false))*
>> * ?concept rdf:type schema:Concept*
>> *}*
>> *UNION*
>> *{*
>> * ?concept relations:equalsTo ?ontologyConcept .*
>> * ?concept rdf:type schema:Concept*
>> * OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
>> FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
>> ?conceptDeleted = false))*
>> *}*
>>
>>
>> basically all this big fuss is to find all child concepts of a specified
>> parent concept identified by concept.code and concept.codeSystemId.
>> so the  @concept.code and  @concept.codeSystemId you see are replaced in
>> runtime to actual values.
>> all of the optional sections you see are to ignore deleted (logically) or
>> not bound concepts.
>>
>> Thanks,
>>
>> Nadav
>>
>> On Mon, Sep 2, 2013 at 4:14 PM, Andy Seaborne <an...@apache.org> wrote:
>>
>>  On 02/09/13 12:51, nadav hoze wrote:
>>>
>>>  hi,
>>>>
>>>> We are doing stress tests to our service which it's underlying data
>>>> layer
>>>> is jena TDB.
>>>> one of our tests is tor run heavy queries for long time (about 6 Hrs)
>>>> and
>>>> afterwards run light queries. (we have clients which are in that mode).
>>>> What we witness is a huge performance degradation, light queries which
>>>> usually took around 0.1-0.2 sec after the heavy queries execution took
>>>> more
>>>> than 3 seconds.
>>>>
>>>>
>>> Not surprising - the heavy queries will have taken over the OS
>>> cache.(assuming 64 bit - a similar effect occurs on 32 bit).  The
>>> light-after-heavy is effectively running cold.
>>>
>>>   Also the heavy query execution had a huge performance degradation after
>>>
>>>> only one minute:
>>>> each heavy query fetched around  35000 triplets and for the first
>>>> minutes
>>>> it took between 10-40 seconds (which is OK), afterwards it peaked to
>>>> 200-8000 seconds.
>>>> Same thing memory wise, after a minute it peaked from 200mg to 2.2g.
>>>>
>>>> What I would like to know is if there could be memory leak in jena, or
>>>> whether jena objects are cached in some way and maybe we can release
>>>> them.
>>>>
>>>> Here are important details for answering:
>>>> *jena version: 2.6.4*
>>>> *tdb version: 0.8.9*
>>>> *arq: 2.8.7*
>>>> *we use a single model and no datasets.*
>>>>
>>>>
>>>> Also can an upgrade to jena latest stable version help us here ?
>>>>
>>>>
>>> You should upgrade anyway. There are bug fixes.  And a different license.
>>>
>>>
>>>
>>>  Help is much appreciated :)
>>>>
>>>>
>>>>  All depends on what the heavy query touches in the database (the
>>> pattern
>>> matching part), the size of the machine, whether anything else is running
>>> on the machine, ...
>>>
>>> There are many, many factors:
>>>
>>> What size of the machine?
>>> What OS?
>>> Is it a VM?
>>> How much data (in triples) is there in the DB?
>>> Heap size?
>>> The driver program is on What
>>> the same machine as the database - does this matter?
>>> ...
>>>
>>>          Andy
>>>
>>>
>>>   Thanks,
>>>
>>>>
>>>> Nadav
>>>>
>>>>
>>>>
>>>
>>
>

Re: Heavy queries followed by light queries

Posted by Andy Seaborne <an...@apache.org>.
On 02/09/13 14:33, nadav hoze wrote:
> Machine size: 12 GB
> OS: Windows Server 2008 64 bit

I don't have much experience of Windows 64 bit and mmap files - you may 
find running with 32 bit mode a useful datapoint (this does not use 
memory mapped files which, from reading around the web, and anecdotal 
evidence on users@, do not have the same benefits as on Linux).

> VM: varies from client to client.

Does this mean that several VMs for running on the same 12G hardware?
If so, how much RAM is allocate to each VM?

> data (in triples): 20,000,000 (3.6 GB)
> Heap size: 2 GB

How big does the entire JVM process get?  At that scale, the entire DB 
should be mapped into memory

> Driver program : ? (didn't understand)

You say the test program issuing TDB directly so it must be in the same 
JVM.

It may be useful to you to run on native hardware to see what effect 
VM's are having.  It can range from no measurable effect to very 
significant.

> No the database is on a network shared drive (different server).
>
> pattern matching (where clause):
>

Sorry - this is unreadable and being a partial extract, I can't reformat it.

	Andy

> *?ontologyConcept schema:code @concept.code^^xsd:string .*
> *?ontologyConcept schema:codeSystemId @concept.codeSystemId^^xsd:string*
> *OPTIONAL{?ontologyConcept schema:isDeleted ?ontologyConceptDeleted}
> FILTER(!bound(?ontologyConceptDeleted) || (bound(?ontologyConceptDeleted)
> && ?ontologyConceptDeleted = false))*
> *{*
> * ?child relations:subClassOf ?ontologyConcept .*
> * OPTIONAL{?child schema:isDeleted ?childDeleted}
> FILTER(!bound(?childDeleted) || (bound(?childDeleted) && ?childDeleted =
> false))*
> * ?concept relations:equalsTo ?child .*
> * OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
> FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
> ?conceptDeleted = false))*
> * ?concept rdf:type schema:Concept*
> *}*
> *UNION*
> *{*
> * ?concept relations:equalsTo ?ontologyConcept .*
> * ?concept rdf:type schema:Concept*
> * OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
> FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
> ?conceptDeleted = false))*
> *}*
>
> basically all this big fuss is to find all child concepts of a specified
> parent concept identified by concept.code and concept.codeSystemId.
> so the  @concept.code and  @concept.codeSystemId you see are replaced in
> runtime to actual values.
> all of the optional sections you see are to ignore deleted (logically) or
> not bound concepts.
>
> Thanks,
>
> Nadav
>
> On Mon, Sep 2, 2013 at 4:14 PM, Andy Seaborne <an...@apache.org> wrote:
>
>> On 02/09/13 12:51, nadav hoze wrote:
>>
>>> hi,
>>>
>>> We are doing stress tests to our service which it's underlying data layer
>>> is jena TDB.
>>> one of our tests is tor run heavy queries for long time (about 6 Hrs) and
>>> afterwards run light queries. (we have clients which are in that mode).
>>> What we witness is a huge performance degradation, light queries which
>>> usually took around 0.1-0.2 sec after the heavy queries execution took
>>> more
>>> than 3 seconds.
>>>
>>
>> Not surprising - the heavy queries will have taken over the OS
>> cache.(assuming 64 bit - a similar effect occurs on 32 bit).  The
>> light-after-heavy is effectively running cold.
>>
>>   Also the heavy query execution had a huge performance degradation after
>>> only one minute:
>>> each heavy query fetched around  35000 triplets and for the first minutes
>>> it took between 10-40 seconds (which is OK), afterwards it peaked to
>>> 200-8000 seconds.
>>> Same thing memory wise, after a minute it peaked from 200mg to 2.2g.
>>>
>>> What I would like to know is if there could be memory leak in jena, or
>>> whether jena objects are cached in some way and maybe we can release them.
>>>
>>> Here are important details for answering:
>>> *jena version: 2.6.4*
>>> *tdb version: 0.8.9*
>>> *arq: 2.8.7*
>>> *we use a single model and no datasets.*
>>>
>>>
>>> Also can an upgrade to jena latest stable version help us here ?
>>>
>>
>> You should upgrade anyway. There are bug fixes.  And a different license.
>>
>>
>>
>>> Help is much appreciated :)
>>>
>>>
>> All depends on what the heavy query touches in the database (the pattern
>> matching part), the size of the machine, whether anything else is running
>> on the machine, ...
>>
>> There are many, many factors:
>>
>> What size of the machine?
>> What OS?
>> Is it a VM?
>> How much data (in triples) is there in the DB?
>> Heap size?
>> The driver program is on What
>> the same machine as the database - does this matter?
>> ...
>>
>>          Andy
>>
>>
>>   Thanks,
>>>
>>> Nadav
>>>
>>>
>>
>


Re: Heavy queries followed by light queries

Posted by nadav hoze <na...@gmail.com>.
Machine size: 12 GB
OS: Windows Server 2008 64 bit
VM: varies from client to client.
data (in triples): 20,000,000 (3.6 GB)
Heap size: 2 GB
Driver program : ? (didn't understand)
No the database is on a network shared drive (different server).

pattern matching (where clause):

*?ontologyConcept schema:code @concept.code^^xsd:string .*
*?ontologyConcept schema:codeSystemId @concept.codeSystemId^^xsd:string*
*OPTIONAL{?ontologyConcept schema:isDeleted ?ontologyConceptDeleted}
FILTER(!bound(?ontologyConceptDeleted) || (bound(?ontologyConceptDeleted)
&& ?ontologyConceptDeleted = false))*
*{*
* ?child relations:subClassOf ?ontologyConcept .*
* OPTIONAL{?child schema:isDeleted ?childDeleted}
FILTER(!bound(?childDeleted) || (bound(?childDeleted) && ?childDeleted =
false))*
* ?concept relations:equalsTo ?child .*
* OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
?conceptDeleted = false))*
* ?concept rdf:type schema:Concept*
*}*
*UNION*
*{*
* ?concept relations:equalsTo ?ontologyConcept .*
* ?concept rdf:type schema:Concept*
* OPTIONAL{?concept schema:isDeleted ?conceptDeleted}
FILTER(!bound(?conceptDeleted) || (bound(?conceptDeleted) &&
?conceptDeleted = false))*
*}*

basically all this big fuss is to find all child concepts of a specified
parent concept identified by concept.code and concept.codeSystemId.
so the  @concept.code and  @concept.codeSystemId you see are replaced in
runtime to actual values.
all of the optional sections you see are to ignore deleted (logically) or
not bound concepts.

Thanks,

Nadav

On Mon, Sep 2, 2013 at 4:14 PM, Andy Seaborne <an...@apache.org> wrote:

> On 02/09/13 12:51, nadav hoze wrote:
>
>> hi,
>>
>> We are doing stress tests to our service which it's underlying data layer
>> is jena TDB.
>> one of our tests is tor run heavy queries for long time (about 6 Hrs) and
>> afterwards run light queries. (we have clients which are in that mode).
>> What we witness is a huge performance degradation, light queries which
>> usually took around 0.1-0.2 sec after the heavy queries execution took
>> more
>> than 3 seconds.
>>
>
> Not surprising - the heavy queries will have taken over the OS
> cache.(assuming 64 bit - a similar effect occurs on 32 bit).  The
> light-after-heavy is effectively running cold.
>
>  Also the heavy query execution had a huge performance degradation after
>> only one minute:
>> each heavy query fetched around  35000 triplets and for the first minutes
>> it took between 10-40 seconds (which is OK), afterwards it peaked to
>> 200-8000 seconds.
>> Same thing memory wise, after a minute it peaked from 200mg to 2.2g.
>>
>> What I would like to know is if there could be memory leak in jena, or
>> whether jena objects are cached in some way and maybe we can release them.
>>
>> Here are important details for answering:
>> *jena version: 2.6.4*
>> *tdb version: 0.8.9*
>> *arq: 2.8.7*
>> *we use a single model and no datasets.*
>>
>>
>> Also can an upgrade to jena latest stable version help us here ?
>>
>
> You should upgrade anyway. There are bug fixes.  And a different license.
>
>
>
>> Help is much appreciated :)
>>
>>
> All depends on what the heavy query touches in the database (the pattern
> matching part), the size of the machine, whether anything else is running
> on the machine, ...
>
> There are many, many factors:
>
> What size of the machine?
> What OS?
> Is it a VM?
> How much data (in triples) is there in the DB?
> Heap size?
> The driver program is on What
> the same machine as the database - does this matter?
> ...
>
>         Andy
>
>
>  Thanks,
>>
>> Nadav
>>
>>
>

Re: Heavy queries followed by light queries

Posted by Andy Seaborne <an...@apache.org>.
On 02/09/13 12:51, nadav hoze wrote:
> hi,
>
> We are doing stress tests to our service which it's underlying data layer
> is jena TDB.
> one of our tests is tor run heavy queries for long time (about 6 Hrs) and
> afterwards run light queries. (we have clients which are in that mode).
> What we witness is a huge performance degradation, light queries which
> usually took around 0.1-0.2 sec after the heavy queries execution took more
> than 3 seconds.

Not surprising - the heavy queries will have taken over the OS 
cache.(assuming 64 bit - a similar effect occurs on 32 bit).  The 
light-after-heavy is effectively running cold.

> Also the heavy query execution had a huge performance degradation after
> only one minute:
> each heavy query fetched around  35000 triplets and for the first minutes
> it took between 10-40 seconds (which is OK), afterwards it peaked to
> 200-8000 seconds.
> Same thing memory wise, after a minute it peaked from 200mg to 2.2g.
>
> What I would like to know is if there could be memory leak in jena, or
> whether jena objects are cached in some way and maybe we can release them.
>
> Here are important details for answering:
> *jena version: 2.6.4*
> *tdb version: 0.8.9*
> *arq: 2.8.7*
> *we use a single model and no datasets.*
>
> Also can an upgrade to jena latest stable version help us here ?

You should upgrade anyway. There are bug fixes.  And a different license.

>
> Help is much appreciated :)
>

All depends on what the heavy query touches in the database (the pattern 
matching part), the size of the machine, whether anything else is 
running on the machine, ...

There are many, many factors:

What size of the machine?
What OS?
Is it a VM?
How much data (in triples) is there in the DB?
Heap size?
The driver program is on What
the same machine as the database - does this matter?
...

	Andy


> Thanks,
>
> Nadav
>