You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Brandon Sara <br...@collectivemedicaltech.com.INVALID> on 2021/08/11 00:17:16 UTC

Need recommendation for memory settings using Fuseki/Delta server

Can I get some recommendations on how to best tweak/setup memory for my fuseki servers? Here is my setup:

- I’ve got a single TDB with at least several million triples (I don’t know the exact amount yet, but perhaps around 10s of millions, maybe 100s of millions…a the very least, I need it to scale to 100s of millions).
- Everything is put in the default graph currently (wanting to change this…but can’t the this point in time).
- The “TransitiveReasoner” is being used on the dataset.
- Full text indexing over two different fields is enabled using Lucene.
- The servers are running embedded Fuseki via rdf-delta and sync via a central rdf-delta server.
- The simplest of queries won’t finish and runs out of memory with, at the very least, 6 GB of RAM.

Also, should I be tweaking my non-heap memory to be larger for the Fuseki server?

Thanks.

No PHI in Email: PointClickCare and Collective Medical, A PointClickCare Company, policies prohibit sending protected health information (PHI) by email, which may violate regulatory requirements. If sending PHI is necessary, please contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

Re: Need recommendation for memory settings using Fuseki/Delta server

Posted by Brandon Sara <br...@collectivemedicaltech.com.INVALID>.

As far as I am aware, we are using TDB2. Is there a way for me to verify this?

I’ve also discovered that this happens after a sync has failed because the patch returned a 404. Given that which patch comes next is automated, it seems quite strange that this would ever happen. It is what was causing my fuseki servers to loose the current delta patch version (which I reported via GitHub). For some reason, once this scenario occurs, no matter how much RAM I give it, the fuseki server can never update to the latest patch. I can’t get any metrics on it because of the fact that metrics don’t start reporting until the fuseki server is up (which is why I requested that the initial sync happen in the background and allow the fuseki server to start right away…it’s all coming together isn’t it ;) ). The fuseki server requests the latest patch…which I assume it obtains (I’m able to query the delta server directly and get it just fine…and it returns quite quickly), then it gets stuck and the server startup times out after 1 hour. A couple of days ago, when this happened, I had debug logging turned on and found that it consistently always stopped at the same point for the patch it was trying to load in `BlockAccessMapped` (which seemed to be where it was reading the input stream of the patch file) and this was always the last log before it froze:

    TRACE [main-1] org.apache.jena.dboe.base.file.BlockAccessMapped : 262750 => [256, 4964352]

> On Aug 12, 2021, at 2:13 PM, Andy Seaborne <an...@apache.org> wrote:
>
> "EXTERNAL EMAIL" - This email originated from outside of the organization. Do not click or open attachments unless you recognize the sender and know the content is safe. If you are unsure, please contact CTS at helpme@pointclickcare.com.
>
> And you use TDB1?
>
> TDB1 can use more memory - and between all the other components it might all amount to 6G if the server isn't able to do some of the background tidy-up work for a while.
>
> TDB2 does not have this effect.
>
>    Andy
>
> On 12/08/2021 17:19, Brandon Sara wrote:
>> I believe that I’ve found the problem. It could be two different problems actually. One was that, during some experimentation locally, I ended up running a VERY LONG running update script…which never actually finished. This seems like it could have been the cause for things not running smoothly locally. As for another environment, I’ve found that if I have updates that are too large, the sync from the delta server runs out of memory. Some of these patch files were about 30 MB. And like you mentioned, this causes very long running updates…which cause the memory to run out…but strangely doesn’t crash the server or throw any errors. It consistently stopped at the exact same point (according to the debug logs) in its update every single time I restarted the server. To remedy this, I’ve split up the manual updates that I’m applying into smaller batches, things seem to be running smoothly again. But this does bring up a concern as to what workarounds I would need if I ever needed to do a large scale dynamic insert/delete via an update script.
>>> On Aug 12, 2021, at 7:37 AM, Andy Seaborne <an...@apache.org> wrote:
>>>
>>> "EXTERNAL EMAIL" - This email originated from outside of the organization. Do not click or open attachments unless you recognize the sender and know the content is safe. If you are unsure, please contact CTS at helpme@pointclickcare.com.
>>>
>>>
>>>
>>> On 11/08/2021 21:21, Brandon Sara wrote:
>>>>> 10s of millions triples of RDFS schema and no instance data?
>>>> Yeah, it’s kinda weird. I inherited this project and am working on fixing much of the structuring, but in the mean time, need to keep it going as is. We are loading ICD-10 CM, SNOMED CT, and many other medical ontologies/thesauri…hence the large ontology. Pretty much every concept is treated as a class. At this point in time, we are using ontology itself for some inference and mapping. Eventually, we will be bringing instance data into the KG to do more powerful inference using the medical ontologies I mentioned.
>>>
>>> Try running without it as a test.
>>>
>>> The transitive reasoner fires up either as the when the server starts or first request (can't remember which).
>>>
>>>>> custom:id has super properties?
>>>> No
>>>
>>> From what you've said, that takes not much memory - at very worse, it populates the node cache which is an LRU cache and usually 2G is enough. (unless you have a lot of very large literals - many lines of text).
>>>
>>>>> is the request causing the database to be sync'ed before the request starts?
>>>> Yes
>>>
>>> That's a source of RAM use if there are large pending updates.
>>>
>>> Also try the query
>>>
>>> SELECT * {} or ASK{}
>>>
>>> which does all the end-to-end stuff for setup and sync but does not touch the data.
>>>
>>> The other thing to try is point VisualVM at the process and look for the memory usage and heap usage.
>>>
>>>    Andy
>>>
>> No PHI in Email: PointClickCare and Collective Medical, A PointClickCare Company, policies prohibit sending protected health information (PHI) by email, which may violate regulatory requirements. If sending PHI is necessary, please contact the sender for secure delivery instructions.
>> Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

No PHI in Email: PointClickCare and Collective Medical, A PointClickCare Company, policies prohibit sending protected health information (PHI) by email, which may violate regulatory requirements. If sending PHI is necessary, please contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

Re: Need recommendation for memory settings using Fuseki/Delta server

Posted by Andy Seaborne <an...@apache.org>.

And you use TDB1?

TDB1 can use more memory - and between all the other components it might 
all amount to 6G if the server isn't able to do some of the background 
tidy-up work for a while.

TDB2 does not have this effect.

     Andy

On 12/08/2021 17:19, Brandon Sara wrote:
> I believe that I’ve found the problem. It could be two different problems actually. One was that, during some experimentation locally, I ended up running a VERY LONG running update script…which never actually finished. This seems like it could have been the cause for things not running smoothly locally. As for another environment, I’ve found that if I have updates that are too large, the sync from the delta server runs out of memory. Some of these patch files were about 30 MB. And like you mentioned, this causes very long running updates…which cause the memory to run out…but strangely doesn’t crash the server or throw any errors. It consistently stopped at the exact same point (according to the debug logs) in its update every single time I restarted the server. To remedy this, I’ve split up the manual updates that I’m applying into smaller batches, things seem to be running smoothly again. But this does bring up a concern as to what workarounds I would need if I ever needed to do a large scale dynamic insert/delete via an update script.
> 
>> On Aug 12, 2021, at 7:37 AM, Andy Seaborne <an...@apache.org> wrote:
>>
>> "EXTERNAL EMAIL" - This email originated from outside of the organization. Do not click or open attachments unless you recognize the sender and know the content is safe. If you are unsure, please contact CTS at helpme@pointclickcare.com.
>>
>>
>>
>> On 11/08/2021 21:21, Brandon Sara wrote:
>>>> 10s of millions triples of RDFS schema and no instance data?
>>> Yeah, it’s kinda weird. I inherited this project and am working on fixing much of the structuring, but in the mean time, need to keep it going as is. We are loading ICD-10 CM, SNOMED CT, and many other medical ontologies/thesauri…hence the large ontology. Pretty much every concept is treated as a class. At this point in time, we are using ontology itself for some inference and mapping. Eventually, we will be bringing instance data into the KG to do more powerful inference using the medical ontologies I mentioned.
>>
>> Try running without it as a test.
>>
>> The transitive reasoner fires up either as the when the server starts or first request (can't remember which).
>>
>>>> custom:id has super properties?
>>> No
>>
>>  From what you've said, that takes not much memory - at very worse, it populates the node cache which is an LRU cache and usually 2G is enough. (unless you have a lot of very large literals - many lines of text).
>>
>>>> is the request causing the database to be sync'ed before the request starts?
>>> Yes
>>
>> That's a source of RAM use if there are large pending updates.
>>
>> Also try the query
>>
>> SELECT * {} or ASK{}
>>
>> which does all the end-to-end stuff for setup and sync but does not touch the data.
>>
>> The other thing to try is point VisualVM at the process and look for the memory usage and heap usage.
>>
>>     Andy
>>
> 
> No PHI in Email: PointClickCare and Collective Medical, A PointClickCare Company, policies prohibit sending protected health information (PHI) by email, which may violate regulatory requirements. If sending PHI is necessary, please contact the sender for secure delivery instructions.
> 
> Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
>

Re: Need recommendation for memory settings using Fuseki/Delta server

Posted by Brandon Sara <br...@collectivemedicaltech.com.INVALID>.

I believe that I’ve found the problem. It could be two different problems actually. One was that, during some experimentation locally, I ended up running a VERY LONG running update script…which never actually finished. This seems like it could have been the cause for things not running smoothly locally. As for another environment, I’ve found that if I have updates that are too large, the sync from the delta server runs out of memory. Some of these patch files were about 30 MB. And like you mentioned, this causes very long running updates…which cause the memory to run out…but strangely doesn’t crash the server or throw any errors. It consistently stopped at the exact same point (according to the debug logs) in its update every single time I restarted the server. To remedy this, I’ve split up the manual updates that I’m applying into smaller batches, things seem to be running smoothly again. But this does bring up a concern as to what workarounds I would need if I ever needed to do a large scale dynamic insert/delete via an update script.

> On Aug 12, 2021, at 7:37 AM, Andy Seaborne <an...@apache.org> wrote:
>
> "EXTERNAL EMAIL" - This email originated from outside of the organization. Do not click or open attachments unless you recognize the sender and know the content is safe. If you are unsure, please contact CTS at helpme@pointclickcare.com.
>
>
>
> On 11/08/2021 21:21, Brandon Sara wrote:
>>> 10s of millions triples of RDFS schema and no instance data?
>> Yeah, it’s kinda weird. I inherited this project and am working on fixing much of the structuring, but in the mean time, need to keep it going as is. We are loading ICD-10 CM, SNOMED CT, and many other medical ontologies/thesauri…hence the large ontology. Pretty much every concept is treated as a class. At this point in time, we are using ontology itself for some inference and mapping. Eventually, we will be bringing instance data into the KG to do more powerful inference using the medical ontologies I mentioned.
>
> Try running without it as a test.
>
> The transitive reasoner fires up either as the when the server starts or first request (can't remember which).
>
>>> custom:id has super properties?
>> No
>
> From what you've said, that takes not much memory - at very worse, it populates the node cache which is an LRU cache and usually 2G is enough. (unless you have a lot of very large literals - many lines of text).
>
>>> is the request causing the database to be sync'ed before the request starts?
>> Yes
>
> That's a source of RAM use if there are large pending updates.
>
> Also try the query
>
> SELECT * {} or ASK{}
>
> which does all the end-to-end stuff for setup and sync but does not touch the data.
>
> The other thing to try is point VisualVM at the process and look for the memory usage and heap usage.
>
>    Andy
>

No PHI in Email: PointClickCare and Collective Medical, A PointClickCare Company, policies prohibit sending protected health information (PHI) by email, which may violate regulatory requirements. If sending PHI is necessary, please contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

Re: Need recommendation for memory settings using Fuseki/Delta server

Posted by Brandon Sara <br...@collectivemedicaltech.com.INVALID>.

I forgot to mention that after splitting up into smaller batches of updates, memory never exceeded ~3.9GB during or after the updates. So that was another very good sign.

> On Aug 12, 2021, at 7:37 AM, Andy Seaborne <an...@apache.org> wrote:
>
> "EXTERNAL EMAIL" - This email originated from outside of the organization. Do not click or open attachments unless you recognize the sender and know the content is safe. If you are unsure, please contact CTS at helpme@pointclickcare.com.
>
>
>
> On 11/08/2021 21:21, Brandon Sara wrote:
>>> 10s of millions triples of RDFS schema and no instance data?
>> Yeah, it’s kinda weird. I inherited this project and am working on fixing much of the structuring, but in the mean time, need to keep it going as is. We are loading ICD-10 CM, SNOMED CT, and many other medical ontologies/thesauri…hence the large ontology. Pretty much every concept is treated as a class. At this point in time, we are using ontology itself for some inference and mapping. Eventually, we will be bringing instance data into the KG to do more powerful inference using the medical ontologies I mentioned.
>
> Try running without it as a test.
>
> The transitive reasoner fires up either as the when the server starts or first request (can't remember which).
>
>>> custom:id has super properties?
>> No
>
> From what you've said, that takes not much memory - at very worse, it populates the node cache which is an LRU cache and usually 2G is enough. (unless you have a lot of very large literals - many lines of text).
>
>>> is the request causing the database to be sync'ed before the request starts?
>> Yes
>
> That's a source of RAM use if there are large pending updates.
>
> Also try the query
>
> SELECT * {} or ASK{}
>
> which does all the end-to-end stuff for setup and sync but does not touch the data.
>
> The other thing to try is point VisualVM at the process and look for the memory usage and heap usage.
>
>    Andy
>

No PHI in Email: PointClickCare and Collective Medical, A PointClickCare Company, policies prohibit sending protected health information (PHI) by email, which may violate regulatory requirements. If sending PHI is necessary, please contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

Re: Need recommendation for memory settings using Fuseki/Delta server

Posted by Andy Seaborne <an...@apache.org>.

On 11/08/2021 21:21, Brandon Sara wrote:
>> 10s of millions triples of RDFS schema and no instance data?
> Yeah, it’s kinda weird. I inherited this project and am working on fixing much of the structuring, but in the mean time, need to keep it going as is. We are loading ICD-10 CM, SNOMED CT, and many other medical ontologies/thesauri…hence the large ontology. Pretty much every concept is treated as a class. At this point in time, we are using ontology itself for some inference and mapping. Eventually, we will be bringing instance data into the KG to do more powerful inference using the medical ontologies I mentioned.
> 

Try running without it as a test.

The transitive reasoner fires up either as the when the server starts or 
first request (can't remember which).

>> custom:id has super properties?
> No

 From what you've said, that takes not much memory - at very worse, it 
populates the node cache which is an LRU cache and usually 2G is enough. 
(unless you have a lot of very large literals - many lines of text).

> 
>> is the request causing the database to be sync'ed before the request starts?
> Yes
> 

That's a source of RAM use if there are large pending updates.

Also try the query

SELECT * {} or ASK{}

which does all the end-to-end stuff for setup and sync but does not 
touch the data.

The other thing to try is point VisualVM at the process and look for the 
memory usage and heap usage.

     Andy

Re: Need recommendation for memory settings using Fuseki/Delta server

Posted by Brandon Sara <br...@collectivemedicaltech.com.INVALID>.

> 10s of millions triples of RDFS schema and no instance data?
Yeah, it’s kinda weird. I inherited this project and am working on fixing much of the structuring, but in the mean time, need to keep it going as is. We are loading ICD-10 CM, SNOMED CT, and many other medical ontologies/thesauri…hence the large ontology. Pretty much every concept is treated as a class. At this point in time, we are using ontology itself for some inference and mapping. Eventually, we will be bringing instance data into the KG to do more powerful inference using the medical ontologies I mentioned.

> custom:id has super properties?
No

> is the request causing the database to be sync'ed before the request starts?
Yes

On Aug 11, 2021, at 12:44 PM, Andy Seaborne <an...@apache.org>> wrote:

"EXTERNAL EMAIL" - This email originated from outside of the organization. Do not click or open attachments unless you recognize the sender and know the content is safe. If you are unsure, please contact CTS at helpme@pointclickcare.com<ma...@pointclickcare.com>.



On 11/08/2021 19:07, Brandon Sara wrote:
>> What properties are transitive?
> Right now, it is just an ontology…so, everything is properties and classes. So, subClassOf and subPropertyOf exist on nearly every subject node.

10s of millions triples of RDFS schema and no instance data?

>
>> Example query?
>
> PREFIX : <urn:blah/>
> PREFIX custom: <http://example.com/<http://example.com>>
>
> SELECT *
> WHERE {
> :42 custom:id ?id
> }
>
>
> (`:42` has only one `custom:id` triple and has no `owl:sameAs` inference…only “TransitiveReasoner” is being used)

custom:id has super properties?


And is the request causing the database to be sync'ed before the request
starts?

>
> On Aug 11, 2021, at 8:29 AM, Andy Seaborne <an...@apache.org>> wrote:
>
> "EXTERNAL EMAIL" - This email originated from outside of the organization. Do not click or open attachments unless you recognize the sender and know the content is safe. If you are unsure, please contact CTS at helpme@pointclickcare.com<ma...@pointclickcare.com>.
>
>
>
> On 11/08/2021 01:17, Brandon Sara wrote:
>> Can I get some recommendations on how to best tweak/setup memory for my fuseki servers? Here is my setup:
>>
>> - I’ve got a single TDB with at least several million triples (I don’t know the exact amount yet, but perhaps around 10s of millions, maybe 100s of millions…a the very least, I need it to scale to 100s of millions).
>> - Everything is put in the default graph currently (wanting to change this…but can’t the this point in time).
>> - The “TransitiveReasoner” is being used on the dataset.
>
> What properties are transitive?
>
> (and maybe https://jena.apache.org/documentation/rdfs/?<https://jena.apache.org/documentation/rdfs/?><https://jena.apache.org/documentation/rdfs/?<https://jena.apache.org/documentation/rdfs/?>>)
>
>> - Full text indexing over two different fields is enabled using Lucene.
>> - The servers are running embedded Fuseki via rdf-delta and sync via a central rdf-delta server.
>> - The simplest of queries won’t finish and runs out of memory with, at the very least, 6 GB of RAM.
>
> Example query?
>
> And is the request causing the database to be sync'ed before the request
> starts?
>
>>
>> Also, should I be tweaking my non-heap memory to be larger for the Fuseki server?
>
> Unlikely.
>
> Andy
>
>>
>> Thanks.
>>
>> No PHI in Email: PointClickCare and Collective Medical, A PointClickCare Company, policies prohibit sending protected health information (PHI) by email, which may violate regulatory requirements. If sending PHI is necessary, please contact the sender for secure delivery instructions.
>>
>> Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
>>
>
>
> No PHI in Email: PointClickCare and Collective Medical, A PointClickCare Company, policies prohibit sending protected health information (PHI) by email, which may violate regulatory requirements. If sending PHI is necessary, please contact the sender for secure delivery instructions.
>
> Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
>


No PHI in Email: PointClickCare and Collective Medical, A PointClickCare Company, policies prohibit sending protected health information (PHI) by email, which may violate regulatory requirements. If sending PHI is necessary, please contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

Re: Need recommendation for memory settings using Fuseki/Delta server

Posted by Andy Seaborne <an...@apache.org>.


On 11/08/2021 19:07, Brandon Sara wrote:
>> What properties are transitive?
> Right now, it is just an ontology…so, everything is properties and classes. So, subClassOf and subPropertyOf exist on nearly every subject node.

10s of millions triples of RDFS schema and no instance data?

> 
>> Example query?
> 
> PREFIX : <urn:blah/>
> PREFIX custom: <http://example.com/>
> 
> SELECT *
> WHERE {
>    :42 custom:id ?id
> }
> 
> 
> (`:42` has only one `custom:id` triple and has no `owl:sameAs` inference…only “TransitiveReasoner” is being used)

custom:id has super properties?


And is the request causing the database to be sync'ed before the request 
starts?

> 
> On Aug 11, 2021, at 8:29 AM, Andy Seaborne <an...@apache.org>> wrote:
> 
> "EXTERNAL EMAIL" - This email originated from outside of the organization. Do not click or open attachments unless you recognize the sender and know the content is safe. If you are unsure, please contact CTS at helpme@pointclickcare.com<ma...@pointclickcare.com>.
> 
> 
> 
> On 11/08/2021 01:17, Brandon Sara wrote:
>> Can I get some recommendations on how to best tweak/setup memory for my fuseki servers? Here is my setup:
>>
>> - I’ve got a single TDB with at least several million triples (I don’t know the exact amount yet, but perhaps around 10s of millions, maybe 100s of millions…a the very least, I need it to scale to 100s of millions).
>> - Everything is put in the default graph currently (wanting to change this…but can’t the this point in time).
>> - The “TransitiveReasoner” is being used on the dataset.
> 
> What properties are transitive?
> 
> (and maybe https://jena.apache.org/documentation/rdfs/?<https://jena.apache.org/documentation/rdfs/?>)
> 
>> - Full text indexing over two different fields is enabled using Lucene.
>> - The servers are running embedded Fuseki via rdf-delta and sync via a central rdf-delta server.
>> - The simplest of queries won’t finish and runs out of memory with, at the very least, 6 GB of RAM.
> 
> Example query?
> 
> And is the request causing the database to be sync'ed before the request
> starts?
> 
>>
>> Also, should I be tweaking my non-heap memory to be larger for the Fuseki server?
> 
> Unlikely.
> 
> Andy
> 
>>
>> Thanks.
>>
>> No PHI in Email: PointClickCare and Collective Medical, A PointClickCare Company, policies prohibit sending protected health information (PHI) by email, which may violate regulatory requirements. If sending PHI is necessary, please contact the sender for secure delivery instructions.
>>
>> Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
>>
> 
> 
> No PHI in Email: PointClickCare and Collective Medical, A PointClickCare Company, policies prohibit sending protected health information (PHI) by email, which may violate regulatory requirements. If sending PHI is necessary, please contact the sender for secure delivery instructions.
> 
> Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
>

Re: Need recommendation for memory settings using Fuseki/Delta server

Posted by Brandon Sara <br...@collectivemedicaltech.com.INVALID>.

> What properties are transitive?
Right now, it is just an ontology…so, everything is properties and classes. So, subClassOf and subPropertyOf exist on nearly every subject node.

> Example query?

PREFIX : <urn:blah/>
PREFIX custom: <http://example.com/>

SELECT *
WHERE {
  :42 custom:id ?id
}

(`:42` has only one `custom:id` triple and has no `owl:sameAs` inference…only “TransitiveReasoner” is being used)

On Aug 11, 2021, at 8:29 AM, Andy Seaborne <an...@apache.org>> wrote:

"EXTERNAL EMAIL" - This email originated from outside of the organization. Do not click or open attachments unless you recognize the sender and know the content is safe. If you are unsure, please contact CTS at helpme@pointclickcare.com<ma...@pointclickcare.com>.

On 11/08/2021 01:17, Brandon Sara wrote:
> Can I get some recommendations on how to best tweak/setup memory for my fuseki servers? Here is my setup:
>
> - I’ve got a single TDB with at least several million triples (I don’t know the exact amount yet, but perhaps around 10s of millions, maybe 100s of millions…a the very least, I need it to scale to 100s of millions).
> - Everything is put in the default graph currently (wanting to change this…but can’t the this point in time).
> - The “TransitiveReasoner” is being used on the dataset.

What properties are transitive?

(and maybe https://jena.apache.org/documentation/rdfs/?<https://jena.apache.org/documentation/rdfs/?>)

> - Full text indexing over two different fields is enabled using Lucene.
> - The servers are running embedded Fuseki via rdf-delta and sync via a central rdf-delta server.
> - The simplest of queries won’t finish and runs out of memory with, at the very least, 6 GB of RAM.

Example query?

And is the request causing the database to be sync'ed before the request
starts?

>
> Also, should I be tweaking my non-heap memory to be larger for the Fuseki server?

Unlikely.

Andy

>
> Thanks.
>
> No PHI in Email: PointClickCare and Collective Medical, A PointClickCare Company, policies prohibit sending protected health information (PHI) by email, which may violate regulatory requirements. If sending PHI is necessary, please contact the sender for secure delivery instructions.
>
> Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
>

No PHI in Email: PointClickCare and Collective Medical, A PointClickCare Company, policies prohibit sending protected health information (PHI) by email, which may violate regulatory requirements. If sending PHI is necessary, please contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

Re: Need recommendation for memory settings using Fuseki/Delta server

Posted by Andy Seaborne <an...@apache.org>.


On 11/08/2021 01:17, Brandon Sara wrote:
> Can I get some recommendations on how to best tweak/setup memory for my fuseki servers? Here is my setup:
> 
> - I’ve got a single TDB with at least several million triples (I don’t know the exact amount yet, but perhaps around 10s of millions, maybe 100s of millions…a the very least, I need it to scale to 100s of millions).
> - Everything is put in the default graph currently (wanting to change this…but can’t the this point in time).
> - The “TransitiveReasoner” is being used on the dataset.

What properties are transitive?

(and maybe https://jena.apache.org/documentation/rdfs/?)

> - Full text indexing over two different fields is enabled using Lucene.
> - The servers are running embedded Fuseki via rdf-delta and sync via a central rdf-delta server.
> - The simplest of queries won’t finish and runs out of memory with, at the very least, 6 GB of RAM.

Example query?

And is the request causing the database to be sync'ed before the request 
starts?

> 
> Also, should I be tweaking my non-heap memory to be larger for the Fuseki server?

Unlikely.

     Andy

> 
> Thanks.
> 
> No PHI in Email: PointClickCare and Collective Medical, A PointClickCare Company, policies prohibit sending protected health information (PHI) by email, which may violate regulatory requirements. If sending PHI is necessary, please contact the sender for secure delivery instructions.
> 
> Confidentiality Notice: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
>