You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Adrian Gschwend <ml...@netlabs.org> on 2021/06/29 13:07:19 UTC

Fuseki Graph Store Protocol: Streaming or not?

Hi everyone,

We have automated pipelines that write to Fuseki using the SPARQL Graph
Store protocol. This seems to work fine for smaller junks of data but
when we write a larger dataset of around 15 million triples in one
batch, this seems to fail.

After checking out what happens, we see an OOM error.

We send application/n-triples so I was expecting that it streams it.
When using tdbloader this size is not really an issue at all.

In this particular setup we first used TDB, the machine has 6GB of
memory assigned.

TDB2 seems to behave a bit better, it runs through without OOM but takes
1.5 hours for the job while it is less than 15 minutes when we split it
into smaller junks and send it in ~100k triples batches via Graph Store
Protocol.

Interestingly we never see more than 1GB of RAM used so I'm even more
confused.

Is this OOM error to be expected for large graph-store writes?


regards

Adrian

Re: Fuseki Graph Store Protocol: Streaming or not?

Posted by Andy Seaborne <an...@apache.org>.


On 29/06/2021 20:49, Adrian Gschwend wrote:

> 
> But that reminds me of something else, it's a custom Fuseki version we
> made with Open Telemetry integrated so we can get a lot more tracing:
> 
> https://github.com/zazuko/docker-fuseki-otel
> 
> (we started doing that when having problems that were almost impossible
> to debug otherwise as the final sender will also be somewhere else with
> proxy & other stuff that makes debugging super hard).

Is there a potential for Fuseki to provide this built-in?

Also: it has Prometheus stats:
https://jena.apache.org/documentation/fuseki2/fuseki-server-info.html

     Andy

Re: Fuseki Graph Store Protocol: Streaming or not?

Posted by james anderson <ja...@dydra.com>.

> On 2021-06-30, at 11:56:11, Andy Seaborne <an...@apache.org> wrote:
> 
> Hi Adrian,
> 
> ...
> 
> All I can think of is the larger-than-needed heap growing to take most of the machine and squeezing out the file system cache causing a lot more real I/O.

if it is not to be accounted for with memory-management losses, then iotop would have to reveal something in the respective read/write rates and wait for the respective approaches
---
james anderson | james@dydra.com | https://dydra.com

Re: Fuseki Graph Store Protocol: Streaming or not?

Posted by Andy Seaborne <an...@apache.org>.

Hi Adrian,

(Fuseki version number?)

Your data: your script,  I get load rates of

Fuseki main
6m44.184s / 34k TPS (triples per second)

Fuseki in the form you used:
7m11.894s / 32k

(only one run each so these two are "the same")

which is about what I'd expect.

Datasets are "publish centric" (indexed for every access pattern) but it 
has an update cost.

So we seem to be down to that fact that as one big file, you get 1.5 
hours but comparable when split into 100k chunks.

That I can't explain.

All I can think of is the larger-than-needed heap growing to take most 
of the machine and squeezing out the file system cache causing a lot 
more real I/O.

Some notes inline ...

     Andy

On 29/06/2021 20:49, Adrian Gschwend wrote:
> On 29.06.21 20:29, Andy Seaborne wrote:
> 
> Hi Andy,
> 
>> I'd expect faster though there are a lot of environmental factors. Lots
>> of question below ...
> 
> good point
> 
>> I've loaded 200+million on the default setup into a live server using
>> TDB2 before.
> 
> ok good to know. I started to have some doubts, that's why I asked.
> 
>> How is the data being sent? What's the client software?
> 
> In this test it's pure curl to a named graph:
> 
> curl -X PUT \

It's a PUT which clears the destination first.

"Clear" is "delete all", not a fast path, because of current transactions.

>       -n \
>       -H Content-Type:application/n-triples \
>       -T scope.nt \
>       -G $SINK_ENDPOINT_URL \
>       --data-urlencode graph=https://some-named-graph/graph/ais-metadata
 >
>> Does the data have a lot of long literals?
> 
> What is "long" in that context? It's archival records so they indeed do
> have longer literals, at least partially.

I looked at the start and didn't see anything of note.

"Long" means strings beign 100's of chars long.

>> Is it "same machine" or are sender and server on different machines?
> 
> different machine, endpoint is in a hosted kubernetes cluster. There is
> obviously some overhead because of the line but it should not be a big
> issue in this setup.
> 
>> What does the log have in it? And what's in the log if running "verbose"
>> which prints more HTTP details.
> 
> What would be of interest here?

The server outputs a long file that records

when verbose it prints the headers:

10:36:04 INFO  Fuseki     :: [1] PUT http://localhost:3030/ds
10:36:04 INFO  Fuseki     :: [1]   => Accept:              */*
10:36:04 INFO  Fuseki     :: [1]   => Expect:              100-continue
10:36:04 INFO  Fuseki     :: [1]   => User-Agent:          curl/7.74.0
10:36:04 INFO  Fuseki     :: [1]   => Host:                localhost:3030
10:36:04 INFO  Fuseki     :: [1]   => Content-Length:      59
10:36:04 INFO  Fuseki     :: [1]   => Content-Type: 
application/n-triples

10:36:28 INFO  Fuseki     :: [1] Body: Content-Length=59, 
Content-Type=application/n-triples, Charset=null => N-Triples : Count=1 
Triples=1 Quads=0
10:36:28 INFO  Fuseki     :: [1]   <= Content-Type:        application/json
10:36:28 INFO  Fuseki     :: [1]   <= Content-Length:      61
10:36:28 INFO  Fuseki     :: [1]   <= Server:              Apache Jena 
Fuseki (4.2.0-SNAPSHOT)
10:36:28 INFO  Fuseki     :: [1] 200 OK (2.677 s)

(different data in this example)

but no matter as I have a example working here now if there isn't a 
concurrent access load which shows up as mixed in  [] records


> But that reminds me of something else, it's a custom Fuseki version we
> made with Open Telemetry integrated so we can get a lot more tracing:
> 
> https://github.com/zazuko/docker-fuseki-otel

404

Fuseki will output a regular NCSA log file of requests as well - by 
default it's off but with log4j2 you can set it to write to a file.

> (we started doing that when having problems that were almost impossible
> to debug otherwise as the final sender will also be somewhere else with
> proxy & other stuff that makes debugging super hard).

Yes!

> 
>> If the server also live, running queries?
> 
> nothing extraordinary right now no, still early phase.
> 
>> Which form of Fuseki? The low level of HTTP is provided by the web
>> server - Jetty or Tomcat.
> 
> The zip we use is taken from maven, it's
> apache-jena-fuseki-${JENA_VERSION}.zip, not sure what this one is using?

Fuseki comes as:

WAR file for Tomcat etc.

Standalone server
from that zip - which is a "webapp" (it has a UI) with Jetty.

There is also a server only "Fuseki main"
https://repo1.maven.org/maven2/org/apache/jena/jena-fuseki-server/
which is Fuseki, no UI, not a webapp.

which is the same core engine doing the same thing.

> Source:
> https://github.com/zazuko/docker-fuseki-otel/blob/main/image/Dockerfile#L22

404

>> I presume this is not a set the dataset is nested in some other
>> functionality?
>>
>> (and what's the version, though nothing has changed directly but you
>> never know... maybe a dependency)
> 
> good point after we added open telemetry I think we did not go back to
> original fuseki with no modifications anymore.

If the configuration is layered, it has a cost.

> 
>> If the data is available, I can try to run it at my end.
> 
> it is:
> 
> http://ktk.netlabs.org/misc/rdf/scope.nt.gz

Got it!

>> TDB1:: There is a limitation on the single transaction as it requires
>> temporary heap space. With TDB1, sending chunks avoids the limit (unless
>> the server is under other load and can't find the time to flush the
>> transaction to the database from the journal).
> 
> ok that is pretty much what we experienced. In other words in this setup
> TDB1 will always have this limitation, good to know thanks.
> 
>> TDB2:: There is no such a limitation nor is it affected by a concurrent
>> read load holding up freeing resources.
>>
>> In fact, TDB2 does some of the work while the transaction is in-progress
>> that TDB1 does at the end.
> 
> excellent. With TDB1 we never managed to write everything without OOM,
> with TDB2 it's slow but we could write the full batch.
> 
>>> We send application/n-triples so I was expecting that it streams it.
>>
>> Yes, if it can.
> 
> ok
> 
>> Loading in Fuseki is not full "tdbloader" in either TDB1 or TDB2.
> 
> ok I expected tdbloader "cheats" as it's super fast. Not a problem per
> se obviously. We have another setup where we load with tdbloader and
> then replace the instance in kubernetes. No outside writes allowed in
> that setup.

k8s - always a chance the I/O path is slower that my unvirtualized test 
figures.

>> As in -Xmx6G on what size of machine? If 8G-ish, it's going to suffer
>> lack of space in the disk cache. 2G is likely fine.
> 
> ok will check with my devops colleagues, not sure.
> 
>> Is the storage spinning disk or SSD?
> 
> same
> 
>>> TDB2 seems to behave a bit better, it runs through without OOM but takes
>>> 1.5 hours for the job while it is less than 15 minutes when we split it
>>> into smaller junks and send it in ~100k triples batches via Graph Store
>>> Protocol.
>>
>> That is a bit slow.
> 
> that was my feeling too.
> 
>> If that space is squeezed by Java growing the heap, it can become slow.
> 
> ok will check the setup.
> 
>> TDB2 - there's a reason why it is not TDB1 :-)
> 
> that is very good to know. So far we mainly used it in tdbloader setups
> so apparently the issues with TDB1 were less a problem for our use-cases.
> 
> thanks for the feedback so far!
> 
> regards
> 
> Adrian
>

Re: Fuseki Graph Store Protocol: Streaming or not?

Posted by Adrian Gschwend <ml...@netlabs.org>.

On 29.06.21 20:29, Andy Seaborne wrote:

Hi Andy,

> I'd expect faster though there are a lot of environmental factors. Lots
> of question below ...

good point

> I've loaded 200+million on the default setup into a live server using
> TDB2 before.

ok good to know. I started to have some doubts, that's why I asked.

> How is the data being sent? What's the client software?

In this test it's pure curl to a named graph:

curl -X PUT \
     -n \
     -H Content-Type:application/n-triples \
     -T scope.nt \
     -G $SINK_ENDPOINT_URL \
     --data-urlencode graph=https://some-named-graph/graph/ais-metadata

> Does the data have a lot of long literals?

What is "long" in that context? It's archival records so they indeed do
have longer literals, at least partially.

> Is it "same machine" or are sender and server on different machines?

different machine, endpoint is in a hosted kubernetes cluster. There is
obviously some overhead because of the line but it should not be a big
issue in this setup.

> What does the log have in it? And what's in the log if running "verbose"
> which prints more HTTP details.

What would be of interest here?

But that reminds me of something else, it's a custom Fuseki version we
made with Open Telemetry integrated so we can get a lot more tracing:

https://github.com/zazuko/docker-fuseki-otel

(we started doing that when having problems that were almost impossible
to debug otherwise as the final sender will also be somewhere else with
proxy & other stuff that makes debugging super hard).

> If the server also live, running queries?

nothing extraordinary right now no, still early phase.

> Which form of Fuseki? The low level of HTTP is provided by the web
> server - Jetty or Tomcat.

The zip we use is taken from maven, it's
apache-jena-fuseki-${JENA_VERSION}.zip, not sure what this one is using?

Source:
https://github.com/zazuko/docker-fuseki-otel/blob/main/image/Dockerfile#L22

> I presume this is not a set the dataset is nested in some other
> functionality?
> 
> (and what's the version, though nothing has changed directly but you
> never know... maybe a dependency)

good point after we added open telemetry I think we did not go back to
original fuseki with no modifications anymore.

> If the data is available, I can try to run it at my end.

it is:

http://ktk.netlabs.org/misc/rdf/scope.nt.gz

> TDB1:: There is a limitation on the single transaction as it requires
> temporary heap space. With TDB1, sending chunks avoids the limit (unless
> the server is under other load and can't find the time to flush the
> transaction to the database from the journal).

ok that is pretty much what we experienced. In other words in this setup
TDB1 will always have this limitation, good to know thanks.

> TDB2:: There is no such a limitation nor is it affected by a concurrent
> read load holding up freeing resources.
> 
> In fact, TDB2 does some of the work while the transaction is in-progress
> that TDB1 does at the end.

excellent. With TDB1 we never managed to write everything without OOM,
with TDB2 it's slow but we could write the full batch.

>> We send application/n-triples so I was expecting that it streams it.
> 
> Yes, if it can.

ok

> Loading in Fuseki is not full "tdbloader" in either TDB1 or TDB2.

ok I expected tdbloader "cheats" as it's super fast. Not a problem per
se obviously. We have another setup where we load with tdbloader and
then replace the instance in kubernetes. No outside writes allowed in
that setup.

> As in -Xmx6G on what size of machine? If 8G-ish, it's going to suffer
> lack of space in the disk cache. 2G is likely fine.

ok will check with my devops colleagues, not sure.

> Is the storage spinning disk or SSD?

same

>> TDB2 seems to behave a bit better, it runs through without OOM but takes
>> 1.5 hours for the job while it is less than 15 minutes when we split it
>> into smaller junks and send it in ~100k triples batches via Graph Store
>> Protocol.
> 
> That is a bit slow.

that was my feeling too.

> If that space is squeezed by Java growing the heap, it can become slow.

ok will check the setup.

> TDB2 - there's a reason why it is not TDB1 :-)

that is very good to know. So far we mainly used it in tdbloader setups
so apparently the issues with TDB1 were less a problem for our use-cases.

thanks for the feedback so far!

regards

Adrian

Re: Fuseki Graph Store Protocol: Streaming or not?

Posted by Andy Seaborne <an...@apache.org>.

Hi Adrian,

I'd expect faster though there are a lot of environmental factors. Lots 
of question below ...

I just tried loading 25million BSBM triples into Fuseki server with TDB2 
on my machine (32G RAM, SATA SSD) and it took 7 minutes (59 ktriples/s) 
using s-post to send the single file. 11m40s for a named graph (37K 
triples/s). 2G heap.

I've loaded 200+million on the default setup into a live server using 
TDB2 before.

On 29/06/2021 14:07, Adrian Gschwend wrote:
> Hi everyone,
> 
> We have automated pipelines that write to Fuseki using the SPARQL Graph
> Store protocol. This seems to work fine for smaller junks of data but
> when we write a larger dataset of around 15 million triples in one
> batch, this seems to fail.

Details matter though ...

How is the data being sent? What's the client software?
Does the data have a lot of long literals?

Is it loading into the default graph or a named graph?

Is it "same machine" or are sender and server on different machines?

What does the log have in it? And what's in the log if running "verbose" 
which prints more HTTP details.

If the server also live, running queries?

Which form of Fuseki? The low level of HTTP is provided by the web 
server - Jetty or Tomcat.

I presume this is not a set the dataset is nested in some other 
functionality?

(and what's the version, though nothing has changed directly but you 
never know... maybe a dependency)

If the data is available, I can try to run it at my end.

(And also in an emerging update of the GSP code including running on HTTP/2)

> After checking out what happens, we see an OOM error.

TDB1:: There is a limitation on the single transaction as it requires 
temporary heap space. With TDB1, sending chunks avoids the limit (unless 
the server is under other load and can't find the time to flush the 
transaction to the database from the journal).

TDB2:: There is no such a limitation nor is it affected by a concurrent 
read load holding up freeing resources.

In fact, TDB2 does some of the work while the transaction is in-progress 
that TDB1 does at the end.

> We send application/n-triples so I was expecting that it streams it.

Yes, if it can.

> When using tdbloader this size is not really an issue at all.

tdbloader (TDB1), loading into empty is handled differently from a 
non-empty database.

Loading in Fuseki is not full "tdbloader" in either TDB1 or TDB2.

> 
> In this particular setup we first used TDB, the machine has 6GB of
> memory assigned.

As in -Xmx6G on what size of machine? If 8G-ish, it's going to suffer 
lack of space in the disk cache. 2G is likely fine.

Is the storage spinning disk or SSD?

> TDB2 seems to behave a bit better, it runs through without OOM but takes
> 1.5 hours for the job while it is less than 15 minutes when we split it
> into smaller junks and send it in ~100k triples batches via Graph Store
> Protocol.

That is a bit slow.

> 
> Interestingly we never see more than 1GB of RAM used so I'm even more
> confused.

TDB2 uses space in two ways - the node table cache and the indexes.

The indexes are not cached in the heap. They are cached by the OS in the 
file system cache and access by memory mapping.

If that space is squeezed by Java growing the heap, it can become slow.

> 
> Is this OOM error to be expected for large graph-store writes?

TDB1 - yes.

Also, if the server is in use for reads, the read load can block TDB1 
doing some of the finalization and that causes keeping data in memory 
longer (as well safe in the journal).

TDB2 - there's a reason why it is not TDB1 :-)

     Andy

> 
> 
> regards
> 
> Adrian
>