You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Petr Baudis <pa...@ucw.cz> on 2014/12/21 17:54:50 UTC

Fuseki hangs under heavy SPARQL query load

  Hi!

  I tried to use Apache Fuseki for my QA system, loaded up with part of
DBpedia and set up according to:

	https://github.com/brmson/yodaqa/blob/master/data/dbpedia/README.md

It works beautifully, but the system puts Fuseki under a pretty heavy
load, with several tens of SPARQL queries per second at times, often in
parallel.  And after about an hour on average, Fuseki just hangs up,
still accepting new queries but never generating a result.

  I suspect it might be some kind of deadlock, but I would need some
advice on how to debug it best or what kind of data you would need.

  (If you think for this usecase, a different kind of server would be
better, I'll be happy to hear suggestions too. :-)  I was using Virtuoso
so far, but with abysmal experience (self-corrupting database, hangs of
different kind), and couldn't get 4store to work; I imported the data
but never made it to return any data in finite time with SPARQL queries
that work with Virtuoso and Fuseki.)

  Thanks,

				Petr Baudis

Re: Fuseki hangs under heavy SPARQL query load

Posted by Petr Baudis <pa...@ucw.cz>.

On Mon, Dec 22, 2014 at 03:03:03AM +0100, Petr Baudis wrote:
> 	8x AMD FX(tm)-8350 Eight-Core Processor

^^^ this is just 8 cores, not 8*8 cores. ;-)

				Petr Baudis

Re: Fuseki hangs under heavy SPARQL query load

Posted by Andy Seaborne <an...@apache.org>.

On 24/12/14 11:04, Petr Baudis wrote:
>    Hi!
>
> On Wed, Dec 24, 2014 at 10:01:41AM +0000, Andy Seaborne wrote:
>> What i think is happening is that if you don't do the close, then
>> the connection isn't return to the pool and a new one is created
>> when the next request comes in.  Hence lots of connections all the
>> way through to the server.
>
>    Hmm, I don't see reusing of connections even when I do .close(),
> though.  I'll take look later if I can easily make it reuse connections
> with a custom http client class.
>
>    Still, I think it'd be worthwhile to time out open connections on the
> Fuseki server side after a while.  Otherwise, it is e.g. trivial to DDoS
> a server open to the internet.
>

Do you have a test case?

I wrote a quick test and traced connections getting put back in the pool 
on the client side in tests of 20K requests. 
ManagedClientConnectionImpl does get called to recycle the connection.

But it was a same-machine test (due to where I am ATM).  From memory, 
freeing client and server side resources isn't completely synchronous 
with the local OS and it has been possible to run faster than the OS 
frees up connections.   It might be better to reduce the HttpClient 
configuration.

On the server side this is all inside Jetty.  There is a tension between 
freeing resources and caching.  Maybe the code is asking for cached 
connections too quickly.

	Andy

Re: Fuseki hangs under heavy SPARQL query load

Posted by Petr Baudis <pa...@ucw.cz>.

  Hi!

On Wed, Dec 24, 2014 at 10:01:41AM +0000, Andy Seaborne wrote:
> What i think is happening is that if you don't do the close, then
> the connection isn't return to the pool and a new one is created
> when the next request comes in.  Hence lots of connections all the
> way through to the server.

  Hmm, I don't see reusing of connections even when I do .close(),
though.  I'll take look later if I can easily make it reuse connections
with a custom http client class.

  Still, I think it'd be worthwhile to time out open connections on the
Fuseki server side after a while.  Otherwise, it is e.g. trivial to DDoS
a server open to the internet.

-- 
				Petr Baudis
	If you do not work on an important problem, it's unlikely
	you'll do important work.  -- R. Hamming
	http://www.cs.virginia.edu/~robins/YouAndYourResearch.html

Re: Fuseki hangs under heavy SPARQL query load

Posted by Andy Seaborne <an...@apache.org>.

Hi Petr,

Thanks for the update.

Jena used Apache Apache HttpComponents Client (HttpClient) via code in 
org.apache.jena.riot.web.HttpOp.

It should be using a caching ClientConnectionManager.  The caching isn't 
very high by default.

Or you can use your own setup HttpOp.setDefaultHttpClient.

What i think is happening is that if you don't do the close, then the 
connection isn't return to the pool and a new one is created when the 
next request comes in.  Hence lots of connections all the way through to 
the server.

	Andy

On 23/12/14 20:40, Petr Baudis wrote:
> On Tue, Dec 23, 2014 at 09:31:24AM +0000, Andy Seaborne wrote:
>> You can do it from the command line with the original tool that was
>> sept up into jvisualvm:
>>
>> jstack ProcessId > stack_dump
>>
>> (IIRC it's officially unsupported these days, but my Java 7 and 8
>> installations have it)
>
> Thanks for the hint!  So I saw 1024 threads with
>
> 	Thread 11824: (state = IN_NATIVE)
> 	 - sun.nio.ch.FileDispatcherImpl.read0(java.io.FileDescriptor, long, int) @bci=0 (Compiled frame; information may be imprecise)
> 	 - sun.nio.ch.SocketDispatcher.read(java.io.FileDescriptor, long, int) @bci=4, line=39 (Compiled frame)
> 	 - sun.nio.ch.IOUtil.readIntoNativeBuffer(java.io.FileDescriptor, java.nio.ByteBuffer, long, sun.nio.ch.NativeDispatcher) @bci=114, line=223 (Compiled frame)
> 	 - sun.nio.ch.IOUtil.read(java.io.FileDescriptor, java.nio.ByteBuffer, long, sun.nio.ch.NativeDispatcher) @bci=48, line=197 (Compiled frame)
> 	 - sun.nio.ch.SocketChannelImpl.read(java.nio.ByteBuffer) @bci=234, line=379 (Compiled frame)
> 	 - org.eclipse.jetty.io.nio.ChannelEndPoint.fill(org.eclipse.jetty.io.Buffer) @bci=64, line=235 (Compiled frame)
> 	 - org.eclipse.jetty.server.nio.BlockingChannelConnector$BlockingChannelEndPoint.fill(org.eclipse.jetty.io.Buffer) @bci=9, line=242 (Compiled frame)
> 	 - org.eclipse.jetty.http.HttpParser.fill() @bci=322, line=1044 (Compiled frame)
> 	 - org.eclipse.jetty.http.HttpParser.parseNext() @bci=177, line=298 (Compiled frame)
> 	 - org.eclipse.jetty.http.HttpParser.parseAvailable() @bci=1, line=235 (Compiled frame)
> 	 - org.eclipse.jetty.server.BlockingHttpConnection.handle() @bci=51, line=72 (Compiled frame)
> 	 - org.eclipse.jetty.server.nio.BlockingChannelConnector$BlockingChannelEndPoint.run() @bci=129, line=298 (Compiled frame)
> 	 - org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(java.lang.Runnable) @bci=1, line=608 (Compiled frame)
> 	 - org.eclipse.jetty.util.thread.QueuedThreadPool$3.run() @bci=47, line=543 (Compiled frame)
> 	 - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
>
> and 2 management threads, and the cause is pretty obvious at that point
> - yes, I simply ran out of sockets!  It seems Fuseki does not time out
> inactive connections, but of course the true fault lies in my code which
> creates so many of them - or rather never closes the connections.
>
> That was caused by me misreading
>
> 	https://jena.apache.org/documentation/query/app_api.html
>
> and not doing qexec.close() even though I did not use the try (...) { }
> construct.  (It would be nice if I could reuse the same HTTP connection
> for multiple *different* queries - but I don't think that's possible
> here, or is it?)
>
> Thanks,
>

Re: Fuseki hangs under heavy SPARQL query load

Posted by Petr Baudis <pa...@ucw.cz>.

On Tue, Dec 23, 2014 at 09:31:24AM +0000, Andy Seaborne wrote:
> You can do it from the command line with the original tool that was
> sept up into jvisualvm:
> 
> jstack ProcessId > stack_dump
> 
> (IIRC it's officially unsupported these days, but my Java 7 and 8
> installations have it)

Thanks for the hint!  So I saw 1024 threads with

	Thread 11824: (state = IN_NATIVE)
	 - sun.nio.ch.FileDispatcherImpl.read0(java.io.FileDescriptor, long, int) @bci=0 (Compiled frame; information may be imprecise)
	 - sun.nio.ch.SocketDispatcher.read(java.io.FileDescriptor, long, int) @bci=4, line=39 (Compiled frame)
	 - sun.nio.ch.IOUtil.readIntoNativeBuffer(java.io.FileDescriptor, java.nio.ByteBuffer, long, sun.nio.ch.NativeDispatcher) @bci=114, line=223 (Compiled frame)
	 - sun.nio.ch.IOUtil.read(java.io.FileDescriptor, java.nio.ByteBuffer, long, sun.nio.ch.NativeDispatcher) @bci=48, line=197 (Compiled frame)
	 - sun.nio.ch.SocketChannelImpl.read(java.nio.ByteBuffer) @bci=234, line=379 (Compiled frame)
	 - org.eclipse.jetty.io.nio.ChannelEndPoint.fill(org.eclipse.jetty.io.Buffer) @bci=64, line=235 (Compiled frame)
	 - org.eclipse.jetty.server.nio.BlockingChannelConnector$BlockingChannelEndPoint.fill(org.eclipse.jetty.io.Buffer) @bci=9, line=242 (Compiled frame)
	 - org.eclipse.jetty.http.HttpParser.fill() @bci=322, line=1044 (Compiled frame)
	 - org.eclipse.jetty.http.HttpParser.parseNext() @bci=177, line=298 (Compiled frame)
	 - org.eclipse.jetty.http.HttpParser.parseAvailable() @bci=1, line=235 (Compiled frame)
	 - org.eclipse.jetty.server.BlockingHttpConnection.handle() @bci=51, line=72 (Compiled frame)
	 - org.eclipse.jetty.server.nio.BlockingChannelConnector$BlockingChannelEndPoint.run() @bci=129, line=298 (Compiled frame)
	 - org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(java.lang.Runnable) @bci=1, line=608 (Compiled frame)
	 - org.eclipse.jetty.util.thread.QueuedThreadPool$3.run() @bci=47, line=543 (Compiled frame)
	 - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

and 2 management threads, and the cause is pretty obvious at that point
- yes, I simply ran out of sockets!  It seems Fuseki does not time out
inactive connections, but of course the true fault lies in my code which
creates so many of them - or rather never closes the connections.

That was caused by me misreading

	https://jena.apache.org/documentation/query/app_api.html

and not doing qexec.close() even though I did not use the try (...) { }
construct.  (It would be nice if I could reuse the same HTTP connection
for multiple *different* queries - but I don't think that's possible
here, or is it?)

Thanks,

-- 
				Petr Baudis
	If you do not work on an important problem, it's unlikely
	you'll do important work.  -- R. Hamming
	http://www.cs.virginia.edu/~robins/YouAndYourResearch.html

Re: Fuseki hangs under heavy SPARQL query load

Posted by Andy Seaborne <an...@apache.org>.

On 22/12/14 22:58, Petr Baudis wrote:
>    Hi!
>
> On Mon, Dec 22, 2014 at 08:46:39PM +0000, Andy Seaborne wrote:
>> If there a a few connections (<=2) and large numbers of small
>> queries issued over each connection. Assuming there are no sorts and
>> no timeouts set, then the execution of the query should be all on
>> the thread that it came in on.  And you 8 (shame it's not 8*8!)
>> cores.  Do you have couple of example queries you can share?
>
> Sure!  It is typically something like
>
> 	PREFIX  dc:   <http://purl.org/dc/elements/1.1/> PREFIX  :     <http://dbpedia.org/resource/> PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX  dbpedia2: <http://dbpedia.org/property/> PREFIX  foaf: <http://xmlns.com/foaf/0.1/> PREFIX  dbo:  <http://dbpedia.org/ontology/> PREFIX  owl:  <http://www.w3.org/2002/07/owl#> PREFIX  xsd:  <http://www.w3.org/2001/XMLSchema#> PREFIX  dbpedia: <http://dbpedia.org/> PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX  skos: <http://www.w3.org/2004/02/skos/core#>  SELECT  ?t WHERE   {   { ?res rdfs:label "California"@en }     UNION       { ?redir dbo:wikiPageRedirects ?res .         ?redir rdfs:label "California"@en       }     ?res rdf:type ?t     FILTER ( ! regex(str(?res), "^http://dbpedia.org/resource/[^_]*:", "i") )   }
>
> or variations for different labels.
>
>> Does the CPU load increase to start with, then drops off?
>> Fuseki/TDB is typically CPU-busy when the OS warms up and the
>> working set index files is memory.
>
>    I see no obvious CPU load variations.  A lot of the queries are
> repeated (so quickly warmed cache) and the server runs the user software
> itself too.
>
>> Maybe the first thing to try is to point jvisualvm (in the JDK) or
>> some other monitoring tool at the Fuseki process and see if there is
>> any evidence. The thread dump would be useful. (jconsole even has a
>> "Detect Deadlock" which I have never used but the button label is
>> suggestive)
>
>    Hmm, seems like that requires a GUI.  I can give that a whirl at the
> end of the week as I have only remote access to the machine until then.
>

You can do it from the command line with the original tool that was sept 
up into jvisualvm:

jstack ProcessId > stack_dump

(IIRC it's officially unsupported these days, but my Java 7 and 8 
installations have it)

	Andy

Re: Fuseki hangs under heavy SPARQL query load

Posted by Petr Baudis <pa...@ucw.cz>.

  Hi!

On Mon, Dec 22, 2014 at 08:46:39PM +0000, Andy Seaborne wrote:
> If there a a few connections (<=2) and large numbers of small
> queries issued over each connection. Assuming there are no sorts and
> no timeouts set, then the execution of the query should be all on
> the thread that it came in on.  And you 8 (shame it's not 8*8!)
> cores.  Do you have couple of example queries you can share?

Sure!  It is typically something like

	PREFIX  dc:   <http://purl.org/dc/elements/1.1/> PREFIX  :     <http://dbpedia.org/resource/> PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX  dbpedia2: <http://dbpedia.org/property/> PREFIX  foaf: <http://xmlns.com/foaf/0.1/> PREFIX  dbo:  <http://dbpedia.org/ontology/> PREFIX  owl:  <http://www.w3.org/2002/07/owl#> PREFIX  xsd:  <http://www.w3.org/2001/XMLSchema#> PREFIX  dbpedia: <http://dbpedia.org/> PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX  skos: <http://www.w3.org/2004/02/skos/core#>  SELECT  ?t WHERE   {   { ?res rdfs:label "California"@en }     UNION       { ?redir dbo:wikiPageRedirects ?res .         ?redir rdfs:label "California"@en       }     ?res rdf:type ?t     FILTER ( ! regex(str(?res), "^http://dbpedia.org/resource/[^_]*:", "i") )   }

or variations for different labels.

> Does the CPU load increase to start with, then drops off?
> Fuseki/TDB is typically CPU-busy when the OS warms up and the
> working set index files is memory.

  I see no obvious CPU load variations.  A lot of the queries are
repeated (so quickly warmed cache) and the server runs the user software
itself too.

> Maybe the first thing to try is to point jvisualvm (in the JDK) or
> some other monitoring tool at the Fuseki process and see if there is
> any evidence. The thread dump would be useful. (jconsole even has a
> "Detect Deadlock" which I have never used but the button label is
> suggestive)

  Hmm, seems like that requires a GUI.  I can give that a whirl at the
end of the week as I have only remote access to the machine until then.

-- 
				Petr Baudis
	If you do not work on an important problem, it's unlikely
	you'll do important work.  -- R. Hamming
	http://www.cs.virginia.edu/~robins/YouAndYourResearch.html

Re: Fuseki hangs under heavy SPARQL query load

Posted by Andy Seaborne <an...@apache.org>.

On 22/12/14 02:03, Petr Baudis wrote:
>    Hi!
>
> On Sun, Dec 21, 2014 at 08:30:02PM +0000, Andy Seaborne wrote:
>> On 21/12/14 16:54, Petr Baudis wrote:
>>> It works beautifully, but the system puts Fuseki under a pretty heavy
>>> load, with several tens of SPARQL queries per second at times, often in
>>> parallel.  And after about an hour on average, Fuseki just hangs up,
>>> still accepting new queries but never generating a result.
> ..snip..
>
>> What is the setup in terms of hardware (RAM size, number of CPUs etc
>> etc), operating system and versions?  The details do matter here.
>
>    This is:
>
> 	24GiB RAM
> 	8x AMD FX(tm)-8350 Eight-Core Processor
> 	Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt2-1 (2014-12-08) x86_64 GNU/Linux
> 	(Debian Wheezy, with some leaf Jessie packages mixed in)
> 	Fuseki 1.1.1 (binary distribution)
> 	Jena 2.12.1 (used for tdbloader)
>
> The SPARQL endpoint is publicly available at
>
> 	http://pasky.or.cz:3030/dbpedia/query
>
> (right now I just run fuseki in a loop and kill it every 10 minutes
> as a stopgap measure).
>
>> This may be JENA-801 [1].
>
>    Hmm. I see, that's interesting.  Just to clarify, though - what I'm
> seeing is a hard hang, the Fuseki process is not consuming any CPU and
> no queries are ever answered (at least in the order of hours).  It is
> not simply a performance degradation, which I get the impression is
> what JENA-801 is about.
>
>    (Also, while I'm hitting Fuseki with a lot of queries, I believe there
> should never be more than two connections + queries going on at the same
> time.  The queries are pretty simple, typically take 2-3ms to service.)
>
>    (Also, I really do just queries, no updates, I'm running in read-only
> mode.)
>
>    (Also, no messages like Java GC notifications are printed on Fuseki's
> console in the event of this deadlock.)
>
>    So this really seems quite different to what I'm reading there and in
> JENA-689, JENA-703.
>
>> That's why I'd like to see what has been done to know if the update
>> chnages were also tried out.  If for your usage, it is just query
>> load, I can build a special for you to try out (or you can : replace
>> the body of CacheLRU with CacheGuava body, add dependency to ARQ and
>> build with maven).
>
>    If in the light of the above you still think trying this out makes
> sense, I will be happy to do that in the course of next few days.  If
> building a special for me would be easy for you, I'd appreciate that,
> but otherwise I can give it a try.
>

Certainly not JENA-689, JENA-703 which are update related so not 
obviously releveant here. JENA-801 is an issue about locking on the node 
table, and that synchronization happens in the read only situation as 
well, which is why I though it might be related.

If there a a few connections (<=2) and large numbers of small queries 
issued over each connection. Assuming there are no sorts and no timeouts 
set, then the execution of the query should be all on the thread that it 
came in on.  And you 8 (shame it's not 8*8!) cores.  Do you have couple 
of example queries you can share?

Does the CPU load increase to start with, then drops off?  Fuseki/TDB is 
typically CPU-busy when the OS warms up and the working set index files 
is memory.

Maybe the first thing to try is to point jvisualvm (in the JDK) or some 
other monitoring tool at the Fuseki process and see if there is any 
evidence. The thread dump would be useful. (jconsole even has a "Detect 
Deadlock" which I have never used but the button label is suggestive)

	Andy

Re: Fuseki hangs under heavy SPARQL query load

Posted by Petr Baudis <pa...@ucw.cz>.

  Hi!

On Sun, Dec 21, 2014 at 08:30:02PM +0000, Andy Seaborne wrote:
> On 21/12/14 16:54, Petr Baudis wrote:
> >It works beautifully, but the system puts Fuseki under a pretty heavy
> >load, with several tens of SPARQL queries per second at times, often in
> >parallel.  And after about an hour on average, Fuseki just hangs up,
> >still accepting new queries but never generating a result.
..snip..

> What is the setup in terms of hardware (RAM size, number of CPUs etc
> etc), operating system and versions?  The details do matter here.

  This is:

	24GiB RAM
	8x AMD FX(tm)-8350 Eight-Core Processor
	Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt2-1 (2014-12-08) x86_64 GNU/Linux
	(Debian Wheezy, with some leaf Jessie packages mixed in)
	Fuseki 1.1.1 (binary distribution)
	Jena 2.12.1 (used for tdbloader)

The SPARQL endpoint is publicly available at

	http://pasky.or.cz:3030/dbpedia/query

(right now I just run fuseki in a loop and kill it every 10 minutes
as a stopgap measure).

> This may be JENA-801 [1].

  Hmm. I see, that's interesting.  Just to clarify, though - what I'm
seeing is a hard hang, the Fuseki process is not consuming any CPU and
no queries are ever answered (at least in the order of hours).  It is
not simply a performance degradation, which I get the impression is
what JENA-801 is about.

  (Also, while I'm hitting Fuseki with a lot of queries, I believe there
should never be more than two connections + queries going on at the same
time.  The queries are pretty simple, typically take 2-3ms to service.)

  (Also, I really do just queries, no updates, I'm running in read-only
mode.)

  (Also, no messages like Java GC notifications are printed on Fuseki's
console in the event of this deadlock.)

  So this really seems quite different to what I'm reading there and in
JENA-689, JENA-703.

> That's why I'd like to see what has been done to know if the update
> chnages were also tried out.  If for your usage, it is just query
> load, I can build a special for you to try out (or you can : replace
> the body of CacheLRU with CacheGuava body, add dependency to ARQ and
> build with maven).

  If in the light of the above you still think trying this out makes
sense, I will be happy to do that in the course of next few days.  If
building a special for me would be easy for you, I'd appreciate that,
but otherwise I can give it a try.

-- 
				Petr Baudis
	If you do not work on an important problem, it's unlikely
	you'll do important work.  -- R. Hamming
	http://www.cs.virginia.edu/~robins/YouAndYourResearch.html

Re: Fuseki hangs under heavy SPARQL query load

Posted by Andy Seaborne <an...@apache.org>.

On 21/12/14 16:54, Petr Baudis wrote:
>    Hi!
>
>    I tried to use Apache Fuseki for my QA system, loaded up with part of
> DBpedia and set up according to:
>
> 	https://github.com/brmson/yodaqa/blob/master/data/dbpedia/README.md
>
> It works beautifully, but the system puts Fuseki under a pretty heavy
> load, with several tens of SPARQL queries per second at times, often in
> parallel.  And after about an hour on average, Fuseki just hangs up,
> still accepting new queries but never generating a result.
>
>    I suspect it might be some kind of deadlock, but I would need some
> advice on how to debug it best or what kind of data you would need.
>
>    (If you think for this usecase, a different kind of server would be
> better, I'll be happy to hear suggestions too. :-)  I was using Virtuoso
> so far, but with abysmal experience (self-corrupting database, hangs of
> different kind), and couldn't get 4store to work; I imported the data
> but never made it to return any data in finite time with SPARQL queries
> that work with Virtuoso and Fuseki.)
>
>    Thanks,
>
> 				Petr Baudis
>

Hi Petr,

What is the setup in terms of hardware (RAM size, number of CPUs etc 
etc), operating system and versions?  The details do matter here.

This may be JENA-801 [1].

If so, there is a suggested fix but the code contribution hasn't arrived 
yet.

I believe the reporter of JENA-801 (Bala Kolla) used something related 
to this to replace CacheLRU:

https://github.com/afs/AFS-Dev/blob/master/src%2Fmain%2Fjava%2Fprojects%2Fcache%2FCacheGuava.java

though if we are going to use Guave Cache (which is highly likely) then 
there is a even better way to use in the presence of updates.  That's 
why I'd like to see what has been done to know if the update chnages 
were also tried out.  If for your usage, it is just query load, I can 
build a special for you to try out (or you can : replace the body of 
CacheLRU with CacheGuava body, add dependency to ARQ and build with maven).

	Andy

[1]
https://issues.apache.org/jira/browse/JENA-801