You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Andy Seaborne <an...@apache.org> on 2020/07/01 12:59:47 UTC

Re: repeated ThriftConvert WARN visit: Unrecognized:

Presumably the client is in java using RDFConnectionFactory.connectFuseki?

Do you have the data from 120611?
Could the request have got truncated?

The fact later PUTs stall suggest (does not prove) that the earlier 
operation broken somehow but I'd expect a stacktrace in the log.

Overlapping should be fine - as that are the same database, one will 
happen before the other even though they are separate graphs.

     Andy

On 30/06/2020 22:29, Chris Tomlinson wrote:
> Hello,
> 
> Running jena/fuseki 3.14.0, compiled and running under openjdk version “1.8.0_252”.
> 
> Just some hours ago fuseki running on an AWS ec2 for almost a year, logged around 199M occurrences of (times are UTC):
> 
>> [2020-06-30 14:12:50] Fuseki INFO [120610] 200 OK (8 ms)
>> [2020-06-30 14:12:51] Fuseki INFO [120611] PUT http://buda1.bdrc.io:13180/fuseki/corerw/data?graph=http://purl.bdrc.io/graph/PersonUIShapes
>> [2020-06-30 14:12:52] Fuseki INFO [120612] PUT http://buda1.bdrc.io:13180/fuseki/corerw/data?graph=http://purl.bdrc.io/graph/ontologySchema
>> [2020-06-30 14:12:53] Fuseki INFO [120611] Body: Content-Length=73984, Content-Type=application/rdf+thrift, Charset=null => RDF-THRIFT : Count=839 Triples=839 Quads=0
>> [2020-06-30 14:12:53] Fuseki INFO [120611] 200 OK (2.106 s)
>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
> 
> . . .
> 
>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>> 30-Jun-2020 16:21:34.866 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version:        Apache Tomcat/8.0.53
>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server built:          Jun 29 2018 14:42:45 UTC
>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server number:         8.0.53.0
>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Name:               Linux
>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Version:            4.9.0-8-amd64
>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Architecture:          amd64
>> 30-Jun-2020 16:21:34.869 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Java Home:             /usr/lib/jvm/java-8-openjdk-amd64/jre
>> 30-Jun-2020 16:21:34.869 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Version:           1.8.0_252-8u252-b09-1~deb9u1-b09
>> 30-Jun-2020 16:21:34.869 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Vendor:            Oracle Corporation
>> 30-Jun-2020 16:21:34.869 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_BASE:         /usr/local/fuseki/tomcat
> 
> Until I saw the issue and restarted fuseki.
> 
> An interesting thing I see in the trace is the overlapping PUTs just as fuseki wigged out. There were no changes in code or the graphs that were the being PUT as part of a restart of an app server.
> 
> Fuseki didn't go completely unresponsive except that later PUTs were indefinitely delayed - which was how I noticed the issue.
> 
> We saw this once before in 29 April and didn’t delve into it at the time.
> 
> Has anyone seen this sort of thing before?
> 
> Thanks,
> Chris
> 
> 
> 

Re: repeated ThriftConvert WARN visit: Unrecognized:

Posted by Chris Tomlinson <ch...@gmail.com>.
> On Jul 2, 2020, at 15:36, Andy Seaborne <an...@apache.org> wrote:
> 
> 
>> On 01/07/2020 17:34, Chris Tomlinson wrote:
>> Hi Andy,
>> 
>>>> On Jul 1, 2020, at 7:59 AM, Andy Seaborne <andy@apache.org <ma...@apache.org>> wrote:
>>> 
>>> Presumably the client is in java using RDFConnectionFactory.connectFuseki?
>> 
>> Yes the clients are in java 1.8 also on Debian 4.9. Fuseki is running on one ec2 instance and some of the clients are on the same instance and others are on another ec2 instance on the same region. The clients use the following patterns:
> 
> And the client code is jena 3.14.0 as well?

Yes

> 
>> 
>>>         fuConnBuilder = RDFConnectionFuseki.create()
>>>                 .destination(baseUrl)
>>>                 .queryEndpoint(baseUrl+"/query")
>>>                 .gspEndpoint(baseUrl+"/data")
>>>                 .updateEndpoint(baseUrl+"/update”);
>>> 
>>>         fuConn = fuConnBuilder.build();
>>>         if (!fuConn.isInTransaction()) {
>>>             fuConn.begin(ReadWrite.WRITE);
>>>         }
>>>         fuConn.put(graphName, model);
>>>         fuConn.commit();
> 
> fuConn isn't shared across threads is it?

No

Chris

> 
>    Andy
> 

Re: repeated ThriftConvert WARN visit: Unrecognized:

Posted by Andy Seaborne <an...@apache.org>.
On 01/07/2020 17:34, Chris Tomlinson wrote:
> Hi Andy,
>
>> On Jul 1, 2020, at 7:59 AM, Andy Seaborne <andy@apache.org 
>> <ma...@apache.org>> wrote:
>>
>> Presumably the client is in java using 
>> RDFConnectionFactory.connectFuseki?
>
> Yes the clients are in java 1.8 also on Debian 4.9. Fuseki is running 
> on one ec2 instance and some of the clients are on the same instance 
> and others are on another ec2 instance on the same region. The clients 
> use the following patterns:

And the client code is jena 3.14.0 as well?

>
>>         fuConnBuilder = RDFConnectionFuseki.create()
>>                 .destination(baseUrl)
>>                 .queryEndpoint(baseUrl+"/query")
>>                 .gspEndpoint(baseUrl+"/data")
>>                 .updateEndpoint(baseUrl+"/update”);
>>
>>         fuConn = fuConnBuilder.build();
>>         if (!fuConn.isInTransaction()) {
>>             fuConn.begin(ReadWrite.WRITE);
>>         }
>>         fuConn.put(graphName, model);
>>         fuConn.commit();

fuConn isn't shared across threads is it?

    Andy


Re: repeated ThriftConvert WARN visit: Unrecognized:

Posted by Andy Seaborne <an...@apache.org>.

On 03/07/2020 17:02, Chris Tomlinson wrote:
> Hi Andy,
> 
>> On Jul 3, 2020, at 8:41 AM, Andy Seaborne <an...@apache.org> wrote:
>>
>>
>>
>> On 02/07/2020 21:55, Chris Tomlinson wrote:
>>> grep -v "ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >" catalina.out
>>
>>
>> Is there any signature as to when they occur?  Two PUTs overlapping, certain usage by your clients (which probably isn't visible in the logs)?  earlier connections broken? high load on the server? Time of day? Anything else that looks like a characteristic?
> 
> Our issue <https://github.com/buda-base/buda-base/issues/30> has a bit of a discussion on the first occurrence on 30 April which might shed some light. The one thing in common in the two occurrences is that in both cases  the graph that was  being PUT was the ontologySchema but there were changes to that graph between 30 April and 30 Jun so it was not the same graph as far as content. Also there were not overlapping puts in the first case.

This is ignorable:

30-Apr-2020 12:46:56.941 WARNING [localhost-startStop-2] 
org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks 
Unable to determine string representation of value of type 
[org.apache.jena.tdb.transaction.DatasetGraphTxn]
  java.lang.UnsupportedOperationException: Transactional.begin(TxnType
         at 
org.apache.jena.sparql.core.TransactionalNotSupported.begin(TransactionalNotSupported.java:66)
	at 
org.apache.jena.tdb.store.DatasetGraphTDB.begin(DatasetGraphTDB.java:265)

because the later log shows its part of the Tomcat shutdown report.

BTW Fuseki does not participate deeply in a clean web shutdown so TDB 
does not release its ThreadLocals on shutdown and they are involved in 
transactions.

Other than that, I'm lost.

Strange things happen, possible one, possibly two cases, with no obvious 
cause.

Using RDFConnectionRemote will change the situation (no thrift used), 
but the Thrift error seems to be a symptom, not a reason.

PersonUIShapes.ttl has been sent many times without problems.

     Andy

> 
> No unusual load - in fact lightly loaded is the typical situation except during the daily migration of legacy data which sometimes may have a lot of changed graphs to PUT. We haven’t seen the WARNings happen at that time.
> 
> Since we’ve only had 2 occurrences of the issue it’s a bit hard to establish a pattern.
> 
> Thanks,
> Chris
> 
> 
>>     Andy
>>
>>
>> On 03/07/2020 00:13, Chris Tomlinson wrote:
>>>> On Jul 2, 2020, at 17:44, Andy Seaborne <an...@apache.org> wrote:
>>>>
>>>> 
>>>>
>>>> On 02/07/2020 21:55, Chris Tomlinson wrote:
>>>>>>  From what I can see, it (WARN) isn't database related.
>>>>> No it seems to me to be related to getting the payload off the wire.
>>>>
>>>> I think you said the same payload had been sent before.
>>>> ??
>>> Yes a copy/clone of the same payload, i.e., the serialization of the given graph, has been sent many times w/o issue.
>>>>
>>>> ...
>>>>
>>>>>> Even the concurrency looks OK because it locally writes a buffer so the HTTP length is available.
>>>>
>>>> (in case of corruption, not repeat, is happening)
>>>>
>>>>> So it seems to me that there may be an opportunity for some sort of robustification in RDFConnection. There is evidently a loop somewhere that doesn't terminate, retrying the parsing repeatedly or something like that. The payload is finite so there wold appear to be a test that repeatedly fails but doesn't make progress in consuming the payload.
>>>>
>>>> RDFConnection (client-side) is sending, not parsing.
>>> I'm referring to the Fuseki receiving end of the connection, where the WARNing is being logged.
>>>> The WARN says that an empty <RDF_StreamRow > was seen.
>>>>
>>>> There is no information about the stalled transactions although not finishing the write would explain this:
>>>>
>>>> 30-Jun-2020 16:21:30.778
>>>>
>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>>>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
>>>>
>>>> so it's waiting for input. What's the proxy/reverse-proxy setup?
>>> None. For the client on the same ec2 instance, obviously none; and for the client on a second ec2 instance, we have nothing between our two internal ec2's
>>> In the current situation, the two precipitating PUTs are from a client on the same ec2 instance.
>>>> The code writes the payload to a ByteArrayOutputStream and sends those bytes. That's how it gets the length for the HTTP header.
>>>>
>>>> https://github.com/apache/jena/blob/master/jena-rdfconnection/src/main/java/org/apache/jena/rdfconnection/RDFConnectionRemote.java#L615
>>>>
>>>> (run Fuseki with "verbose" to see the headers ... but it is quite verbose)
>>>>
>>>> It sent something so the RDF->Thrift->bytes had finished and then it sent bytes.
>>> As I tried to clarify, my remarks were w.r.t. the Fuseki/receiving end where the issue is getting logged. Not the sending/client end.
>>> Chris
>>>> Anyway - you have the source code ... :-)
>>>>
>>>>     Andy
> 
> 

Re: repeated ThriftConvert WARN visit: Unrecognized:

Posted by Chris Tomlinson <ch...@gmail.com>.
Hi Andy,

> On Jul 3, 2020, at 8:41 AM, Andy Seaborne <an...@apache.org> wrote:
> 
> 
> 
> On 02/07/2020 21:55, Chris Tomlinson wrote:
> > grep -v "ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >" catalina.out
> 
> 
> Is there any signature as to when they occur?  Two PUTs overlapping, certain usage by your clients (which probably isn't visible in the logs)?  earlier connections broken? high load on the server? Time of day? Anything else that looks like a characteristic?

Our issue <https://github.com/buda-base/buda-base/issues/30> has a bit of a discussion on the first occurrence on 30 April which might shed some light. The one thing in common in the two occurrences is that in both cases  the graph that was  being PUT was the ontologySchema but there were changes to that graph between 30 April and 30 Jun so it was not the same graph as far as content. Also there were not overlapping puts in the first case.

No unusual load - in fact lightly loaded is the typical situation except during the daily migration of legacy data which sometimes may have a lot of changed graphs to PUT. We haven’t seen the WARNings happen at that time.

Since we’ve only had 2 occurrences of the issue it’s a bit hard to establish a pattern.

Thanks,
Chris


>    Andy
> 
> 
> On 03/07/2020 00:13, Chris Tomlinson wrote:
>>> On Jul 2, 2020, at 17:44, Andy Seaborne <an...@apache.org> wrote:
>>> 
>>> 
>>> 
>>> On 02/07/2020 21:55, Chris Tomlinson wrote:
>>>>> From what I can see, it (WARN) isn't database related.
>>>> No it seems to me to be related to getting the payload off the wire.
>>> 
>>> I think you said the same payload had been sent before.
>>> ??
>> Yes a copy/clone of the same payload, i.e., the serialization of the given graph, has been sent many times w/o issue.
>>> 
>>> ...
>>> 
>>>>> Even the concurrency looks OK because it locally writes a buffer so the HTTP length is available.
>>> 
>>> (in case of corruption, not repeat, is happening)
>>> 
>>>> So it seems to me that there may be an opportunity for some sort of robustification in RDFConnection. There is evidently a loop somewhere that doesn't terminate, retrying the parsing repeatedly or something like that. The payload is finite so there wold appear to be a test that repeatedly fails but doesn't make progress in consuming the payload.
>>> 
>>> RDFConnection (client-side) is sending, not parsing.
>> I'm referring to the Fuseki receiving end of the connection, where the WARNing is being logged.
>>> The WARN says that an empty <RDF_StreamRow > was seen.
>>> 
>>> There is no information about the stalled transactions although not finishing the write would explain this:
>>> 
>>> 30-Jun-2020 16:21:30.778
>>> 
>>> java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
>>> 
>>> so it's waiting for input. What's the proxy/reverse-proxy setup?
>> None. For the client on the same ec2 instance, obviously none; and for the client on a second ec2 instance, we have nothing between our two internal ec2's
>> In the current situation, the two precipitating PUTs are from a client on the same ec2 instance.
>>> The code writes the payload to a ByteArrayOutputStream and sends those bytes. That's how it gets the length for the HTTP header.
>>> 
>>> https://github.com/apache/jena/blob/master/jena-rdfconnection/src/main/java/org/apache/jena/rdfconnection/RDFConnectionRemote.java#L615
>>> 
>>> (run Fuseki with "verbose" to see the headers ... but it is quite verbose)
>>> 
>>> It sent something so the RDF->Thrift->bytes had finished and then it sent bytes.
>> As I tried to clarify, my remarks were w.r.t. the Fuseki/receiving end where the issue is getting logged. Not the sending/client end.
>> Chris
>>> Anyway - you have the source code ... :-)
>>> 
>>>    Andy


Re: repeated ThriftConvert WARN visit: Unrecognized:

Posted by Andy Seaborne <an...@apache.org>.

On 02/07/2020 21:55, Chris Tomlinson wrote:
 > grep -v "ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >" 
catalina.out


Is there any signature as to when they occur?  Two PUTs overlapping, 
certain usage by your clients (which probably isn't visible in the 
logs)?  earlier connections broken? high load on the server? Time of 
day? Anything else that looks like a characteristic?

     Andy


On 03/07/2020 00:13, Chris Tomlinson wrote:
> 
> 
>> On Jul 2, 2020, at 17:44, Andy Seaborne <an...@apache.org> wrote:
>>
>> 
>>
>> On 02/07/2020 21:55, Chris Tomlinson wrote:
>>>>  From what I can see, it (WARN) isn't database related.
>>> No it seems to me to be related to getting the payload off the wire.
>>
>> I think you said the same payload had been sent before.
>> ??
> 
> Yes a copy/clone of the same payload, i.e., the serialization of the given graph, has been sent many times w/o issue.
> 
> 
> 
>>
>> ...
>>
>>>> Even the concurrency looks OK because it locally writes a buffer so the HTTP length is available.
>>
>> (in case of corruption, not repeat, is happening)
>>
>>> So it seems to me that there may be an opportunity for some sort of robustification in RDFConnection. There is evidently a loop somewhere that doesn't terminate, retrying the parsing repeatedly or something like that. The payload is finite so there wold appear to be a test that repeatedly fails but doesn't make progress in consuming the payload.
>>
>> RDFConnection (client-side) is sending, not parsing.
> 
> I'm referring to the Fuseki receiving end of the connection, where the WARNing is being logged.
> 
> 
> 
>> The WARN says that an empty <RDF_StreamRow > was seen.
>>
>> There is no information about the stalled transactions although not finishing the write would explain this:
>>
>> 30-Jun-2020 16:21:30.778
>>
>> java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
>>
>> so it's waiting for input. What's the proxy/reverse-proxy setup?
> 
> None. For the client on the same ec2 instance, obviously none; and for the client on a second ec2 instance, we have nothing between our two internal ec2's
> 
> In the current situation, the two precipitating PUTs are from a client on the same ec2 instance.
> 
> 
> 
>> The code writes the payload to a ByteArrayOutputStream and sends those bytes. That's how it gets the length for the HTTP header.
>>
>> https://github.com/apache/jena/blob/master/jena-rdfconnection/src/main/java/org/apache/jena/rdfconnection/RDFConnectionRemote.java#L615
>>
>> (run Fuseki with "verbose" to see the headers ... but it is quite verbose)
>>
>> It sent something so the RDF->Thrift->bytes had finished and then it sent bytes.
> 
> As I tried to clarify, my remarks were w.r.t. the Fuseki/receiving end where the issue is getting logged. Not the sending/client end.
> 
> Chris
> 
>> Anyway - you have the source code ... :-)
>>
>>     Andy

Re: repeated ThriftConvert WARN visit: Unrecognized:

Posted by Chris Tomlinson <ch...@gmail.com>.

> On Jul 2, 2020, at 17:44, Andy Seaborne <an...@apache.org> wrote:
> 
> 
> 
> On 02/07/2020 21:55, Chris Tomlinson wrote:
>>> From what I can see, it (WARN) isn't database related.
>> No it seems to me to be related to getting the payload off the wire.
> 
> I think you said the same payload had been sent before.
> ??

Yes a copy/clone of the same payload, i.e., the serialization of the given graph, has been sent many times w/o issue.



> 
> ...
> 
>>> Even the concurrency looks OK because it locally writes a buffer so the HTTP length is available.
> 
> (in case of corruption, not repeat, is happening)
> 
>> So it seems to me that there may be an opportunity for some sort of robustification in RDFConnection. There is evidently a loop somewhere that doesn't terminate, retrying the parsing repeatedly or something like that. The payload is finite so there wold appear to be a test that repeatedly fails but doesn't make progress in consuming the payload.
> 
> RDFConnection (client-side) is sending, not parsing.

I'm referring to the Fuseki receiving end of the connection, where the WARNing is being logged.



> The WARN says that an empty <RDF_StreamRow > was seen.
> 
> There is no information about the stalled transactions although not finishing the write would explain this:
> 
> 30-Jun-2020 16:21:30.778
> 
> java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> 
> so it's waiting for input. What's the proxy/reverse-proxy setup?

None. For the client on the same ec2 instance, obviously none; and for the client on a second ec2 instance, we have nothing between our two internal ec2's

In the current situation, the two precipitating PUTs are from a client on the same ec2 instance.



> The code writes the payload to a ByteArrayOutputStream and sends those bytes. That's how it gets the length for the HTTP header.
> 
> https://github.com/apache/jena/blob/master/jena-rdfconnection/src/main/java/org/apache/jena/rdfconnection/RDFConnectionRemote.java#L615
> 
> (run Fuseki with "verbose" to see the headers ... but it is quite verbose)
> 
> It sent something so the RDF->Thrift->bytes had finished and then it sent bytes.

As I tried to clarify, my remarks were w.r.t. the Fuseki/receiving end where the issue is getting logged. Not the sending/client end.

Chris

> Anyway - you have the source code ... :-)
> 
>    Andy

Re: repeated ThriftConvert WARN visit: Unrecognized:

Posted by Andy Seaborne <an...@apache.org>.

On 02/07/2020 21:55, Chris Tomlinson wrote:
>>  From what I can see, it (WARN) isn't database related.
> 
> No it seems to me to be related to getting the payload off the wire.

I think you said the same payload had been sent before.
??

...

>> Even the concurrency looks OK because it locally writes a buffer so the HTTP length is available.

(in case of corruption, not repeat, is happening)

> So it seems to me that there may be an opportunity for some sort of robustification in RDFConnection. There is evidently a loop somewhere that doesn't terminate, retrying the parsing repeatedly or something like that. The payload is finite so there wold appear to be a test that repeatedly fails but doesn't make progress in consuming the payload.

RDFConnection (client-side) is sending, not parsing.

The WARN says that an empty <RDF_StreamRow > was seen.

There is no information about the stalled transactions although not 
finishing the write would explain this:

30-Jun-2020 16:21:30.778

java.io.BufferedInputStream.read(BufferedInputStream.java:345)
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)

so it's waiting for input. What's the proxy/reverse-proxy setup?


The code writes the payload to a ByteArrayOutputStream and sends those 
bytes. That's how it gets the length for the HTTP header.

https://github.com/apache/jena/blob/master/jena-rdfconnection/src/main/java/org/apache/jena/rdfconnection/RDFConnectionRemote.java#L615

(run Fuseki with "verbose" to see the headers ... but it is quite verbose)

It sent something so the RDF->Thrift->bytes had finished and then it 
sent bytes.

Anyway - you have the source code ... :-)

     Andy

Re: repeated ThriftConvert WARN visit: Unrecognized:

Posted by Chris Tomlinson <ch...@gmail.com>.
> On Jul 2, 2020, at 15:33, Andy Seaborne <an...@apache.org> wrote:
> 
> There may be two things going on ... or one
> 
> The WARN
> 
> This is broken input data but from looking at the Jena code, I'm not seeing how a encoding error on it's own can sent an unset RDF_StreamRow (which is what prints as <RDF_StreamRow >) The Jena code is quite simple and by inspection I can't see a path that sends that. And in Thrift you can send unset data anyway - the write path throws a client side a exception.
> 
> So I'm wondering about the path from client to server especially with the broken pipes earlier which might impact a reverse proxy or tomcat itself. Maybe there is junk on the wire.
> 
> 
> The W-transactions
> This is TDB1 presumably.
> 
> The lack of response suggests they are backing up and that can happen in TDB1 if the engine database is blocked from writing to the storage. that said - it does log if that is happening unless I/O is blocked at the OS level.
> 
>>> On 02/07/2020 18:37, Chris Tomlinson wrote:
>> Hi Andy,
>>>> On Jul 2, 2020, at 10:22 AM, Andy Seaborne <an...@apache.org> wrote:
>>> The log done not contain "Unrecognized".
>>> Is the "ThriftConvert WARN visit:" message from a different process?
>> Sorry. I should have been much clearer:
>>>> I have attached the catalina.out with the 199M or so "ThriftConvert WARN”s removed.
> 
> I wasn't clear if "removed" meant stderr another source or not.

grep -v "ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >" catalina.out


> 
>> The event starts at
>>    line 11305 "[2020-06-30 14:12:51] Fuseki     INFO  [120611] PUT http://buda1. . . .”
>> with the first WARNing at 14:12:53 after the "[120611] 200 OK (2.106 s)”;
>> and ends with the stacktraces resulting from “systemctl restart fuseki.service” at :
>>    line 12940 "30-Jun-2020 16:21:28.445"
>> and the stacktraces end at startup:
>>    line 12939 "[2020-06-30 16:21:26] Fuseki     INFO  [121275] PUT http://buda1. . . . "
>> I have a 55.1MB gzip’d file with all of the ~199M "ThriftConvert WARN”s. It is 15GB+ uncompressed. The rate of the WARNing messages is around 20K - 25K per second.
>> If you need the entire file, where should I upload it?
> 
> From what I can see, it (WARN) isn't database related.

No it seems to me to be related to getting the payload off the wire.

> 
>>> There are some broken pipe errors much earlier but they happen if clients go away untidily.  They are on query responses not PUTs.
>>> The fact 120612 doesn't even print "Body:”
>> I think we conclude from this that the RDFConnection became unstable processing the incoming payload of 7536 triples in the ontologySchema graph. We have significantly larger graphs that are PUT using the same pattern as I displayed earlier.
>> The same source bits/bytes loaded w/o issue before and after this issue so it seems likely that there was an undetected transmission error that RDFConnection did not get notified of and was not able to recover from. May be that’s peculiar to the Thrift encoding?
> 
> Caveat a bug in thrift, no.  Either it works or it doesn't and your system works most of the time.

Yes this is a very intermittent problem


> 
>>> You could run with RDFConnectionRemote - what RDFConnectionFuseki does is switch to thrift encodings, RDFConnectionRemote uses the standards ones.
>> We’ll take this under consideration. The problem occurred once at the end of April and now at the end of June. Not very frequent given the amount of PUTs that we do on a daily basis. It’s not what I would call effectively repeatable.
>> Is there a reason that we should distrust Thrift in general? Is it somehow more susceptible to undetected transmission errors that the standard encodings?
> 
> Not that I know of. I'm suggesting it to change the situation to get more information.

Ok. I'm not sure we'll make a change just now as we're in the middle of a bit of a crunch.

> 
>> Do we need to consider avoiding Thrift everywhere?
> 
> If it works for months, and it is a simple code path, suggests to me it is working.
> 
> Even the concurrency looks OK because it locally writes a buffer so the HTTP length is available.

So it seems to me that there may be an opportunity for some sort of robustification in RDFConnection. There is evidently a loop somewhere that doesn't terminate, retrying the parsing repeatedly or something like that. The payload is finite so there wold appear to be a test that repeatedly fails but doesn't make progress in consuming the payload.

Thanks,
Chris


> 
>   Andy
> 
>> Thanks,
>> Chris
>>>    Andy
>>> On 01/07/2020 17:34, Chris Tomlinson wrote:
>>>> Hi Andy,
>>>>> On Jul 1, 2020, at 7:59 AM, Andy Seaborne <andy@apache.org <ma...@apache.org>> wrote:
>>>>> Presumably the client is in java using RDFConnectionFactory.connectFuseki?
>>>> Yes the clients are in java 1.8 also on Debian 4.9. Fuseki is running on one ec2 instance and some of the clients are on the same instance and others are on another ec2 instance on the same region. The clients use the following patterns:
>>>>>        fuConnBuilder = RDFConnectionFuseki.create()
>>>>>                .destination(baseUrl)
>>>>>                .queryEndpoint(baseUrl+"/query")
>>>>>                .gspEndpoint(baseUrl+"/data")
>>>>>                .updateEndpoint(baseUrl+"/update”);
>>>>>        fuConn = fuConnBuilder.build();
>>>>>        if (!fuConn.isInTransaction()) {
>>>>>            fuConn.begin(ReadWrite.WRITE);
>>>>>        }
>>>>>        fuConn.put(graphName, model);
>>>>>        fuConn.commit();
>>>>> Do you have the data from 120611?
>>>> The data for the PUT of 120611, can be retrieved via http://purl.bdrc.io/graph/PersonUIShapes.ttl. The source is person.ui.shapes.ttl <https://github.com/buda-base/editor-templates/blob/master/templates/core/person.ui.shapes.ttl> which is loaded via OntDocumentManager w/ setProcessImports(true) from GH.
>>>> From the log, 120611 appears to have completed successfully with 839 triples (the expected number); however, since then there have been several reloads of that graph during restarts and so on - we’re in development - so the exact bits in the 120611 PUT are not specifically available. This particular data has not changed in the past couple of weeks.
>>>> As for 120612 it never completed and the source data has not had any changes since 2020-06-29 prior to the “failure” and after and there have been numerous reloads w/o issue before and since. Only during the period 14:12 - 16:21 do we see the PUTs hang.
>>>>> Could the request have got truncated?
>>>> Referring to 120611, I don’t see any evidence of truncation. The expected number of triples, 839, is reported in the log.
>>>>> The fact later PUTs stall suggest (does not prove) that the earlier operation broken somehow but I'd expect a stacktrace in the log.
>>>> There may be a stacktrace at the very end when I restarted fuseki. There were quite a few threads aborted w/ stacktraces but I don’t know how to interpret which might have been associated with 120612 which appears to have never completed receiving ontologySchema since no
>>>>    Fuseki INFO [120612] Body: . . .
>>>> appears in the log. There are another 13 occurrences of PUTs which hang during the period 14:12 - 16:21.
>>>> I have attached the catalina.out with the 199M or so "ThriftConvert WARN”s removed. It shows the stacktraces resulting from the “systemctl restart fuseki.service” at 30-Jun-2020 16:21:28.445 line# 12940 to the startup at 30-Jun-2020 16:21:34.866 line# 18856.
>>>> The log also shows that just the PUTs hung, GETs and POSTs (of queries) continued to be processed, perhaps a little less rapidly than usual.
>>>>> Overlapping should be fine - as that are the same database, one will happen before the other even though they are separate graphs.
>>>> As we expected.
>>>> It seems that this broken behavior is an unusual occurrence.
>>>> Thank you for your help. It’s rather perplexing.
>>>> Chris
>>>>>   Andy
>>>>> On 30/06/2020 22:29, Chris Tomlinson wrote:
>>>>>> Hello,
>>>>>> Running jena/fuseki 3.14.0, compiled and running under openjdk version “1.8.0_252”.
>>>>>> Just some hours ago fuseki running on an AWS ec2 for almost a year, logged around 199M occurrences of (times are UTC):
>>>>>>> [2020-06-30 14:12:50] Fuseki INFO [120610] 200 OK (8 ms)
>>>>>>> [2020-06-30 14:12:51] Fuseki INFO [120611] PUT http://buda1.bdrc.io:13180/fuseki/corerw/data?graph=http://purl.bdrc.io/graph/PersonUIShapes
>>>>>>> [2020-06-30 14:12:52] Fuseki INFO [120612] PUT http://buda1.bdrc.io:13180/fuseki/corerw/data?graph=http://purl.bdrc.io/graph/ontologySchema
>>>>>>> [2020-06-30 14:12:53] Fuseki INFO [120611] Body: Content-Length=73984, Content-Type=application/rdf+thrift, Charset=null => RDF-THRIFT : Count=839 Triples=839 Quads=0
>>>>>>> [2020-06-30 14:12:53] Fuseki INFO [120611] 200 OK (2.106 s)
>>>>>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>>>>>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>>>>>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>>>>>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>>>>>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>>>>>> . . .
>>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>>> 30-Jun-2020 16:21:34.866 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version:        Apache Tomcat/8.0.53
>>>>>>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server built:          Jun 29 2018 14:42:45 UTC
>>>>>>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server number:         8.0.53.0
>>>>>>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Name:               Linux
>>>>>>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Version:            4.9.0-8-amd64
>>>>>>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Architecture:          amd64
>>>>>>> 30-Jun-2020 16:21:34.869 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Java Home:             /usr/lib/jvm/java-8-openjdk-amd64/jre
>>>>>>> 30-Jun-2020 16:21:34.869 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Version:           1.8.0_252-8u252-b09-1~deb9u1-b09
>>>>>>> 30-Jun-2020 16:21:34.869 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Vendor:            Oracle Corporation
>>>>>>> 30-Jun-2020 16:21:34.869 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_BASE:         /usr/local/fuseki/tomcat
>>>>>> Until I saw the issue and restarted fuseki.
>>>>>> An interesting thing I see in the trace is the overlapping PUTs just as fuseki wigged out. There were no changes in code or the graphs that were the being PUT as part of a restart of an app server.
>>>>>> Fuseki didn't go completely unresponsive except that later PUTs were indefinitely delayed - which was how I noticed the issue.
>>>>>> We saw this once before in 29 April and didn’t delve into it at the time.
>>>>>> Has anyone seen this sort of thing before?
>>>>>> Thanks,
>>>>>> Chris

Re: repeated ThriftConvert WARN visit: Unrecognized:

Posted by Andy Seaborne <an...@apache.org>.
There may be two things going on ... or one

The WARN

This is broken input data but from looking at the Jena code, I'm not 
seeing how a encoding error on it's own can sent an unset RDF_StreamRow 
(which is what prints as <RDF_StreamRow >) The Jena code is quite simple 
and by inspection I can't see a path that sends that. And in Thrift you 
can send unset data anyway - the write path throws a client side a 
exception.

So I'm wondering about the path from client to server especially with 
the broken pipes earlier which might impact a reverse proxy or tomcat 
itself. Maybe there is junk on the wire.


The W-transactions
This is TDB1 presumably.

The lack of response suggests they are backing up and that can happen in 
TDB1 if the engine database is blocked from writing to the storage. 
that said - it does log if that is happening unless I/O is blocked at 
the OS level.

On 02/07/2020 18:37, Chris Tomlinson wrote:
> Hi Andy,
> 
>> On Jul 2, 2020, at 10:22 AM, Andy Seaborne <an...@apache.org> wrote:
>>
>> The log done not contain "Unrecognized".
>> Is the "ThriftConvert WARN visit:" message from a different process?
> 
> Sorry. I should have been much clearer:
> 
>>> I have attached the catalina.out with the 199M or so "ThriftConvert WARN”s removed.

I wasn't clear if "removed" meant stderr another source or not.

> 
> The event starts at
> 
>      line 11305 "[2020-06-30 14:12:51] Fuseki     INFO  [120611] PUT http://buda1. . . .”
> 
> with the first WARNing at 14:12:53 after the "[120611] 200 OK (2.106 s)”;
> 
> and ends with the stacktraces resulting from “systemctl restart fuseki.service” at :
> 
>      line 12940 "30-Jun-2020 16:21:28.445"
>   
> and the stacktraces end at startup:
> 
>      line 12939 "[2020-06-30 16:21:26] Fuseki     INFO  [121275] PUT http://buda1. . . . "
> 
> I have a 55.1MB gzip’d file with all of the ~199M "ThriftConvert WARN”s. It is 15GB+ uncompressed. The rate of the WARNing messages is around 20K - 25K per second.
> 
> If you need the entire file, where should I upload it?

 From what I can see, it (WARN) isn't database related.

> 
>> There are some broken pipe errors much earlier but they happen if clients go away untidily.  They are on query responses not PUTs.
>>
>> The fact 120612 doesn't even print "Body:”
> 
> I think we conclude from this that the RDFConnection became unstable processing the incoming payload of 7536 triples in the ontologySchema graph. We have significantly larger graphs that are PUT using the same pattern as I displayed earlier.
> 
> The same source bits/bytes loaded w/o issue before and after this issue so it seems likely that there was an undetected transmission error that RDFConnection did not get notified of and was not able to recover from. May be that’s peculiar to the Thrift encoding?

Caveat a bug in thrift, no.  Either it works or it doesn't and your 
system works most of the time.

> 
> 
> 
>> You could run with RDFConnectionRemote - what RDFConnectionFuseki does is switch to thrift encodings, RDFConnectionRemote uses the standards ones.
> 
> We’ll take this under consideration. The problem occurred once at the end of April and now at the end of June. Not very frequent given the amount of PUTs that we do on a daily basis. It’s not what I would call effectively repeatable.
> 
> Is there a reason that we should distrust Thrift in general? Is it somehow more susceptible to undetected transmission errors that the standard encodings?

Not that I know of. I'm suggesting it to change the situation to get 
more information.

> 
> Do we need to consider avoiding Thrift everywhere?

If it works for months, and it is a simple code path, suggests to me it 
is working.

Even the concurrency looks OK because it locally writes a buffer so the 
HTTP length is available.

     Andy

> 
> Thanks,
> Chris
> 
> 
>>      Andy
>>
>>
>> On 01/07/2020 17:34, Chris Tomlinson wrote:
>>> Hi Andy,
>>>> On Jul 1, 2020, at 7:59 AM, Andy Seaborne <andy@apache.org <ma...@apache.org>> wrote:
>>>>
>>>> Presumably the client is in java using RDFConnectionFactory.connectFuseki?
>>> Yes the clients are in java 1.8 also on Debian 4.9. Fuseki is running on one ec2 instance and some of the clients are on the same instance and others are on another ec2 instance on the same region. The clients use the following patterns:
>>>>          fuConnBuilder = RDFConnectionFuseki.create()
>>>>                  .destination(baseUrl)
>>>>                  .queryEndpoint(baseUrl+"/query")
>>>>                  .gspEndpoint(baseUrl+"/data")
>>>>                  .updateEndpoint(baseUrl+"/update”);
>>>>
>>>>          fuConn = fuConnBuilder.build();
>>>>          if (!fuConn.isInTransaction()) {
>>>>              fuConn.begin(ReadWrite.WRITE);
>>>>          }
>>>>          fuConn.put(graphName, model);
>>>>          fuConn.commit();
>>>> Do you have the data from 120611?
>>> The data for the PUT of 120611, can be retrieved via http://purl.bdrc.io/graph/PersonUIShapes.ttl. The source is person.ui.shapes.ttl <https://github.com/buda-base/editor-templates/blob/master/templates/core/person.ui.shapes.ttl> which is loaded via OntDocumentManager w/ setProcessImports(true) from GH.
>>>  From the log, 120611 appears to have completed successfully with 839 triples (the expected number); however, since then there have been several reloads of that graph during restarts and so on - we’re in development - so the exact bits in the 120611 PUT are not specifically available. This particular data has not changed in the past couple of weeks.
>>> As for 120612 it never completed and the source data has not had any changes since 2020-06-29 prior to the “failure” and after and there have been numerous reloads w/o issue before and since. Only during the period 14:12 - 16:21 do we see the PUTs hang.
>>>> Could the request have got truncated?
>>> Referring to 120611, I don’t see any evidence of truncation. The expected number of triples, 839, is reported in the log.
>>
>>
>>
>>>> The fact later PUTs stall suggest (does not prove) that the earlier operation broken somehow but I'd expect a stacktrace in the log.
>>> There may be a stacktrace at the very end when I restarted fuseki. There were quite a few threads aborted w/ stacktraces but I don’t know how to interpret which might have been associated with 120612 which appears to have never completed receiving ontologySchema since no
>>>      Fuseki INFO [120612] Body: . . .
>>> appears in the log. There are another 13 occurrences of PUTs which hang during the period 14:12 - 16:21.
>>> I have attached the catalina.out with the 199M or so "ThriftConvert WARN”s removed. It shows the stacktraces resulting from the “systemctl restart fuseki.service” at 30-Jun-2020 16:21:28.445 line# 12940 to the startup at 30-Jun-2020 16:21:34.866 line# 18856.
>>> The log also shows that just the PUTs hung, GETs and POSTs (of queries) continued to be processed, perhaps a little less rapidly than usual.
>>>> Overlapping should be fine - as that are the same database, one will happen before the other even though they are separate graphs.
>>> As we expected.
>>> It seems that this broken behavior is an unusual occurrence.
>>> Thank you for your help. It’s rather perplexing.
>>> Chris
>>>>     Andy
>>>>
>>>> On 30/06/2020 22:29, Chris Tomlinson wrote:
>>>>> Hello,
>>>>> Running jena/fuseki 3.14.0, compiled and running under openjdk version “1.8.0_252”.
>>>>> Just some hours ago fuseki running on an AWS ec2 for almost a year, logged around 199M occurrences of (times are UTC):
>>>>>> [2020-06-30 14:12:50] Fuseki INFO [120610] 200 OK (8 ms)
>>>>>> [2020-06-30 14:12:51] Fuseki INFO [120611] PUT http://buda1.bdrc.io:13180/fuseki/corerw/data?graph=http://purl.bdrc.io/graph/PersonUIShapes
>>>>>> [2020-06-30 14:12:52] Fuseki INFO [120612] PUT http://buda1.bdrc.io:13180/fuseki/corerw/data?graph=http://purl.bdrc.io/graph/ontologySchema
>>>>>> [2020-06-30 14:12:53] Fuseki INFO [120611] Body: Content-Length=73984, Content-Type=application/rdf+thrift, Charset=null => RDF-THRIFT : Count=839 Triples=839 Quads=0
>>>>>> [2020-06-30 14:12:53] Fuseki INFO [120611] 200 OK (2.106 s)
>>>>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>>>>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>>>>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>>>>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>>>>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>>>>> . . .
>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>>> 30-Jun-2020 16:21:34.866 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version:        Apache Tomcat/8.0.53
>>>>>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server built:          Jun 29 2018 14:42:45 UTC
>>>>>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server number:         8.0.53.0
>>>>>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Name:               Linux
>>>>>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Version:            4.9.0-8-amd64
>>>>>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Architecture:          amd64
>>>>>> 30-Jun-2020 16:21:34.869 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Java Home:             /usr/lib/jvm/java-8-openjdk-amd64/jre
>>>>>> 30-Jun-2020 16:21:34.869 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Version:           1.8.0_252-8u252-b09-1~deb9u1-b09
>>>>>> 30-Jun-2020 16:21:34.869 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Vendor:            Oracle Corporation
>>>>>> 30-Jun-2020 16:21:34.869 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_BASE:         /usr/local/fuseki/tomcat
>>>>> Until I saw the issue and restarted fuseki.
>>>>> An interesting thing I see in the trace is the overlapping PUTs just as fuseki wigged out. There were no changes in code or the graphs that were the being PUT as part of a restart of an app server.
>>>>> Fuseki didn't go completely unresponsive except that later PUTs were indefinitely delayed - which was how I noticed the issue.
>>>>> We saw this once before in 29 April and didn’t delve into it at the time.
>>>>> Has anyone seen this sort of thing before?
>>>>> Thanks,
>>>>> Chris
> 

Re: repeated ThriftConvert WARN visit: Unrecognized:

Posted by Chris Tomlinson <ch...@gmail.com>.
Hi Andy,

> On Jul 2, 2020, at 10:22 AM, Andy Seaborne <an...@apache.org> wrote:
> 
> The log done not contain "Unrecognized".
> Is the "ThriftConvert WARN visit:" message from a different process?

Sorry. I should have been much clearer:

>> I have attached the catalina.out with the 199M or so "ThriftConvert WARN”s removed.

The event starts at 

    line 11305 "[2020-06-30 14:12:51] Fuseki     INFO  [120611] PUT http://buda1. . . .”

with the first WARNing at 14:12:53 after the "[120611] 200 OK (2.106 s)”;

and ends with the stacktraces resulting from “systemctl restart fuseki.service” at :

    line 12940 "30-Jun-2020 16:21:28.445"
 
and the stacktraces end at startup:

    line 12939 "[2020-06-30 16:21:26] Fuseki     INFO  [121275] PUT http://buda1. . . . "

I have a 55.1MB gzip’d file with all of the ~199M "ThriftConvert WARN”s. It is 15GB+ uncompressed. The rate of the WARNing messages is around 20K - 25K per second.

If you need the entire file, where should I upload it?



> There are some broken pipe errors much earlier but they happen if clients go away untidily.  They are on query responses not PUTs.
> 
> The fact 120612 doesn't even print "Body:”

I think we conclude from this that the RDFConnection became unstable processing the incoming payload of 7536 triples in the ontologySchema graph. We have significantly larger graphs that are PUT using the same pattern as I displayed earlier. 

The same source bits/bytes loaded w/o issue before and after this issue so it seems likely that there was an undetected transmission error that RDFConnection did not get notified of and was not able to recover from. May be that’s peculiar to the Thrift encoding?



> You could run with RDFConnectionRemote - what RDFConnectionFuseki does is switch to thrift encodings, RDFConnectionRemote uses the standards ones.

We’ll take this under consideration. The problem occurred once at the end of April and now at the end of June. Not very frequent given the amount of PUTs that we do on a daily basis. It’s not what I would call effectively repeatable.

Is there a reason that we should distrust Thrift in general? Is it somehow more susceptible to undetected transmission errors that the standard encodings?

Do we need to consider avoiding Thrift everywhere?

Thanks,
Chris


>     Andy
> 
> 
> On 01/07/2020 17:34, Chris Tomlinson wrote:
>> Hi Andy,
>>> On Jul 1, 2020, at 7:59 AM, Andy Seaborne <andy@apache.org <ma...@apache.org>> wrote:
>>> 
>>> Presumably the client is in java using RDFConnectionFactory.connectFuseki?
>> Yes the clients are in java 1.8 also on Debian 4.9. Fuseki is running on one ec2 instance and some of the clients are on the same instance and others are on another ec2 instance on the same region. The clients use the following patterns:
>>>         fuConnBuilder = RDFConnectionFuseki.create()
>>>                 .destination(baseUrl)
>>>                 .queryEndpoint(baseUrl+"/query")
>>>                 .gspEndpoint(baseUrl+"/data")
>>>                 .updateEndpoint(baseUrl+"/update”);
>>> 
>>>         fuConn = fuConnBuilder.build();
>>>         if (!fuConn.isInTransaction()) {
>>>             fuConn.begin(ReadWrite.WRITE);
>>>         }
>>>         fuConn.put(graphName, model);
>>>         fuConn.commit();
>>> Do you have the data from 120611?
>> The data for the PUT of 120611, can be retrieved via http://purl.bdrc.io/graph/PersonUIShapes.ttl. The source is person.ui.shapes.ttl <https://github.com/buda-base/editor-templates/blob/master/templates/core/person.ui.shapes.ttl> which is loaded via OntDocumentManager w/ setProcessImports(true) from GH.
>> From the log, 120611 appears to have completed successfully with 839 triples (the expected number); however, since then there have been several reloads of that graph during restarts and so on - we’re in development - so the exact bits in the 120611 PUT are not specifically available. This particular data has not changed in the past couple of weeks.
>> As for 120612 it never completed and the source data has not had any changes since 2020-06-29 prior to the “failure” and after and there have been numerous reloads w/o issue before and since. Only during the period 14:12 - 16:21 do we see the PUTs hang.
>>> Could the request have got truncated?
>> Referring to 120611, I don’t see any evidence of truncation. The expected number of triples, 839, is reported in the log.
> 
> 
> 
>>> The fact later PUTs stall suggest (does not prove) that the earlier operation broken somehow but I'd expect a stacktrace in the log.
>> There may be a stacktrace at the very end when I restarted fuseki. There were quite a few threads aborted w/ stacktraces but I don’t know how to interpret which might have been associated with 120612 which appears to have never completed receiving ontologySchema since no
>>     Fuseki INFO [120612] Body: . . .
>> appears in the log. There are another 13 occurrences of PUTs which hang during the period 14:12 - 16:21.
>> I have attached the catalina.out with the 199M or so "ThriftConvert WARN”s removed. It shows the stacktraces resulting from the “systemctl restart fuseki.service” at 30-Jun-2020 16:21:28.445 line# 12940 to the startup at 30-Jun-2020 16:21:34.866 line# 18856.
>> The log also shows that just the PUTs hung, GETs and POSTs (of queries) continued to be processed, perhaps a little less rapidly than usual.
>>> Overlapping should be fine - as that are the same database, one will happen before the other even though they are separate graphs.
>> As we expected.
>> It seems that this broken behavior is an unusual occurrence.
>> Thank you for your help. It’s rather perplexing.
>> Chris
>>>    Andy
>>> 
>>> On 30/06/2020 22:29, Chris Tomlinson wrote:
>>>> Hello,
>>>> Running jena/fuseki 3.14.0, compiled and running under openjdk version “1.8.0_252”.
>>>> Just some hours ago fuseki running on an AWS ec2 for almost a year, logged around 199M occurrences of (times are UTC):
>>>>> [2020-06-30 14:12:50] Fuseki INFO [120610] 200 OK (8 ms)
>>>>> [2020-06-30 14:12:51] Fuseki INFO [120611] PUT http://buda1.bdrc.io:13180/fuseki/corerw/data?graph=http://purl.bdrc.io/graph/PersonUIShapes
>>>>> [2020-06-30 14:12:52] Fuseki INFO [120612] PUT http://buda1.bdrc.io:13180/fuseki/corerw/data?graph=http://purl.bdrc.io/graph/ontologySchema
>>>>> [2020-06-30 14:12:53] Fuseki INFO [120611] Body: Content-Length=73984, Content-Type=application/rdf+thrift, Charset=null => RDF-THRIFT : Count=839 Triples=839 Quads=0
>>>>> [2020-06-30 14:12:53] Fuseki INFO [120611] 200 OK (2.106 s)
>>>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>>>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>>>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>>>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>>>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>>>> . . .
>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>>>> 30-Jun-2020 16:21:34.866 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version:        Apache Tomcat/8.0.53
>>>>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server built:          Jun 29 2018 14:42:45 UTC
>>>>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server number:         8.0.53.0
>>>>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Name:               Linux
>>>>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Version:            4.9.0-8-amd64
>>>>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Architecture:          amd64
>>>>> 30-Jun-2020 16:21:34.869 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Java Home:             /usr/lib/jvm/java-8-openjdk-amd64/jre
>>>>> 30-Jun-2020 16:21:34.869 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Version:           1.8.0_252-8u252-b09-1~deb9u1-b09
>>>>> 30-Jun-2020 16:21:34.869 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Vendor:            Oracle Corporation
>>>>> 30-Jun-2020 16:21:34.869 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_BASE:         /usr/local/fuseki/tomcat
>>>> Until I saw the issue and restarted fuseki.
>>>> An interesting thing I see in the trace is the overlapping PUTs just as fuseki wigged out. There were no changes in code or the graphs that were the being PUT as part of a restart of an app server.
>>>> Fuseki didn't go completely unresponsive except that later PUTs were indefinitely delayed - which was how I noticed the issue.
>>>> We saw this once before in 29 April and didn’t delve into it at the time.
>>>> Has anyone seen this sort of thing before?
>>>> Thanks,
>>>> Chris


Re: repeated ThriftConvert WARN visit: Unrecognized:

Posted by Andy Seaborne <an...@apache.org>.
The log done not contain "Unrecognized".
Is the "ThriftConvert WARN visit:" message from a different process?

There are some broken pipe errors much earlier but they happen if 
clients go away untidily.  They are on query responses not PUTs.

The fact 120612 doesn't even print "Body:"

You could run with RDFConnectionRemote - what RDFConnectionFuseki does 
is switch to thrift encodings, RDFConnectionRemote uses the standards ones.

     Andy


On 01/07/2020 17:34, Chris Tomlinson wrote:
> Hi Andy,
> 
>> On Jul 1, 2020, at 7:59 AM, Andy Seaborne <andy@apache.org 
>> <ma...@apache.org>> wrote:
>>
>> Presumably the client is in java using RDFConnectionFactory.connectFuseki?
> 
> Yes the clients are in java 1.8 also on Debian 4.9. Fuseki is running on 
> one ec2 instance and some of the clients are on the same instance and 
> others are on another ec2 instance on the same region. The clients use 
> the following patterns:
> 
>>         fuConnBuilder = RDFConnectionFuseki.create()
>>                 .destination(baseUrl)
>>                 .queryEndpoint(baseUrl+"/query")
>>                 .gspEndpoint(baseUrl+"/data")
>>                 .updateEndpoint(baseUrl+"/update”);
>>
>>         fuConn = fuConnBuilder.build();
>>         if (!fuConn.isInTransaction()) {
>>             fuConn.begin(ReadWrite.WRITE);
>>         }
>>         fuConn.put(graphName, model);
>>         fuConn.commit();
> 
> 
> 
>> Do you have the data from 120611?
> 
> The data for the PUT of 120611, can be retrieved via 
> http://purl.bdrc.io/graph/PersonUIShapes.ttl. The source is 
> person.ui.shapes.ttl 
> <https://github.com/buda-base/editor-templates/blob/master/templates/core/person.ui.shapes.ttl> which 
> is loaded via OntDocumentManager w/ setProcessImports(true) from GH.
> 
>  From the log, 120611 appears to have completed successfully with 839 
> triples (the expected number); however, since then there have been 
> several reloads of that graph during restarts and so on - we’re in 
> development - so the exact bits in the 120611 PUT are not specifically 
> available. This particular data has not changed in the past couple of weeks.
> 
> As for 120612 it never completed and the source data has not had any 
> changes since 2020-06-29 prior to the “failure” and after and there have 
> been numerous reloads w/o issue before and since. Only during the period 
> 14:12 - 16:21 do we see the PUTs hang.
> 
> 
>> Could the request have got truncated?
> 
> Referring to 120611, I don’t see any evidence of truncation. The 
> expected number of triples, 839, is reported in the log.



> 
> 
> 
>> The fact later PUTs stall suggest (does not prove) that the earlier 
>> operation broken somehow but I'd expect a stacktrace in the log.
> 
> There may be a stacktrace at the very end when I restarted fuseki. There 
> were quite a few threads aborted w/ stacktraces but I don’t know how to 
> interpret which might have been associated with 120612 which appears to 
> have never completed receiving ontologySchema since no
> 
>      Fuseki INFO [120612] Body: . . .
> 
> appears in the log. There are another 13 occurrences of PUTs which hang 
> during the period 14:12 - 16:21.
> 
> I have attached the catalina.out with the 199M or so "ThriftConvert 
> WARN”s removed. It shows the stacktraces resulting from the “systemctl 
> restart fuseki.service” at 30-Jun-2020 16:21:28.445 line# 12940 to the 
> startup at 30-Jun-2020 16:21:34.866 line# 18856.
> 
> The log also shows that just the PUTs hung, GETs and POSTs (of queries) 
> continued to be processed, perhaps a little less rapidly than usual.
> 
> 
> 
> 
> 
>> Overlapping should be fine - as that are the same database, one will 
>> happen before the other even though they are separate graphs.
> 
> As we expected.
> 
> It seems that this broken behavior is an unusual occurrence.
> 
> Thank you for your help. It’s rather perplexing.
> 
> Chris
> 
> 
> 
>>    Andy
>>
>> On 30/06/2020 22:29, Chris Tomlinson wrote:
>>> Hello,
>>> Running jena/fuseki 3.14.0, compiled and running under openjdk 
>>> version “1.8.0_252”.
>>> Just some hours ago fuseki running on an AWS ec2 for almost a year, 
>>> logged around 199M occurrences of (times are UTC):
>>>> [2020-06-30 14:12:50] Fuseki INFO [120610] 200 OK (8 ms)
>>>> [2020-06-30 14:12:51] Fuseki INFO [120611] PUT 
>>>> http://buda1.bdrc.io:13180/fuseki/corerw/data?graph=http://purl.bdrc.io/graph/PersonUIShapes
>>>> [2020-06-30 14:12:52] Fuseki INFO [120612] PUT 
>>>> http://buda1.bdrc.io:13180/fuseki/corerw/data?graph=http://purl.bdrc.io/graph/ontologySchema
>>>> [2020-06-30 14:12:53] Fuseki INFO [120611] Body: 
>>>> Content-Length=73984, Content-Type=application/rdf+thrift, 
>>>> Charset=null => RDF-THRIFT : Count=839 Triples=839 Quads=0
>>>> [2020-06-30 14:12:53] Fuseki INFO [120611] 200 OK (2.106 s)
>>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: 
>>>> <RDF_StreamRow >
>>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: 
>>>> <RDF_StreamRow >
>>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: 
>>>> <RDF_StreamRow >
>>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: 
>>>> <RDF_StreamRow >
>>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: 
>>>> <RDF_StreamRow >
>>> . . .
>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: 
>>>> <RDF_StreamRow >
>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: 
>>>> <RDF_StreamRow >
>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: 
>>>> <RDF_StreamRow >
>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: 
>>>> <RDF_StreamRow >
>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: 
>>>> <RDF_StreamRow >
>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: 
>>>> <RDF_StreamRow >
>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: 
>>>> <RDF_StreamRow >
>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: 
>>>> <RDF_StreamRow >
>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: 
>>>> <RDF_StreamRow >
>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: 
>>>> <RDF_StreamRow >
>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: 
>>>> <RDF_StreamRow >
>>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: 
>>>> <RDF_StreamRow >
>>>> 30-Jun-2020 16:21:34.866 INFO [main] 
>>>> org.apache.catalina.startup.VersionLoggerListener.log Server 
>>>> version:        Apache Tomcat/8.0.53
>>>> 30-Jun-2020 16:21:34.868 INFO [main] 
>>>> org.apache.catalina.startup.VersionLoggerListener.log Server built: 
>>>>          Jun 29 2018 14:42:45 UTC
>>>> 30-Jun-2020 16:21:34.868 INFO [main] 
>>>> org.apache.catalina.startup.VersionLoggerListener.log Server number: 
>>>>         8.0.53.0
>>>> 30-Jun-2020 16:21:34.868 INFO [main] 
>>>> org.apache.catalina.startup.VersionLoggerListener.log OS Name: 
>>>>               Linux
>>>> 30-Jun-2020 16:21:34.868 INFO [main] 
>>>> org.apache.catalina.startup.VersionLoggerListener.log OS Version: 
>>>>            4.9.0-8-amd64
>>>> 30-Jun-2020 16:21:34.868 INFO [main] 
>>>> org.apache.catalina.startup.VersionLoggerListener.log Architecture: 
>>>>          amd64
>>>> 30-Jun-2020 16:21:34.869 INFO [main] 
>>>> org.apache.catalina.startup.VersionLoggerListener.log Java Home: 
>>>>             /usr/lib/jvm/java-8-openjdk-amd64/jre
>>>> 30-Jun-2020 16:21:34.869 INFO [main] 
>>>> org.apache.catalina.startup.VersionLoggerListener.log JVM Version: 
>>>>           1.8.0_252-8u252-b09-1~deb9u1-b09
>>>> 30-Jun-2020 16:21:34.869 INFO [main] 
>>>> org.apache.catalina.startup.VersionLoggerListener.log JVM Vendor: 
>>>>            Oracle Corporation
>>>> 30-Jun-2020 16:21:34.869 INFO [main] 
>>>> org.apache.catalina.startup.VersionLoggerListener.log CATALINA_BASE: 
>>>>         /usr/local/fuseki/tomcat
>>> Until I saw the issue and restarted fuseki.
>>> An interesting thing I see in the trace is the overlapping PUTs just 
>>> as fuseki wigged out. There were no changes in code or the graphs 
>>> that were the being PUT as part of a restart of an app server.
>>> Fuseki didn't go completely unresponsive except that later PUTs were 
>>> indefinitely delayed - which was how I noticed the issue.
>>> We saw this once before in 29 April and didn’t delve into it at the time.
>>> Has anyone seen this sort of thing before?
>>> Thanks,
>>> Chris
> 

Re: repeated ThriftConvert WARN visit: Unrecognized:

Posted by Chris Tomlinson <ch...@gmail.com>.
Hi Andy,

> On Jul 1, 2020, at 7:59 AM, Andy Seaborne <an...@apache.org> wrote:
> 
> Presumably the client is in java using RDFConnectionFactory.connectFuseki?

Yes the clients are in java 1.8 also on Debian 4.9. Fuseki is running on one ec2 instance and some of the clients are on the same instance and others are on another ec2 instance on the same region. The clients use the following patterns:

>         fuConnBuilder = RDFConnectionFuseki.create()
>                 .destination(baseUrl)
>                 .queryEndpoint(baseUrl+"/query")
>                 .gspEndpoint(baseUrl+"/data")
>                 .updateEndpoint(baseUrl+"/update”);
> 
>         fuConn = fuConnBuilder.build();
>         if (!fuConn.isInTransaction()) {
>             fuConn.begin(ReadWrite.WRITE);
>         }
>         fuConn.put(graphName, model);
>         fuConn.commit();




> Do you have the data from 120611?

The data for the PUT of 120611, can be retrieved via http://purl.bdrc.io/graph/PersonUIShapes.ttl. The source is person.ui.shapes.ttl  <https://github.com/buda-base/editor-templates/blob/master/templates/core/person.ui.shapes.ttl> which is loaded via OntDocumentManager w/ setProcessImports(true) from GH.

From the log, 120611 appears to have completed successfully with 839 triples (the expected number); however, since then there have been several reloads of that graph during restarts and so on - we’re in development - so the exact bits in the 120611 PUT are not specifically available. This particular data has not changed in the past couple of weeks.

As for 120612 it never completed and the source data has not had any changes since 2020-06-29 prior to the “failure” and after and there have been numerous reloads w/o issue before and since. Only during the period 14:12 - 16:21 do we see the PUTs hang.


> Could the request have got truncated?

Referring to 120611, I don’t see any evidence of truncation. The expected number of triples, 839, is reported in the log.



> The fact later PUTs stall suggest (does not prove) that the earlier operation broken somehow but I'd expect a stacktrace in the log.

There may be a stacktrace at the very end when I restarted fuseki. There were quite a few threads aborted w/ stacktraces but I don’t know how to interpret which might have been associated with 120612 which appears to have never completed receiving ontologySchema since no 

    Fuseki INFO [120612] Body: . . .

appears in the log. There are another 13 occurrences of PUTs which hang during the period 14:12 - 16:21.

I have attached the catalina.out with the 199M or so "ThriftConvert WARN”s removed. It shows the stacktraces resulting from the “systemctl restart fuseki.service” at 30-Jun-2020 16:21:28.445 line# 12940 to the startup at 30-Jun-2020 16:21:34.866 line# 18856.

The log also shows that just the PUTs hung, GETs and POSTs (of queries) continued to be processed, perhaps a little less rapidly than usual.





> Overlapping should be fine - as that are the same database, one will happen before the other even though they are separate graphs.

As we expected.

It seems that this broken behavior is an unusual occurrence.

Thank you for your help. It’s rather perplexing.

Chris



>    Andy
> 
> On 30/06/2020 22:29, Chris Tomlinson wrote:
>> Hello,
>> Running jena/fuseki 3.14.0, compiled and running under openjdk version “1.8.0_252”.
>> Just some hours ago fuseki running on an AWS ec2 for almost a year, logged around 199M occurrences of (times are UTC):
>>> [2020-06-30 14:12:50] Fuseki INFO [120610] 200 OK (8 ms)
>>> [2020-06-30 14:12:51] Fuseki INFO [120611] PUT http://buda1.bdrc.io:13180/fuseki/corerw/data?graph=http://purl.bdrc.io/graph/PersonUIShapes
>>> [2020-06-30 14:12:52] Fuseki INFO [120612] PUT http://buda1.bdrc.io:13180/fuseki/corerw/data?graph=http://purl.bdrc.io/graph/ontologySchema
>>> [2020-06-30 14:12:53] Fuseki INFO [120611] Body: Content-Length=73984, Content-Type=application/rdf+thrift, Charset=null => RDF-THRIFT : Count=839 Triples=839 Quads=0
>>> [2020-06-30 14:12:53] Fuseki INFO [120611] 200 OK (2.106 s)
>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>>> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >
>> . . .
>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized: <RDF_StreamRow >
>>> 30-Jun-2020 16:21:34.866 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server version:        Apache Tomcat/8.0.53
>>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server built:          Jun 29 2018 14:42:45 UTC
>>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Server number:         8.0.53.0
>>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Name:               Linux
>>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log OS Version:            4.9.0-8-amd64
>>> 30-Jun-2020 16:21:34.868 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Architecture:          amd64
>>> 30-Jun-2020 16:21:34.869 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Java Home:             /usr/lib/jvm/java-8-openjdk-amd64/jre
>>> 30-Jun-2020 16:21:34.869 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Version:           1.8.0_252-8u252-b09-1~deb9u1-b09
>>> 30-Jun-2020 16:21:34.869 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log JVM Vendor:            Oracle Corporation
>>> 30-Jun-2020 16:21:34.869 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log CATALINA_BASE:         /usr/local/fuseki/tomcat
>> Until I saw the issue and restarted fuseki.
>> An interesting thing I see in the trace is the overlapping PUTs just as fuseki wigged out. There were no changes in code or the graphs that were the being PUT as part of a restart of an app server.
>> Fuseki didn't go completely unresponsive except that later PUTs were indefinitely delayed - which was how I noticed the issue.
>> We saw this once before in 29 April and didn’t delve into it at the time.
>> Has anyone seen this sort of thing before?
>> Thanks,
>> Chris