You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Piotr Nowara <pi...@gmail.com> on 2020/09/10 06:11:28 UTC

Re: RDF Delta/Zookeper freezing issue

Andy, Rob,

thanks for your explanations and suggestions - now I understand the issue
much better.

I tried different approaches to solve it but the freeze keeps
occurring when the leading Zookeper instance is down/stopped permanently.
(BTW: shutting down even 2 out of 3 RDF Delta servers does not have any
negative impact on the SPARQL execution which is great).

Andy,
to answer your question about when the freeze happens:

For SELECTs it always occurs between the second and third (last) log entry:

[2020-09-10 05:52:13] Fuseki INFO [13] POST http://localhost:3031/ds
[2020-09-10 05:52:13] Fuseki INFO [13] Query = SELECT * {?s ?p ?o}
[2020-09-10 05:52:23] Fuseki INFO [13] 200 OK (10.023 s)

For INSERTs log looks like this:
[2020-09-10 05:48:50] Fuseki INFO [7] POST http://localhost:3031/ds
[2020-09-10 05:49:00] HTTP INFO Send patch id:758bcf (165 bytes) ->
ds:090688
[2020-09-10 05:49:09] Fuseki INFO [7] 200 OK (19.059 s)
[2020-09-10 05:50:15] Fuseki INFO [10] POST http://localhost:3031/ds
[2020-09-10 05:50:25] HTTP INFO Send patch id:472bf1 (165 bytes) ->
ds:090688
[2020-09-10 05:50:44] Fuseki INFO [10] 200 OK (29.042 s)

Thanks,
Piotr

wt., 18 sie 2020 o 22:03 Andy Seaborne <an...@apache.org> napisał(a):

> Piotr,
>
> It will depend on how long zookeeper takes to resync. One of the factors
> is how big the zookeeper database has become because when a ZK server
> starts, it has to process the snapshot and any logs to
>
> But while syncing the other zk server should still provide service
> (read-only) - maybe the Fuseki server is choosing the fresh zk server so
> it'll wait whereas as if went to another server, it would be OK (for a
> read-transaction). I thought the new one would not service request until
> ready but I'm not sure.
>
> At what point in the Fuseki log file does the freeze happen? After the
> HTTP request is received or as it exits (is the freeze before or after
> the middle log line of the three for a request).
>
> With 3 zk servers, the system can survive one outage.
> With 5 zk server it can survive one outage and still allow write, or 2
> out and read-only.
>
> You can also configure zk in more complex primary-secondary configurations.
>
> A load balancer between the Fuseki servers and the zk servers may help.
>
>      Andy
>
>
> On 17/08/2020 17:09, Piotr Nowara wrote:
> > Hi,
> >
> > We are testing RDF Delta with three Zookeeper instances. Sometimes when
> we
> > kill one of those Zookeper instances Fuseki freezes for about 30 seconds
> > which is bad. Is this expected? Will increasing the number of Zookeper
> > instances help to avoid such issues?
>
> Yes and no.
>
> >
> > Thanks,
> > Piotr Nowara
> >
>

Re: RDF Delta/Zookeper freezing issue

Posted by Andy Seaborne <an...@apache.org>.


On 10/09/2020 07:11, Piotr Nowara wrote:
> Andy, Rob,
> 
> thanks for your explanations and suggestions - now I understand the issue
> much better.
> 
> I tried different approaches to solve it but the freeze keeps
> occurring when the leading Zookeper instance is down/stopped permanently.
> (BTW: shutting down even 2 out of 3 RDF Delta servers does not have any
> negative impact on the SPARQL execution which is great).

I think this is to do with the way zookeeper works. I don't know (= 
can't remember) if ZK has a way to gracefully shutdown that hands over 
the leader role without a full election with timeouts. It does look like 
exactly 10s timeouts happening somewhere. (Same situation as the 30s you 
reported originally?)

If the way you stop the leader is abrupt, then the system is going to 
have to do some kind of timeout because it may be a transitory network 
glitch.

Did you try any changes to the zk server configuration like Rob's [2] link?

     Andy

> Andy,
> to answer your question about when the freeze happens:
> 
> For SELECTs it always occurs between the second and third (last) log entry:
> 
> [2020-09-10 05:52:13] Fuseki INFO [13] POST http://localhost:3031/ds
> [2020-09-10 05:52:13] Fuseki INFO [13] Query = SELECT * {?s ?p ?o}
> [2020-09-10 05:52:23] Fuseki INFO [13] 200 OK (10.023 s)
> 
> For INSERTs log looks like this:
> [2020-09-10 05:48:50] Fuseki INFO [7] POST http://localhost:3031/ds
> [2020-09-10 05:49:00] HTTP INFO Send patch id:758bcf (165 bytes) ->
> ds:090688
> [2020-09-10 05:49:09] Fuseki INFO [7] 200 OK (19.059 s)
> [2020-09-10 05:50:15] Fuseki INFO [10] POST http://localhost:3031/ds
> [2020-09-10 05:50:25] HTTP INFO Send patch id:472bf1 (165 bytes) ->
> ds:090688
> [2020-09-10 05:50:44] Fuseki INFO [10] 200 OK (29.042 s)
> 
> Thanks,
> Piotr
> 
> wt., 18 sie 2020 o 22:03 Andy Seaborne <an...@apache.org> napisał(a):
> 
>> Piotr,
>>
>> It will depend on how long zookeeper takes to resync. One of the factors
>> is how big the zookeeper database has become because when a ZK server
>> starts, it has to process the snapshot and any logs to
>>
>> But while syncing the other zk server should still provide service
>> (read-only) - maybe the Fuseki server is choosing the fresh zk server so
>> it'll wait whereas as if went to another server, it would be OK (for a
>> read-transaction). I thought the new one would not service request until
>> ready but I'm not sure.
>>
>> At what point in the Fuseki log file does the freeze happen? After the
>> HTTP request is received or as it exits (is the freeze before or after
>> the middle log line of the three for a request).
>>
>> With 3 zk servers, the system can survive one outage.
>> With 5 zk server it can survive one outage and still allow write, or 2
>> out and read-only.
>>
>> You can also configure zk in more complex primary-secondary configurations.
>>
>> A load balancer between the Fuseki servers and the zk servers may help.
>>
>>       Andy
>>
>>
>> On 17/08/2020 17:09, Piotr Nowara wrote:
>>> Hi,
>>>
>>> We are testing RDF Delta with three Zookeeper instances. Sometimes when
>> we
>>> kill one of those Zookeper instances Fuseki freezes for about 30 seconds
>>> which is bad. Is this expected? Will increasing the number of Zookeper
>>> instances help to avoid such issues?
>>
>> Yes and no.
>>
>>>
>>> Thanks,
>>> Piotr Nowara
>>>
>>
>