You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Kien Truong <du...@gmail.com> on 2017/10/07 03:44:01 UTC

RocksDB segfault inside timer when accessing/clearing state

Hi,

We are using processing timer to implement some state clean up logic.
After switching from FsStateBackend to RocksDB, we encounter a lot of segfault from the Time Trigger threads when accessing/clearing state value.

We currently uses the latest 1.3-SNAPSHOT, with the patch upgrading RocksDB to 5.6.1, because the seg faults happen less frequently with this version than with the original FRocksDB

Perhaps, there's some race conditions here. Any insights would be much appreciated.

Best regards,
Kien

Re: RocksDB segfault inside timer when accessing/clearing state

Posted by Kien Truong <du...@gmail.com>.

Hi Stephan,

I guess this is the case. Our cluster is a bit overloaded network-wise,

so sometime a Task Manager got disconnected, which causes the restart of 
the entire job,

leading to multiple segfaults in other task managers, prolonging recovery.

We're upgrading the network, hopefully the problem will go away :)


Thanks,

Kien


On 10/9/2017 7:27 AM, Stefan Richter wrote:
> Hi,
>
> I would assume that those segfaults are only observed *after* a job is already in the process of canceling? This is a known problem, but currently „accepted“ behaviour after discussions with Stephan and Aljoscha (in CC). From that discussion, the background is that the native RocksDB resource is disposed somewhere in the process of cancelation, and the timers are executed in a different thread than the main event processing loop that is exited in cancelation. We would currently either have to a) wait for all timer events to finish before cancelation or b) somehow synchronize every access to the RocksDB resource field. The problem with option a) is that it can delay cancelation for an uncertain amount of time and we want to cancel asap so that the job can restart immediately in case of failover. Option b) introduces additional costs per access under normal operations to avoid a problem after the point that a job is already canceling.
>
> Personally, I also absolutely don’t like the idea of accepting this faulty behaviour and would be in favour of a „cleaner“ solution, maybe somehow reworking how the timers events are executed or interact with normal processing.
>
> Best,
> Stefan
>
>> Am 07.10.2017 um 05:44 schrieb Kien Truong <du...@gmail.com>:
>>
>> Hi,
>>
>> We are using processing timer to implement some state clean up logic.
>> After switching from FsStateBackend to RocksDB, we encounter a lot of segfault from the Time Trigger threads when accessing/clearing state value.
>>
>> We currently uses the latest 1.3-SNAPSHOT, with the patch upgrading RocksDB to 5.6.1, because the seg faults happen less frequently with this version than with the original FRocksDB
>>
>> Perhaps, there's some race conditions here. Any insights would be much appreciated.
>>
>> Best regards,
>> Kien
>>

Re: RocksDB segfault inside timer when accessing/clearing state

Posted by Stefan Richter <s....@data-artisans.com>.

Hi,

I would assume that those segfaults are only observed *after* a job is already in the process of canceling? This is a known problem, but currently „accepted“ behaviour after discussions with Stephan and Aljoscha (in CC). From that discussion, the background is that the native RocksDB resource is disposed somewhere in the process of cancelation, and the timers are executed in a different thread than the main event processing loop that is exited in cancelation. We would currently either have to a) wait for all timer events to finish before cancelation or b) somehow synchronize every access to the RocksDB resource field. The problem with option a) is that it can delay cancelation for an uncertain amount of time and we want to cancel asap so that the job can restart immediately in case of failover. Option b) introduces additional costs per access under normal operations to avoid a problem after the point that a job is already canceling.

Personally, I also absolutely don’t like the idea of accepting this faulty behaviour and would be in favour of a „cleaner“ solution, maybe somehow reworking how the timers events are executed or interact with normal processing.

Best,
Stefan

> Am 07.10.2017 um 05:44 schrieb Kien Truong <du...@gmail.com>:
> 
> Hi,
> 
> We are using processing timer to implement some state clean up logic.
> After switching from FsStateBackend to RocksDB, we encounter a lot of segfault from the Time Trigger threads when accessing/clearing state value.
> 
> We currently uses the latest 1.3-SNAPSHOT, with the patch upgrading RocksDB to 5.6.1, because the seg faults happen less frequently with this version than with the original FRocksDB
> 
> Perhaps, there's some race conditions here. Any insights would be much appreciated.
> 
> Best regards,
> Kien
>