You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@bahir.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/05/17 21:47:00 UTC

[jira] [Commented] (BAHIR-177) Siddhi Library state recovery causes an Exception

    [ https://issues.apache.org/jira/browse/BAHIR-177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16842627#comment-16842627 ] 

ASF subversion and git services commented on BAHIR-177:
-------------------------------------------------------

Commit dd02c6dfda2a1bd42a6f79ec4c7ba273216640a0 in bahir-flink's branch refs/heads/master from Dominik Wosin
[ https://gitbox.apache.org/repos/asf?p=bahir-flink.git;h=dd02c6d ]

[BAHIR-177] Fixes state recovery/size of the recovered queue

Two issues are meant to be fixed in this PR:

- As described in BAHIR-177 currently the state recovery of
Bahir operators depends on randomly generated IDs, which
basically makes it impossible to recover state properly.
The chagne has been done, so that the outStreamId is
used instead of random names.

-The size of the queue recovered in restoreQueuerState()
was equal to the actual size (number of elements) of the
snapshot queue. If the queue was empty, the method would
try to create queue with the size 0, which is currently
forbidden for the PriorityQueue in Java.

Closes #51


> Siddhi Library state recovery causes an Exception
> -------------------------------------------------
>
>                 Key: BAHIR-177
>                 URL: https://issues.apache.org/jira/browse/BAHIR-177
>             Project: Bahir
>          Issue Type: Bug
>            Reporter: Dominik Wosiński
>            Assignee: Dominik Wosiński
>            Priority: Blocker
>
> Currently, Flink offers a way to store state and this is utilized for Siddhi Library. The problem is that Siddhi internally bases on operators IDs that are generated automatically when the _SiddhiAppRuntime_ is initialized. This means that if the job is restarted and new operators IDs are assigned for Siddhi, yet the Flink stores states with old ID's. 
> Siddhi uses an operator ID to get state from Map :
> _snapshotable.restoreState(snapshots.get(snapshotable.getElementId()));_
> Siddhi does not make a null-check on the retrieved values, thus _restoreState_ throws an NPE which is caught and _CannotRestoreSiddhiAppStateException_ is thrown instead. Any flink job will go into infinite loop of restarting after facing this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)