You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "yan zhou (JIRA)" <ji...@apache.org> on 2018/06/04 23:30:00 UTC
[jira] [Comment Edited] (FLINK-9524) NPE from
ProcTimeBoundedRangeOver.scala
[ https://issues.apache.org/jira/browse/FLINK-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501026#comment-16501026 ]
yan zhou edited comment on FLINK-9524 at 6/4/18 11:29 PM:
----------------------------------------------------------
I would love to work on this bug if it's confirmed. It takes me a while to add/adjust the trace log to easily understand whats happening.
was (Author: yzandrew):
Here is the trace log I added into ProcTimeBoundedRangeOver.scala. It should explain how does NPE happen:
_[ts:1528149296456] [label:state_ttl_update] register for cleanup at 1528150096456(CLEANUP_TIME_1), because of Row:(orderId:001,userId:U123)_
_[ts:1528149296456] [label:register_pt] register for process input at 1528149296457, because of Row:(orderId:001,userId:U123)_
_[ts:1528149296458] [label:state_apply] ontimer at 1528149296457, apply Row:(orderId:001,userId:U123) to accumulator_
_[ts:1528149885813] [label:state_ttl_update] register at 1528150685813(__CLEANUP_TIME___2__), because of Row:(orderId:002,userId:U123)_
_[ts:1528149885813] [label:register_pt] register for process input at 1528149885814, because of Row:(orderId:002,userId:U123)_
_[ts:1528149885814] [label:state_apply] ontimer at 1528149885814, apply Row:(orderId:002,userId:U123) to accumulator_
_[ts:1528150096460] [label:NO_ELEMENTS_IN_STATE] ontimer at 1528150096456(__CLEANUP_TIME___1__), bypass needToCleanupState check, however rowMapState is \{key:1528150096455, value:[]}_
_[ts:1528150685815] [label:state_timeout] ontimer at 1528150685813(__CLEANUP_TIME___2__), clean/empty the rowMapState [\{key:1528149885813, value:[Row:(orderId:002,userId:U123)]}]_
> NPE from ProcTimeBoundedRangeOver.scala
> ---------------------------------------
>
> Key: FLINK-9524
> URL: https://issues.apache.org/jira/browse/FLINK-9524
> Project: Flink
> Issue Type: Bug
> Components: Table API & SQL
> Affects Versions: 1.5.0
> Reporter: yan zhou
> Priority: Major
> Attachments: npe_from_ProcTimeBoundedRangeOver.txt
>
>
> The class _ProcTimeBoundedRangeOver_ would throws NPE if _minRetentionTime_ and _maxRetentionTime_ are set to greater then 1.
> Please see [^npe_from_ProcTimeBoundedRangeOver.txt] for the detail of exception. Below is a short description of the cause:
> * When the first event for a key arrives, the cleanup time is registered with _timerservice_ and recorded in _cleanupTimeState_. If the second event with same key arrives before the cleanup time, the value in _cleanupTimeState_ is updated and a new timer is registered to _timerService_. So now we have two registered timers for cleanup. One is registered because of the first event, the other for the second event.
> * However, when _onTimer_ method is fired for the first cleanup timer, the _cleanupTimeStates_ value has already been updated to second cleanup time. So it will bypass the _needToCleanupState_ check, and yet run through the remained code of _onTimer_ (which is intended to update the accumulator and emit output) and cause NPE.
> _RowTimeBoundedRangeOver_ has very similar logic with _ProcTimeBoundedRangeOver. But_ It won't cause NPE by the same reason. To avoid the exception, it simply add a null check before running the logic for updating accumulator.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)