You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "yan zhou (JIRA)" <ji...@apache.org> on 2018/06/04 23:30:00 UTC
[jira] [Comment Edited] (FLINK-9524) NPE from ProcTimeBoundedRangeOver.scala

    [ https://issues.apache.org/jira/browse/FLINK-9524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501026#comment-16501026 ] 

yan zhou edited comment on FLINK-9524 at 6/4/18 11:29 PM:
----------------------------------------------------------

I would love to work on this bug if it's confirmed. It takes me a while to add/adjust the trace log to easily understand whats happening. 


was (Author: yzandrew):
Here is the trace log I added into ProcTimeBoundedRangeOver.scala. It should explain how does NPE happen:

 
_[ts:1528149296456] [label:state_ttl_update] register for cleanup at 1528150096456(CLEANUP_TIME_1), because of Row:(orderId:001,userId:U123)_
_[ts:1528149296456] [label:register_pt] register for process input at 1528149296457, because of Row:(orderId:001,userId:U123)_
_[ts:1528149296458] [label:state_apply] ontimer at 1528149296457, apply Row:(orderId:001,userId:U123) to accumulator_
 
_[ts:1528149885813] [label:state_ttl_update] register at 1528150685813(__CLEANUP_TIME___2__), because of Row:(orderId:002,userId:U123)_
_[ts:1528149885813] [label:register_pt] register for process input at 1528149885814, because of Row:(orderId:002,userId:U123)_
_[ts:1528149885814] [label:state_apply] ontimer at 1528149885814, apply Row:(orderId:002,userId:U123) to accumulator_
 
_[ts:1528150096460] [label:NO_ELEMENTS_IN_STATE] ontimer at 1528150096456(__CLEANUP_TIME___1__), bypass needToCleanupState check, however rowMapState is \{key:1528150096455, value:[]}_
 
_[ts:1528150685815] [label:state_timeout] ontimer at 1528150685813(__CLEANUP_TIME___2__), clean/empty the rowMapState [\{key:1528149885813, value:[Row:(orderId:002,userId:U123)]}]_

> NPE from ProcTimeBoundedRangeOver.scala
> ---------------------------------------
>
>                 Key: FLINK-9524
>                 URL: https://issues.apache.org/jira/browse/FLINK-9524
>             Project: Flink
>          Issue Type: Bug
>          Components: Table API &amp; SQL
>    Affects Versions: 1.5.0
>            Reporter: yan zhou
>            Priority: Major
>         Attachments: npe_from_ProcTimeBoundedRangeOver.txt
>
>
> The class _ProcTimeBoundedRangeOver_ would throws NPE if _minRetentionTime_ and _maxRetentionTime_ are set to greater then 1. 
> Please see [^npe_from_ProcTimeBoundedRangeOver.txt] for the detail of  exception. Below is a short description of the cause:
>  * When the first event for a key arrives,  the cleanup time is registered with _timerservice_ and recorded in _cleanupTimeState_. If the second event with same key arrives before the cleanup time, the value in _cleanupTimeState_ is updated and a new timer is registered to _timerService_. So now we have two registered timers for cleanup. One is registered because of the first event, the other for the second event.
>  * However, when _onTimer_ method is fired for the first cleanup timer, the _cleanupTimeStates_ value has already been updated to second cleanup time. So it will bypass the _needToCleanupState_ check, and yet run through the remained code of _onTimer_ (which is intended to update the accumulator and emit output) and cause NPE.
> _RowTimeBoundedRangeOver_ has very similar logic with _ProcTimeBoundedRangeOver. But_ It won't cause NPE by the same reason. To avoid the exception, it simply add a null check before running the logic for updating accumulator.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)