You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/09/18 18:07:01 UTC

[jira] [Commented] (ATLAS-2634) Large Notification Messages: Avoid Processing of Already Processed Messages

    [ https://issues.apache.org/jira/browse/ATLAS-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619490#comment-16619490 ] 

ASF subversion and git services commented on ATLAS-2634:
--------------------------------------------------------

Commit 2f7348988b992e8a9e5a71cf1a483803fa7d6db8 in atlas's branch refs/heads/branch-0.8 from [~ashutoshm]
[ https://git-wip-us.apache.org/repos/asf?p=atlas.git;h=2f73489 ]

ATLAS-2634: Avoid duplicate message processing.

Signed-off-by: Ashutosh Mestry <am...@hortonworks.com>
(cherry picked from commit f29a2b7bb2b555e68d7f5e2b43221f85877aa39c)


> Large Notification Messages: Avoid Processing of Already Processed Messages
> ---------------------------------------------------------------------------
>
>                 Key: ATLAS-2634
>                 URL: https://issues.apache.org/jira/browse/ATLAS-2634
>             Project: Atlas
>          Issue Type: Bug
>          Components:  atlas-core
>    Affects Versions: trunk
>            Reporter: Ashutosh Mestry
>            Assignee: Ashutosh Mestry
>            Priority: Major
>             Fix For: 1.0.0
>
>         Attachments: ATLAS-2634-Large-Notification-Message-Processing-avo.patch
>
>
> *Scenario*
> If a hook encounters messages that have size larger than what Kafka can handle, it has either compresses or splits or does both to break down the message in a size that Kafka can handle.
> When Atlas encounters such a message as part of processing messages from the hook, it uses appropriate strategy to get the message back in the correct format.
> When a message of this type is processed, there is a possibility that the processing will go on for over the threshold mandated by Kafka for commit. If the processing exceeds the threshold, Kafka will resend that message.
> This causes the message to be reprocessed. 
> Given this, it is possible that the message may be stuck in the queue forever or at the very least, it is re-processed several times (at least twice).
>  
> *Solution*
>  * Record the message Ids for large messages.
>  ** For messages with no version number, calculate MD5 hash of the message and use that as message id.
>  * If a message with same Id is encountered again, commit the same, without processing. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)