You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by "Hemanth Yamijala (JIRA)" <ji...@apache.org> on 2016/05/05 12:47:12 UTC
[jira] [Commented] (ATLAS-629) Kafka messages in ATLAS_HOOK might be lost in HA mode at the instant of failover.

    [ https://issues.apache.org/jira/browse/ATLAS-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272262#comment-15272262 ] 

Hemanth Yamijala commented on ATLAS-629:
----------------------------------------

Started looking at the approach to fix this problem. With Kafka's (old) high level consumer, we only have *atmost-once* delivery because the offsets read from the partitions are auto committed by default. So if a message is read and offset auto committed, but before the metadata ingest is completed, the server reboots, then this message could be lost for processing.

To fix this issue, I am looking at *atleast-once* delivery semantics with Kafka, under the assumption that *message processing can be idempotent on the server*. Given we use transactions in Titan and also have create-or-update semantics, this may be mostly true - but not really sure. Will need to test.

To move to atleast-once processing, the predominant approach people follow seems to be to:
* disable auto commit
* Create one ConsumerConnector per partition of a topic.

The latter is because the old high level consumer does not provide for commit per partition. It can only commit all offsets read by all partitions it is connected to [(Reference 1)|http://grokbase.com/t/kafka/users/144b80h269/consumerconnector-commitoffsets]. The above suggestion of one consumer connector per partition has been proposed by Kafka experts in many threads [(Reference 2)|http://mail-archives.apache.org/mod_mbox/kafka-users/201409.mbox/%3CCAHBV8WeYj8ce6G5J0k3a1hGgdNskGv3bsaP8JXSM=kWBnuJ4GQ@mail.gmail.com%3E].

The other option could be to move to the newer consumer API in Kafka (with 0.9+) that (I think) provides better options for handling a per partition commit. However, the new consumer is still marked beta, so not really sure. Can check with some Kafka committers internally.

For now, I will try out the first approach and see. In the meantime, happy to hear feedback from others.

> Kafka messages in ATLAS_HOOK might be lost in HA mode at the instant of failover.
> ---------------------------------------------------------------------------------
>
>                 Key: ATLAS-629
>                 URL: https://issues.apache.org/jira/browse/ATLAS-629
>             Project: Atlas
>          Issue Type: Bug
>    Affects Versions: 0.7-incubating
>            Reporter: Hemanth Yamijala
>            Assignee: Hemanth Yamijala
>            Priority: Critical
>             Fix For: 0.7-incubating
>
>
> Write data to Kafka continuously from Hive hook - can do this by writing a script that constantly creates tables. Bring down the Active instance with kill -9. Ensure writes continue after passive becomes active. The expectation is the number of tables created and the number of tables in Atlas match.
> In one test, wrote 180 tables and switched over 6 times from one instance to another. Found that 1 table was lost of the lot. i.e. 179 tables were created, and 1 did not get in.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)