You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "nicu marasoiu (JIRA)" <ji...@apache.org> on 2014/07/26 20:42:38 UTC

[jira] [Comment Edited] (KAFKA-1510) Force offset commits when migrating consumer offsets from zookeeper to kafka

    [ https://issues.apache.org/jira/browse/KAFKA-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075447#comment-14075447 ] 

nicu marasoiu edited comment on KAFKA-1510 at 7/26/14 6:40 PM:
---------------------------------------------------------------

Hi,
isAutoCommit argument works exactly the other way around, apparently it is "false" from the scheduled auto commit and to "true" from zkConsConn.commitOffsets()?

So the migration of offsets from zk to kafka is to : set dual commit and kafka storage, restart consumers, wait for kafka to be copied on the offset commits, and take out dual commit.

So currently kafka is copied with the offsets only when data flows, and for the purpose of this task, we need to add one or 2 more cases when it is getting the offset: when shutting down, or perhaps periodically.

So this task applies only when storage==kafka and dualCommit ==true, right?

I would first ask why the write to zookeeper the new offsets, only if the write to kafka was ok? 

I would write both directions at all time, and perhaps keep 2 checkpoint structures, one kafka one zookeeper.



was (Author: nmarasoi):
Hi,
I am confused. Let me state some understandings and check which is ok and which not.

So the migration of offsets from zk to kafka is to : set dual commit, leave storage to zookeeper, wait for kafka to be copied on the offset commits, and switch storage to kafka, yes?

So currently kafka is copied with the offsets only when data flows, and for the purpose of this task, we need to add one or 2 more cases when it is getting the offset: when shutting down, or perhaps periodically.

So this task applies only when storage==kafka and dualCommit ==true, right?

I would first ask why the write to zookeeper the new offsets, only if the write to kafka was ok? 

I would write both directions at all time, and perhaps keep 2 checkpoint structures, one kafka one zookeeper.

So if I presume correctly, the first phrase in the task description I don't think "in addition to setting offsets.storage to kafka" is correct, because logically, unless there is a tool to copy offsets, you first set dual, wait for offsets to sync and for all clients to reboot in dual and sync, then switch storage to kafka (so it is not at the same time like the description suggests), and then reboot clients in single kafka mode, yes?

Waiting for clarifications, I've started coding, but it is better I fully understand first.

> Force offset commits when migrating consumer offsets from zookeeper to kafka
> ----------------------------------------------------------------------------
>
>                 Key: KAFKA-1510
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1510
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.2
>            Reporter: Joel Koshy
>            Assignee: nicu marasoiu
>              Labels: newbie
>             Fix For: 0.8.2
>
>
> When migrating consumer offsets from ZooKeeper to kafka, we have to turn on dual-commit (i.e., the consumers will commit offsets to both zookeeper and kafka) in addition to setting offsets.storage to kafka. However, when we commit offsets we only commit offsets if they have changed (since the last commit). For low-volume topics or for topics that receive data in bursts offsets may not move for a long period of time. Therefore we may want to force the commit (even if offsets have not changed) when migrating (i.e., when dual-commit is enabled) - we can add a minimum interval threshold (say force commit after every 10 auto-commits) as well as on rebalance and shutdown.
> Also, I think it is safe to switch the default for offsets.storage from zookeeper to kafka and set the default to dual-commit (for people who have not migrated yet). We have deployed this to the largest consumers at linkedin and have not seen any issues so far (except for the migration caveat that this jira will resolve).



--
This message was sent by Atlassian JIRA
(v6.2#6252)