You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Joel Koshy (JIRA)" <ji...@apache.org> on 2011/07/26 20:29:09 UTC

[jira] [Created] (KAFKA-75) Corp replica does not shutdown on IO error

Corp replica does not shutdown on IO error
------------------------------------------

                 Key: KAFKA-75
                 URL: https://issues.apache.org/jira/browse/KAFKA-75
             Project: Kafka
          Issue Type: Improvement
            Reporter: Joel Koshy


The embedded consumer in the corp replica uses the low-level Log api to create the replica. The append operation may fail and result in a corrupt log file, due to an IO error which is currently caught and ignored.

The proposed fix is to switch to using the high-level producer API to create the replica. Not only would this avoid the above issue, but it would also fit better with the current design of the replication enhancement for kafka (http://linkedin.jira.com/browse/KAFKA-23), since the low-level Log api is not replication-aware. Another advantage is that compression is exposed at the producer API-level. One caveat in this approach would be the following: the async producer drops events when its queue is full. This behavior is unsuitable for the embedded consumer, so we can expose a configuration option in the producer to allow for (queue-level) blocking semantics.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (KAFKA-75) Kafka mirror does not shutdown on IO error

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-75?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Koshy closed KAFKA-75.
---------------------------

    Resolution: Fixed

This issue has been subsumed by the fix for KAFKA-74.

> Kafka mirror does not shutdown on IO error
> ------------------------------------------
>
>                 Key: KAFKA-75
>                 URL: https://issues.apache.org/jira/browse/KAFKA-75
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joel Koshy
>
> The embedded consumer in the kafka mirror implementation uses the low-level Log api to create the replica. The append operation may fail and result in a corrupt log file, due to an IO error which is currently caught and ignored.
> The proposed fix is to switch to using the high-level producer API to create the replica. Not only would this avoid the above issue, but it would also fit better with the current design of the replication enhancement for kafka (http://linkedin.jira.com/browse/KAFKA-23), since the low-level Log api is not replication-aware. Another advantage is that compression is exposed at the producer API-level. One caveat in this approach would be the following: the async producer drops events when its queue is full. This behavior is unsuitable for the embedded consumer, so we can expose a configuration option in the producer to allow for (queue-level) blocking semantics.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-75) Corp replica does not shutdown on IO error

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-75?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071264#comment-13071264 ] 

Joel Koshy commented on KAFKA-75:
---------------------------------

If we use a producer to mirror the source cluster, then that brings in the need for a producer configuration in addition to the embedded consumer configuration. There are some alternatives:

- One ConsumerConfig and one ProducerConfig. i.e., only allow mirroring one source cluster at a time. If you need mirrors of multiple source clusters, you have to set up one target kafka cluster per source cluster.
- Only one (global) ConsumerConfig and allow multiple values (one per source cluster) for the zk.connect property. However, with this approach you cannot easily override global configs such as socket.buffersize/timeout (and such overrides are often useful for cross-DC mirrors), topic blacklists (when we do auto-discovery of topics - SNA-6887), etc.
- Array of ConsumerConfig (one per source cluster to mirror) and one ProducerConfig. We can either instantiate multiple embedded consumers or just multiple consumer connectors.

More considerations for ProducerConfig:

- With only one ProducerConfig, identical topics from different clusters will get mixed on the target cluster. We could have an array of ConsumerConfig/ProducerConfig pairs to avoid this.
- If it makes sense, we can also allow ProducerConfig to be optional to aid in smoother deployments of the next release (with the above changes in place). In this case, the ProducerConfig will inherit the ZkConfig of the local kafka broker.


> Corp replica does not shutdown on IO error
> ------------------------------------------
>
>                 Key: KAFKA-75
>                 URL: https://issues.apache.org/jira/browse/KAFKA-75
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joel Koshy
>
> The embedded consumer in the corp replica uses the low-level Log api to create the replica. The append operation may fail and result in a corrupt log file, due to an IO error which is currently caught and ignored.
> The proposed fix is to switch to using the high-level producer API to create the replica. Not only would this avoid the above issue, but it would also fit better with the current design of the replication enhancement for kafka (http://linkedin.jira.com/browse/KAFKA-23), since the low-level Log api is not replication-aware. Another advantage is that compression is exposed at the producer API-level. One caveat in this approach would be the following: the async producer drops events when its queue is full. This behavior is unsuitable for the embedded consumer, so we can expose a configuration option in the producer to allow for (queue-level) blocking semantics.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-75) Kafka mirror does not shutdown on IO error

Posted by "Joel Koshy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-75?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joel Koshy updated KAFKA-75:
----------------------------

    Description: 
The embedded consumer in the kafka mirror implementation uses the low-level Log api to create the replica. The append operation may fail and result in a corrupt log file, due to an IO error which is currently caught and ignored.

The proposed fix is to switch to using the high-level producer API to create the replica. Not only would this avoid the above issue, but it would also fit better with the current design of the replication enhancement for kafka (http://linkedin.jira.com/browse/KAFKA-23), since the low-level Log api is not replication-aware. Another advantage is that compression is exposed at the producer API-level. One caveat in this approach would be the following: the async producer drops events when its queue is full. This behavior is unsuitable for the embedded consumer, so we can expose a configuration option in the producer to allow for (queue-level) blocking semantics.


  was:
The embedded consumer in the corp replica uses the low-level Log api to create the replica. The append operation may fail and result in a corrupt log file, due to an IO error which is currently caught and ignored.

The proposed fix is to switch to using the high-level producer API to create the replica. Not only would this avoid the above issue, but it would also fit better with the current design of the replication enhancement for kafka (http://linkedin.jira.com/browse/KAFKA-23), since the low-level Log api is not replication-aware. Another advantage is that compression is exposed at the producer API-level. One caveat in this approach would be the following: the async producer drops events when its queue is full. This behavior is unsuitable for the embedded consumer, so we can expose a configuration option in the producer to allow for (queue-level) blocking semantics.


        Summary: Kafka mirror does not shutdown on IO error  (was: Corp replica does not shutdown on IO error)

> Kafka mirror does not shutdown on IO error
> ------------------------------------------
>
>                 Key: KAFKA-75
>                 URL: https://issues.apache.org/jira/browse/KAFKA-75
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Joel Koshy
>
> The embedded consumer in the kafka mirror implementation uses the low-level Log api to create the replica. The append operation may fail and result in a corrupt log file, due to an IO error which is currently caught and ignored.
> The proposed fix is to switch to using the high-level producer API to create the replica. Not only would this avoid the above issue, but it would also fit better with the current design of the replication enhancement for kafka (http://linkedin.jira.com/browse/KAFKA-23), since the low-level Log api is not replication-aware. Another advantage is that compression is exposed at the producer API-level. One caveat in this approach would be the following: the async producer drops events when its queue is full. This behavior is unsuitable for the embedded consumer, so we can expose a configuration option in the producer to allow for (queue-level) blocking semantics.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira