You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Apurva Mehta (JIRA)" <ji...@apache.org> on 2017/06/20 00:15:00 UTC
[jira] [Updated] (KAFKA-5477) TransactionalProducer sleeps
unnecessarily long during back to back transactions
[ https://issues.apache.org/jira/browse/KAFKA-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apurva Mehta updated KAFKA-5477:
--------------------------------
Description:
I am running some perf tests for EOS and there is a severe perf impact with our default configs.
Here is the issue.
# When we do a commit transaction, the producer sends an `EndTxn` request to the coordinator. The coordinator writes the `PrepareCommit` message to the transaction log and then returns the response the client. It writes the transaction markers and the final 'CompleteCommit' message asynchronously.
# In the mean time, if the client starts another transaction, it will send an `AddPartitions` request on the next `Sender.run` loop. If the markers haven't been written yet, then the coordinator will return a retriable `CONCURRENT_TRANSACTIONS` error to the client.
# The current behavior in the producer is to sleep for `retryBackoffMs` before retrying the request. The current default for this is 100ms. So the producer will sleep for 100ms before sending the `AddPartitions` again. This puts a floor on the latency for back to back transactions.
Ideally, we don't want to sleep the full 100ms in this particular case, because the retry is 'expected'.
The options are:
# do nothing, let streams override the retry.backoff.ms in their producer to 10 when EOS is enabled (since they have a HOTFIX patch out anyway).
# Introduce a special 'transactionRetryBackoffMs' non-configurable variable and hard code that to a low value which applies to all transactional requests.
# do nothing and fix it properly in 0.11.0.1
was:
I am running some perf tests for EOS and there is a severe perf impact with our default configs.
Here is the issue.
# When we do a commit transaction, the producer sends an `EndTxn` request to the coordinator. The coordinator writes the `PrepareCommit` message to the transaction log and then returns the response the client. It writes the transaction markers and the final 'CompleteCommit' message asynchronously.
# In the mean time, if the client starts another transaction, it will send an `AddPartitions` request on the next `Sender.run` loop. If the markers haven't been written yet, then the coordinator will return a retriable `CONCURRENT_TRANSACTIONS` error to the client.
# The current behavior in the producer is to sleep for `retryBackoffMs` before retrying the request. The current default for this is 100ms. So the producer will sleep for 100ms before sending the `AddPartitions` again. This puts a floor on the latency for transactions.
Ideally, we don't want to sleep the full 100ms in this particular case, because the retry is 'expected'.
The options are:
# do nothing, let streams override the retry.backoff.ms in their producer to 10 when EOS is enabled (since they have a HOTFIX patch out anyway).
# Introduce a special 'transactionRetryBackoffMs' non-configurable variable and hard code that to a low value which applies to all transactional requests.
# do nothing and fix it properly in 0.11.0.1
> TransactionalProducer sleeps unnecessarily long during back to back transactions
> --------------------------------------------------------------------------------
>
> Key: KAFKA-5477
> URL: https://issues.apache.org/jira/browse/KAFKA-5477
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.11.0.0
> Reporter: Apurva Mehta
> Assignee: Apurva Mehta
>
> I am running some perf tests for EOS and there is a severe perf impact with our default configs.
> Here is the issue.
> # When we do a commit transaction, the producer sends an `EndTxn` request to the coordinator. The coordinator writes the `PrepareCommit` message to the transaction log and then returns the response the client. It writes the transaction markers and the final 'CompleteCommit' message asynchronously.
> # In the mean time, if the client starts another transaction, it will send an `AddPartitions` request on the next `Sender.run` loop. If the markers haven't been written yet, then the coordinator will return a retriable `CONCURRENT_TRANSACTIONS` error to the client.
> # The current behavior in the producer is to sleep for `retryBackoffMs` before retrying the request. The current default for this is 100ms. So the producer will sleep for 100ms before sending the `AddPartitions` again. This puts a floor on the latency for back to back transactions.
> Ideally, we don't want to sleep the full 100ms in this particular case, because the retry is 'expected'.
> The options are:
> # do nothing, let streams override the retry.backoff.ms in their producer to 10 when EOS is enabled (since they have a HOTFIX patch out anyway).
> # Introduce a special 'transactionRetryBackoffMs' non-configurable variable and hard code that to a low value which applies to all transactional requests.
> # do nothing and fix it properly in 0.11.0.1
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)