You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Sergi Vladykin <se...@gmail.com> on 2019/11/23 20:32:12 UTC

Transactional Producer

Hi!

I have two questions related to transactional producers:

1. Is it OK to mix transactional and non-transactional approach with a
single KafkaProducer instance? I mean sometimes I want to publish multiple
messages transactionally, but oftentimes just a single message. Starting a
transaction for publishing a single message looks inefficient. What is the
recommend approach here?

2. If I publish multiple messages to multiple partitions in a single
transaction is it guaranteed to be all or nothing published? Is it possible
to end up with only half of the messages published to half of partitions in
some failure scenario?

Sergi

Re: Transactional Producer

Posted by Sergi Vladykin <se...@gmail.com>.
Hi!

I think we need to step back a little bit and understand what is what you
> are trying to achieve, please, will be beneficial to give you an accurate
> answer.
>

Sure, I'm working on my pet project that is a simple key-value database
replicated over Kafka.
I already implemented simple atomic updates like putIfAbsent, but now I
want to support transactional updates for multiple keys.
Thus, I'm trying to understand limitations of Kafka transactions and how to
correctly apply them to the task.


> What order can I expect for these published messages?
> - This depends on different factors, like linger, batch size, buffers, etc,
> even the network latency.
>

Obviously, I'm not too much interested in cases when the batch size is huge
and linger is huge as well and everything is batched together and
transactions are published one after another.
I'm looking into extreme case when there were no batching at all and
records were sent one after another in the described order, so that we
heave interleaving of records between the transactions.
Since producer API allows us to get the published offset of the record
before committing, it makes me think that this interleaving must be
possible.


> We should get all the records in the order of their offsets, thus we will
> be able to consume A and will not be able
> to consume B until X is either committed or aborted?
> - This depends on the reading isolation level, partitions assigned to the
> consumer, ...
>

As I wrote in the original message, we are talking about a single
partition.
And obviously it must be a read_committed consumer, otherwise transactions
are useless.


> If you give a little bit more context about what you are trying to achieve,
> probably we can help you further.
>

Thanks a lot for your help!

BTW, here is the link to the project if you are interested:
https://github.com/svladykin/ReplicaMap

Sergi


>
> Cheers!
> --
> Jonathan
>
> On Mon, Nov 25, 2019 at 6:51 AM Sergi Vladykin <se...@gmail.com>
> wrote:
>
> > Thanks a lot for your help!
> >
> > Another question about ordering and visibility.
> >
> > Lets say we have two transactional producers with different transactional
> > ids. They both publish records to the same partition like this:
> >
> > *thread1: startTx*
> > *thread1: record A*
> > *thread2: startTx*
> > *thread2: record X*
> > *thread1: record B*
> > *thread1: commit*
> > *thread2: record Y*
> > *thread2: commit*
> >
> > What order can I expect for these published messages?
> > Looks like it should be possible to get interleaved AXBY order (if the
> > records were not batched together).
> > But what if thread2 hangs for a long time right before the commit and
> > thread1 successfully commits?
> > We should get all the records in the order of their offsets, thus we will
> > be able to consume A and will not be able
> > to consume B until X is either committed or aborted?
> > Is my understanding right?
> >
> > The same will happen when we have one transactional and one
> > non-transactional producer publishing to the same partition?
> >
> > Sergi
> >
> > вс, 24 нояб. 2019 г. в 21:12, Jonathan Santilli <
> > jonathansantilli@gmail.com
> > >:
> >
> > > Hello Sergi,
> > >
> > > 1. Is it OK to mix transactional and non-transactional approach with a
> > > single KafkaProducer instance?
> > > - This is not possible, a transactional producer can not send data
> > outside
> > > a transaction.
> > >
> > > I mean sometimes I want to publish multiple messages transactionally,
> but
> > > oftentimes just a single message.
> > > Starting a transaction for publishing a single message looks
> inefficient.
> > > What is the recommend approach here?
> > > - Try to batch the records, if possible, otherwise, you need to begging
> > and
> > > commit the transaction, even for a single record.
> > >
> > > 2. If I publish multiple messages to multiple partitions in a single
> > > transaction is it guaranteed to be all or nothing published?
> > > - Yes, this is the power of the transactions, all or nothing.
> > >
> > > Is it possible to end up with only half of the messages published to
> half
> > > of partitions in some failure scenario?
> > > - No, this is not possible if you are using correctly a transaction.
> > >
> > > Please, take a look at this simple gist with diff scenarios of a
> > > KafkaProducer, hope this help:
> > >
> >
> https://gist.github.com/jonathansantilli/3b69ebbcd24e7a30f66db790ef648f99
> > >
> > >
> > > Cheers!
> > > --
> > > Jonathan
> > >
> > >
> > >
> > > On Sat, Nov 23, 2019 at 8:33 PM Sergi Vladykin <
> sergi.vladykin@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi!
> > > >
> > > > I have two questions related to transactional producers:
> > > >
> > > > 1. Is it OK to mix transactional and non-transactional approach with
> a
> > > > single KafkaProducer instance? I mean sometimes I want to publish
> > > multiple
> > > > messages transactionally, but oftentimes just a single message.
> > Starting
> > > a
> > > > transaction for publishing a single message looks inefficient. What
> is
> > > the
> > > > recommend approach here?
> > > >
> > > > 2. If I publish multiple messages to multiple partitions in a single
> > > > transaction is it guaranteed to be all or nothing published? Is it
> > > possible
> > > > to end up with only half of the messages published to half of
> > partitions
> > > in
> > > > some failure scenario?
> > > >
> > > > Sergi
> > > >
> > >
> > >
> > > --
> > > Santilli Jonathan
> > >
> >
>
>
> --
> Santilli Jonathan
>

Re: Transactional Producer

Posted by Jonathan Santilli <jo...@gmail.com>.
Hello Sergi,

I think we need to step back a little bit and understand what is what you
are trying to achieve, please, will be beneficial to give you an accurate
answer.

What order can I expect for these published messages?
- This depends on different factors, like linger, batch size, buffers, etc,
even the network latency.

We should get all the records in the order of their offsets, thus we will
be able to consume A and will not be able
to consume B until X is either committed or aborted?
- This depends on the reading isolation level, partitions assigned to the
consumer, ...

If you give a little bit more context about what you are trying to achieve,
probably we can help you further.

Cheers!
--
Jonathan

On Mon, Nov 25, 2019 at 6:51 AM Sergi Vladykin <se...@gmail.com>
wrote:

> Thanks a lot for your help!
>
> Another question about ordering and visibility.
>
> Lets say we have two transactional producers with different transactional
> ids. They both publish records to the same partition like this:
>
> *thread1: startTx*
> *thread1: record A*
> *thread2: startTx*
> *thread2: record X*
> *thread1: record B*
> *thread1: commit*
> *thread2: record Y*
> *thread2: commit*
>
> What order can I expect for these published messages?
> Looks like it should be possible to get interleaved AXBY order (if the
> records were not batched together).
> But what if thread2 hangs for a long time right before the commit and
> thread1 successfully commits?
> We should get all the records in the order of their offsets, thus we will
> be able to consume A and will not be able
> to consume B until X is either committed or aborted?
> Is my understanding right?
>
> The same will happen when we have one transactional and one
> non-transactional producer publishing to the same partition?
>
> Sergi
>
> вс, 24 нояб. 2019 г. в 21:12, Jonathan Santilli <
> jonathansantilli@gmail.com
> >:
>
> > Hello Sergi,
> >
> > 1. Is it OK to mix transactional and non-transactional approach with a
> > single KafkaProducer instance?
> > - This is not possible, a transactional producer can not send data
> outside
> > a transaction.
> >
> > I mean sometimes I want to publish multiple messages transactionally, but
> > oftentimes just a single message.
> > Starting a transaction for publishing a single message looks inefficient.
> > What is the recommend approach here?
> > - Try to batch the records, if possible, otherwise, you need to begging
> and
> > commit the transaction, even for a single record.
> >
> > 2. If I publish multiple messages to multiple partitions in a single
> > transaction is it guaranteed to be all or nothing published?
> > - Yes, this is the power of the transactions, all or nothing.
> >
> > Is it possible to end up with only half of the messages published to half
> > of partitions in some failure scenario?
> > - No, this is not possible if you are using correctly a transaction.
> >
> > Please, take a look at this simple gist with diff scenarios of a
> > KafkaProducer, hope this help:
> >
> https://gist.github.com/jonathansantilli/3b69ebbcd24e7a30f66db790ef648f99
> >
> >
> > Cheers!
> > --
> > Jonathan
> >
> >
> >
> > On Sat, Nov 23, 2019 at 8:33 PM Sergi Vladykin <sergi.vladykin@gmail.com
> >
> > wrote:
> >
> > > Hi!
> > >
> > > I have two questions related to transactional producers:
> > >
> > > 1. Is it OK to mix transactional and non-transactional approach with a
> > > single KafkaProducer instance? I mean sometimes I want to publish
> > multiple
> > > messages transactionally, but oftentimes just a single message.
> Starting
> > a
> > > transaction for publishing a single message looks inefficient. What is
> > the
> > > recommend approach here?
> > >
> > > 2. If I publish multiple messages to multiple partitions in a single
> > > transaction is it guaranteed to be all or nothing published? Is it
> > possible
> > > to end up with only half of the messages published to half of
> partitions
> > in
> > > some failure scenario?
> > >
> > > Sergi
> > >
> >
> >
> > --
> > Santilli Jonathan
> >
>


-- 
Santilli Jonathan

Re: Transactional Producer

Posted by Sergi Vladykin <se...@gmail.com>.
Thanks a lot for your help!

Another question about ordering and visibility.

Lets say we have two transactional producers with different transactional
ids. They both publish records to the same partition like this:

*thread1: startTx*
*thread1: record A*
*thread2: startTx*
*thread2: record X*
*thread1: record B*
*thread1: commit*
*thread2: record Y*
*thread2: commit*

What order can I expect for these published messages?
Looks like it should be possible to get interleaved AXBY order (if the
records were not batched together).
But what if thread2 hangs for a long time right before the commit and
thread1 successfully commits?
We should get all the records in the order of their offsets, thus we will
be able to consume A and will not be able
to consume B until X is either committed or aborted?
Is my understanding right?

The same will happen when we have one transactional and one
non-transactional producer publishing to the same partition?

Sergi

вс, 24 нояб. 2019 г. в 21:12, Jonathan Santilli <jonathansantilli@gmail.com
>:

> Hello Sergi,
>
> 1. Is it OK to mix transactional and non-transactional approach with a
> single KafkaProducer instance?
> - This is not possible, a transactional producer can not send data outside
> a transaction.
>
> I mean sometimes I want to publish multiple messages transactionally, but
> oftentimes just a single message.
> Starting a transaction for publishing a single message looks inefficient.
> What is the recommend approach here?
> - Try to batch the records, if possible, otherwise, you need to begging and
> commit the transaction, even for a single record.
>
> 2. If I publish multiple messages to multiple partitions in a single
> transaction is it guaranteed to be all or nothing published?
> - Yes, this is the power of the transactions, all or nothing.
>
> Is it possible to end up with only half of the messages published to half
> of partitions in some failure scenario?
> - No, this is not possible if you are using correctly a transaction.
>
> Please, take a look at this simple gist with diff scenarios of a
> KafkaProducer, hope this help:
> https://gist.github.com/jonathansantilli/3b69ebbcd24e7a30f66db790ef648f99
>
>
> Cheers!
> --
> Jonathan
>
>
>
> On Sat, Nov 23, 2019 at 8:33 PM Sergi Vladykin <se...@gmail.com>
> wrote:
>
> > Hi!
> >
> > I have two questions related to transactional producers:
> >
> > 1. Is it OK to mix transactional and non-transactional approach with a
> > single KafkaProducer instance? I mean sometimes I want to publish
> multiple
> > messages transactionally, but oftentimes just a single message. Starting
> a
> > transaction for publishing a single message looks inefficient. What is
> the
> > recommend approach here?
> >
> > 2. If I publish multiple messages to multiple partitions in a single
> > transaction is it guaranteed to be all or nothing published? Is it
> possible
> > to end up with only half of the messages published to half of partitions
> in
> > some failure scenario?
> >
> > Sergi
> >
>
>
> --
> Santilli Jonathan
>

Re: Transactional Producer

Posted by Jonathan Santilli <jo...@gmail.com>.
Hello Sergi,

1. Is it OK to mix transactional and non-transactional approach with a
single KafkaProducer instance?
- This is not possible, a transactional producer can not send data outside
a transaction.

I mean sometimes I want to publish multiple messages transactionally, but
oftentimes just a single message.
Starting a transaction for publishing a single message looks inefficient.
What is the recommend approach here?
- Try to batch the records, if possible, otherwise, you need to begging and
commit the transaction, even for a single record.

2. If I publish multiple messages to multiple partitions in a single
transaction is it guaranteed to be all or nothing published?
- Yes, this is the power of the transactions, all or nothing.

Is it possible to end up with only half of the messages published to half
of partitions in some failure scenario?
- No, this is not possible if you are using correctly a transaction.

Please, take a look at this simple gist with diff scenarios of a
KafkaProducer, hope this help:
https://gist.github.com/jonathansantilli/3b69ebbcd24e7a30f66db790ef648f99


Cheers!
--
Jonathan



On Sat, Nov 23, 2019 at 8:33 PM Sergi Vladykin <se...@gmail.com>
wrote:

> Hi!
>
> I have two questions related to transactional producers:
>
> 1. Is it OK to mix transactional and non-transactional approach with a
> single KafkaProducer instance? I mean sometimes I want to publish multiple
> messages transactionally, but oftentimes just a single message. Starting a
> transaction for publishing a single message looks inefficient. What is the
> recommend approach here?
>
> 2. If I publish multiple messages to multiple partitions in a single
> transaction is it guaranteed to be all or nothing published? Is it possible
> to end up with only half of the messages published to half of partitions in
> some failure scenario?
>
> Sergi
>


-- 
Santilli Jonathan