You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by 丛搏 <bo...@apache.org> on 2022/04/12 12:44:36 UTC

[DISCUSS] [PIP-154] Max active transaction limitation for transaction coordinator

pipHi Pulsar community,

I open a pip to discuss Max active transaction limitation for
transaction coordinator.
link: https://github.com/apache/pulsar/issues/15133


Thanks,
Bo

Re: [DISCUSS] [PIP-154] Max active transaction limitation for transaction coordinator

Posted by Michael Marshall <mm...@apache.org>.
Copying the contents of the PIP to this thread:

----

- **Status**: Discussion
- **Author**: Bo Cong
- **Pull Request**:
- **Mailing List discussion**:
- **Release**: 2.11

# Motivation

Currently, the transaction coordinator does not limit the number of
active transactions, which may cause the following problems:

- A large number of active transactions will put a lot of pressure on memory
- The transaction that a single TC can handle is limited, so the
active transaction cannot be expanded infinitely
- End transaction should wait TP or TB recover success, so a lot end
request will pending in TP or TB and TC don't kown the state of the TB
or TP, it will wast a lot of resource of the machine. If there have a
lot of TB or TP request in pending state, it will cause the OOM

## Implementation

### Add config

add maxActiveTransactions into broker.conf

```makefile
# The max active transactions in one transaction coordinator
maxActiveTransactions=0
```



### How to handle the number of active transactions reach the
maxActiveTransactions?



If reach the maxActiveTransactions, return the Exception to client. It
has a lot of disadvantages:

1. broker should add a ReachMaxActiveTxnException, if reach the max
active txn exception. client need try this exception then do op. every
client will handle the ReachMaxActiveTxnException.
2. client receive this transaction will not stop open txn, because it
don't know what time the TC will be recoverd. It will retry now. When
the TC can't recover, the client will keep retrying. But this op is
not make sense.

### Design

When this op request reach the maxActiveTransactions, coordinator
don't return any response for this request. ignore this request
directly. In this way, broker don't need to add any exception for this
config.



#### Let's we can see, how does this way will affect the client?

If broker don't return the reponse for this request, the op of open
txn will timeout. and in coordinator client, it has a semaphore to
control the op of txn(open, add produce topic, add ack topic, end
txn). In the timeout time, the coordinator client only can open the
number of semaphore txns. Any other request will be block. So this
design slove this two problems:

1. don't need to add a exception
2. client will not infinite retry

#### Worries

If you are worried that this design will affect the client-side
experience, because the open transaction will always time out and
other txn op will be blocked. I think your worry is superfluous, At
this time, you should consider increasing the performance of the
cluster or find the problematic client to repair.



### flow chart

![image](https://user-images.githubusercontent.com/39078850/162964277-6342ae82-1691-48b5-af84-18bb7a422ff1.png)



### Compatibility, Deprecation, and Migration Plan

maxActiveTransactions default = 0, if maxActiveTransactions will not
block open txn

### Test Plan

reach maxActiveTransactions client open txn will timeout

### Rejected Alternatives

If reach the maxActiveTransactions, return the Exception to client. It
has a lot of disadvantages:

1. broker should add a ReachMaxActiveTxnException, if reach the max
active txn exception. client need try this exception then do op. every
client will handle the ReachMaxActiveTxnException.
2. client receive this transaction will not stop open txn, because it
don't know what time the TC will be recoverd. It will retry now. When
the TC can't recover, the client will keep retrying. But this op is
not make sense.

On Tue, Apr 12, 2022 at 7:45 AM 丛搏 <bo...@apache.org> wrote:
>
> pipHi Pulsar community,
>
> I open a pip to discuss Max active transaction limitation for
> transaction coordinator.
> link: https://github.com/apache/pulsar/issues/15133
>
>
> Thanks,
> Bo

Re: [DISCUSS] [PIP-154] Max active transaction limitation for transaction coordinator

Posted by Ran Gao <rg...@apache.org>.
Bo, Great work!

+1

best,
Ran Gao

On 2022/04/12 12:44:36 丛搏 wrote:
> pipHi Pulsar community,
> 
> I open a pip to discuss Max active transaction limitation for
> transaction coordinator.
> link: https://github.com/apache/pulsar/issues/15133
> 
> 
> Thanks,
> Bo
> 

Re: [DISCUSS] [PIP-154] Max active transaction limitation for transaction coordinator

Posted by 丛搏 <co...@gmail.com>.
Hi Haiting,

We have the active txn count metrics, but don't have the "active
transaction usage percent", I think we can use active txn count to
implement the monitor and alert.

PR : https://github.com/apache/pulsar/pull/10651

Thanks,
Bo

Haiting Jiang <ji...@apache.org> 于2022年4月19日周二 11:38写道:
>
> Hi Bo,
>
> Do we have metrics like "active transaction usage percent" to  set up monitor and alert.
> This is necessary since this critical once the limitation is reached.
>
> Thanks,
> Haiting
>
> On 2022/04/12 12:44:36 丛搏 wrote:
> > pipHi Pulsar community,
> >
> > I open a pip to discuss Max active transaction limitation for
> > transaction coordinator.
> > link: https://github.com/apache/pulsar/issues/15133
> >
> >
> > Thanks,
> > Bo
> >

Re: [DISCUSS] [PIP-154] Max active transaction limitation for transaction coordinator

Posted by PengHui Li <pe...@apache.org>.
+1

Penghui

On Tue, Apr 19, 2022 at 11:38 AM Haiting Jiang <ji...@apache.org>
wrote:

> Hi Bo,
>
> Do we have metrics like "active transaction usage percent" to  set up
> monitor and alert.
> This is necessary since this critical once the limitation is reached.
>
> Thanks,
> Haiting
>
> On 2022/04/12 12:44:36 丛搏 wrote:
> > pipHi Pulsar community,
> >
> > I open a pip to discuss Max active transaction limitation for
> > transaction coordinator.
> > link: https://github.com/apache/pulsar/issues/15133
> >
> >
> > Thanks,
> > Bo
> >
>

Re: [DISCUSS] [PIP-154] Max active transaction limitation for transaction coordinator

Posted by Haiting Jiang <ji...@apache.org>.
Hi Bo,

Do we have metrics like "active transaction usage percent" to  set up monitor and alert.
This is necessary since this critical once the limitation is reached.

Thanks,
Haiting

On 2022/04/12 12:44:36 丛搏 wrote:
> pipHi Pulsar community,
> 
> I open a pip to discuss Max active transaction limitation for
> transaction coordinator.
> link: https://github.com/apache/pulsar/issues/15133
> 
> 
> Thanks,
> Bo
>