You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Xiangying Meng <xi...@apache.org> on 2023/09/18 14:26:06 UTC

Re: [DISSCUSS] PIP-298: Consumer supports specifying consumption isolation level

Hi Dave,
This is an external request. Paimon has added support for Kafka but
has not yet incorporated support for Pulsar. Therefore, the Paimon
community desires to integrate Pulsar.

Furthermore, when integrating Pulsar into Paimon, it is desired to
enable the ability to configure isolation levels, similar to Kafka, to
support reading uncommitted transaction logs.

Additional context can be found in the following link:
https://github.com/apache/incubator-paimon/issues/765

Sincerely,
Xiangying

On Mon, Sep 18, 2023 at 10:30 AM Dave Fisher <wa...@comcast.net> wrote:
>
> My concern is that this pip allows consumers to change the processing rules for transactions in ways that a producer might find unexpected.
>
> I think if this proceeds then the scope needs to be expanded to producers/admins needing to proactively allow transactions to be consumed uncommitted.
>
> I am also interested in the use case that motivates this change.
>
> Best,
> Dave
>
> Sent from my iPhone
>
> > On Sep 17, 2023, at 8:50 AM, hzh0425 <hz...@apache.org> wrote:
> >
> > Hi, all
> >
> > This PR contributed to pip 298: https://github.com/apache/pulsar/pull/21114
> >
> >
> >
> >
> > This pip is to implement Read Committed and Read Uncommitted isolation levels for Pulsar transactions, allow consumers to configure isolation levels during the building process.
> >
> > For more details, please refer to pip-298.md
> >
> >
> >
> >
> > I hope everyone can help review and discuss this pip and enter the discuss stage.
> >
> > Thanks
> > Zhangheng Huang
>

Re: [DISSCUSS] PIP-298: Consumer supports specifying consumption isolation level

Posted by hzh0425 <hz...@apache.org>.
Thanks for Dave and xiangying.
Does anyone have any other questions or concerns?



---- Replied Message ----
| From | Xiangying Meng<xi...@apache.org> |
| Date | 09/20/2023 15:50 |
| To | dev@pulsar.apache.org |
| Cc | |
| Subject | Re: [DISSCUSS] PIP-298: Consumer supports specifying consumption isolation level |
Hi, all,

Let's consider another example:

**System**: Financial Transaction System

**Operations**: Large volume of deposit and withdrawal operations, a
small number of transfer operations.

**Roles**:

- **Client A1**
- **Client A2**
- **User Account B1**
- **User Account B2**
- **Request Topic C**
- **Real-time Monitoring System D**
- **Business Processing System E**

**Client Operations**:

- **Withdrawal**: Client A1 decreases the deposit amount from User
Account B1 or B2.
- **Deposit**: Client A1 increases the deposit amount in User Account B1 or B2.
- **Transfer**: Client A2 decreases the deposit amount from User
Account B1 and increases it in User Account B2. Or vice versa.

**Real-time Monitoring System D**: Obtains the latest data from
Request Topic C as quickly as possible to monitor transaction data and
changes in bank reserves in real-time. This is necessary for the
timely detection of anomalies and real-time decision-making.

**Business Processing System E**: Reads data from Request Topic C,
then actually operates User Accounts B1, B2.

**User Scenario**: Client A1 sends a large number of deposit and
withdrawal requests to Request Topic C. Client A2 writes a small
number of transfer requests to Request Topic C.

In this case, Business Processing System E needs a read-committed
isolation level to ensure operation consistency and Exactly Once
semantics. The real-time monitoring system does not care if a small
number of transfer requests are incomplete (dirty data). What it
cannot tolerate is a situation where a large number of deposit and
withdrawal requests cannot be presented in real time due to a small
number of transfer requests (the current situation is that uncommitted
transaction messages can block the reading of committed transaction
messages).

In this case, it is necessary to set different isolation levels for
different consumers/subscriptions.

Thanks,
Xiangying

On Tue, Sep 19, 2023 at 11:35 PM 杨国栋 <ya...@gmail.com> wrote:
>
> Hi Dave and Xiangying,
> Thanks for all your support.
>
> Let me add some background.
>
> Apache Paimon take message queue as External Log Systems and changelog of
> Paimon can also be consumed from message queue.
> By default, change-log of message queue in Paimon are visible to consumers
> only after a snapshot. Snapshot have a same life cycle as message queue
> transactions.
> However, users can immediately consume change-log by read uncommited
> message without waiting for the next snapshot.
> This behavior reduces the latency of changelog, but it relies on reading
> uncommited message in Kafka or other message queue.
> So we hope Pulsar can support Read Uncommitted isolation level.
>
> Put aside the application scenarios of Paimon. Let's discuss Read
> Uncommitted isolation level itself.
>
> Read Uncommitted isolation will bring certain security risks, but will also
> make the message immediately readable.
> Reading submitted data can ensure accuracy, and reading uncommitted data
> can ensure real-time performance (there may be some repeated message or
> dirty message).
> Real-time performance is what users need. How to handle dirty message
> should be considered by the application side.
>
> We can still get complete and accurate data from Read Committed isolation
> level.
>
> Sincerely yours.

[DISSCUSS] PIP-298: Consumer supports specifying consumption isolation level

Posted by hzh0425 <hz...@apache.org>.
Hi Dave,


Thanks for your support.
We have completed the relevant documents: https://github.com/apache/pulsar-site/pull/712

Pip: https://github.com/apache/pulsar/pull/21114
Please help me take a look when you have time.


Thanks,
zhangheng
---- Replied Message ----
| From | Xiangying Meng<xi...@apache.org> |
| Date | 9/26/2023 09:40 |
| To | <de...@pulsar.apache.org> |
| Subject | Re: [DISSCUSS] PIP-298: Consumer supports specifying consumption isolation level |
Hi Dave,

Thanks for your support.
I also think this should only be for the master branch.

Thanks,
Xiangying

On Tue, Sep 26, 2023 at 9:34 AM Dave Fisher <wa...@comcast.net> wrote:

Hi -

OK. I’ll agree, but I think the PIP ought to include documentation. There should also be clear communication about this use case and how to use it.

Sent from my iPhone

On Sep 25, 2023, at 6:23 PM, Xiangying Meng <xi...@apache.org> wrote:

Hi Dave,
The uncommitted transactions do not impact actual users' bank accounts.
Business Processing System E only reads committed transactional
messages and operates users' accounts. It needs Exactly-once semantic.
Real-time Monitoring System D reads uncommitted transactional
messages. It does not need Exactly-once semantic.

They use different subscriptions and choose different isolation
levels. One needs transaction, one does not.
In general, multiple subscriptions of the same topic do not all
require transaction guarantees.
Some want low latency without the exact-once semantic guarantee, and
some must require the exactly-once guarantee.
We just provide a new option for different subscriptions. This should
not be a breaking change,right?

Not a breaking change, but it does add to the API.

It should be discussed if this PIP is only for master - 3.2, or if may be cherry picked to current versions.


Looking forward to your reply.

Thank you,
Dave

Thanks,
Xiangying

On Tue, Sep 26, 2023 at 4:09 AM Dave Fisher <wa...@apache.org> wrote:



On Sep 20, 2023, at 12:50 AM, Xiangying Meng <xi...@apache.org> wrote:

Hi, all,

Let's consider another example:

**System**: Financial Transaction System

**Operations**: Large volume of deposit and withdrawal operations, a
small number of transfer operations.

**Roles**:

- **Client A1**
- **Client A2**
- **User Account B1**
- **User Account B2**
- **Request Topic C**
- **Real-time Monitoring System D**
- **Business Processing System E**

**Client Operations**:

- **Withdrawal**: Client A1 decreases the deposit amount from User
Account B1 or B2.
- **Deposit**: Client A1 increases the deposit amount in User Account B1 or B2.
- **Transfer**: Client A2 decreases the deposit amount from User
Account B1 and increases it in User Account B2. Or vice versa.

**Real-time Monitoring System D**: Obtains the latest data from
Request Topic C as quickly as possible to monitor transaction data and
changes in bank reserves in real-time. This is necessary for the
timely detection of anomalies and real-time decision-making.

**Business Processing System E**: Reads data from Request Topic C,
then actually operates User Accounts B1, B2.

**User Scenario**: Client A1 sends a large number of deposit and
withdrawal requests to Request Topic C. Client A2 writes a small
number of transfer requests to Request Topic C.

In this case, Business Processing System E needs a read-committed
isolation level to ensure operation consistency and Exactly Once
semantics. The real-time monitoring system does not care if a small
number of transfer requests are incomplete (dirty data). What it
cannot tolerate is a situation where a large number of deposit and
withdrawal requests cannot be presented in real time due to a small
number of transfer requests (the current situation is that uncommitted
transaction messages can block the reading of committed transaction
messages).

So you are willing to let uncommitted transactions impact actual users bank accounts? Are you sure that there is not another way to bypass uncommitted records? Letting uncommitted records through is not Exactly once.

Are you ready to rewrite Pulsar’s documentation to explain how normal users can avoid allowing this?

Best,
Dave



In this case, it is necessary to set different isolation levels for
different consumers/subscriptions.

Thanks,
Xiangying

On Tue, Sep 19, 2023 at 11:35 PM 杨国栋 <ya...@gmail.com> wrote:

Hi Dave and Xiangying,
Thanks for all your support.

Let me add some background.

Apache Paimon take message queue as External Log Systems and changelog of
Paimon can also be consumed from message queue.
By default, change-log of message queue in Paimon are visible to consumers
only after a snapshot. Snapshot have a same life cycle as message queue
transactions.
However, users can immediately consume change-log by read uncommited
message without waiting for the next snapshot.
This behavior reduces the latency of changelog, but it relies on reading
uncommited message in Kafka or other message queue.
So we hope Pulsar can support Read Uncommitted isolation level.

Put aside the application scenarios of Paimon. Let's discuss Read
Uncommitted isolation level itself.

Read Uncommitted isolation will bring certain security risks, but will also
make the message immediately readable.
Reading submitted data can ensure accuracy, and reading uncommitted data
can ensure real-time performance (there may be some repeated message or
dirty message).
Real-time performance is what users need. How to handle dirty message
should be considered by the application side.

We can still get complete and accurate data from Read Committed isolation
level.

Sincerely yours.



Re: [DISSCUSS] PIP-298: Consumer supports specifying consumption isolation level

Posted by Xiangying Meng <xi...@apache.org>.
Hi Dave,

Thanks for your support.
I also think this should only be for the master branch.

Thanks,
Xiangying

On Tue, Sep 26, 2023 at 9:34 AM Dave Fisher <wa...@comcast.net> wrote:
>
> Hi -
>
> OK. I’ll agree, but I think the PIP ought to include documentation. There should also be clear communication about this use case and how to use it.
>
> Sent from my iPhone
>
> > On Sep 25, 2023, at 6:23 PM, Xiangying Meng <xi...@apache.org> wrote:
> >
> > Hi Dave,
> > The uncommitted transactions do not impact actual users' bank accounts.
> > Business Processing System E only reads committed transactional
> > messages and operates users' accounts. It needs Exactly-once semantic.
> > Real-time Monitoring System D reads uncommitted transactional
> > messages. It does not need Exactly-once semantic.
> >
> > They use different subscriptions and choose different isolation
> > levels. One needs transaction, one does not.
> > In general, multiple subscriptions of the same topic do not all
> > require transaction guarantees.
> > Some want low latency without the exact-once semantic guarantee, and
> > some must require the exactly-once guarantee.
> > We just provide a new option for different subscriptions. This should
> > not be a breaking change,right?
>
> Not a breaking change, but it does add to the API.
>
> It should be discussed if this PIP is only for master - 3.2, or if may be cherry picked to current versions.
>
> >
> > Looking forward to your reply.
>
> Thank you,
> Dave
> >
> > Thanks,
> > Xiangying
> >
> >> On Tue, Sep 26, 2023 at 4:09 AM Dave Fisher <wa...@apache.org> wrote:
> >>
> >>
> >>
> >>>> On Sep 20, 2023, at 12:50 AM, Xiangying Meng <xi...@apache.org> wrote:
> >>>
> >>> Hi, all,
> >>>
> >>> Let's consider another example:
> >>>
> >>> **System**: Financial Transaction System
> >>>
> >>> **Operations**: Large volume of deposit and withdrawal operations, a
> >>> small number of transfer operations.
> >>>
> >>> **Roles**:
> >>>
> >>> - **Client A1**
> >>> - **Client A2**
> >>> - **User Account B1**
> >>> - **User Account B2**
> >>> - **Request Topic C**
> >>> - **Real-time Monitoring System D**
> >>> - **Business Processing System E**
> >>>
> >>> **Client Operations**:
> >>>
> >>> - **Withdrawal**: Client A1 decreases the deposit amount from User
> >>> Account B1 or B2.
> >>> - **Deposit**: Client A1 increases the deposit amount in User Account B1 or B2.
> >>> - **Transfer**: Client A2 decreases the deposit amount from User
> >>> Account B1 and increases it in User Account B2. Or vice versa.
> >>>
> >>> **Real-time Monitoring System D**: Obtains the latest data from
> >>> Request Topic C as quickly as possible to monitor transaction data and
> >>> changes in bank reserves in real-time. This is necessary for the
> >>> timely detection of anomalies and real-time decision-making.
> >>>
> >>> **Business Processing System E**: Reads data from Request Topic C,
> >>> then actually operates User Accounts B1, B2.
> >>>
> >>> **User Scenario**: Client A1 sends a large number of deposit and
> >>> withdrawal requests to Request Topic C. Client A2 writes a small
> >>> number of transfer requests to Request Topic C.
> >>>
> >>> In this case, Business Processing System E needs a read-committed
> >>> isolation level to ensure operation consistency and Exactly Once
> >>> semantics. The real-time monitoring system does not care if a small
> >>> number of transfer requests are incomplete (dirty data). What it
> >>> cannot tolerate is a situation where a large number of deposit and
> >>> withdrawal requests cannot be presented in real time due to a small
> >>> number of transfer requests (the current situation is that uncommitted
> >>> transaction messages can block the reading of committed transaction
> >>> messages).
> >>
> >> So you are willing to let uncommitted transactions impact actual users bank accounts? Are you sure that there is not another way to bypass uncommitted records? Letting uncommitted records through is not Exactly once.
> >>
> >> Are you ready to rewrite Pulsar’s documentation to explain how normal users can avoid allowing this?
> >>
> >> Best,
> >> Dave
> >>
> >>
> >>>
> >>> In this case, it is necessary to set different isolation levels for
> >>> different consumers/subscriptions.
> >>>
> >>> Thanks,
> >>> Xiangying
> >>>
> >>> On Tue, Sep 19, 2023 at 11:35 PM 杨国栋 <ya...@gmail.com> wrote:
> >>>>
> >>>> Hi Dave and Xiangying,
> >>>> Thanks for all your support.
> >>>>
> >>>> Let me add some background.
> >>>>
> >>>> Apache Paimon take message queue as External Log Systems and changelog of
> >>>> Paimon can also be consumed from message queue.
> >>>> By default, change-log of message queue in Paimon are visible to consumers
> >>>> only after a snapshot. Snapshot have a same life cycle as message queue
> >>>> transactions.
> >>>> However, users can immediately consume change-log by read uncommited
> >>>> message without waiting for the next snapshot.
> >>>> This behavior reduces the latency of changelog, but it relies on reading
> >>>> uncommited message in Kafka or other message queue.
> >>>> So we hope Pulsar can support Read Uncommitted isolation level.
> >>>>
> >>>> Put aside the application scenarios of Paimon. Let's discuss Read
> >>>> Uncommitted isolation level itself.
> >>>>
> >>>> Read Uncommitted isolation will bring certain security risks, but will also
> >>>> make the message immediately readable.
> >>>> Reading submitted data can ensure accuracy, and reading uncommitted data
> >>>> can ensure real-time performance (there may be some repeated message or
> >>>> dirty message).
> >>>> Real-time performance is what users need. How to handle dirty message
> >>>> should be considered by the application side.
> >>>>
> >>>> We can still get complete and accurate data from Read Committed isolation
> >>>> level.
> >>>>
> >>>> Sincerely yours.
> >>
>

Re: [DISSCUSS] PIP-298: Consumer supports specifying consumption isolation level

Posted by Dave Fisher <wa...@comcast.net>.
Hi -

OK. I’ll agree, but I think the PIP ought to include documentation. There should also be clear communication about this use case and how to use it.

Sent from my iPhone

> On Sep 25, 2023, at 6:23 PM, Xiangying Meng <xi...@apache.org> wrote:
> 
> Hi Dave,
> The uncommitted transactions do not impact actual users' bank accounts.
> Business Processing System E only reads committed transactional
> messages and operates users' accounts. It needs Exactly-once semantic.
> Real-time Monitoring System D reads uncommitted transactional
> messages. It does not need Exactly-once semantic.
> 
> They use different subscriptions and choose different isolation
> levels. One needs transaction, one does not.
> In general, multiple subscriptions of the same topic do not all
> require transaction guarantees.
> Some want low latency without the exact-once semantic guarantee, and
> some must require the exactly-once guarantee.
> We just provide a new option for different subscriptions. This should
> not be a breaking change,right?

Not a breaking change, but it does add to the API.

It should be discussed if this PIP is only for master - 3.2, or if may be cherry picked to current versions.

> 
> Looking forward to your reply.

Thank you,
Dave
> 
> Thanks,
> Xiangying
> 
>> On Tue, Sep 26, 2023 at 4:09 AM Dave Fisher <wa...@apache.org> wrote:
>> 
>> 
>> 
>>>> On Sep 20, 2023, at 12:50 AM, Xiangying Meng <xi...@apache.org> wrote:
>>> 
>>> Hi, all,
>>> 
>>> Let's consider another example:
>>> 
>>> **System**: Financial Transaction System
>>> 
>>> **Operations**: Large volume of deposit and withdrawal operations, a
>>> small number of transfer operations.
>>> 
>>> **Roles**:
>>> 
>>> - **Client A1**
>>> - **Client A2**
>>> - **User Account B1**
>>> - **User Account B2**
>>> - **Request Topic C**
>>> - **Real-time Monitoring System D**
>>> - **Business Processing System E**
>>> 
>>> **Client Operations**:
>>> 
>>> - **Withdrawal**: Client A1 decreases the deposit amount from User
>>> Account B1 or B2.
>>> - **Deposit**: Client A1 increases the deposit amount in User Account B1 or B2.
>>> - **Transfer**: Client A2 decreases the deposit amount from User
>>> Account B1 and increases it in User Account B2. Or vice versa.
>>> 
>>> **Real-time Monitoring System D**: Obtains the latest data from
>>> Request Topic C as quickly as possible to monitor transaction data and
>>> changes in bank reserves in real-time. This is necessary for the
>>> timely detection of anomalies and real-time decision-making.
>>> 
>>> **Business Processing System E**: Reads data from Request Topic C,
>>> then actually operates User Accounts B1, B2.
>>> 
>>> **User Scenario**: Client A1 sends a large number of deposit and
>>> withdrawal requests to Request Topic C. Client A2 writes a small
>>> number of transfer requests to Request Topic C.
>>> 
>>> In this case, Business Processing System E needs a read-committed
>>> isolation level to ensure operation consistency and Exactly Once
>>> semantics. The real-time monitoring system does not care if a small
>>> number of transfer requests are incomplete (dirty data). What it
>>> cannot tolerate is a situation where a large number of deposit and
>>> withdrawal requests cannot be presented in real time due to a small
>>> number of transfer requests (the current situation is that uncommitted
>>> transaction messages can block the reading of committed transaction
>>> messages).
>> 
>> So you are willing to let uncommitted transactions impact actual users bank accounts? Are you sure that there is not another way to bypass uncommitted records? Letting uncommitted records through is not Exactly once.
>> 
>> Are you ready to rewrite Pulsar’s documentation to explain how normal users can avoid allowing this?
>> 
>> Best,
>> Dave
>> 
>> 
>>> 
>>> In this case, it is necessary to set different isolation levels for
>>> different consumers/subscriptions.
>>> 
>>> Thanks,
>>> Xiangying
>>> 
>>> On Tue, Sep 19, 2023 at 11:35 PM 杨国栋 <ya...@gmail.com> wrote:
>>>> 
>>>> Hi Dave and Xiangying,
>>>> Thanks for all your support.
>>>> 
>>>> Let me add some background.
>>>> 
>>>> Apache Paimon take message queue as External Log Systems and changelog of
>>>> Paimon can also be consumed from message queue.
>>>> By default, change-log of message queue in Paimon are visible to consumers
>>>> only after a snapshot. Snapshot have a same life cycle as message queue
>>>> transactions.
>>>> However, users can immediately consume change-log by read uncommited
>>>> message without waiting for the next snapshot.
>>>> This behavior reduces the latency of changelog, but it relies on reading
>>>> uncommited message in Kafka or other message queue.
>>>> So we hope Pulsar can support Read Uncommitted isolation level.
>>>> 
>>>> Put aside the application scenarios of Paimon. Let's discuss Read
>>>> Uncommitted isolation level itself.
>>>> 
>>>> Read Uncommitted isolation will bring certain security risks, but will also
>>>> make the message immediately readable.
>>>> Reading submitted data can ensure accuracy, and reading uncommitted data
>>>> can ensure real-time performance (there may be some repeated message or
>>>> dirty message).
>>>> Real-time performance is what users need. How to handle dirty message
>>>> should be considered by the application side.
>>>> 
>>>> We can still get complete and accurate data from Read Committed isolation
>>>> level.
>>>> 
>>>> Sincerely yours.
>> 


Re: [DISSCUSS] PIP-298: Consumer supports specifying consumption isolation level

Posted by Xiangying Meng <xi...@apache.org>.
Hi Dave,
The uncommitted transactions do not impact actual users' bank accounts.
Business Processing System E only reads committed transactional
messages and operates users' accounts. It needs Exactly-once semantic.
Real-time Monitoring System D reads uncommitted transactional
messages. It does not need Exactly-once semantic.

They use different subscriptions and choose different isolation
levels. One needs transaction, one does not.
In general, multiple subscriptions of the same topic do not all
require transaction guarantees.
Some want low latency without the exact-once semantic guarantee, and
some must require the exactly-once guarantee.
We just provide a new option for different subscriptions. This should
not be a breaking change,right?

Looking forward to your reply.

Thanks,
Xiangying

On Tue, Sep 26, 2023 at 4:09 AM Dave Fisher <wa...@apache.org> wrote:
>
>
>
> > On Sep 20, 2023, at 12:50 AM, Xiangying Meng <xi...@apache.org> wrote:
> >
> > Hi, all,
> >
> > Let's consider another example:
> >
> > **System**: Financial Transaction System
> >
> > **Operations**: Large volume of deposit and withdrawal operations, a
> > small number of transfer operations.
> >
> > **Roles**:
> >
> > - **Client A1**
> > - **Client A2**
> > - **User Account B1**
> > - **User Account B2**
> > - **Request Topic C**
> > - **Real-time Monitoring System D**
> > - **Business Processing System E**
> >
> > **Client Operations**:
> >
> > - **Withdrawal**: Client A1 decreases the deposit amount from User
> > Account B1 or B2.
> > - **Deposit**: Client A1 increases the deposit amount in User Account B1 or B2.
> > - **Transfer**: Client A2 decreases the deposit amount from User
> > Account B1 and increases it in User Account B2. Or vice versa.
> >
> > **Real-time Monitoring System D**: Obtains the latest data from
> > Request Topic C as quickly as possible to monitor transaction data and
> > changes in bank reserves in real-time. This is necessary for the
> > timely detection of anomalies and real-time decision-making.
> >
> > **Business Processing System E**: Reads data from Request Topic C,
> > then actually operates User Accounts B1, B2.
> >
> > **User Scenario**: Client A1 sends a large number of deposit and
> > withdrawal requests to Request Topic C. Client A2 writes a small
> > number of transfer requests to Request Topic C.
> >
> > In this case, Business Processing System E needs a read-committed
> > isolation level to ensure operation consistency and Exactly Once
> > semantics. The real-time monitoring system does not care if a small
> > number of transfer requests are incomplete (dirty data). What it
> > cannot tolerate is a situation where a large number of deposit and
> > withdrawal requests cannot be presented in real time due to a small
> > number of transfer requests (the current situation is that uncommitted
> > transaction messages can block the reading of committed transaction
> > messages).
>
> So you are willing to let uncommitted transactions impact actual users bank accounts? Are you sure that there is not another way to bypass uncommitted records? Letting uncommitted records through is not Exactly once.
>
> Are you ready to rewrite Pulsar’s documentation to explain how normal users can avoid allowing this?
>
> Best,
> Dave
>
>
> >
> > In this case, it is necessary to set different isolation levels for
> > different consumers/subscriptions.
> >
> > Thanks,
> > Xiangying
> >
> > On Tue, Sep 19, 2023 at 11:35 PM 杨国栋 <ya...@gmail.com> wrote:
> >>
> >> Hi Dave and Xiangying,
> >> Thanks for all your support.
> >>
> >> Let me add some background.
> >>
> >> Apache Paimon take message queue as External Log Systems and changelog of
> >> Paimon can also be consumed from message queue.
> >> By default, change-log of message queue in Paimon are visible to consumers
> >> only after a snapshot. Snapshot have a same life cycle as message queue
> >> transactions.
> >> However, users can immediately consume change-log by read uncommited
> >> message without waiting for the next snapshot.
> >> This behavior reduces the latency of changelog, but it relies on reading
> >> uncommited message in Kafka or other message queue.
> >> So we hope Pulsar can support Read Uncommitted isolation level.
> >>
> >> Put aside the application scenarios of Paimon. Let's discuss Read
> >> Uncommitted isolation level itself.
> >>
> >> Read Uncommitted isolation will bring certain security risks, but will also
> >> make the message immediately readable.
> >> Reading submitted data can ensure accuracy, and reading uncommitted data
> >> can ensure real-time performance (there may be some repeated message or
> >> dirty message).
> >> Real-time performance is what users need. How to handle dirty message
> >> should be considered by the application side.
> >>
> >> We can still get complete and accurate data from Read Committed isolation
> >> level.
> >>
> >> Sincerely yours.
>

Re: [DISSCUSS] PIP-298: Consumer supports specifying consumption isolation level

Posted by Dave Fisher <wa...@apache.org>.

> On Sep 20, 2023, at 12:50 AM, Xiangying Meng <xi...@apache.org> wrote:
> 
> Hi, all,
> 
> Let's consider another example:
> 
> **System**: Financial Transaction System
> 
> **Operations**: Large volume of deposit and withdrawal operations, a
> small number of transfer operations.
> 
> **Roles**:
> 
> - **Client A1**
> - **Client A2**
> - **User Account B1**
> - **User Account B2**
> - **Request Topic C**
> - **Real-time Monitoring System D**
> - **Business Processing System E**
> 
> **Client Operations**:
> 
> - **Withdrawal**: Client A1 decreases the deposit amount from User
> Account B1 or B2.
> - **Deposit**: Client A1 increases the deposit amount in User Account B1 or B2.
> - **Transfer**: Client A2 decreases the deposit amount from User
> Account B1 and increases it in User Account B2. Or vice versa.
> 
> **Real-time Monitoring System D**: Obtains the latest data from
> Request Topic C as quickly as possible to monitor transaction data and
> changes in bank reserves in real-time. This is necessary for the
> timely detection of anomalies and real-time decision-making.
> 
> **Business Processing System E**: Reads data from Request Topic C,
> then actually operates User Accounts B1, B2.
> 
> **User Scenario**: Client A1 sends a large number of deposit and
> withdrawal requests to Request Topic C. Client A2 writes a small
> number of transfer requests to Request Topic C.
> 
> In this case, Business Processing System E needs a read-committed
> isolation level to ensure operation consistency and Exactly Once
> semantics. The real-time monitoring system does not care if a small
> number of transfer requests are incomplete (dirty data). What it
> cannot tolerate is a situation where a large number of deposit and
> withdrawal requests cannot be presented in real time due to a small
> number of transfer requests (the current situation is that uncommitted
> transaction messages can block the reading of committed transaction
> messages).

So you are willing to let uncommitted transactions impact actual users bank accounts? Are you sure that there is not another way to bypass uncommitted records? Letting uncommitted records through is not Exactly once.

Are you ready to rewrite Pulsar’s documentation to explain how normal users can avoid allowing this?

Best,
Dave


> 
> In this case, it is necessary to set different isolation levels for
> different consumers/subscriptions.
> 
> Thanks,
> Xiangying
> 
> On Tue, Sep 19, 2023 at 11:35 PM 杨国栋 <ya...@gmail.com> wrote:
>> 
>> Hi Dave and Xiangying,
>> Thanks for all your support.
>> 
>> Let me add some background.
>> 
>> Apache Paimon take message queue as External Log Systems and changelog of
>> Paimon can also be consumed from message queue.
>> By default, change-log of message queue in Paimon are visible to consumers
>> only after a snapshot. Snapshot have a same life cycle as message queue
>> transactions.
>> However, users can immediately consume change-log by read uncommited
>> message without waiting for the next snapshot.
>> This behavior reduces the latency of changelog, but it relies on reading
>> uncommited message in Kafka or other message queue.
>> So we hope Pulsar can support Read Uncommitted isolation level.
>> 
>> Put aside the application scenarios of Paimon. Let's discuss Read
>> Uncommitted isolation level itself.
>> 
>> Read Uncommitted isolation will bring certain security risks, but will also
>> make the message immediately readable.
>> Reading submitted data can ensure accuracy, and reading uncommitted data
>> can ensure real-time performance (there may be some repeated message or
>> dirty message).
>> Real-time performance is what users need. How to handle dirty message
>> should be considered by the application side.
>> 
>> We can still get complete and accurate data from Read Committed isolation
>> level.
>> 
>> Sincerely yours.


Re: [DISSCUSS] PIP-298: Consumer supports specifying consumption isolation level

Posted by Xiangying Meng <xi...@apache.org>.
Hi, all,

Let's consider another example:

**System**: Financial Transaction System

**Operations**: Large volume of deposit and withdrawal operations, a
small number of transfer operations.

**Roles**:

- **Client A1**
- **Client A2**
- **User Account B1**
- **User Account B2**
- **Request Topic C**
- **Real-time Monitoring System D**
- **Business Processing System E**

**Client Operations**:

- **Withdrawal**: Client A1 decreases the deposit amount from User
Account B1 or B2.
- **Deposit**: Client A1 increases the deposit amount in User Account B1 or B2.
- **Transfer**: Client A2 decreases the deposit amount from User
Account B1 and increases it in User Account B2. Or vice versa.

**Real-time Monitoring System D**: Obtains the latest data from
Request Topic C as quickly as possible to monitor transaction data and
changes in bank reserves in real-time. This is necessary for the
timely detection of anomalies and real-time decision-making.

**Business Processing System E**: Reads data from Request Topic C,
then actually operates User Accounts B1, B2.

**User Scenario**: Client A1 sends a large number of deposit and
withdrawal requests to Request Topic C. Client A2 writes a small
number of transfer requests to Request Topic C.

In this case, Business Processing System E needs a read-committed
isolation level to ensure operation consistency and Exactly Once
semantics. The real-time monitoring system does not care if a small
number of transfer requests are incomplete (dirty data). What it
cannot tolerate is a situation where a large number of deposit and
withdrawal requests cannot be presented in real time due to a small
number of transfer requests (the current situation is that uncommitted
transaction messages can block the reading of committed transaction
messages).

In this case, it is necessary to set different isolation levels for
different consumers/subscriptions.

Thanks,
Xiangying

On Tue, Sep 19, 2023 at 11:35 PM 杨国栋 <ya...@gmail.com> wrote:
>
> Hi Dave and Xiangying,
> Thanks for all your support.
>
> Let me add some background.
>
> Apache Paimon take message queue as External Log Systems and changelog of
> Paimon can also be consumed from message queue.
> By default, change-log of message queue in Paimon are visible to consumers
> only after a snapshot. Snapshot have a same life cycle as message queue
> transactions.
> However, users can immediately consume change-log by read uncommited
> message without waiting for the next snapshot.
> This behavior reduces the latency of changelog, but it relies on reading
> uncommited message in Kafka or other message queue.
> So we hope Pulsar can support Read Uncommitted isolation level.
>
> Put aside the application scenarios of Paimon. Let's discuss Read
> Uncommitted isolation level itself.
>
> Read Uncommitted isolation will bring certain security risks, but will also
> make the message immediately readable.
> Reading submitted data can ensure accuracy, and reading uncommitted data
> can ensure real-time performance (there may be some repeated message or
> dirty message).
> Real-time performance is what users need. How to handle dirty message
> should be considered by the application side.
>
> We can still get complete and accurate data from Read Committed isolation
> level.
>
> Sincerely yours.

Re: [DISSCUSS] PIP-298: Consumer supports specifying consumption isolation level

Posted by 杨国栋 <ya...@gmail.com>.
Hi Dave and Xiangying,
Thanks for all your support.

Let me add some background.

Apache Paimon take message queue as External Log Systems and changelog of
Paimon can also be consumed from message queue.
By default, change-log of message queue in Paimon are visible to consumers
only after a snapshot. Snapshot have a same life cycle as message queue
transactions.
However, users can immediately consume change-log by read uncommited
message without waiting for the next snapshot.
This behavior reduces the latency of changelog, but it relies on reading
uncommited message in Kafka or other message queue.
So we hope Pulsar can support Read Uncommitted isolation level.

Put aside the application scenarios of Paimon. Let's discuss Read
Uncommitted isolation level itself.

Read Uncommitted isolation will bring certain security risks, but will also
make the message immediately readable.
Reading submitted data can ensure accuracy, and reading uncommitted data
can ensure real-time performance (there may be some repeated message or
dirty message).
Real-time performance is what users need. How to handle dirty message
should be considered by the application side.

We can still get complete and accurate data from Read Committed isolation
level.

Sincerely yours.

Re: [DISSCUSS] PIP-298: Consumer supports specifying consumption isolation level

Posted by Dave Fisher <wa...@comcast.net>.
Let me try a different approach. Please see the definition of a Pulsar Transaction - https://pulsar.apache.org/docs/3.1.x/transactions/

If messages that are uncommitted are consumed that definition is no longer true. If breaking the definition is going to be allowed to a consumer then the producer who may abort the transaction ought to have given permission.

You never know what content is in that transaction and it might be aborted at a user’s choice for any reason.

If Kafka follows a different policy with transactions then perhaps you should look into Kafka on Pulsar / Starlight for Kafka protocol handlers.

If we want Pulsar to allow for Kafka style then let’s be clear about the implications along with expectations.

Best,
Dave

Sent from my iPhone

> On Sep 18, 2023, at 6:41 PM, Xiangying Meng <xi...@apache.org> wrote:
> 
> Hi Dave,
> 
> I greatly appreciate your perspective, yet it leaves me with some
> uncertainties that I am eager to address. Why would the introduction
> of isolation levels constitute an insecure action?
> 
>> I think if this proceeds then the scope needs to be expanded to producers/admins needing to proactively allow transactions to be consumed uncommitted.
> 
> We are merely presenting an option to the users. The notion of
> establishing isolation levels for producers and administrators is, in
> my view, devoid of necessity, and I am not inclined to implement it,
> for it is devoid of substance.
> 
> Sincerely,
> Xiangying
> 
>> On Mon, Sep 18, 2023 at 10:58 PM Dave Fisher <wa...@comcast.net> wrote:
>> 
>> Thanks. So, this is to support exfiltration of uncommitted transaction data? This is IMO wrong and a security risk.
>> 
>> Pulsar already supports CDC through IO Connectors.
>> 
>> Kafka can be wrong about these isolation levels.
>> 
>> There is really no information in those Paimon issues. How is Paimon’s ability to support Pulsar broken by this edge case?
>> 
>> Best,
>> Dave
>> 
>> Sent from my iPhone
>> 
>>>> On Sep 18, 2023, at 7:26 AM, Xiangying Meng <xi...@apache.org> wrote:
>>> 
>>> Hi Dave,
>>> This is an external request. Paimon has added support for Kafka but
>>> has not yet incorporated support for Pulsar. Therefore, the Paimon
>>> community desires to integrate Pulsar.
>>> 
>>> Furthermore, when integrating Pulsar into Paimon, it is desired to
>>> enable the ability to configure isolation levels, similar to Kafka, to
>>> support reading uncommitted transaction logs.
>>> 
>>> Additional context can be found in the following link:
>>> https://github.com/apache/incubator-paimon/issues/765
>>> 
>>> Sincerely,
>>> Xiangying
>>> 
>>>> On Mon, Sep 18, 2023 at 10:30 AM Dave Fisher <wa...@comcast.net> wrote:
>>>> 
>>>> My concern is that this pip allows consumers to change the processing rules for transactions in ways that a producer might find unexpected.
>>>> 
>>>> I think if this proceeds then the scope needs to be expanded to producers/admins needing to proactively allow transactions to be consumed uncommitted.
>>>> 
>>>> I am also interested in the use case that motivates this change.
>>>> 
>>>> Best,
>>>> Dave
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>>>> On Sep 17, 2023, at 8:50 AM, hzh0425 <hz...@apache.org> wrote:
>>>>> 
>>>>> Hi, all
>>>>> 
>>>>> This PR contributed to pip 298: https://github.com/apache/pulsar/pull/21114
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> This pip is to implement Read Committed and Read Uncommitted isolation levels for Pulsar transactions, allow consumers to configure isolation levels during the building process.
>>>>> 
>>>>> For more details, please refer to pip-298.md
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> I hope everyone can help review and discuss this pip and enter the discuss stage.
>>>>> 
>>>>> Thanks
>>>>> Zhangheng Huang
>>>> 
>> 

Re: [DISSCUSS] PIP-298: Consumer supports specifying consumption isolation level

Posted by Xiangying Meng <xi...@apache.org>.
Hi Dave,

I greatly appreciate your perspective, yet it leaves me with some
uncertainties that I am eager to address. Why would the introduction
of isolation levels constitute an insecure action?

>I think if this proceeds then the scope needs to be expanded to producers/admins needing to proactively allow transactions to be consumed uncommitted.

 We are merely presenting an option to the users. The notion of
establishing isolation levels for producers and administrators is, in
my view, devoid of necessity, and I am not inclined to implement it,
for it is devoid of substance.

Sincerely,
Xiangying

On Mon, Sep 18, 2023 at 10:58 PM Dave Fisher <wa...@comcast.net> wrote:
>
> Thanks. So, this is to support exfiltration of uncommitted transaction data? This is IMO wrong and a security risk.
>
> Pulsar already supports CDC through IO Connectors.
>
> Kafka can be wrong about these isolation levels.
>
> There is really no information in those Paimon issues. How is Paimon’s ability to support Pulsar broken by this edge case?
>
> Best,
> Dave
>
> Sent from my iPhone
>
> > On Sep 18, 2023, at 7:26 AM, Xiangying Meng <xi...@apache.org> wrote:
> >
> > Hi Dave,
> > This is an external request. Paimon has added support for Kafka but
> > has not yet incorporated support for Pulsar. Therefore, the Paimon
> > community desires to integrate Pulsar.
> >
> > Furthermore, when integrating Pulsar into Paimon, it is desired to
> > enable the ability to configure isolation levels, similar to Kafka, to
> > support reading uncommitted transaction logs.
> >
> > Additional context can be found in the following link:
> > https://github.com/apache/incubator-paimon/issues/765
> >
> > Sincerely,
> > Xiangying
> >
> >> On Mon, Sep 18, 2023 at 10:30 AM Dave Fisher <wa...@comcast.net> wrote:
> >>
> >> My concern is that this pip allows consumers to change the processing rules for transactions in ways that a producer might find unexpected.
> >>
> >> I think if this proceeds then the scope needs to be expanded to producers/admins needing to proactively allow transactions to be consumed uncommitted.
> >>
> >> I am also interested in the use case that motivates this change.
> >>
> >> Best,
> >> Dave
> >>
> >> Sent from my iPhone
> >>
> >>>> On Sep 17, 2023, at 8:50 AM, hzh0425 <hz...@apache.org> wrote:
> >>>
> >>> Hi, all
> >>>
> >>> This PR contributed to pip 298: https://github.com/apache/pulsar/pull/21114
> >>>
> >>>
> >>>
> >>>
> >>> This pip is to implement Read Committed and Read Uncommitted isolation levels for Pulsar transactions, allow consumers to configure isolation levels during the building process.
> >>>
> >>> For more details, please refer to pip-298.md
> >>>
> >>>
> >>>
> >>>
> >>> I hope everyone can help review and discuss this pip and enter the discuss stage.
> >>>
> >>> Thanks
> >>> Zhangheng Huang
> >>
>

Re: [DISSCUSS] PIP-298: Consumer supports specifying consumption isolation level

Posted by Dave Fisher <wa...@comcast.net>.
Thanks. So, this is to support exfiltration of uncommitted transaction data? This is IMO wrong and a security risk.

Pulsar already supports CDC through IO Connectors.

Kafka can be wrong about these isolation levels.

There is really no information in those Paimon issues. How is Paimon’s ability to support Pulsar broken by this edge case?

Best,
Dave

Sent from my iPhone

> On Sep 18, 2023, at 7:26 AM, Xiangying Meng <xi...@apache.org> wrote:
> 
> Hi Dave,
> This is an external request. Paimon has added support for Kafka but
> has not yet incorporated support for Pulsar. Therefore, the Paimon
> community desires to integrate Pulsar.
> 
> Furthermore, when integrating Pulsar into Paimon, it is desired to
> enable the ability to configure isolation levels, similar to Kafka, to
> support reading uncommitted transaction logs.
> 
> Additional context can be found in the following link:
> https://github.com/apache/incubator-paimon/issues/765
> 
> Sincerely,
> Xiangying
> 
>> On Mon, Sep 18, 2023 at 10:30 AM Dave Fisher <wa...@comcast.net> wrote:
>> 
>> My concern is that this pip allows consumers to change the processing rules for transactions in ways that a producer might find unexpected.
>> 
>> I think if this proceeds then the scope needs to be expanded to producers/admins needing to proactively allow transactions to be consumed uncommitted.
>> 
>> I am also interested in the use case that motivates this change.
>> 
>> Best,
>> Dave
>> 
>> Sent from my iPhone
>> 
>>>> On Sep 17, 2023, at 8:50 AM, hzh0425 <hz...@apache.org> wrote:
>>> 
>>> Hi, all
>>> 
>>> This PR contributed to pip 298: https://github.com/apache/pulsar/pull/21114
>>> 
>>> 
>>> 
>>> 
>>> This pip is to implement Read Committed and Read Uncommitted isolation levels for Pulsar transactions, allow consumers to configure isolation levels during the building process.
>>> 
>>> For more details, please refer to pip-298.md
>>> 
>>> 
>>> 
>>> 
>>> I hope everyone can help review and discuss this pip and enter the discuss stage.
>>> 
>>> Thanks
>>> Zhangheng Huang
>>