You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/09/22 13:52:38 UTC

[GitHub] [pulsar] Huanli-Meng commented on a change in pull request #12085: [website][upgrade] feat: docs migration - transactions

Huanli-Meng commented on a change in pull request #12085:
URL: https://github.com/apache/pulsar/pull/12085#discussion_r713938784



##########
File path: site2/website-next/docs/txn-how.md
##########
@@ -0,0 +1,154 @@
+---
+id: txn-how
+title: How transactions work?
+sidebar_label: How transactions work?
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+This section describes transaction components and how the components work together. For the complete design details, see [PIP-31: Transactional Streaming](https://docs.google.com/document/d/145VYp09JKTw9jAT-7yNyFU255FptB2_B2Fye100ZXDI/edit#heading=h.bm5ainqxosrx).
+
+## Key concept
+
+It is important to know the following key concepts, which is a prerequisite for understanding how transactions work.
+
+### Transaction coordinator
+
+The transaction coordinator (TC) is a module running inside a Pulsar broker. 
+
+* It maintains the entire life cycle of transactions and prevents a transaction from getting into an incorrect status. 
+  
+* It handles transaction timeout, and ensures that the transaction is aborted after a transaction timeout.
+
+### Transaction log
+
+All the transaction metadata persists in the transaction log. The transaction log is backed by a Pulsar topic. If the transaction coordinator crashes, it can restore the transaction metadata from the transaction log.
+
+The transaction log stores the transaction status rather than actual messages in the transaction (the actual messages are stored in the actual topic partitions). 
+
+### Transaction buffer
+
+Messages produced to a topic partition within a transaction are stored in the transaction buffer (TB) of that topic partition. The messages in the transaction buffer are not visible to consumers until the transactions are committed. The messages in the transaction buffer are discarded when the transactions are aborted. 
+
+Transaction buffer stores all ongoing and aborted transactions in memory. All messages are sent to the actual partitioned Pulsar topics.  After transactions are committed, the messages in the transaction buffer are materialized (visible) to consumers. When the transactions are aborted, the messages in the transaction buffer are discarded.
+
+### Transaction ID
+
+Transaction ID (TxnID) identifies a unique transaction in Pulsar. The transaction ID is 128-bit. The highest 16 bits are reserved for the ID of the transaction coordinator, and the remaining bits are used for monotonically increasing numbers in each transaction coordinator. It is easy to locate the transaction crash with the TxnID.

Review comment:
       ```suggestion
   The transaction ID (TxnID) identifies a unique transaction in Pulsar. The transaction ID is 128-bit. The highest 16 bits are reserved for the transaction coordinator ID, and the remaining bits are used for monotonically increasing numbers in each transaction coordinator. It is easy to locate the transaction crash with the TxnID.
   ```
   One more comment, for expressions such as "transaction coordinator (TC)"
   First I think the first letter of each word should use the uppercase letter.
   Second, since you give an abbreviation of the word phrase, then you should use TC in the rest of the docs.

##########
File path: site2/website-next/docs/txn-how.md
##########
@@ -0,0 +1,154 @@
+---
+id: txn-how
+title: How transactions work?
+sidebar_label: How transactions work?
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+This section describes transaction components and how the components work together. For the complete design details, see [PIP-31: Transactional Streaming](https://docs.google.com/document/d/145VYp09JKTw9jAT-7yNyFU255FptB2_B2Fye100ZXDI/edit#heading=h.bm5ainqxosrx).
+
+## Key concept
+
+It is important to know the following key concepts, which is a prerequisite for understanding how transactions work.
+
+### Transaction coordinator
+
+The transaction coordinator (TC) is a module running inside a Pulsar broker. 
+
+* It maintains the entire life cycle of transactions and prevents a transaction from getting into an incorrect status. 
+  
+* It handles transaction timeout, and ensures that the transaction is aborted after a transaction timeout.
+
+### Transaction log
+
+All the transaction metadata persists in the transaction log. The transaction log is backed by a Pulsar topic. If the transaction coordinator crashes, it can restore the transaction metadata from the transaction log.
+
+The transaction log stores the transaction status rather than actual messages in the transaction (the actual messages are stored in the actual topic partitions). 
+
+### Transaction buffer
+
+Messages produced to a topic partition within a transaction are stored in the transaction buffer (TB) of that topic partition. The messages in the transaction buffer are not visible to consumers until the transactions are committed. The messages in the transaction buffer are discarded when the transactions are aborted. 
+
+Transaction buffer stores all ongoing and aborted transactions in memory. All messages are sent to the actual partitioned Pulsar topics.  After transactions are committed, the messages in the transaction buffer are materialized (visible) to consumers. When the transactions are aborted, the messages in the transaction buffer are discarded.

Review comment:
       ```suggestion
   The transaction buffer stores all ongoing and aborted transactions in memory. All messages are sent to the actual partitioned Pulsar topics. After transactions are committed, the messages in the transaction buffer are materialized (visible) to consumers. When the transactions are aborted, the messages in the transaction buffer are discarded.
   ```
   BTW, I think the following sentences have the same meaning with sentences in the previous paragraph.
   "After transactions are committed, the messages in the transaction buffer are materialized (visible) to consumers. When the transactions are aborted, the messages in the transaction buffer are discarded."

##########
File path: site2/website-next/docs/txn-how.md
##########
@@ -0,0 +1,154 @@
+---
+id: txn-how
+title: How transactions work?
+sidebar_label: How transactions work?
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+This section describes transaction components and how the components work together. For the complete design details, see [PIP-31: Transactional Streaming](https://docs.google.com/document/d/145VYp09JKTw9jAT-7yNyFU255FptB2_B2Fye100ZXDI/edit#heading=h.bm5ainqxosrx).
+
+## Key concept
+
+It is important to know the following key concepts, which is a prerequisite for understanding how transactions work.
+
+### Transaction coordinator
+
+The transaction coordinator (TC) is a module running inside a Pulsar broker. 
+
+* It maintains the entire life cycle of transactions and prevents a transaction from getting into an incorrect status. 
+  
+* It handles transaction timeout, and ensures that the transaction is aborted after a transaction timeout.
+
+### Transaction log
+
+All the transaction metadata persists in the transaction log. The transaction log is backed by a Pulsar topic. If the transaction coordinator crashes, it can restore the transaction metadata from the transaction log.
+
+The transaction log stores the transaction status rather than actual messages in the transaction (the actual messages are stored in the actual topic partitions). 
+
+### Transaction buffer
+
+Messages produced to a topic partition within a transaction are stored in the transaction buffer (TB) of that topic partition. The messages in the transaction buffer are not visible to consumers until the transactions are committed. The messages in the transaction buffer are discarded when the transactions are aborted. 
+
+Transaction buffer stores all ongoing and aborted transactions in memory. All messages are sent to the actual partitioned Pulsar topics.  After transactions are committed, the messages in the transaction buffer are materialized (visible) to consumers. When the transactions are aborted, the messages in the transaction buffer are discarded.
+
+### Transaction ID
+
+Transaction ID (TxnID) identifies a unique transaction in Pulsar. The transaction ID is 128-bit. The highest 16 bits are reserved for the ID of the transaction coordinator, and the remaining bits are used for monotonically increasing numbers in each transaction coordinator. It is easy to locate the transaction crash with the TxnID.
+
+### Pending acknowledge state
+
+Pending acknowledge state maintains message acknowledgments within a transaction before a transaction completes. If a message is in the pending acknowledge state, the message cannot be acknowledged by other transactions until the message is removed from the pending acknowledge state.
+
+The pending acknowledge state is persisted to the pending acknowledge log (cursor ledger). A new broker can restore the state from the pending acknowledge log to ensure the acknowledgement is not lost.    
+
+## Data flow
+
+At a high level, the data flow can be split into several steps:
+
+1. Begin a transaction.
+   
+2. Publish messages with a transaction.
+   
+3. Acknowledge messages with a transaction.
+   
+4. End a transaction.
+
+To help you debug or tune the transaction for better performance, review the following diagrams and descriptions. 
+
+### 1. Begin a transaction
+
+Before introducing the transaction in Pulsar, a producer is created and then messages are sent to brokers and stored in data logs. 
+
+![](/assets/txn-3.png)
+
+Let’s walk through the steps for _beginning a transaction_.
+
+| Step  |  Description  | 

Review comment:
       1. Could we use the orderlist instead of putting the steps in a table? same comment for the following sections.
   2. Update "transaction ID" to TxnID, as you have mentioned it before. --Check through the whole doc and update it accordingly.
   

##########
File path: site2/website-next/docs/txn-how.md
##########
@@ -0,0 +1,154 @@
+---
+id: txn-how
+title: How transactions work?
+sidebar_label: How transactions work?
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+This section describes transaction components and how the components work together. For the complete design details, see [PIP-31: Transactional Streaming](https://docs.google.com/document/d/145VYp09JKTw9jAT-7yNyFU255FptB2_B2Fye100ZXDI/edit#heading=h.bm5ainqxosrx).
+
+## Key concept
+
+It is important to know the following key concepts, which is a prerequisite for understanding how transactions work.
+
+### Transaction coordinator
+
+The transaction coordinator (TC) is a module running inside a Pulsar broker. 
+
+* It maintains the entire life cycle of transactions and prevents a transaction from getting into an incorrect status. 
+  
+* It handles transaction timeout, and ensures that the transaction is aborted after a transaction timeout.
+
+### Transaction log
+
+All the transaction metadata persists in the transaction log. The transaction log is backed by a Pulsar topic. If the transaction coordinator crashes, it can restore the transaction metadata from the transaction log.
+
+The transaction log stores the transaction status rather than actual messages in the transaction (the actual messages are stored in the actual topic partitions). 
+
+### Transaction buffer
+
+Messages produced to a topic partition within a transaction are stored in the transaction buffer (TB) of that topic partition. The messages in the transaction buffer are not visible to consumers until the transactions are committed. The messages in the transaction buffer are discarded when the transactions are aborted. 
+
+Transaction buffer stores all ongoing and aborted transactions in memory. All messages are sent to the actual partitioned Pulsar topics.  After transactions are committed, the messages in the transaction buffer are materialized (visible) to consumers. When the transactions are aborted, the messages in the transaction buffer are discarded.
+
+### Transaction ID
+
+Transaction ID (TxnID) identifies a unique transaction in Pulsar. The transaction ID is 128-bit. The highest 16 bits are reserved for the ID of the transaction coordinator, and the remaining bits are used for monotonically increasing numbers in each transaction coordinator. It is easy to locate the transaction crash with the TxnID.
+
+### Pending acknowledge state
+
+Pending acknowledge state maintains message acknowledgments within a transaction before a transaction completes. If a message is in the pending acknowledge state, the message cannot be acknowledged by other transactions until the message is removed from the pending acknowledge state.
+
+The pending acknowledge state is persisted to the pending acknowledge log (cursor ledger). A new broker can restore the state from the pending acknowledge log to ensure the acknowledgement is not lost.    
+
+## Data flow
+
+At a high level, the data flow can be split into several steps:
+
+1. Begin a transaction.

Review comment:
       Start?

##########
File path: site2/website-next/sidebars.json
##########
@@ -81,6 +85,17 @@
         "tiered-storage-azure",
         "tiered-storage-aliyun"
       ]
+    },
+    {
+      "type": "category",
+      "label": "Transactions",
+      "items": [
+        "txn-why",

Review comment:
       I think it might be more appropriate to put What are transactions as the first one, and then why transactions,.....

##########
File path: site2/website-next/docs/txn-how.md
##########
@@ -0,0 +1,154 @@
+---
+id: txn-how
+title: How transactions work?
+sidebar_label: How transactions work?
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+This section describes transaction components and how the components work together. For the complete design details, see [PIP-31: Transactional Streaming](https://docs.google.com/document/d/145VYp09JKTw9jAT-7yNyFU255FptB2_B2Fye100ZXDI/edit#heading=h.bm5ainqxosrx).
+
+## Key concept
+
+It is important to know the following key concepts, which is a prerequisite for understanding how transactions work.
+
+### Transaction coordinator
+
+The transaction coordinator (TC) is a module running inside a Pulsar broker. 
+
+* It maintains the entire life cycle of transactions and prevents a transaction from getting into an incorrect status. 
+  
+* It handles transaction timeout, and ensures that the transaction is aborted after a transaction timeout.
+
+### Transaction log
+
+All the transaction metadata persists in the transaction log. The transaction log is backed by a Pulsar topic. If the transaction coordinator crashes, it can restore the transaction metadata from the transaction log.
+
+The transaction log stores the transaction status rather than actual messages in the transaction (the actual messages are stored in the actual topic partitions). 
+
+### Transaction buffer
+
+Messages produced to a topic partition within a transaction are stored in the transaction buffer (TB) of that topic partition. The messages in the transaction buffer are not visible to consumers until the transactions are committed. The messages in the transaction buffer are discarded when the transactions are aborted. 
+
+Transaction buffer stores all ongoing and aborted transactions in memory. All messages are sent to the actual partitioned Pulsar topics.  After transactions are committed, the messages in the transaction buffer are materialized (visible) to consumers. When the transactions are aborted, the messages in the transaction buffer are discarded.
+
+### Transaction ID
+
+Transaction ID (TxnID) identifies a unique transaction in Pulsar. The transaction ID is 128-bit. The highest 16 bits are reserved for the ID of the transaction coordinator, and the remaining bits are used for monotonically increasing numbers in each transaction coordinator. It is easy to locate the transaction crash with the TxnID.
+
+### Pending acknowledge state
+
+Pending acknowledge state maintains message acknowledgments within a transaction before a transaction completes. If a message is in the pending acknowledge state, the message cannot be acknowledged by other transactions until the message is removed from the pending acknowledge state.
+
+The pending acknowledge state is persisted to the pending acknowledge log (cursor ledger). A new broker can restore the state from the pending acknowledge log to ensure the acknowledgement is not lost.    

Review comment:
       pending acknowledgment log?

##########
File path: site2/website-next/docs/txn-use.md
##########
@@ -0,0 +1,96 @@
+---
+id: txn-use
+title: How to use transactions?
+sidebar_label: How to use transactions?
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+## Transaction API
+
+The transaction feature is primarily a server-side and protocol-level feature. You can use the transaction feature via the [transaction API](https://pulsar.apache.org/api/admin/), which is available in **Pulsar 2.8.0 or later**. 

Review comment:
       ```suggestion
   The transaction feature is primarily a server-side and protocol-level feature. You can use the transaction feature via the [transaction API](https://pulsar.apache.org/api/admin/), which is available in **Pulsar 2.8.0 or higher**. 
   ```
   One more comment, as I know, In pulsar 2.8.0, transaction is not stable feature. Please confirm with eng whether the Pulsar version should be changed.

##########
File path: site2/website-next/docs/txn-use.md
##########
@@ -0,0 +1,96 @@
+---
+id: txn-use
+title: How to use transactions?
+sidebar_label: How to use transactions?
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+## Transaction API
+
+The transaction feature is primarily a server-side and protocol-level feature. You can use the transaction feature via the [transaction API](https://pulsar.apache.org/api/admin/), which is available in **Pulsar 2.8.0 or later**. 
+
+To use the transaction API, you do not need any additional settings in the Pulsar client. **By default**, transactions is **disabled**. 
+
+Currently, transaction API is only available for **Java** clients. Support for other language clients will be added in the future releases.
+
+## Quick start
+
+This section provides an example of how to use the transaction API to send and receive messages in a Java client. 
+
+1. Start Pulsar 2.8.0 or later. 
+
+2. Enable transaction. 
+
+    Change the configuration in the `broker.conf` file.

Review comment:
       can we choose to change config in `standalone.conf` file? I noticed for batch messages, we can either update transaction configs in `broker.conf` or `standalone.conf` file

##########
File path: site2/website-next/docs/txn-what.md
##########
@@ -0,0 +1,63 @@
+---
+id: txn-what
+title: What are transactions?
+sidebar_label: What are transactions?
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+Transactions strengthen the message delivery semantics of Apache Pulsar and [processing guarantees of Pulsar Functions](https://pulsar.apache.org/docs/en/next/functions-overview/#processing-guarantees). The Pulsar Transaction API supports atomic writes and acknowledgments across multiple topics. 
+
+Transactions allow:
+
+- A producer to send a batch of messages to multiple topics where all messages in the batch are eventually visible to any consumer, or none are ever visible to consumers. 
+
+- End-to-end exactly-once semantics (execute a `consume-process-produce` operation exactly once).
+
+## Transaction semantics
+
+Pulsar transactions have the following semantics: 
+
+* All operations within a transaction are committed as a single unit.
+   
+  * Either all messages are committed, or none of them are. 
+
+  * Each message is written or processed exactly once, without data loss or duplicates (even in the event of failures). 
+
+  * If a transaction is aborted, all the writes and acknowledgments in this transaction rollback.

Review comment:
       ```suggestion
     * If a transaction is aborted, all the writes and acknowledgments in this transaction roll back.
   ```

##########
File path: site2/website-next/versioned_docs/version-2.7.3/transactions-api.md
##########
@@ -0,0 +1,155 @@
+---
+id: transactions-api
+title: Transactions API (Developer Preview)
+sidebar_label: Transactions API
+original_id: transactions-api
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+All messages in a transaction is available only to consumers after the transaction is committed. If a transaction is aborted, all the writes and acknowledgments in this transaction rollback. 
+
+Currently, Pulsar transaction is a developer preview feature. It is disabled by default. You can enable the feature and use transactions in your application in development environment.
+
+## Prerequisites
+1. To enable transactions in Pulsar, you need to configure the parameter in the `broker.conf` file.
+
+```

Review comment:
       The code should be indented. Same comment for the following step.

##########
File path: site2/website-next/docs/txn-what.md
##########
@@ -0,0 +1,63 @@
+---
+id: txn-what
+title: What are transactions?
+sidebar_label: What are transactions?
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+Transactions strengthen the message delivery semantics of Apache Pulsar and [processing guarantees of Pulsar Functions](https://pulsar.apache.org/docs/en/next/functions-overview/#processing-guarantees). The Pulsar Transaction API supports atomic writes and acknowledgments across multiple topics. 
+
+Transactions allow:
+
+- A producer to send a batch of messages to multiple topics where all messages in the batch are eventually visible to any consumer, or none are ever visible to consumers. 
+
+- End-to-end exactly-once semantics (execute a `consume-process-produce` operation exactly once).
+
+## Transaction semantics
+
+Pulsar transactions have the following semantics: 
+
+* All operations within a transaction are committed as a single unit.
+   
+  * Either all messages are committed, or none of them are. 
+
+  * Each message is written or processed exactly once, without data loss or duplicates (even in the event of failures). 
+
+  * If a transaction is aborted, all the writes and acknowledgments in this transaction rollback.
+  
+* A group of messages in a transaction can be received from, produced to, and acknowledged by multiple partitions.
+  
+  * Consumers are only allowed to read committed (acked) messages. In other words, the broker does not deliver transactional messages which are part of an open transaction or messages which are part of an aborted transaction.
+    
+  * Message writes across multiple partitions are atomic.
+    
+  * Message acks across multiple subscriptions are atomic. A message is acked successfully only once by a consumer under the subscription when acknowledging the message with the transaction ID.
+
+## Transactions and stream processing
+
+Stream processing on Pulsar is a `consume-process-produce` operation on Pulsar topics:
+
+* `Consume`: a source operator that runs a Pulsar consumer reads messages from one or multiple Pulsar topics.
+  
+* `Process`: a processing operator transforms the messages. 
+  
+* `Produce`: a sink operator that runs a Pulsar producer writes the resulting messages to one or multiple Pulsar topics.
+
+![](/assets/txn-2.png)
+
+Pulsar transactions support end-to-end exactly-once stream processing, which means messages are not lost from a source operator and messages are not duplicated to a sink operator.
+
+## Use case
+
+Prior to Pulsar 2.8.0, there was no easy way to build stream processing applications with Pulsar to achieve exactly-once processing guarantees. With the transaction introduced in Pulsar 2.8.0, the following services support exactly-once semantics:
+
+* [Pulsar Flink connector](https://flink.apache.org/2021/01/07/pulsar-flink-connector-270.html)
+
+    Prior to Pulsar 2.8.0, if you want to build stream applications using Pulsar and Flink, the Pulsar Flink connector only supported exactly-once source connector and at-least-once sink connector, which means the highest processing guarantee for end-to-end was at-least-once, there was possibility that the resulting messages from streaming applications produce duplicated messages to the resulting topics in Pulsar.

Review comment:
       ```suggestion
       Prior to Pulsar 2.8.0, if you wanted to build stream applications using Pulsar and Flink, the Pulsar Flink connector only supported exactly-once source connector and at-least-once sink connector, which means the highest processing guarantee for end-to-end was at-least-once. There was possibility that the resulting messages from streaming applications produce duplicated messages to the resulting topics in Pulsar.
   ```

##########
File path: site2/website-next/docs/txn-use.md
##########
@@ -0,0 +1,96 @@
+---
+id: txn-use
+title: How to use transactions?
+sidebar_label: How to use transactions?
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+
+## Transaction API
+
+The transaction feature is primarily a server-side and protocol-level feature. You can use the transaction feature via the [transaction API](https://pulsar.apache.org/api/admin/), which is available in **Pulsar 2.8.0 or later**. 
+
+To use the transaction API, you do not need any additional settings in the Pulsar client. **By default**, transactions is **disabled**. 
+
+Currently, transaction API is only available for **Java** clients. Support for other language clients will be added in the future releases.
+
+## Quick start
+
+This section provides an example of how to use the transaction API to send and receive messages in a Java client. 
+
+1. Start Pulsar 2.8.0 or later. 
+
+2. Enable transaction. 
+
+    Change the configuration in the `broker.conf` file.
+
+    ```
+    transactionCoordinatorEnabled=true
+    ```
+
+    If you want to enable batch messages in transactions, follow the steps below.

Review comment:
       ```suggestion
       If you want to enable batch messages in transactions, set `acknowledgmentAtBatchIndexLevelEnabled` to `true` in the `broker.conf` or `standalone.conf` file.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org