You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/04/26 13:04:44 UTC

[GitHub] [pulsar] liangyepianzhou opened a new issue, #15334: [PIP]Reuse transaction buffer reader

liangyepianzhou opened a new issue, #15334:
URL: https://github.com/apache/pulsar/issues/15334

   ## Motivation
   Now, a namespace will have a system topic that stores the TransactionBuffer snapshot of all topics under the namespace. When the broker  recover, all topic transaction buffers in this namespace will create a reader to read the snapshot in the system topic. A large number of read op will reach to bookkeeper. The read operations performed by all topics under this namespace are the same. So let all topics under a namespace share one reader to reduce the pressure to read bookkeeper.
   
   ## Goal
   By adding a `map<String, List> snapshotBuffer `to the `TransactionBufferSnapshotReader` to cache snapshots, all topic transaction buffers in a namespace can share a `TransactionBufferSnapshotReader`. To reduce the read pressure on the bookkeeper when the broker recovers.
   Next, we will briefly introduce the current implementation scheme and the replacement scheme we plan to adopt.
   ### Existing implementation
   As shown in the figure below, in the current implementation, each topic has a `TransactionBuffer`, and each `TransactionBuffer` will create a `transactionBufferReader` and  a manager ledger reader when recovering. Each topic will read the snapshot in `TransactionBufferSystemTopic` from the beginning position to last position.
   Such an implementation will create a large number of readers for repeated reading. It will not only cause a waste of memory but also cause huge reading pressure to bookkeeper.
   <img width="1176" alt="image" src="https://user-images.githubusercontent.com/55571188/165258137-f4ec1c97-8d2c-4d0e-9e68-93ede08404f4.png">
   ### Alternative
   As shown in the figure below, we will add a map to the `TransactionBufferSnapshotReader` to cache the data read by the reader from the bookkeeper. Use the name of the topic as the key of the map, and store the ordered list of read snapshots as the value.
   TransactionBuffers of all topics under a namespace share a `TransactionBufferSnapshotReader` and manager ledger reader.
   <img width="1450" alt="image" src="https://user-images.githubusercontent.com/55571188/165263555-2985cd28-bacf-4b54-9528-b33a538614bf.png">
   ### Process flow
   1. When the `TransactionBufferSnapshotReader` reads data, it will first determine whether there are unprocessed snapshots of this topic in the cache map.
   2. Returns the oldest unprocessed snapshot to the caller, if one exists. and move it out of the cache map.
   3. If not, go to the `TransactionBufferSystemTopic` to read snapshot.
   4. If the read snapshot is not of this topic, then it will be added to the cache map and continue to read.
   5. If the read snapshot is of this topic, it is returned directly to the caller.
   ### Alternative Feasibility Analysis
   ####  From the perspective of memory usage
   * Added a map to store snapshots that have not been processed. But considering that the processing speed of cpu is much greater than the reading speed of bookkeeper, the map will not be very large and occupy limited memory space.
   * Reduced the number of `TransactionBufferSnapshotReaders` and the number of manager ledger Readers
   So in general, the memory usage in the new scheme increases slightly, but it can be ignored.
   #### From an IO perspective
   * If it is read from the start position of the `transactionBufferSystemTopic` to the end position, it is recorded as x times of IO. The number of namespaces is m, and the number of topics under one namespace is n.
   * Then the times of IO has changed from `m* n * x` to `m*x`. dropped an order of magnitude
   Therefore, the IO pressure in the new scheme will be greatly reduced.
   #### From the perspective of recovery performance
   * Because the processing speed of Cpu is much greater than the speed of reading data from bookkeeper. The time for recover to complete depends on the time taken by bookkeeper to read the snapshot.
   * When the amount of IO concurrency is large, the reading speed is lower than that when the amount of IO concurrency is small. So the new implementation can read the snapshot data in bookkeeper faster.
   * The speed of reading data from the cache is faster than reading data from the bookkeeper, which can reduce the waiting time of the thread.
   Therefore, the recovery speed of the transaction buffer in the new scheme will be faster and the performance will be better.
   #### From the perspective of modification of code
   Only the implementation in `TransactionBufferReader` and the logic for creating and using `TransactionBufferReader` need to be modified.
   Code modifications are small.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] github-actions[bot] commented on issue #15334: [PIP-159]Reuse transaction buffer reader

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #15334:
URL: https://github.com/apache/pulsar/issues/15334#issuecomment-1140610357

   The issue had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] liangyepianzhou closed issue #15334: [PIP-159]Reuse transaction buffer reader

Posted by "liangyepianzhou (via GitHub)" <gi...@apache.org>.
liangyepianzhou closed issue #15334: [PIP-159]Reuse transaction buffer reader
URL: https://github.com/apache/pulsar/issues/15334


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org