You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/04/20 13:44:34 UTC

[GitHub] [pulsar] liangyepianzhou opened a new pull request, #15239: [doc][tiered storage] read data from filesystem

liangyepianzhou opened a new pull request, #15239:
URL: https://github.com/apache/pulsar/pull/15239

   ### Motivation & Modification
   The current Filesystem Offload documentation has no documentation on how to directly read the data after offload, so consider adding documentation for this section
   
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
     - *Added integration tests for end-to-end deployment with large payloads (10MB)*
     - *Extended integration test for recovery after broker failure*
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If `yes` was chosen, please highlight the changes*
   
     - Dependencies (does it add or upgrade a dependency): (yes / no)
     - The public API: (yes / no)
     - The schema: (yes / no / don't know)
     - The default values of configurations: (yes / no)
     - The wire protocol: (yes / no)
     - The rest endpoints: (yes / no)
     - The admin cli options: (yes / no)
     - Anything that affects deployment: (yes / no / don't know)
   
   ### Documentation
   
   Check the box below or label this PR directly.
   
   Need to update docs? 
   
   - [ ] `doc-required` 
   (Your PR needs to update docs and you will update later)
     
   - [ ] `no-need-doc` 
   (Please explain why)
     
   - [x] `doc` 
   (Your PR contains doc changes)
   
   - [ ] `doc-added`
   (Docs have been already added)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #15239: [doc][tiered storage] read data from filesystem

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #15239:
URL: https://github.com/apache/pulsar/pull/15239#discussion_r860550392


##########
site2/docs/tiered-storage-filesystem.md:
##########
@@ -520,4 +520,66 @@ Execute the following commands in the repository where you download Pulsar tarba
 
     And the **Capacity Used** is changed from 4 KB to 116.46 KB.
 
-    ![](assets/FileSystem-8.png)
\ No newline at end of file
+    ![](assets/FileSystem-8.png)
+
+## Read offloaded data from fileSystem
+
+This section provides detailed instructions about how to read data out as Ledger Entry in the file system.
+
+* The data was offloaded as `MapFile` to the following path:
+  ```properties
+    path = storageBasePath + "/" + managedLedgerName + "/" + ledgerId + "-" + uuid.toString();
+  ```
+    1. storageBasePath is the value of `hadoop.tmp.dir`, configured in `broker.conf` or `filesystem_offload_core_site.xml`
+    2. managedLedgerName is the name of the persistentTopic manager Ledger
+  ```shell
+     managedLedgerName of persistent://public/default/topics-name is public/default/persistent/topics-name.
+  ```
+  Can use the following method to get the managedLedgerName:
+  ```shell
+     String managedLedgerName = TopicName.get("persistent://public/default/topics-name").getPersistenceNamingEncoding(); 
+  ```
+
+* Create a reader to read `MapFile` according to the above path and the `configuration` of the file system
+  ```shell
+     MapFile.Reader reader = new MapFile.Reader(new Path(dataFilePath),  configuration); 
+  ```
+* Read data as `LedgerEntry` from FileSystem.

Review Comment:
   ```suggestion
   2. Read the data as `LedgerEntry` from the filesystem.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #15239: [doc][tiered storage] read data from filesystem

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #15239:
URL: https://github.com/apache/pulsar/pull/15239#discussion_r860550108


##########
site2/docs/tiered-storage-filesystem.md:
##########
@@ -520,4 +520,66 @@ Execute the following commands in the repository where you download Pulsar tarba
 
     And the **Capacity Used** is changed from 4 KB to 116.46 KB.
 
-    ![](assets/FileSystem-8.png)
\ No newline at end of file
+    ![](assets/FileSystem-8.png)
+
+## Read offloaded data from fileSystem
+
+This section provides detailed instructions about how to read data out as Ledger Entry in the file system.
+
+* The data was offloaded as `MapFile` to the following path:
+  ```properties
+    path = storageBasePath + "/" + managedLedgerName + "/" + ledgerId + "-" + uuid.toString();
+  ```
+    1. storageBasePath is the value of `hadoop.tmp.dir`, configured in `broker.conf` or `filesystem_offload_core_site.xml`
+    2. managedLedgerName is the name of the persistentTopic manager Ledger
+  ```shell
+     managedLedgerName of persistent://public/default/topics-name is public/default/persistent/topics-name.
+  ```
+  Can use the following method to get the managedLedgerName:
+  ```shell
+     String managedLedgerName = TopicName.get("persistent://public/default/topics-name").getPersistenceNamingEncoding(); 
+  ```
+
+* Create a reader to read `MapFile` according to the above path and the `configuration` of the file system

Review Comment:
   ```suggestion
   1. Create a reader to read both `MapFile` with a new path and the `configuration` of the filesystem.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Anonymitaet commented on pull request #15239: [doc][tiered storage] read data from filesystem

Posted by GitBox <gi...@apache.org>.
Anonymitaet commented on PR #15239:
URL: https://github.com/apache/pulsar/pull/15239#issuecomment-1104615062

   @momo-jun could you please help review this PR? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui commented on a diff in pull request #15239: [doc][tiered storage] read data from filesystem

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on code in PR #15239:
URL: https://github.com/apache/pulsar/pull/15239#discussion_r857405505


##########
site2/docs/tiered-storage-filesystem.md:
##########
@@ -520,4 +520,66 @@ Execute the following commands in the repository where you download Pulsar tarba
 
     And the **Capacity Used** is changed from 4 KB to 116.46 KB.
 
-    ![](assets/FileSystem-8.png)
\ No newline at end of file
+    ![](assets/FileSystem-8.png)
+
+## Read data from fileSystem

Review Comment:
   ```suggestion
   ## Read offloaded data from fileSystem
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #15239: [doc][tiered storage] read data from filesystem

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #15239:
URL: https://github.com/apache/pulsar/pull/15239#discussion_r860552485


##########
site2/docs/tiered-storage-filesystem.md:
##########
@@ -520,4 +520,66 @@ Execute the following commands in the repository where you download Pulsar tarba
 
     And the **Capacity Used** is changed from 4 KB to 116.46 KB.
 
-    ![](assets/FileSystem-8.png)
\ No newline at end of file
+    ![](assets/FileSystem-8.png)
+
+## Read offloaded data from fileSystem
+
+This section provides detailed instructions about how to read data out as Ledger Entry in the file system.
+
+* The data was offloaded as `MapFile` to the following path:
+  ```properties
+    path = storageBasePath + "/" + managedLedgerName + "/" + ledgerId + "-" + uuid.toString();
+  ```
+    1. storageBasePath is the value of `hadoop.tmp.dir`, configured in `broker.conf` or `filesystem_offload_core_site.xml`

Review Comment:
   ```suggestion
   * `storageBasePath` is the value of `hadoop.tmp.dir`, which is configured in `broker.conf` or `filesystem_offload_core_site.xml`.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #15239: [doc][tiered storage] read data from filesystem

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #15239:
URL: https://github.com/apache/pulsar/pull/15239#discussion_r860554269


##########
site2/docs/tiered-storage-filesystem.md:
##########
@@ -520,4 +520,66 @@ Execute the following commands in the repository where you download Pulsar tarba
 
     And the **Capacity Used** is changed from 4 KB to 116.46 KB.
 
-    ![](assets/FileSystem-8.png)
\ No newline at end of file
+    ![](assets/FileSystem-8.png)
+
+## Read offloaded data from fileSystem

Review Comment:
   ```suggestion
   ## Read offloaded data from filesystem
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #15239: [doc][tiered storage] read data from filesystem

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #15239:
URL: https://github.com/apache/pulsar/pull/15239#discussion_r860555604


##########
site2/docs/tiered-storage-filesystem.md:
##########
@@ -520,4 +520,66 @@ Execute the following commands in the repository where you download Pulsar tarba
 
     And the **Capacity Used** is changed from 4 KB to 116.46 KB.
 
-    ![](assets/FileSystem-8.png)
\ No newline at end of file
+    ![](assets/FileSystem-8.png)
+
+## Read offloaded data from fileSystem
+
+This section provides detailed instructions about how to read data out as Ledger Entry in the file system.

Review Comment:
   This sentence can be moved.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on pull request #15239: [doc][tiered storage] read data from filesystem

Posted by GitBox <gi...@apache.org>.
momo-jun commented on PR #15239:
URL: https://github.com/apache/pulsar/pull/15239#issuecomment-1111837698

   @liangyepianzhou I proposed some suggestions to make the instructions easier to read and understand by changing the ordered/unordered list plus other minor fixes. To provide a comprehensive understanding, you can preview my suggestions as follows. 
   
   The offloaded data is stored as `MapFile` in the following new path of the filesystem:
   …
     * `storageBasePath` is the value of `hadoop.tmp.dir`, which is configured in `broker.conf` or `filesystem_offload_core_site.xml`.
     * `managedLedgerName` is the ledger name of the persistentTopic manager. 
         ...
        You can use the following method to get `managedLedgerName`.
         …
   
   To read data out as ledger entries from the filesystem, complete the following steps.
   1. Create a reader to read both `MapFile`  with a new path and the `configuration` of the filesystem.
       ...
   2. Read the data as `LedgerEntry` from the filesystem.
       ...
   3. Deserialize the `ledgerEntry` to `Message`.
       ...
   
   Note that the capitalization of `LedgerEntry` in step2 and `ledgerEntry` in step3 is different, which may confuse users.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #15239: [doc][tiered storage] read data from filesystem

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #15239:
URL: https://github.com/apache/pulsar/pull/15239#discussion_r860549920


##########
site2/docs/tiered-storage-filesystem.md:
##########
@@ -520,4 +520,66 @@ Execute the following commands in the repository where you download Pulsar tarba
 
     And the **Capacity Used** is changed from 4 KB to 116.46 KB.
 
-    ![](assets/FileSystem-8.png)
\ No newline at end of file
+    ![](assets/FileSystem-8.png)
+
+## Read offloaded data from fileSystem
+
+This section provides detailed instructions about how to read data out as Ledger Entry in the file system.
+
+* The data was offloaded as `MapFile` to the following path:

Review Comment:
   ```suggestion
   The data is offloaded to the following new path as `MapFile`:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #15239: [doc][tiered storage] read data from filesystem

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #15239:
URL: https://github.com/apache/pulsar/pull/15239#discussion_r860553114


##########
site2/docs/tiered-storage-filesystem.md:
##########
@@ -520,4 +520,66 @@ Execute the following commands in the repository where you download Pulsar tarba
 
     And the **Capacity Used** is changed from 4 KB to 116.46 KB.
 
-    ![](assets/FileSystem-8.png)
\ No newline at end of file
+    ![](assets/FileSystem-8.png)
+
+## Read offloaded data from fileSystem
+
+This section provides detailed instructions about how to read data out as Ledger Entry in the file system.
+
+* The data was offloaded as `MapFile` to the following path:
+  ```properties
+    path = storageBasePath + "/" + managedLedgerName + "/" + ledgerId + "-" + uuid.toString();
+  ```
+    1. storageBasePath is the value of `hadoop.tmp.dir`, configured in `broker.conf` or `filesystem_offload_core_site.xml`
+    2. managedLedgerName is the name of the persistentTopic manager Ledger

Review Comment:
   * `managedLedgerName` is the ledger name of the persistentTopic manager.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #15239: [doc][tiered storage] read data from filesystem

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #15239:
URL: https://github.com/apache/pulsar/pull/15239#discussion_r860550108


##########
site2/docs/tiered-storage-filesystem.md:
##########
@@ -520,4 +520,66 @@ Execute the following commands in the repository where you download Pulsar tarba
 
     And the **Capacity Used** is changed from 4 KB to 116.46 KB.
 
-    ![](assets/FileSystem-8.png)
\ No newline at end of file
+    ![](assets/FileSystem-8.png)
+
+## Read offloaded data from fileSystem
+
+This section provides detailed instructions about how to read data out as Ledger Entry in the file system.
+
+* The data was offloaded as `MapFile` to the following path:
+  ```properties
+    path = storageBasePath + "/" + managedLedgerName + "/" + ledgerId + "-" + uuid.toString();
+  ```
+    1. storageBasePath is the value of `hadoop.tmp.dir`, configured in `broker.conf` or `filesystem_offload_core_site.xml`
+    2. managedLedgerName is the name of the persistentTopic manager Ledger
+  ```shell
+     managedLedgerName of persistent://public/default/topics-name is public/default/persistent/topics-name.
+  ```
+  Can use the following method to get the managedLedgerName:
+  ```shell
+     String managedLedgerName = TopicName.get("persistent://public/default/topics-name").getPersistenceNamingEncoding(); 
+  ```
+
+* Create a reader to read `MapFile` according to the above path and the `configuration` of the file system

Review Comment:
   ```suggestion
   To read data out as ledger entries from the filesystem, complete the following steps.
   1. Create a reader to read both `MapFile` with a new path and the `configuration` of the filesystem.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui commented on pull request #15239: [doc][tiered storage] read data from filesystem

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on PR #15239:
URL: https://github.com/apache/pulsar/pull/15239#issuecomment-1111668136

   @Anonymitaet Please help review this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] liangyepianzhou commented on pull request #15239: [doc][tiered storage] read data from filesystem

Posted by GitBox <gi...@apache.org>.
liangyepianzhou commented on PR #15239:
URL: https://github.com/apache/pulsar/pull/15239#issuecomment-1105105443

   @zymap Could you please give a review for this doc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #15239: [doc][tiered storage] read data from filesystem

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #15239:
URL: https://github.com/apache/pulsar/pull/15239#discussion_r860551436


##########
site2/docs/tiered-storage-filesystem.md:
##########
@@ -520,4 +520,66 @@ Execute the following commands in the repository where you download Pulsar tarba
 
     And the **Capacity Used** is changed from 4 KB to 116.46 KB.
 
-    ![](assets/FileSystem-8.png)
\ No newline at end of file
+    ![](assets/FileSystem-8.png)
+
+## Read offloaded data from fileSystem
+
+This section provides detailed instructions about how to read data out as Ledger Entry in the file system.
+
+* The data was offloaded as `MapFile` to the following path:
+  ```properties
+    path = storageBasePath + "/" + managedLedgerName + "/" + ledgerId + "-" + uuid.toString();
+  ```
+    1. storageBasePath is the value of `hadoop.tmp.dir`, configured in `broker.conf` or `filesystem_offload_core_site.xml`
+    2. managedLedgerName is the name of the persistentTopic manager Ledger
+  ```shell
+     managedLedgerName of persistent://public/default/topics-name is public/default/persistent/topics-name.
+  ```
+  Can use the following method to get the managedLedgerName:
+  ```shell
+     String managedLedgerName = TopicName.get("persistent://public/default/topics-name").getPersistenceNamingEncoding(); 
+  ```
+
+* Create a reader to read `MapFile` according to the above path and the `configuration` of the file system
+  ```shell
+     MapFile.Reader reader = new MapFile.Reader(new Path(dataFilePath),  configuration); 
+  ```
+* Read data as `LedgerEntry` from FileSystem.
+  ```java
+     LongWritable key = new LongWritable();
+     BytesWritable value = new BytesWritable();
+     key.set(nextExpectedId - 1);
+     reader.seek(key);
+     reader.next(key, value);
+     int length = value.getLength();
+     long entryId = key.get();
+     ByteBuf buf = PooledByteBufAllocator.DEFAULT.buffer(length, length);
+     buf.writeBytes(value.copyBytes());
+     LedgerEntryImpl ledgerEntry = LedgerEntryImpl.create(ledgerId, entryId, length, buf);
+  ```
+* Deserialize the `ledgerEntry` to `Message`.

Review Comment:
   ```suggestion
   3. Deserialize the `ledgerEntry` to `Message`.
   ```
   
   Which one is correct? `LedgerEntry` or `ledgerEntry`? The difference between step2 and step3 may make users confused.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] congbobo184 commented on a diff in pull request #15239: [doc][tiered storage] read data from filesystem

Posted by GitBox <gi...@apache.org>.
congbobo184 commented on code in PR #15239:
URL: https://github.com/apache/pulsar/pull/15239#discussion_r855693668


##########
site2/docs/tiered-storage-filesystem.md:
##########
@@ -520,4 +520,40 @@ Execute the following commands in the repository where you download Pulsar tarba
 
     And the **Capacity Used** is changed from 4 KB to 116.46 KB.
 
-    ![](assets/FileSystem-8.png)
\ No newline at end of file
+    ![](assets/FileSystem-8.png)
+
+## Read data from fileSystem
+
+This section provides detailed instructions about how to read data out as Ledger Entry in the file system.
+
+* The data was offloaded as `MapFile` to the following path:
+  ```properties
+    path = storageBasePath + "/" + managedLedgerName + "/" + ledgerId + "-" + uuid.toString();
+  ```
+    1. storageBasePath is the value of `hadoop.tmp.dir`, configured in `broker.conf` and `filesystem_offload_core_site.xml`

Review Comment:
   ```suggestion
       1. storageBasePath is the value of `hadoop.tmp.dir`, configured in `broker.conf` or `filesystem_offload_core_site.xml`
   ```



##########
site2/docs/tiered-storage-filesystem.md:
##########
@@ -520,4 +520,40 @@ Execute the following commands in the repository where you download Pulsar tarba
 
     And the **Capacity Used** is changed from 4 KB to 116.46 KB.
 
-    ![](assets/FileSystem-8.png)
\ No newline at end of file
+    ![](assets/FileSystem-8.png)
+
+## Read data from fileSystem
+
+This section provides detailed instructions about how to read data out as Ledger Entry in the file system.
+
+* The data was offloaded as `MapFile` to the following path:
+  ```properties
+    path = storageBasePath + "/" + managedLedgerName + "/" + ledgerId + "-" + uuid.toString();
+  ```
+    1. storageBasePath is the value of `hadoop.tmp.dir`, configured in `broker.conf` and `filesystem_offload_core_site.xml`
+    2. managedLedgerName is the name of the persistentTopic manager Ledger
+  ```shell
+     managedLedgerName of persistent://public/default/topics-name is  public/default/persistent/topics-name.
+  ```
+  Considering the iteration of versions,  you can use the following method to get the managedLedgerName:

Review Comment:
   ```suggestion
     Can use the following method to get the managedLedgerName:
   ```



##########
site2/docs/tiered-storage-filesystem.md:
##########
@@ -520,4 +520,40 @@ Execute the following commands in the repository where you download Pulsar tarba
 
     And the **Capacity Used** is changed from 4 KB to 116.46 KB.
 
-    ![](assets/FileSystem-8.png)
\ No newline at end of file
+    ![](assets/FileSystem-8.png)
+
+## Read data from fileSystem
+
+This section provides detailed instructions about how to read data out as Ledger Entry in the file system.
+
+* The data was offloaded as `MapFile` to the following path:
+  ```properties
+    path = storageBasePath + "/" + managedLedgerName + "/" + ledgerId + "-" + uuid.toString();
+  ```
+    1. storageBasePath is the value of `hadoop.tmp.dir`, configured in `broker.conf` and `filesystem_offload_core_site.xml`
+    2. managedLedgerName is the name of the persistentTopic manager Ledger
+  ```shell
+     managedLedgerName of persistent://public/default/topics-name is  public/default/persistent/topics-name.

Review Comment:
   ```suggestion
        managedLedgerName of persistent://public/default/topics-name is public/default/persistent/topics-name.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] liangyepianzhou commented on a diff in pull request #15239: [doc][tiered storage] read data from filesystem

Posted by GitBox <gi...@apache.org>.
liangyepianzhou commented on code in PR #15239:
URL: https://github.com/apache/pulsar/pull/15239#discussion_r855938656


##########
site2/docs/tiered-storage-filesystem.md:
##########
@@ -520,4 +520,40 @@ Execute the following commands in the repository where you download Pulsar tarba
 
     And the **Capacity Used** is changed from 4 KB to 116.46 KB.
 
-    ![](assets/FileSystem-8.png)
\ No newline at end of file
+    ![](assets/FileSystem-8.png)
+
+## Read data from fileSystem
+
+This section provides detailed instructions about how to read data out as Ledger Entry in the file system.
+
+* The data was offloaded as `MapFile` to the following path:
+  ```properties
+    path = storageBasePath + "/" + managedLedgerName + "/" + ledgerId + "-" + uuid.toString();
+  ```
+    1. storageBasePath is the value of `hadoop.tmp.dir`, configured in `broker.conf` and `filesystem_offload_core_site.xml`
+    2. managedLedgerName is the name of the persistentTopic manager Ledger
+  ```shell
+     managedLedgerName of persistent://public/default/topics-name is  public/default/persistent/topics-name.
+  ```
+  Considering the iteration of versions,  you can use the following method to get the managedLedgerName:
+  ```shell
+     String managedLedgerName = TopicName.get("persistent://public/default/topics-name").getPersistenceNamingEncoding(); 
+  ```
+
+* Create a reader to read `MapFile` according to the above path and the `configuration` of the file system
+  ```shell
+     MapFile.Reader reader = new MapFile.Reader(new Path(dataFilePath),  configuration); 
+  ```
+* Read data as `LedgerEntry` from FileSystem.
+  ```java
+     LongWritable key = new LongWritable();
+     BytesWritable value = new BytesWritable();
+     key.set(nextExpectedId - 1);
+     reader.seek(key);
+     reader.next(key, value);
+     int length = value.getLength();
+     long entryId = key.get();
+     ByteBuf buf = PooledByteBufAllocator.DEFAULT.buffer(length, length);
+     buf.writeBytes(value.copyBytes());
+     LedgerEntryImpl ledgerEntry = LedgerEntryImpl.create(ledgerId, entryId, length, buf);

Review Comment:
   Great advice. I have added the deserialization steps, please review again when you have time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Anonymitaet merged pull request #15239: [doc][tiered storage] read data from filesystem

Posted by GitBox <gi...@apache.org>.
Anonymitaet merged PR #15239:
URL: https://github.com/apache/pulsar/pull/15239


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] zymap commented on a diff in pull request #15239: [doc][tiered storage] read data from filesystem

Posted by GitBox <gi...@apache.org>.
zymap commented on code in PR #15239:
URL: https://github.com/apache/pulsar/pull/15239#discussion_r855694523


##########
site2/docs/tiered-storage-filesystem.md:
##########
@@ -520,4 +520,40 @@ Execute the following commands in the repository where you download Pulsar tarba
 
     And the **Capacity Used** is changed from 4 KB to 116.46 KB.
 
-    ![](assets/FileSystem-8.png)
\ No newline at end of file
+    ![](assets/FileSystem-8.png)
+
+## Read data from fileSystem
+
+This section provides detailed instructions about how to read data out as Ledger Entry in the file system.
+
+* The data was offloaded as `MapFile` to the following path:
+  ```properties
+    path = storageBasePath + "/" + managedLedgerName + "/" + ledgerId + "-" + uuid.toString();
+  ```
+    1. storageBasePath is the value of `hadoop.tmp.dir`, configured in `broker.conf` and `filesystem_offload_core_site.xml`
+    2. managedLedgerName is the name of the persistentTopic manager Ledger
+  ```shell
+     managedLedgerName of persistent://public/default/topics-name is  public/default/persistent/topics-name.
+  ```
+  Considering the iteration of versions,  you can use the following method to get the managedLedgerName:
+  ```shell
+     String managedLedgerName = TopicName.get("persistent://public/default/topics-name").getPersistenceNamingEncoding(); 
+  ```
+
+* Create a reader to read `MapFile` according to the above path and the `configuration` of the file system
+  ```shell
+     MapFile.Reader reader = new MapFile.Reader(new Path(dataFilePath),  configuration); 
+  ```
+* Read data as `LedgerEntry` from FileSystem.
+  ```java
+     LongWritable key = new LongWritable();
+     BytesWritable value = new BytesWritable();
+     key.set(nextExpectedId - 1);
+     reader.seek(key);
+     reader.next(key, value);
+     int length = value.getLength();
+     long entryId = key.get();
+     ByteBuf buf = PooledByteBufAllocator.DEFAULT.buffer(length, length);
+     buf.writeBytes(value.copyBytes());
+     LedgerEntryImpl ledgerEntry = LedgerEntryImpl.create(ledgerId, entryId, length, buf);

Review Comment:
   Just want to know what users want to do when they read the entry? The entry is a serialized message, does they will deserialize the entry to the message by themself?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #15239: [doc][tiered storage] read data from filesystem

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #15239:
URL: https://github.com/apache/pulsar/pull/15239#discussion_r860553611


##########
site2/docs/tiered-storage-filesystem.md:
##########
@@ -520,4 +520,66 @@ Execute the following commands in the repository where you download Pulsar tarba
 
     And the **Capacity Used** is changed from 4 KB to 116.46 KB.
 
-    ![](assets/FileSystem-8.png)
\ No newline at end of file
+    ![](assets/FileSystem-8.png)
+
+## Read offloaded data from fileSystem
+
+This section provides detailed instructions about how to read data out as Ledger Entry in the file system.
+
+* The data was offloaded as `MapFile` to the following path:
+  ```properties
+    path = storageBasePath + "/" + managedLedgerName + "/" + ledgerId + "-" + uuid.toString();
+  ```
+    1. storageBasePath is the value of `hadoop.tmp.dir`, configured in `broker.conf` or `filesystem_offload_core_site.xml`
+    2. managedLedgerName is the name of the persistentTopic manager Ledger
+  ```shell
+     managedLedgerName of persistent://public/default/topics-name is public/default/persistent/topics-name.
+  ```
+  Can use the following method to get the managedLedgerName:

Review Comment:
   ```suggestion
     You can use the following method to get `managedLedgerName`:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] momo-jun commented on a diff in pull request #15239: [doc][tiered storage] read data from filesystem

Posted by GitBox <gi...@apache.org>.
momo-jun commented on code in PR #15239:
URL: https://github.com/apache/pulsar/pull/15239#discussion_r860549920


##########
site2/docs/tiered-storage-filesystem.md:
##########
@@ -520,4 +520,66 @@ Execute the following commands in the repository where you download Pulsar tarba
 
     And the **Capacity Used** is changed from 4 KB to 116.46 KB.
 
-    ![](assets/FileSystem-8.png)
\ No newline at end of file
+    ![](assets/FileSystem-8.png)
+
+## Read offloaded data from fileSystem
+
+This section provides detailed instructions about how to read data out as Ledger Entry in the file system.
+
+* The data was offloaded as `MapFile` to the following path:

Review Comment:
   ```suggestion
   The offloaded data is stored as `MapFile` in the following new path of the filesystem:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org