You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/07/06 09:39:39 UTC

[GitHub] [pulsar] Aegeaner opened a new pull request #7461: Add readCompact option

Aegeaner opened a new pull request #7461:
URL: https://github.com/apache/pulsar/pull/7461


   <!--
   ### Contribution Checklist
   
   Master Issue: #6734
   
   ### Motivation
   
   
   *Pulsar SQL should support readCompact on topics.*
   
   ### Modifications
   
   *Add a Split in PulsarSplitManager to handle compacted data, and a option readCompact for user to open it.*
   
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
     - *Added integration tests for end-to-end deployment with large payloads (10MB)*
     - *Extended integration test for recovery after broker failure*
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If `yes` was chosen, please highlight the changes*
   
     - Dependencies (does it add or upgrade a dependency): (yes / no)
     - The public API: (yes / no)
     - The schema: (yes / no / don't know)
     - The default values of configurations: (yes / no)
     - The wire protocol: (yes / no)
     - The rest endpoints: (yes / no)
     - The admin cli options: (yes / no)
     - Anything that affects deployment: (yes / no / don't know)
   
   ### Documentation
   
     - Does this pull request introduce a new feature? (yes / no)
     - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
     - If a feature is not applicable for documentation, explain why?
     - If a feature is not documented yet in this PR, please create a followup issue for adding the documentation
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #7461: Add readCompact option

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #7461:
URL: https://github.com/apache/pulsar/pull/7461#issuecomment-654898275


   The compacted data is stored in a separate ledger, I think the cursor can't read data from that ledger. You can see https://github.com/apache/pulsar/blob/53407fc598286690790727635bb5067a7ac108e7/pulsar-broker/src/main/java/org/apache/pulsar/compaction/CompactedTopicImpl.java#L104 .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] gaoran10 commented on pull request #7461: Add readCompact option

Posted by GitBox <gi...@apache.org>.
gaoran10 commented on pull request #7461:
URL: https://github.com/apache/pulsar/pull/7461#issuecomment-654883130


   1. I think we couldn't add the config `pulsar.read-compact`, this config is a global config. It is up to users to query the raw data or the compacted data, so we could add metadata for users to add in the SQL statement.
   Such as `select price from pulsar.'public/default'.'stock-topic' where __compacted_topic__ = true` is used to query compacted data. Refer to the `PulsarInternalColumn.java`.
   
   2. The compacted data should be read from two parts, one is the compacted ledger and the other are normal ledgers, like the compacted Topic reading. So we should check the topic compacted position first, and generate splits through the compacted ledger and normal ledgers. One split should have one property such as `read-compacted` indicate this split is used to read compacted ledger or normal topic ledgers. We could use the normal topic name to get its compacted ledger name.
   
   @jiazhai @codelipenghui Please confirm.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui closed pull request #7461: Add readCompact option

Posted by GitBox <gi...@apache.org>.
codelipenghui closed pull request #7461:
URL: https://github.com/apache/pulsar/pull/7461


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat commented on a change in pull request #7461: Add readCompact option

Posted by GitBox <gi...@apache.org>.
merlimat commented on a change in pull request #7461:
URL: https://github.com/apache/pulsar/pull/7461#discussion_r450453525



##########
File path: managed-ledger/src/main/java/org/apache/bookkeeper/mledger/ManagedLedgerFactory.java
##########
@@ -160,4 +161,9 @@ void asyncOpenReadOnlyCursor(String managedLedgerName, Position startPosition, M
      */
     void shutdown() throws InterruptedException, ManagedLedgerException;
 
+    /**
+     * Get the bookkeeper.
+     */
+    BookKeeper getBookKeeper();

Review comment:
       We shouldn't expose the BK client on the interface here




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #7461: Add readCompact option

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #7461:
URL: https://github.com/apache/pulsar/pull/7461#issuecomment-721454646


   Looks like there are no updates, I will close it first.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui closed pull request #7461: Add readCompact option

Posted by GitBox <gi...@apache.org>.
codelipenghui closed pull request #7461:
URL: https://github.com/apache/pulsar/pull/7461


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #7461: Add readCompact option

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #7461:
URL: https://github.com/apache/pulsar/pull/7461#issuecomment-654896700


   The compacted data is stored in a separate ledger, I think the cursor can't read data from that ledger. You can see https://github.com/apache/pulsar/blob/53407fc598286690790727635bb5067a7ac108e7/pulsar-broker/src/main/java/org/apache/pulsar/compaction/CompactedTopicImpl.java#L104 .


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] gaoran10 edited a comment on pull request #7461: Add readCompact option

Posted by GitBox <gi...@apache.org>.
gaoran10 edited a comment on pull request #7461:
URL: https://github.com/apache/pulsar/pull/7461#issuecomment-654883130


   1. I think we couldn't add the config `pulsar.read-compact`, this config is a global config. It is up to users to query the raw data or the compacted data, so we could add metadata for users to add in the SQL statement.
   Such as `select price from pulsar.'public/default'.'stock-topic' where __compacted_topic__ = true` is used to query compacted data. Refer to the `PulsarInternalColumn.java`.
   
   2. The compacted data should be read from two parts, one is the compacted ledger and the other are normal ledgers, like the compacted Topic reading. So we should check the topic compacted position first, and generate splits through the compacted ledger and normal ledgers. One split should have one property such as `readCompacted` indicate this split is used to read compacted ledger or normal topic ledgers. We could use the normal topic name to get its compacted ledger name.
   
   @jiazhai @codelipenghui Please confirm.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org