You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/05/19 23:30:42 UTC

[GitHub] [pulsar] lukestephenson opened a new issue #6995: Decouple original topic and compacted topic retention policies

lukestephenson opened a new issue #6995:
URL: https://github.com/apache/pulsar/issues/6995


   **Is your feature request related to a problem? Please describe.**
   
   When publishing to a topic that is compacted, the producer needs to publish the full state of the entity being published. 
   
   For example, it wouldn't make sense to publish to a compacted topic an event like "User added item to shopping cart" because a consumer wouldn't be able to reconstruct the state from only the last message for that user id.  Each time the producer published, it would need to publish a message like "User shopping cart state".
   
   Consuming from a compacted topic is not just a consumer decision.  The producer also needs to allow for it in advance.
   
   Pulsar has the following compacted topic support (https://pulsar.apache.org/docs/en/cookbooks-compaction/#when-should-i-use-compacted-topics):
   > - They can read from the "original," non-compacted topic in case they need access to "historical" values, i.e. the entirety of the topic's messages.
   > - They can read from the compacted topic if they only want to see the most up-to-date messages.
   
   I intend to have all consumers of these messages on the compacted version of this topic.  As such, it would be very wasteful to continue to store the complete history of messages on the original topic.
   
   However, we can't set a retention policy on the original topic without that also affecting the compacted topic (https://pulsar.apache.org/docs/en/concepts-topic-compaction/#compaction):
   > Topic compaction does, however, respect retention. If retention has removed a message from the message backlog of a topic, the message will also not be readable from the compacted topic ledger.
   
   So it sounds like if I set a retention policy to remove old messages from the "original" topic, that will also remove messages from the compacted topic.
   
   **Describe the solution you'd like**
   I'd like storage to only be consumed for the compacted topic.  Given nothing will be consuming from the original topic, it can be cleared.
   
   **Describe alternatives you've considered**
   While it is promoted as a feature, given the producers need to be aware the messages they are publishing will be compacted, I'm not sure what the benefit of having both the original and compacted topic.  Could it be that the producers make a decision to only publish to a compacted topic?
   
   **Additional context**
   This was originally discussed on slack: https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1589871154137500
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat commented on issue #6995: Decouple original topic and compacted topic retention policies

Posted by GitBox <gi...@apache.org>.
merlimat commented on issue #6995:
URL: https://github.com/apache/pulsar/issues/6995#issuecomment-898998128


   @lukestephenson What is described in this issue (and in the docs) was actually not the intended behavior (as it's not practically useful). The behavior was fixed in #11287 and scheduled to be released in 2.8.1 in the next few days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] vsly-ru commented on issue #6995: Decouple original topic and compacted topic retention policies

Posted by GitBox <gi...@apache.org>.
vsly-ru commented on issue #6995:
URL: https://github.com/apache/pulsar/issues/6995#issuecomment-894788124


   @lukestephenson Did you find a way of achieving this? 
   I'm facing the same problem, but sounds like the Pulsar still can't be configured such a way, that a compacted topic becomes a key:value storage, but distributed and integrated onto existing Pulsar infrastructure. 
   We're considering periodic topic migration: run compaction, read compacted -> write to a new topic, left a "moved" message in the old topic with the new topic name; then delete it after a while, when all readers/consumers migrated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lukestephenson commented on issue #6995: Decouple original topic and compacted topic retention policies

Posted by GitBox <gi...@apache.org>.
lukestephenson commented on issue #6995:
URL: https://github.com/apache/pulsar/issues/6995#issuecomment-898996251


   Hi @vsly-ru , we didn't proceed with Pulsar at the time as I didn't feel it was mature enough.  That was a year ago now, and pulsar may be more stable now.  I wrote up https://medium.com/zendesk-engineering/evaluating-apache-pulsar-92e6ed3fc792 with more details.
   
   What you are suggesting sounds like it could work.  How will you move the topic producers across to the new topic?  It's could be tricky managing ordering guarantees when you switch the regular producer and topic migration process over to the new topic (unless you can stop the producers with some lock during this period).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lukestephenson commented on issue #6995: Decouple original topic and compacted topic retention policies

Posted by GitBox <gi...@apache.org>.
lukestephenson commented on issue #6995:
URL: https://github.com/apache/pulsar/issues/6995#issuecomment-698032989


   Just came across https://streamnative.io/blog/tech/2020-07-08-pulsar-vs-kafka-part-1#topic-log-compaction:~:text=Topic%20(Log)%20Compaction which when I read it makes it sound like this is already supported.
   
   > By doing this, Pulsar allows for non-compacted data to have a retention policy, keeping control over unbounded growth, but still allowing periodic compaction to generate the most recent materialized view around. 
   
   However, this appears to conflict with what is stated in the pulsar docs:
   https://pulsar.apache.org/docs/en/concepts-topic-compaction/#__docusaurus:~:text=respect%20retention
   
   > Topic compaction does, however, respect retention. If retention has removed a message from the message backlog of a topic, the message will also not be readable from the compacted topic ledger.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] vsly-ru edited a comment on issue #6995: Decouple original topic and compacted topic retention policies

Posted by GitBox <gi...@apache.org>.
vsly-ru edited a comment on issue #6995:
URL: https://github.com/apache/pulsar/issues/6995#issuecomment-894788124


   @lukestephenson Did you find a way of achieving this? 
   I'm facing the same problem, but seems like Pulsar still can't be configured such a way, that a compacted topic becomes the key:value storage integrated into a distributed Pulsar infrastructure. 
   We're considering periodic topic migration: run compaction, read compacted -> write to a new topic, left "moved" message in the old topic with the new topic's name; Then you can delete the old one after a while.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] vsly-ru commented on issue #6995: Decouple original topic and compacted topic retention policies

Posted by GitBox <gi...@apache.org>.
vsly-ru commented on issue #6995:
URL: https://github.com/apache/pulsar/issues/6995#issuecomment-899094102


   @lukestephenson luckily we have only one producer per compacted topic at a time.
   In case of multiple producers you indeed need some kind of lock or service time to perform the migration. I've also heard a _hack_ to enable retention and run a cron job to _touch_ every key with the last value. However it also requires a lock. 
   
    @merlimat wow, great news! Sounds like exactly what we need!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org