You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/05/11 03:06:33 UTC

[GitHub] [pulsar] equanz opened a new pull request #10534: [PIP 79][common,broker,client] Change PartitionedTopicStats to support partial partitioned producer

equanz opened a new pull request #10534:
URL: https://github.com/apache/pulsar/pull/10534


   Master Issue: https://github.com/apache/pulsar/wiki/PIP-79%3A-Reduce-redundant-producers-from-partitioned-producer
   
   ### Motivation
   
   Please see the PIP document.
   This is a part of implementations. **Before review it, please check https://github.com/apache/pulsar/pull/10279. After https://github.com/apache/pulsar/pull/10279 is merged, I will rebase to top of master branch.**
   
   ### Modifications
   
   * Change PartitionedTopicStats to support partial partitioned producer like [this(#10279)](https://github.com/equanz/pulsar/blob/a04a7bec47ed78258ef9bdc1b0120f987762f2f6/pulsar-client/src/main/java/org/apache/pulsar/client/impl/customroute/PartialRoundRobinMessageRouterImpl.java)
   
   For backward compatibility reason, I introduce new producer property `producerStatsKey`. If this property is same, then associate publisher stats as same producer at partitioned producer stats.
   
   I think about using `producerName` instead of introducing `producerStatsKey`, but I think we should not use it. Because currently `producerName` is configured by the system(different for each partition) or yourself(same between all partitions). Therefore, if we use `producerName` as association key, we should change system behavior (I think this is breaking changes) or request to user that "you should set `producerName` yourself if you want to associate publisher stats per partitioned producer".
   
   If any suggestions, please comment to this PR.
   
   ### Verifying this change
   
   * [ ] Make sure that the change passes the CI checks.
   
   This change added tests and can be verified as follows:
   
   * Added test for partitioned producer stats
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If `yes` was chosen, please highlight the changes*
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API: (no)
     - The schema: (yes)
       - https://github.com/apache/pulsar/blob/c0a48454bf4199943c1ca7cf821e6351843a55ee/pulsar-common/src/main/java/org/apache/pulsar/common/policies/data/PublisherStats.java#L49
     - The default values of configurations: (no)
     - The wire protocol: (yes)
       - https://github.com/apache/pulsar/blob/c0a48454bf4199943c1ca7cf821e6351843a55ee/pulsar-common/src/main/proto/PulsarApi.proto#L259
       - https://github.com/apache/pulsar/blob/c0a48454bf4199943c1ca7cf821e6351843a55ee/pulsar-common/src/main/proto/PulsarApi.proto#L486
       - https://github.com/apache/pulsar/blob/c0a48454bf4199943c1ca7cf821e6351843a55ee/pulsar-common/src/main/proto/PulsarApi.proto#L636
     - The rest endpoints: (no)
     - The admin cli options: (no)
     - Anything that affects deployment: (no)
   
   ### Documentation
   
   * Does this pull request introduce a new feature? (no)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] equanz commented on pull request #10534: [PIP 79][common,broker,client] Change PartitionedTopicStats to support partial partitioned producer

Posted by GitBox <gi...@apache.org>.
equanz commented on pull request #10534:
URL: https://github.com/apache/pulsar/pull/10534#issuecomment-1031138035


   Close this PR because https://github.com/apache/pulsar/pull/12401 has been merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] equanz commented on pull request #10534: [PIP 79][common,broker,client] Change PartitionedTopicStats to support partial partitioned producer

Posted by GitBox <gi...@apache.org>.
equanz commented on pull request #10534:
URL: https://github.com/apache/pulsar/pull/10534#issuecomment-899931286


   @Vanlightly 
   > Moreover, it looks to me like the current strategy is already broken seeing as the numbers won't always match (neither numbers nor order) such as when producers come and go.
   
   I think so too. Therefore, not only partial producer stats, but also total producer stats will be calculated correctly by this feature.
   
   > Is that summary correct?
   
   Yes.
   
   > Could we not use the producerName for aggregation and use the same strategy of starting a single producer, then making all others inherit their name from the first (when the user doesn't set the name)?
   
   We can choose the strategy described above, but I think it will break some existing behavior(currently, internal producer's name aren't same. / if we use the same producerName in the partitioned topic, then they will be aggregated.).
   
   If this change is approved in the community, I'll change the strategy.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] equanz commented on pull request #10534: [PIP 79][common,broker,client] Change PartitionedTopicStats to support partial partitioned producer

Posted by GitBox <gi...@apache.org>.
equanz commented on pull request #10534:
URL: https://github.com/apache/pulsar/pull/10534#issuecomment-918004541


   I'll try to implement this feature based on above comment (use the producerName for aggregation and use the same strategy of starting a single producer).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] equanz closed pull request #10534: [PIP 79][common,broker,client] Change PartitionedTopicStats to support partial partitioned producer

Posted by GitBox <gi...@apache.org>.
equanz closed pull request #10534:
URL: https://github.com/apache/pulsar/pull/10534


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] equanz commented on pull request #10534: [PIP 79][common,broker,client] Change PartitionedTopicStats to support partial partitioned producer

Posted by GitBox <gi...@apache.org>.
equanz commented on pull request #10534:
URL: https://github.com/apache/pulsar/pull/10534#issuecomment-898542014


   @Vanlightly Of course. I think the main issue is stats collecting logic( https://github.com/apache/pulsar/blob/3dd9ec528b07ca495a42c020b3bf0824dc67426f/pulsar-common/src/main/java/org/apache/pulsar/common/policies/data/stats/TopicStatsImpl.java#L184-L193 ).
   
   It causes these issues.
   
   1. If each partition has different number of producers, then calculate publisher stats separetelly
      - e.g. if full partitioned producer and partial partitioned producer(or simply single producer) are connected to the partitioned topic which has 2 partitions, then broker calclulates 3 publisher stats each other
      - expect: each partitioned producer is accumulated as single publisher stats
   2. If each partition has same number of producer, then calculate publisher stats as single partitioned producer
     - e.g. if each partition has single producer(all producers were created separetelly), then broker calculates publisher stats as single publisher
     - expect: all producers are accumulated separetelly
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] equanz edited a comment on pull request #10534: [PIP 79][common,broker,client] Change PartitionedTopicStats to support partial partitioned producer

Posted by GitBox <gi...@apache.org>.
equanz edited a comment on pull request #10534:
URL: https://github.com/apache/pulsar/pull/10534#issuecomment-899931286


   @Vanlightly 
   > Moreover, it looks to me like the current strategy is already broken seeing as the numbers won't always match (neither numbers nor order) such as when producers come and go.
   
   I think so too. Therefore, not only partial producer stats, but also total producer stats will be calculated correctly by this feature.
   
   > Is that summary correct?
   
   Yes.
   
   > Could we not use the producerName for aggregation and use the same strategy of starting a single producer, then making all others inherit their name from the first (when the user doesn't set the name)?
   
   We can choose the strategy described above, but I think it will break some existing behavior(currently, internal producer's name aren't same. / if we use the same producerName in the partitioned topic, then they will be aggregated.).
   
   If the change described above is approved in the community, I'll change the strategy.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] equanz edited a comment on pull request #10534: [PIP 79][common,broker,client] Change PartitionedTopicStats to support partial partitioned producer

Posted by GitBox <gi...@apache.org>.
equanz edited a comment on pull request #10534:
URL: https://github.com/apache/pulsar/pull/10534#issuecomment-898542014


   @Vanlightly Of course. I think the main issue is stats collecting logic.
   https://github.com/apache/pulsar/blob/3dd9ec528b07ca495a42c020b3bf0824dc67426f/pulsar-common/src/main/java/org/apache/pulsar/common/policies/data/stats/TopicStatsImpl.java#L184-L193
   
   It causes these issues.
   
   1. If each partition has different number of producers, then calculate publisher stats separetelly
      - e.g. if full partitioned producer and partial partitioned producer(or simply single producer) are connected to the partitioned topic which has 2 partitions, then broker calclulates 3 publisher stats each other
      - expect: each partitioned producer is accumulated as single publisher stats
   2. If each partition has same number of producer, then calculate publisher stats as single partitioned producer
      - e.g. if each partition has single producer(all producers were created separetelly), then broker calculates publisher stats as single publisher
      - expect: all producers are accumulated separetelly
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Vanlightly commented on pull request #10534: [PIP 79][common,broker,client] Change PartitionedTopicStats to support partial partitioned producer

Posted by GitBox <gi...@apache.org>.
Vanlightly commented on pull request #10534:
URL: https://github.com/apache/pulsar/pull/10534#issuecomment-898476318


   @equanz Could you elaborate on what the problem is? If each partitioned producer creates producers on a subset of partitions, how do the stats become incorrect?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Vanlightly commented on pull request #10534: [PIP 79][common,broker,client] Change PartitionedTopicStats to support partial partitioned producer

Posted by GitBox <gi...@apache.org>.
Vanlightly commented on pull request #10534:
URL: https://github.com/apache/pulsar/pull/10534#issuecomment-899386900


   @equanz Let me know if I understand correctly. We publish aggregated stats, where we aggregate by partitioned producer, and optionally we also publish stats per partition with no aggregation. So the issue is that currently the stats collection code expects each partition to have the same number of producers on each partition, and uses that as a way of aggregating the stats for each partitioned producer. 
   
   So we need another way of aggregating by partitioned producer. Moreover, it looks to me like the current strategy is already broken seeing as the numbers won't always match (neither numbers nor order) such as when producers come and go.
   
   Using producer name as an aggregation key is not good because if the producer name is not set by the user, then each internal producer will have a different producer name (generated by the broker).
   
   So the fix is to add another producer identity (producerStatsKey) that is shared across all producers of the same partitioned producer. Then aggregate on that. 
   
   The producerStatsKey is set by the broker, to ensure it is globally unique. The partitioned producer creates a single producer when the partitioned producer is started. Then when additional internal producers for other partitions are required, they are created with the same  producerStatsKey as the first producer. That way they all share the same key.
   
   Is that summary correct?
   
   Additional question: 
   - Could we not use the producerName for aggregation and use the same strategy of starting a single producer, then making all others inherit their name from the first (when the user doesn't set the name)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] equanz edited a comment on pull request #10534: [PIP 79][common,broker,client] Change PartitionedTopicStats to support partial partitioned producer

Posted by GitBox <gi...@apache.org>.
equanz edited a comment on pull request #10534:
URL: https://github.com/apache/pulsar/pull/10534#issuecomment-898542014


   @Vanlightly Of course. I think the main issue is stats collecting logic.
   https://github.com/apache/pulsar/blob/3dd9ec528b07ca495a42c020b3bf0824dc67426f/pulsar-common/src/main/java/org/apache/pulsar/common/policies/data/stats/TopicStatsImpl.java#L184-L193
   
   It causes these issues.
   
   1. If each partition has different number of producers, then calculate publisher stats separetelly
      - e.g. if full partitioned producer and partial partitioned producer(or simply single producer) are connected to the partitioned topic which has 2 partitions, then broker calclulates 3 publisher stats each other
      - expect: each partitioned producer is accumulated as single publisher stats
   2. If each partition has same number of producer, then calculate publisher stats as single partitioned producer
     - e.g. if each partition has single producer(all producers were created separetelly), then broker calculates publisher stats as single publisher
     - expect: all producers are accumulated separetelly
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org