You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by "YanshuoH (via GitHub)" <gi...@apache.org> on 2024/03/15 09:41:17 UTC

[D] Geo-Replication: how to sync topics and policies between clusters [pulsar]

GitHub user YanshuoH created a discussion: Geo-Replication: how to sync topics and policies between clusters

Hello guys,

We have chosen the `Asynchronous geo-replication in Pulsar` mode (cf. https://pulsar.apache.org/docs/3.2.x/concepts-replication/ ) for Geo-Replication due to multiple reasons. Mainly because we would like to profit the functionality of geo-replication to create some replicated clusters in different regions and form a global mesh.

While testing the two clusters in standalone mode, by following the instructions in https://pulsar.apache.org/docs/3.2.x/administration-geo/, I realize that there is no indication of how the remote namespace and topics are created and handled.

Assuming I have two pulsar clusters named: `z1` and `z2`. When I enabled geo replication for namespace `public/repl`, in the `z1` broker's log, I've noticed:

```
{"reason":"Namespace not found"}
```

Then I tried to create my first topic `persistent://public/repl/t1` under the namespace in question, I got the same error in broker log. And same thing goes with produce a message in the topic. And the error message keeps repeating.

So I've created the namespace`public/repl` in `z2`, and the replication went well.

Yet I've noticed that the topic auto created in `z2` is:

```
+-----------------------------------------+---------------+
|               TOPIC NAME                | PARTITIONED ? |
+-----------------------------------------+---------------+
| persistent://public/repl/t1-partition-1 | N             |
+-----------------------------------------+---------------+
```

which is not a partitioned topic. And with `auto-topic-creation` disabled in `z2`, the replication error in `z1` appears with something like `topic not found`.

Pardon me for the long reading, my questions are:
1. Is there a way to sync metadata (namespace / topic and even the policies / auth) between clusters ? Or is it the user's responsibility to keep the two clusters' structure / policies (metadata) in sync? (eg. `z1`'s `t1` has 4 partitions and we expect `z2`'s '`t1` to be the same.)
2. Should we (or should we not) to expect an error if sending messages to a topic that is expected to be replicated among clusters but fails to replicate? (eg. send message to `z1`'s `t1` but the topic or even namespace does not exist in `z2`)

Thank you.

GitHub link: https://github.com/apache/pulsar/discussions/22278

----
This is an automatically sent email for commits@pulsar.apache.org.
To unsubscribe, please send an email to: commits-unsubscribe@pulsar.apache.org


Re: [D] Geo-Replication: how to sync topics and policies between clusters [pulsar]

Posted by "slawrencemd (via GitHub)" <gi...@apache.org>.
GitHub user slawrencemd added a comment to the discussion: Geo-Replication: how to sync topics and policies between clusters

If the global config store is not used, how would topics/policies be synced between peered clusters? Is there an automatic mechanism to create/mirror the entities? I remember there is a 'global' flag  for certain policies assigned to topics - is that the mechanism?

Or is that the tradeoff - global config store vs _manual_ entity management in peered clusters?

![image](https://github.com/apache/pulsar/assets/151540338/70e56376-5a50-4438-96ff-e032e40a5071)


GitHub link: https://github.com/apache/pulsar/discussions/22278#discussioncomment-9059185

----
This is an automatically sent email for commits@pulsar.apache.org.
To unsubscribe, please send an email to: commits-unsubscribe@pulsar.apache.org


Re: [D] Geo-Replication: how to sync topics and policies between clusters [pulsar]

Posted by "asafm (via GitHub)" <gi...@apache.org>.
GitHub user asafm added a comment to the discussion: Geo-Replication: how to sync topics and policies between clusters

From my understanding there is a central Configuration Store - i.e. a single shared ZK - which stores the metadata of topics/namespaces/tenants. You should configure both clusters to use the same configuration store (ZK). So one ZK (metadata) per cluster and one shared ZK (configuration store)

GitHub link: https://github.com/apache/pulsar/discussions/22278#discussioncomment-8829254

----
This is an automatically sent email for commits@pulsar.apache.org.
To unsubscribe, please send an email to: commits-unsubscribe@pulsar.apache.org


Re: [D] Geo-Replication: how to sync topics and policies between clusters [pulsar]

Posted by "YanshuoH (via GitHub)" <gi...@apache.org>.
GitHub user YanshuoH added a comment to the discussion: Geo-Replication: how to sync topics and policies between clusters

Thank you for reply.

GitHub link: https://github.com/apache/pulsar/discussions/22278#discussioncomment-8837782

----
This is an automatically sent email for commits@pulsar.apache.org.
To unsubscribe, please send an email to: commits-unsubscribe@pulsar.apache.org


Re: [D] Geo-Replication: how to sync topics and policies between clusters [pulsar]

Posted by "lhotari (via GitHub)" <gi...@apache.org>.
GitHub user lhotari added a comment to the discussion: Geo-Replication: how to sync topics and policies between clusters

> If the global config store is not used, how would topics/policies be synced between peered clusters?

@slawrencemd good questions.

A topic will get created by the replicator that pushes messages from the source cluster to the target cluster. There's no absolute need to replicate the policies. For synchronizing policies, there's an alternative for global config store as part of ["PIP-136: Sync Pulsar policies across multiple clouds"](https://github.com/apache/pulsar/issues/16424). I haven't used that so I'm not sure if it's fully implemented and documented. 
["PIP-188: Cluster migration or Blue-Green cluster deployment support in Pulsar"](https://github.com/apache/pulsar/issues/16551) might also contain some changes related to replicating topics and policies across clusters.


GitHub link: https://github.com/apache/pulsar/discussions/22278#discussioncomment-9059421

----
This is an automatically sent email for commits@pulsar.apache.org.
To unsubscribe, please send an email to: commits-unsubscribe@pulsar.apache.org


Re: [D] Geo-Replication: how to sync topics and policies between clusters [pulsar]

Posted by "lhotari (via GitHub)" <gi...@apache.org>.
GitHub user lhotari added a comment to the discussion: Geo-Replication: how to sync topics and policies between clusters

btw. The global configuration store isn't mandatory. (Question: https://github.com/apache/pulsar/discussions/22456)

GitHub link: https://github.com/apache/pulsar/discussions/22278#discussioncomment-9057622

----
This is an automatically sent email for commits@pulsar.apache.org.
To unsubscribe, please send an email to: commits-unsubscribe@pulsar.apache.org