You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2019/01/08 09:11:02 UTC

Slack digest for #general - 2019-01-08

2019-01-07 09:11:04 UTC - Sijie Guo: Or check your bookie nodes to see if the bookies are running or not 
----
2019-01-07 09:20:55 UTC - bossbaby: <https://gist.github.com/tuan6956/cf05fc21fa733b6ef92ce86923b56dde>
----
2019-01-07 09:21:12 UTC - bossbaby: please check help me
----
2019-01-07 09:28:49 UTC - Ali Ahmed: you only have one bookie it seems
----
2019-01-07 09:30:00 UTC - bossbaby: i found error and edit it,
Run successfull
+1 : bossbaby
----
2019-01-07 09:30:06 UTC - Ali Ahmed: ok
+1 : bossbaby
----
2019-01-07 09:31:39 UTC - bossbaby: "If you deploy Pulsar in a one-node cluster, you should update the replication settings in conf/broker.conf to 1" has been described in document. But default is 2, so fix it and run againt
----
2019-01-07 09:31:44 UTC - bossbaby: Thanks you all bro
----
2019-01-07 09:32:19 UTC - Ali Ahmed: if you need a one node cluster just use standalone mode
----
2019-01-07 09:32:28 UTC - Ali Ahmed: it will setup everything correctly
+1 : bossbaby
----
2019-01-07 09:55:00 UTC - Yuvaraj Loganathan: Right now we are thinking of one topic per customer under an namespace. like topic name as `customer-data-&lt;customer-id&gt;` . The consumer will consume using pattern subscription `customer-data-*`  Let say there are two topics matches the subscription `customer-data-1` and `cusotmer-data-2` For every message i call an external service. The external service may throttle lets say for `customer-data-1`. So When the external service is throttles i would like to stop consuming the  message from `customer-data-1` for some time and continue on `customer-data-2` topic data which is not throttles. In pulsar client  if I don’t acknowledge for an topic`customer-data-1` topic and continuously acknowledge for `customer-data-2` topic will I get data for customer-data-2 topic without getting blocked ?
----
2019-01-07 09:57:49 UTC - bossbaby: i don't know why do need 3 zookkeeper single in 3 vms in tutorial "To run Pulsar on bare metal". i think 1 zk + 1 bk + 1 broker in 1 vms to enough
----
2019-01-07 11:35:36 UTC - Yuvaraj Loganathan: @Sijie Guo <https://github.com/apache/pulsar/issues/3317> Any help would be highly appreciated. Our dev pipeline blocked because of this. Not able to compile also.
----
2019-01-07 12:28:52 UTC - Yifan: @Yuvaraj Loganathan which version of python are you using. 3.6 doesn’t work
----
2019-01-07 12:35:10 UTC - Yuvaraj Loganathan: @Yifan Yes it is 3.6 :face_palm: Let me check with 3.7
----
2019-01-07 12:38:26 UTC - Yuvaraj Loganathan: It works awesome! Thanks @Yifan
+1 : Yifan, Sijie Guo
----
2019-01-07 12:39:46 UTC - Yuvaraj Loganathan: Closed the issue.
----
2019-01-07 15:53:15 UTC - Grant Wu: @Sijie Guo B u m p.
----
2019-01-07 15:53:38 UTC - Grant Wu: Wait, why doesn’t the client work with Python 3.6?
----
2019-01-07 15:57:04 UTC - Grant Wu: Because Zookeeper is designed to run in a cluster/multi-server setup to provide a voting quorum.  See <https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkMulitServerSetup>
----
2019-01-07 15:57:21 UTC - Grant Wu: If you want to run a standalone setup for development purposes, `pulsar standalone` probably suffices for your need?
+1 : Matteo Merli
----
2019-01-07 16:09:35 UTC - Matteo Merli: For MacOS we only publish the wheel files for 2.7 and 3.7.  These are the versions of python coming with either macOS or homebrew 
----
2019-01-07 16:10:33 UTC - Matteo Merli: It would be nice to have an environment with all the combinations, to use when doing releases 
----
2019-01-07 16:12:22 UTC - Grant Wu: Oh, so this doesn’t apply to Linux, okay
----
2019-01-07 16:26:31 UTC - Matteo Merli: Yes, in Linux we build in Docker containers and have all combinations 
----
2019-01-07 16:30:37 UTC - Grant Wu: or @Matteo Merli do you think you could look into this? :confused:
----
2019-01-07 16:37:14 UTC - Matteo Merli: Passing buck to @Jerry Peng ;)
----
2019-01-07 16:44:53 UTC - Romain Castagnet: Hi. When I activate SSL connexion on brokers, I have this warning before an SSL handshake error : "org.apache.pulsar.broker.service.ServerCnx - [/XX.XX.XX.XX:41818] Got exception TooLongFrameException : Adjusted frame length exceeds 5242880: 369295620 - discarded". Yesterday morning this error disappear and it seems to fall in work. Since I tried to activate authentication, this error appear again. I don't understand why. Did you have a similar problem ?
----
2019-01-07 16:53:08 UTC - Chris Miller: Is there any reason why ConsumerImpl.hasMessageAvailable() is not part of the Consumer interface?
----
2019-01-07 16:56:20 UTC - Matteo Merli: Not technical reason, it’s more of semantics. Consumer is the api to use a managed subscription, where the server knows and controls which messages you’re consuming. In general application don’t have the need to know when they’re caught up with the publishers
----
2019-01-07 16:57:50 UTC - Matteo Merli: On the contrary, the Reader is completely unmanaged. A common use case is to create a reader to do a scan on the topic, starting from a given point and up to “now”
----
2019-01-07 16:58:08 UTC - Chris Miller: I see, thanks. Maybe I'm looking at things the wrong way then. I'd like to have a consumer that can read some history up until the most recent message. Sounds like I need a Reader instead
----
2019-01-07 16:59:43 UTC - Chris Miller: I asked some related questions on Friday about this, wondering when you might use Consumer.seek() vs Reader, and why Reader wasn't a super-interface of Consumer
----
2019-01-07 17:00:59 UTC - Chris Miller: I don't suppose there's a "best practices" doc somewhere detailing these sort of common patterns?
----
2019-01-07 17:02:38 UTC - Chris Miller: One thing that's missing from both Consumer and Reader is seeking to a timestamp. The admin API has that via resetCursor(). I guess it's not an efficient operation and therefore no so suitable for client use?
----
2019-01-07 17:06:30 UTC - Grant Wu: I’ve actually asked about this before and I think it was stated that it was a reasonable thing to ask for
+1 : Chris Miller
----
2019-01-07 17:14:48 UTC - Grant Wu: I think it may have been lost due to the history limit :disappointed:
----
2019-01-07 17:16:54 UTC - Chris Miller: History limit?
----
2019-01-07 17:20:22 UTC - Grant Wu: Yes, Slack limits free plans to 10k messages
----
2019-01-07 17:25:46 UTC - Grant Wu: There are archives sent to the mailing list but I don't know how to search that
----
2019-01-07 17:50:06 UTC - Chris Miller: Oh, haha sorry I thought you were referring to some sort of history limit in Pulsar :slightly_smiling_face:
----
2019-01-07 18:02:22 UTC - Evan Nelson: @Evan Nelson has joined the channel
----
2019-01-07 18:58:30 UTC - Jerry Peng: @Grant Wu ok let me investigate
pray : Grant Wu
----
2019-01-07 21:57:57 UTC - Jerry Peng: @Grant Wu before we can get the topic name in python functions we need to complete this first:
<https://github.com/apache/pulsar/issues/3322>
since there is currently no way to get the topic name from a message using the C++/Python API
----
2019-01-07 21:59:47 UTC - Grant Wu: oh no :disappointed:
----
2019-01-07 21:59:52 UTC - Grant Wu: Okay, good to know
----
2019-01-07 22:23:32 UTC - Jerry Peng: Though this should be pretty easy to add
----
2019-01-07 23:01:47 UTC - Emma Pollum: What IP does pulsar use for geo replication? Does it utilize the service url of the cluster to replicate to, or something else?
----
2019-01-07 23:12:23 UTC - Matteo Merli: It will use the ServiceURL for the other cluster as specified in the “clusters” metadata
----
2019-01-07 23:12:50 UTC - Matteo Merli: eg. the metadata you specify with `initialize-cluster-metadata` command
----
2019-01-07 23:14:07 UTC - Emma Pollum: Thank you!
----
2019-01-08 02:22:59 UTC - Kevin DiVincenzo: @Kevin DiVincenzo has joined the channel
----
2019-01-08 02:27:27 UTC - Kevin DiVincenzo: Hi - I have a question regarding namespaces/topics.

For my use-case, I want to create topics like: `persistent://{tenant}/{namespace}/{topic}/someId`. There is no problem creating these topics from the Java client, using `pulsar-perf produce ...`, etc.

The problem is that all of the admin functionality doesn't seem to work when you nest the topics one layer deeper than just `persistent://{tenant}/{namespace}/{topic}`. `{namespace}/{topic}` doesn't appear to be a valid namespace (expected), but if I do `pulsar-admin topics list {tenant}/{namespace}`, I get back an empty list.
----
2019-01-08 02:28:14 UTC - Kevin DiVincenzo: For what its worth - this use case is for event-sourcing / integrating with Akka Persistence.
----
2019-01-08 02:30:03 UTC - bossbaby: so, pulsar will use 1 in 3 to backup it or use all.
----
2019-01-08 02:33:18 UTC - Kevin DiVincenzo: So is this not possible / not supported?
----
2019-01-08 02:37:05 UTC - Sijie Guo: @Kevin DiVincenzo I think there something related to encoding and decoding “/”.
----
2019-01-08 02:37:30 UTC - Sijie Guo: I would recommend if possible trying to avoid using “/” for now
----
2019-01-08 02:37:37 UTC - Sijie Guo: but this is a bug we definitely to fix
----
2019-01-08 02:37:38 UTC - bossbaby: i was deploy 3 node in 1 cluster but i have a question that 2 in 3 node will node backup and store copy data to handle failure, it right?
----
2019-01-08 02:38:23 UTC - Kevin DiVincenzo: @Sijie Guo Ahh - so with the Admin API not being able to encode/decode `/` properly in the namespace name?
----
2019-01-08 02:38:41 UTC - bossbaby: i dont know, i should deploy 1 cluster 3 broker or 3 cluster to 2 cluster add in 1 cluster
----
2019-01-08 02:42:56 UTC - Sijie Guo: @Kevin DiVincenzo:

&gt; so with the Admin API not being able to encode/decode `/` properly in the namespace name?

it should already encode and decode “/”. however “/” is used for distinguish namespace, tenant and topic, as well as for distinguish v1 topic format and v2 topic format. so there might something in rest server doesn’t handle encoding properly. (feel free to create a github issue for that)

so I strongly recommend to avoid using “/” in topic name for now, until we identified the issue and fix it properly
----
2019-01-08 02:44:56 UTC - Kevin DiVincenzo: @Sijie Guo - Before I go down the path of using some other delimiter (e.g. `-`), is it safe to assume that there currently isn't a better way to represent this _{eventlog_name}_ *{delimiter}* _{actual pulsar topic}_  relationship within pulsar currently?
----
2019-01-08 02:45:45 UTC - Sijie Guo: can you use `{eventlog_name}` as a namespace?
----
2019-01-08 02:45:48 UTC - Kevin DiVincenzo: I'm planning on using the multi-topic subscription to aggregate all of the child topics into the event log FWIW
----
2019-01-08 02:47:12 UTC - Kevin DiVincenzo: Well each entity in the aggregate root (e.g. event-log) has its own _persistenceId_ (artifact of the Akka Persistence system). Each entity needs to be able to traverse the topic (by sequenceId) for various purposes.
----
2019-01-08 02:47:46 UTC - Kevin DiVincenzo: So you might have 5 assets in some event-log called "asset", each with their own unique persistenceId
----
2019-01-08 02:49:39 UTC - Kevin DiVincenzo: If some other service wanted to read the whole log of assets (vs. a single asset), I was just using the `.topicsPattern(...)` method on the client with `<persistent://tenant/namespace/assets/.*>` as the pattern.
----
2019-01-08 02:49:58 UTC - Kevin DiVincenzo: All of this is actually already working in my little demo (before I build it out to a proper SDK).
----
2019-01-08 02:50:28 UTC - Kevin DiVincenzo: It was just the namespace navigation / admin topics list stuff that had me stumped.
----
2019-01-08 02:52:20 UTC - Kevin DiVincenzo: @Sijie Guo - I guess before I go further down the rabbit hole with this, are there any current limitations for the `MultiTopicsConsumerImpl`?
----
2019-01-08 02:52:30 UTC - Kevin DiVincenzo: E.g. problems with reading from thousands of topics?
----
2019-01-08 02:54:36 UTC - Kevin DiVincenzo: You have the `property` field on the message builder, so I was also thinking of tagging messages with their `persistenceId` - the downside is to see the history for just a single `persistenceId`, you'd have to traverse the topic (e.g. with the Reader interface), filter only those messages for that `persistenceId`, then create some sort of mapping into a logical sequenceId for only those messages.
----
2019-01-08 02:54:59 UTC - Kevin DiVincenzo: Straight forward to do I guess, but I was trying to avoid it if possible / not necessary.
----
2019-01-08 02:55:04 UTC - Sijie Guo: ah:

1) I would suggest you use other delimiters, such as “-” or “_”. so in your use case, your regex will be “<persistent://tenant/namespace/asserts_.*>“.
2)
&gt; problems with reading from thousands of topics?

there shouldn’t be problems reading from thousands of topics. but the number of topics will be bounded by your resources of your client machine, such as memory.
----
2019-01-08 02:55:38 UTC - Sijie Guo: it depends on your use case
----
2019-01-08 02:56:02 UTC - Kevin DiVincenzo: ^^ - perfect thanks. I'm assuming that bounding the client receive buffer to something ~reasonable~ small like `10` should fix #2?
----
2019-01-08 02:57:11 UTC - bossbaby: i was deploy 3 node in 1 cluster but i have a question that 2 in 3 node will node backup and store copy data to handle failure, it right?
----
2019-01-08 02:58:36 UTC - Kevin DiVincenzo: IOW - is that receive buffer per individual topic (assuming yes based on your response) or is it shared between all topics?
----
2019-01-08 02:58:38 UTC - bossbaby: in my node is 1 bookkeeper( 3 bookkeeper - 3 cluster)
----
2019-01-08 03:07:12 UTC - Kevin DiVincenzo: Actually never mind - everything seems to be working fine, up to 1,000 topics. I guess if the number of topics in an event-log ever needs to exceed that number, we'll just use multiple readers.
----
2019-01-08 03:07:17 UTC - Kevin DiVincenzo: Thanks for your help @Sijie Guo
ok_hand : Sijie Guo
----
2019-01-08 03:07:50 UTC - Sijie Guo: yes. it is per topic.
----
2019-01-08 03:08:09 UTC - Sijie Guo: I think there is a setting for the total receiver buffer as well
----
2019-01-08 03:08:57 UTC - Kevin DiVincenzo: Actually - one more question for sanity's sake if you don't mind.
----
2019-01-08 03:09:10 UTC - Chris Chapman: @Chris Chapman has joined the channel
----
2019-01-08 03:11:23 UTC - Kevin DiVincenzo: From testing, it seems like with the default message retention policy and backlog policy, messages are actually *not ever* deleted from the topic. I'm able to later on start a consumer (with `.subscriptionInitialPosition(SubscriptionInitialPosition.Earliest)`) and read the entire history of all messages sent to this topic (this is what I want). Is this the actual/correct behavior?
----
2019-01-08 03:11:39 UTC - Sijie Guo: &gt;  2 in 3 node will node backup and store copy data to handle failure, it right?

bookkeeper has replication to handle failures
----
2019-01-08 03:13:40 UTC - Sijie Guo: &gt;  it seems like with the default message retention policy and backlog policy

I think default message retention policy is deleting the messages after all subscriptions have consumed the messages.
if the messages are not deleted, there might be some subscriptions not acknowledging the messages. I would recommend u using “pulsar-admin topics stats &lt;topic&gt;” to see if there are any subscriptions in that topic not acknowledging the messages.
----
2019-01-08 03:14:21 UTC - Kevin DiVincenzo: Is there a way to tell pulsar "don't delete messages ever" then?
----
2019-01-08 03:15:11 UTC - Kevin DiVincenzo: (was planning on using the awesome Bookkeeper tiered storage feature)
----
2019-01-08 03:15:21 UTC - bossbaby: so 2 bookeeper in 2 Vms and it will replicaion ?
----
2019-01-08 03:15:46 UTC - Sijie Guo: yes
----
2019-01-08 03:18:06 UTC - Sijie Guo: @Kevin DiVincenzo - currently you can configure the retention policy (by setting retention time to -1) to keep the data forever.  it is per namespace basis. <http://pulsar.apache.org/docs/en/cookbooks-retention-expiry/>
----
2019-01-08 03:19:56 UTC - Kevin DiVincenzo: i.e. `pulsar-admin namespaces set-retention my-tenant/my-ns \
  --size -1 \
  --time -1`
+1 : Sijie Guo
----
2019-01-08 03:20:28 UTC - Sijie Guo: &gt; messages are actually *not ever* deleted from the topic.

actually I took my previous comment back. it is probably related to how pulsar garbage collects data. pulsar garbage collects data by segments. so there is at least one segment kept even all your consumers are consumed the messages. that might explain why you receive all the data after restarting from `earliest`.

anyway, in general, you can use “topics stats” and “topics stats-internal” to see more details about the topic
----
2019-01-08 03:20:34 UTC - Sijie Guo: yes
----
2019-01-08 03:20:49 UTC - Kevin DiVincenzo: Yup - that's want I needed. Thanks again.
----
2019-01-08 03:22:40 UTC - Sijie Guo: cool
----
2019-01-08 03:27:21 UTC - bossbaby: great, i thought i will setup to replication but now, setup every broker with every node and bookkeeper will replcate data to every bookkeeper have in cluster
----