You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2020/06/16 09:11:05 UTC

Slack digest for #general - 2020-06-16

2020-06-15 09:51:41 UTC - Tushar Sonawane: @Tushar Sonawane has joined the channel
----
2020-06-15 11:00:39 UTC - Dave Winstone: @Dave Winstone has joined the channel
----
2020-06-15 11:10:21 UTC - Grzegorz: @Grzegorz has joined the channel
----
2020-06-15 11:18:07 UTC - Grzegorz: Hi guys. I have a question regarding topics in Pulsar. I don't understand how partitioned and non-partitioned topics are related to persistence.
According to docs #1 both non-partitioned and partitioned topics must be explicitly created.  But here #2 I can see that topics are automatically created.
How these two categories are connected? Are auto-created topics non-partitioned by default? If so why #1 says non-partitioned must be created explicitely? Can I have persistent and partitioned topic?

#1 <https://pulsar.apache.org/docs/en/2.5.0/admin-api-non-partitioned-topics/>
#2 <https://pulsar.apache.org/docs/en/2.5.0/concepts-messaging/>
----
2020-06-15 11:21:24 UTC - Ankur Jain: New to pulsar (have worked with kafka in production) and have been going through docs/talks on internet around the same. We are evaluating it for replacing a bunch of kafka deployments with a multi-tenant messaging platform and impressed with the pains related to Kafka that it can potentially solve. Throughput requirements for such a platform are ~5M messages/sec but cumulative across topics including log/metric ingestion, data streaming, etc. We are yet to run some benchmarks. I had few concerns around pulsar's setup and would love to get some pointers from the community:
1. Read from partition in pulsar is tied to a broker similar to kafka. We have ordering requirements so scaling out consumption in kafka means adding new partitions but old partitions continue to remain a bottleneck. I believe this is the same with pulsar? Any guidelines on how to configure partitions for a topic and increase them for ordered streams? Fanouts are not preknown to us always and a high-throughput topic can become popular later.
2. Lagging consumers compete for disk i/o on kafka brokers. Impacts produce throughput as well as consume throughput. Since brokers are stateless in pulsar, it would mean a n/w hop to bookie (old messages will not be in broker cache) and random disk i/o on bookie to read the fragment. Is my understanding correct and how does pulsar ensure optimal performance(better read latency/throughput than kafka?) for catch-up consumers. We unfortunately have systems where catch-up consumers are a reality.
3. We have machines with 1 hdd in our datacenter. ssd-specific machines are available but costlier and lower capacity so aggressive retention which we may not want. Recommended bookie configurations suggest 1 disk(ssd?) for wal and 1 disk for ledger mounted on the same instance. If we don't have such instance types available, how much impact can we expect on reads/writes through bookkeeper as compared to kafka brokers which we run today on these single hdd instances. I understand that kafka does not do fsync and we are ok with the same level of durability which disabling fsync in bookkeeper would achieve. Keeping that in mind, do the instance types we have can make the cut for bookies and how can that be tuned? Official docs do not mention anything around this.
----
2020-06-15 11:41:56 UTC - Sankararao Routhu: yes @Sijie Guo they are in the same VPC
----
2020-06-15 13:25:14 UTC - Konstantinos Papalias: re 1. Some strategy both on Kafka and Pulsar would be to always over-provision partitions for a topic, did you have any restrictions for not doing it on Kafka? E.g large number of partitions across all topics ?
----
2020-06-15 14:17:03 UTC - Raphael Enns: Hi. I'm getting a lot of WARN log lines in my pulsar-broker.log file. It is always wrapped in the set of log lines like below (the 2nd line):
```10:13:07.025 [pulsar-io-22-1] INFO  org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:50580] Subscribing on topic <persistent://public/default/4736a6614c9a4e29aa73871956180ef1> / reader-02257d519f
10:13:07.025 [pulsar-io-22-1] WARN  org.apache.pulsar.broker.service.BrokerService - No autoTopicCreateOverride policy found for <persistent://public/default/4736a6614c9a4e29aa73871956180ef1>
10:13:07.025 [pulsar-io-22-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [<persistent://public/default/4736a6614c9a4e29aa73871956180ef1>][reader-02257d519f] Creating non-durable subscription at msg id 13144:1:-1:0
10:13:07.025 [pulsar-io-22-1] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [public/default/persistent/4736a6614c9a4e29aa73871956180ef1-reader-02257d519f] Rewind from 79:0 to 79:0
10:13:07.025 [pulsar-io-22-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [<persistent://public/default/4736a6614c9a4e29aa73871956180ef1>] There are no replicated subscriptions on the topic
10:13:07.025 [pulsar-io-22-1] INFO  org.apache.pulsar.broker.service.persistent.PersistentTopic - [<persistent://public/default/4736a6614c9a4e29aa73871956180ef1>][reader-02257d519f] Created new subscription for 2077315
10:13:07.025 [pulsar-io-22-1] INFO  org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:50580] Created subscription on topic <persistent://public/default/4736a6614c9a4e29aa73871956180ef1> / reader-02257d519f```
Is something wrong happening here? Is there anything I can do to fix it?
Thanks
----
2020-06-15 14:27:11 UTC - Ankur Jain: Thanks for the response! On one of the bigger kafka clusters we have, there are ~4K topics. Over-provisioning partitions means increasing zk load on fetch requests, broker startup times get increased, etc. So while we do over provision, there is always an element of caution attached. Seeing how pulsar uses zk to do bookkeeping as well, I assumed similar concerns would apply here as well.
Hoping to get some response on remaining concerns also :slightly_smiling_face:
ok_hand : Konstantinos Papalias
----
2020-06-15 15:25:01 UTC - Marcio Martins: Thanks @Sijie Guo
----
2020-06-15 15:32:23 UTC - Sankararao Routhu: when single broker is given for brokerServiceUrl, proxy to broker lookup is working but when nlb is given, proxy to broker lookup is not working
----
2020-06-15 17:13:31 UTC - Addison Higham: Question for y'all: what are your key metrics for seeing how close a pulsar cluster is to capacity? I know that it is a simple question with a complicated answer, but just curious what sort of top-line metrics you pay the most attention to (or better yet, what you alert on to indicate it may be time to scale up)
+1 : Anup Ghatage
----
2020-06-15 17:43:01 UTC - Asaf Mesika: Very interesting design choice
----
2020-06-15 17:43:17 UTC - Asaf Mesika: Wondering the motivation to go this way?
----
2020-06-15 17:43:55 UTC - Asaf Mesika: So this code will only “live” as pulsar function, or you will also have library code running in your microservice?
----
2020-06-15 17:45:55 UTC - Asaf Mesika: Also, another question, how did you solve the double submission issue: Say you have a Job (e.g. Send Mail) and you launch it with parameters like: emailAddress and body. Now presume you submitted it twice with same parameters (say you did something and you didn’t commit, app got killed, restarted and did it again). How did you think to protect against running the same job twice (or even, due to this, in parallel, since now you have two messages representing the same execution)?
----
2020-06-15 17:49:02 UTC - Gilles Barbier: The intention is mainly to minimize the need for any infrastructure beyond Pulsar itself. I know from experience that scaling distributed systems is hard and I prefer to rely on Pulsar for that.
----
2020-06-15 17:52:12 UTC - Gilles Barbier: Regarding your question, each client producing a command includes an unique id with it. If a "Send Email" is sent twice with same id, the second one will be discarded by the engine function that maintains the state of this task.
----
2020-06-15 17:52:49 UTC - Asaf Mesika: ok. What happens if you have 1M unique IDs every min?
----
2020-06-15 17:53:07 UTC - Asaf Mesika: How is the state managed? In-memory in a single machine?
----
2020-06-15 17:56:20 UTC - Gilles Barbier: Yeah, for this scale Pulsar is still missing a  feature, but it should be a relatively easy one to add: to have a key-shared subscription for functions. With it, and using the id for keys, you should be able to scale horizontally your engine functions and even to keep states in memory if needed
----
2020-06-15 17:58:50 UTC - Gilles Barbier: I have entered an issue for that <https://github.com/apache/pulsar/issues/6527>
----
2020-06-15 18:03:45 UTC - Gilles Barbier: (note that even today it's possible to run the engine on a key-shared subscription as a "normal" consumer, but then you need to add your own  storage for states)
----
2020-06-15 18:07:44 UTC - Asaf Mesika: So the equivalent in my case which is only a library code is to have shared state like KV store which is persisted 
----
2020-06-15 18:08:55 UTC - Asaf Mesika: I really wish would allow dedup like they do but based on my own key
----
2020-06-15 18:09:13 UTC - Asaf Mesika: And not always increase key
----
2020-06-15 18:20:43 UTC - Gilles Barbier: You are aware also that Pulsar has message deduplication feature <http://pulsar.apache.org/docs/en/cookbooks-deduplication/>
----
2020-06-15 18:21:34 UTC - Gilles Barbier: oh I see, it's what you mean by "like they do but based on my own key"
----
2020-06-15 18:22:18 UTC - Manjunath Ghargi: For question #2: In Pulsar there is a different read cache v/s write cache, hence even if there are millions of backlogs producer throughput is not effected. consumers can consume at thier own pace.
----
2020-06-15 18:36:20 UTC - Manjunath Ghargi: Hi,
How do we equally balance the load across all the brokers?
We have 3 brokers configured with 9 topics * 3 partitions each, the load is equally getting divided across partitions but not across brokers, some times for a given topic all 3 partitions are being served by a single broker. which is causing the CPU to go high as 80% on one broker while the other 2 brokers have less than 20% CPU.
On the producer client end, we are setting (MessageRoutingMode=RoundRobinParition)
Can someone suggest the right configuration for balancing the load across brokers equally?
----
2020-06-15 18:54:34 UTC - Manjunath Ghargi: @Sijie Guo: Can you please suggest the right configuration?
----
2020-06-15 18:55:16 UTC - Asaf Mesika: Btw: can you live good without the TX that is coming in 2.7.0? Your function in produce a message but fail to ack
----
2020-06-15 18:59:56 UTC - Alexandre DUVAL: Hi, there is a way to tune max http header content length/size? :D
----
2020-06-15 19:14:56 UTC - Gilles Barbier: As far as I can say, the worst case is a duplicate task processing
----
2020-06-15 19:22:05 UTC - Addison Higham: are all these topics in the same namespace @Manjunath Ghargi?
----
2020-06-15 19:24:13 UTC - Addison Higham: topics get grouped into "bundles" which is just a hash ring. Bundles are what get mapped to brokers. By default, a namespace has 4 bundles. Bundles should eventually split naturally, but it can take a while, so if they are all in the same namespace, it may make a lot of sense to turn those 4 bundles into like 12 bundles, giving you more "units" that can be  distributed across the cluster
----
2020-06-15 20:06:05 UTC - Alexandre DUVAL: (on proxy)
----
2020-06-15 20:41:04 UTC - Sree Vaddi: <https://pulsar-summit.org/schedule/first-day>
----
2020-06-15 20:48:30 UTC - David Lanouette: Will this be recorded?  There's lots of interesting talks, but I've gotta work during those times. :disappointed:
----
2020-06-15 20:55:13 UTC - Ankur Jain: @Manjunath Ghargi Won't the write cache required to be flushed to log entry disk while they are reads happening from the same disk for catchup-consumers causing contention similar to kafka? Maybe I am missing something here?
----
2020-06-15 21:31:33 UTC - Addison Higham: is there a way to force major compaction on a bookie?
----
2020-06-15 22:24:44 UTC - Anup Ghatage: You can set config param `isForceGCAllowWhenNoSpace=true`  if you want to force GC when you’re running out of space.
Other than that, I’ve not done this myself, but there are configuration options we can play around with for this.
viz. `majorCompactionInterval`, `majorCompactionThreshold`  and `gcWaitTime`

Even if you do set these, you’d have to restart bookies with these config options set to whatever values you want.
----
2020-06-15 23:18:21 UTC - Sijie Guo: I think you also provide the http endpoint to do so.
----
2020-06-15 23:19:07 UTC - Sijie Guo: It will be recorded and uploaded to Youtube after the summit. Follow @PulsarSummt to the updates :slightly_smiling_face:
----
2020-06-15 23:20:30 UTC - Sijie Guo: 1. Disk space
2. Storage size &amp; msg backlog
3. Latency
----
2020-06-15 23:21:33 UTC - Sijie Guo: I think the warning logs are introduced by a new feature. They are  annoying. @Penghui Li @jia zhai we should consider reduce the logging level.
----
2020-06-15 23:49:47 UTC - Matteo Merli: You mean to impose a max size?
----
2020-06-15 23:51:54 UTC - Alexandre DUVAL: there is currently a max size and i'd like to configure it
----
2020-06-15 23:52:11 UTC - Alexandre DUVAL: ```Jun 15 21:41:31 yo-pulsar-c1-n9 pulsar-proxy[28315]: 21:41:31.096 [pulsar-external-web-5-6] DEBUG org.eclipse.jetty.client.HttpSender - Request shutdown output HttpRequest[GET /admin/v2/namespaces/user_7684cfc9-f54e-4e09-848c-1953af6e3e89/pulsar_c61f4274-4725-4e9f-8196-3cd7edcd77e5/retention HTTP/1.1]@1965e4b
7
Jun 15 21:41:31 yo-pulsar-c1-n9 pulsar-proxy[28315]: 21:41:31.096 [pulsar-external-web-5-6] DEBUG org.eclipse.jetty.client.HttpSender - Request failure HttpRequest[GET /admin/v2/namespaces/user_7684cfc9-f54e-4e09-848c-1953af6e3e89/pulsar_c61f4274-4725-4e9f-8196-3cd7edcd77e5/retention HTTP/1.1]@1965e4b7 HttpEx
change@6c710cea req=COMPLETED/org.eclipse.jetty.http.BadMessageException: 500: Request header too large@6e4cc94a res=PENDING/null@null on HttpChannelOverHTTP@47ba053f(exchange=HttpExchange@6c710cea req=COMPLETED/org.eclipse.jetty.http.BadMessageException: 500: Request header too large@6e4cc94a res=PENDING/null@null)[s
end=HttpSenderOverHTTP@7545f435(req=FAILURE,snd=FAILED,failure=org.eclipse.jetty.http.BadMessageException: 500: Request header too large)[HttpGenerator@571d274{s=END}],recv=HttpReceiverOverHTTP@1c455293(rsp=IDLE,failure=null)[HttpParser{s=START,0 of -1}]]: {}
Jun 15 21:41:31 yo-pulsar-c1-n9 pulsar-proxy[28315]: 21:41:31.096 [pulsar-external-web-5-2] DEBUG org.eclipse.jetty.io.ManagedSelector - Created SocketChannelEndPoint@3ac82237{yo-pulsar-c1-n2/192.168.10.11:2005&lt;-&gt;/192.168.10.14:55540,OPEN,fill=FI,flush=-,to=1/30000}{io=1/1,kio=1,kro=8}-&gt;SslConnection
@2d469d74{NOT_HANDSHAKING,eio=-1/-1,di=-1,fill=INTERESTED,flush=IDLE}~&gt;DecryptedEndPoint@56e4c405{yo-pulsar-c1-n2/192.168.10.11:2005&lt;-&gt;/192.168.10.14:55540,OPEN,fill=FI,flush=-,to=2/30000}=&gt;HttpConnectionOverHTTP@492f6c59(l:/192.168.10.14:55540 &lt;-&gt; r:yo-pulsar-c1-n2/192.168.10.11:2005,closed=false)=&gt;
HttpChannelOverHTTP@47ba053f(exchange=HttpExchange@6c710cea req=COMPLETED/org.eclipse.jetty.http.BadMessageException: 500: Request header too large@6e4cc94a res=PENDING/null@null)[send=HttpSenderOverHTTP@7545f435(req=FAILURE,snd=FAILED,failure=org.eclipse.jetty.http.BadMessageException: 500: Request header too large)[
HttpGenerator@571d274{s=END}],recv=HttpReceiverOverHTTP@1c455293(rsp=IDLE,failure=null)[HttpParser{s=START,0 of -1}]]
Jun 15 21:41:31 yo-pulsar-c1-n9 pulsar-proxy[28315]: 21:41:31.096 [pulsar-external-web-5-6] DEBUG org.eclipse.jetty.client.HttpExchange - Terminated request for HttpExchange@6c710cea req=TERMINATED/org.eclipse.jetty.http.BadMessageException: 500: Request header too large@6e4cc94a res=PENDING/null@null, result
: null```
----
2020-06-15 23:52:17 UTC - Alexandre DUVAL: @Matteo Merli ^
----
2020-06-15 23:52:56 UTC - Alexandre DUVAL: so there is a limitation on the request from proxy to broker, right?
----
2020-06-15 23:53:03 UTC - Matteo Merli: I think there should be one that we configure on jetty 
----
2020-06-15 23:53:51 UTC - Alexandre DUVAL: currently i think it comes from jetty defaults
----
2020-06-15 23:53:54 UTC - Alexandre DUVAL: ```        value = config.getInitParameter("requestBufferSize");
        if (value != null)
            client.setRequestBufferSize(Integer.parseInt(value));```
----
2020-06-15 23:54:44 UTC - Alexandre DUVAL: (if it's the request from proxy to broker that throw this)
----
2020-06-15 23:56:01 UTC - Matteo Merli: In the exception above, is it thrown by broker or proxy?
----
2020-06-15 23:56:43 UTC - Alexandre DUVAL: proxy
----
2020-06-15 23:57:14 UTC - Alexandre DUVAL: i tried to run the request without proxy, it works well and doesn't throw this
----
2020-06-15 23:57:34 UTC - Alexandre DUVAL: so i admitted it was during the proxy to broker request
----
2020-06-15 23:59:11 UTC - Alexandre DUVAL: so i think the only way is to put `client.setRequestBufferSize(32000);` on AdminProxyHandler httpclient
----
2020-06-15 23:59:18 UTC - Alexandre DUVAL: as quickfix
----
2020-06-16 00:14:17 UTC - jia zhai: sure
----
2020-06-16 00:21:11 UTC - Greg Methvin: I think we’re talking about the subscription backlog, which is `msgBacklog` in the topic stats. My understanding is that this is actually the number of message _batches_ rather than the actual number of messages. Is that still the case?
----
2020-06-16 00:30:16 UTC - Penghui Li: This is fixed by <https://github.com/apache/pulsar/pull/7080> at will release at 2.6.0
----
2020-06-16 00:48:54 UTC - David Lanouette: Great!  Thanks for the response.
----
2020-06-16 00:54:28 UTC - Matteo Merli: Got it. We should then probably make that configurable.
----
2020-06-16 00:56:25 UTC - Matteo Merli: @Ankur Jain

&gt;  We have ordering requirements so scaling out consumption in kafka means adding new partitions but old partitions continue to remain a bottleneck. I believe this is the same with pulsar?
You can use KeyShared subscription type to be able to scale consumers within a single partition, while retaining ordering (per key)
----
2020-06-16 00:56:43 UTC - Matteo Merli: So, no need to over-provision partitions
----
2020-06-16 01:01:22 UTC - Matteo Merli: &gt;  Won't the write cache required to be flushed to log entry disk while they are reads happening from the same disk for catchup-consumers causing contention similar to kafka? Maybe I am missing something here?
The way bookies write to disk is very different from Kafka. All writes the bookie does are sequential.

The critical writes are on the journal, which is dedicated to the writes, so no interferences from reads.

When we flush the write-cache, this happens in background, so it does not impact the latency of requests. Also, it's a bulk write on one single big file (or few files), so it's as efficient as it can get when writing to disk.

In kafka, you're basically relying on page cache to do background flushes, but it's writing on multiple files at the same time (eg: 5 files per partition, between data and indices) so it's really generating random IO write workload which works not very well for SSDs. When the page cache is under-pressure (since the reads will pull in old files), this *will* definitely impact your write latency.
----
2020-06-16 01:03:50 UTC - Alexandre DUVAL: indeed
----
2020-06-16 01:59:18 UTC - Rich Adams: @Rich Adams has joined the channel
----
2020-06-16 05:28:54 UTC - Hiroyuki Yamada: @Penghui Li @Matteo Merli I reported in the Github issue page.
<https://github.com/apache/pulsar/issues/5819#issuecomment-644510797>

As far as I checked, the issue is still happening. But  it doesn’t happen if consistent hashing is used.
Please take a look.
----
2020-06-16 05:42:16 UTC - Narayan: @Narayan has joined the channel
----
2020-06-16 05:44:01 UTC - Penghui Li: Thanks for you update, maybe there are some problems with the auto-split mechanism. I have reopened the <https://github.com/apache/pulsar/issues/6554|#6554> for tracking this issue.
man-bowing : Hiroyuki Yamada
+1 : Hiroyuki Yamada
----
2020-06-16 06:27:04 UTC - Isaiah Rairdon: Looking for some guidance with liveness probe. I was trying to add liveness probe to our helm chart. I haven't been able to get the OOTB probe to work as we are using mtls for Auth. I have working curl calls working for basic admin calls like
curl <https://pulsar-broker:8443/admin/clusters> --key server.key --cacert issue_ca.crt --cert server.crt
but doing the same call using route for status.html give me a 404. Am I doing something wrong with the route or call here?
curl <https://pulsar-broker:8443/status.html> --key server.key --cacert issue_ca.crt --cert server.crt
&lt;html&gt;
&lt;head&gt;
&lt;meta http-equiv="Content-Type" content="text/html;charset=utf-8"/&gt;
&lt;title&gt;Error 404 Not Found&lt;/title&gt;
&lt;/head&gt;
&lt;body&gt;&lt;h2&gt;HTTP ERROR 404&lt;/h2&gt;
&lt;p&gt;Problem accessing /status.html. Reason:
&lt;pre&gt;    Not Found&lt;/pre&gt;&lt;/p&gt;&lt;hr&gt;&lt;a href="<http://eclipse.org/jetty>"&gt;Powered by Jetty:// 9.4.20.v20190813&lt;/a&gt;&lt;hr/&gt;

&lt;/body&gt;
&lt;/html&gt;
----
2020-06-16 07:17:21 UTC - Sankararao Routhu: <!here> is there a plugin or any other way to add whitelisting solution in pulsar-proxy? We wanted to allow connections from only trusted/known networks/accounts
----