You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2018/11/19 09:11:03 UTC

Slack digest for #general - 2018-11-19

2018-11-18 12:22:25 UTC - Samuel Sun: Hi, anyone knows when the PublishTIme will be written to the pulsar ? the client send time or the broker write time ?
----
2018-11-18 13:12:08 UTC - Nicolas Ha: here you go :slightly_smiling_face: <https://github.com/apache/pulsar/issues/3009>
----
2018-11-18 15:43:45 UTC - Matteo Merli: @Samuel Sun it’s set in client library right before send the message. 
----
2018-11-19 01:32:56 UTC - zy: @zy has joined the channel
----
2018-11-19 02:01:36 UTC - aylen: @aylen has joined the channel
----
2018-11-19 02:09:38 UTC - Sijie Guo: thank you
----
2018-11-19 02:26:38 UTC - Samuel Sun: noted, thanks, is there any way to know the time broker write data to storage ?
----
2018-11-19 02:34:54 UTC - zy: hello, anyone knows how to set a topic for two different subscriptions, Can you provide sample code?
----
2018-11-19 02:35:40 UTC - Sijie Guo: you can just use two different subscription names to subscribe.
----
2018-11-19 02:42:32 UTC - zy: 
----
2018-11-19 02:42:52 UTC - zy: 
----
2018-11-19 02:43:10 UTC - zy: One of them can’t receive it
----
2018-11-19 02:44:32 UTC - Sijie Guo: what do you mean they can’t receive? can you describe your test sequence?

(fyi, for a new subscription, the consumption starts from latest messages by default)
----
2018-11-19 02:47:53 UTC - zy: How to set to the specified location
----
2018-11-19 02:49:21 UTC - Sijie Guo: you can use setSubscriptionInitialPosition(SubscriptionInitialPosition.earliest)
----
2018-11-19 03:42:21 UTC - zy: How do I know which subscription is in which location?
----
2018-11-19 03:44:16 UTC - Sijie Guo: @zy what you are trying to do here?
----
2018-11-19 03:49:09 UTC - Sijie Guo: typically you don’t need to care about the location (offset) of a subscritpion. the offset is all managed by brokers, it is not exposed directly to consumers. consumers use ack to confirm what messages have been consumed by the consumers.

you can use bin/pulsar-admin to query stats to get the last consumed message id and use a few commands (like reset-cursor) to move the cursors if you want the subscription to rewind
----
2018-11-19 03:53:20 UTC - zy: my english is not very good, can you speak chinese?
----
2018-11-19 03:53:56 UTC - Sijie Guo: we have a <#CC2CMFY76|china> channel. you can join there and ask questions in Chinese :slightly_smiling_face:
----
2018-11-19 03:54:05 UTC - Tony: From the Framing documentation (url <https://pulsar.apache.org/docs/en/develop-binary-protocol/>), it states ```The maximum allowable size of a single frame is 5 MB.```. How many frames can a payload to be sent if I have a message greater than 20MB for example?
----
2018-11-19 03:56:31 UTC - Sijie Guo: @Zuyu Zhang currently it doesn’t support sending a message more than the maximum size of a single frame. the setting is configurable. you can increase the setting to 20MB. However I don’t think it is a very optimal solution.

so my question is what are you trying to sent ?
----
2018-11-19 03:57:12 UTC - Tony: a compressed huge XML document
----
2018-11-19 04:00:03 UTC - zy: Consumer consumer = client.newConsumer()
                .topic("<persistent://my-tenant/my-namespace/my-topic>")
                .consumerName("test")
                .subscriptionName("hoho")
                .subscribe();
----
2018-11-19 04:00:11 UTC - zy: Consumer consumer = client.newConsumer()
                .topic("<persistent://my-tenant/my-namespace/my-topic>")
                .consumerName("test2")
                .subscriptionName("haha")
                .subscribe();
----
2018-11-19 04:00:49 UTC - zy: 同时启动这两个main程序,但是只有一个收到message
----
2018-11-19 04:01:25 UTC - Tony: Since the size increase setting is configurable, can you provide me where to configure it? do I need to bounce all pulsar brokers after the change is updated?
----
2018-11-19 04:01:26 UTC - Sijie Guo: @Tony okay I see. a temp workaround is to increase the setting.
----
2018-11-19 04:02:12 UTC - Sijie Guo: the setting is in bookies. you need to bounce bookie after the configuration is applied.
----
2018-11-19 04:04:27 UTC - Sijie Guo: ```
# The maximum netty frame size in bytes. Any message received larger than this will be rejeted. Default value is 5MB.
# nettyMaxFrameSizeBytes=5242880
```

just add this setting to `bookkeeper.conf` and bounce the bookies
----
2018-11-19 04:04:39 UTC - Sijie Guo: uncomment `nettyMaxFrameSizeBytes`
----
2018-11-19 04:05:20 UTC - Tony: You also mentioned it’s not a very optimal solution -- do you refer to only this “single frame with big size” ? or all the other small-size frames would also get impacted in performance because of the size is changed to bigger number?
----
2018-11-19 04:11:16 UTC - Sijie Guo: &gt; “do you refer to only this “single frame with big size”

yes. I mean currently one message is only sent with frame. which means brokers and bookies need to allocate same amount of memory for transferring the “huge” messages to disk.

ideally, pulsar should handle this better to break down a huge message into small frames, that would be optimal for both network, memory and disk IO.
----
2018-11-19 04:14:16 UTC - Tony: yeah, agree. hope the future version would take this breaking down non-optimal message size into optimal ones automatically.
----
2018-11-19 04:14:33 UTC - Sijie Guo: yes. that is on our roadmpa
----
2018-11-19 04:14:34 UTC - Sijie Guo: map
----
2018-11-19 04:22:24 UTC - Tony: just curious, the DL/Bookeeper seems not require to have Zookeeper to run together, why Pulsar needs Zookeeper? Or I mis-understanding something here … Thanks for clarification
----
2018-11-19 04:26:22 UTC - Sijie Guo: @Tony: BK requires zookeeper to run. but BK has abstract all the metadata operations into an interface, currently there is an ongoing effort in BK to support Etcd as the metadata store. Pulsar also has similar metadata interfaces, so ideally it should be able to use any type of metadata store. In BK, there is an going effort to provide a built-in table service, which can be used by pulsar as metadata store. we will start moving some of the metadata stores (e.g. schema metadata store) to this built-in table service, so eventually (in a few releases) pulsar will reduce the need of zookeeper (and maybe remove it )
----
2018-11-19 04:34:29 UTC - Tony: How soon would the Pulsar clear out all messages if I change the TTL to 1ms? In Kafka, if I change the retention time to something like this, it would take few minutes to clear out all messages from a topic.
----
2018-11-19 04:39:50 UTC - Sijie Guo: @Tony:

there are a couple of things here:

1)  TTL means “auto-consume” after a given time period. It doesn’t mean data will be removed.

change TTL to 1ms, it will take a couple (seconds) to apply the polices, and it will mark the messages that are not consumed as “consumed”.

2) data will be marked “removed”  only when all subscriptions consume the messages and it passes retention period.

3) the actual deletion happens on ledger rolling happens. the pulsar will delete the ledgers that are not retened anymore.

4) the data of the ledger is removed lazily by a garbage collection thread running on bookies.

all the steps 1) - 3) are just changing metadata and doesn’t delete the actual data. when 3) happens, the actual data will be deleted in a configured gc interval configured in bookies.

Hope this explains
----
2018-11-19 04:51:45 UTC - Tony: 1) so how do I know once all messages are marked in ‘not-able-to-consumed’ when TTL set to 1ms, for example, so that I can change the TTL back to the original retention time for longer period?
2) how frequent the GC kicks in from bookies? is it configurable somewhere?
----
2018-11-19 04:57:00 UTC - Sijie Guo: &gt; how do I know once all messages are marked in ‘not-able-to-consumed’ when TTL set to 1ms,

you can verify it with a simple test:

1) disconnect your consumer
2) produce 100 messages.
3) wait for a few minutes for TTL to kick in. (the interval for expiring messages is configured by <https://github.com/apache/pulsar/blob/master/conf/broker.conf#L83> )

&gt;  for example, so that I can change the TTL back to the original retention time for longer period?

you can change TTL back. but the cursor will not be rewinded, the new TTL will be applied to new messages only.

if you want to rewind your subscription, use pulsar-admin reset-cursor.

&gt; how frequent the GC kicks in from bookies? is it configurable somewhere?

you can configure the gc interval : <https://github.com/apache/pulsar/blob/master/conf/bookkeeper.conf#L133>
----
2018-11-19 05:03:27 UTC - Tony: Thank you so much for your info. much appreciated!
btw, noticed that the info under ```Run Pulsar Standalone in Docker``` (<https://pulsar.apache.org/docs/en/standalone/>) , the command of `docker run -it -p 80:80 -p 8080:8080 -p 6650:6650 apachepulsar/pulsar-standalone` seems not working. Either there’s no `apachepulsar/pulsar-standalone` repo available for public, or the info is missing something though.
----
2018-11-19 05:04:35 UTC - Sijie Guo: oh I see. pulsar-standalone was a new image added in 2.2. so it is probably not published when 2.2 was released.

you can change it to use apachepulsar/pulsar or apachepulsar/pulsar-all
----
2018-11-19 05:04:59 UTC - Sijie Guo: do you mind filing an issue for us?
----
2018-11-19 05:05:24 UTC - Tony: sure, I’d glad to. where to file it?
----
2018-11-19 05:06:07 UTC - Sijie Guo: we are using github for managing the issues. so if you can create a github issue, that would be good : <https://github.com/apache/pulsar/issues>
----
2018-11-19 05:06:25 UTC - Tony: awesome, will do!
----
2018-11-19 05:06:32 UTC - Sijie Guo: cool thank you
----