You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2018/02/23 17:19:41 UTC

Slack digest for #general - 2018-02-23

2018-02-22 19:16:18 UTC - Sijie Guo: @Masakazu Kitajo: I think @Matteo Merli setup a jenkins job for the slack digest job. so if you have jenkins access, you can access that
----
2018-02-22 19:17:09 UTC - Matteo Merli: Yes, I replied on the mailing list.. it was my mistake on the cron schedule…
----
2018-02-22 19:17:20 UTC - Matteo Merli: it went off every min for 1h
----
2018-02-22 19:18:27 UTC - Sijie Guo: oh i see
----
2018-02-22 19:25:14 UTC - Sijie Guo: @SansWord Huang sorry for late response. just saw your replies. 

- you don’t need large capacity for journal disk. but it is critical to latency, you probably want a hdd with battery-backup-unit or an ssd, since it is doing fsyncs. ledger disks are basically the place where the data is eventually stored. so you need to caculate that based on how many data you are going to store.

&gt; data first written into journal disc then “flush” into ledger storage?

yes data is first written to journal disk and the write is responded once the data is fsynced to journal disk. and the data is asynchronously indexed and flushed back to ledger storage.

&gt; when will data be rebalanced?

if you are using bookkeeper in the log/messaging workload, you typically don’t need data rebalance, because when a ledger is created, there are old ledgers deleted because the data has been expired due to retention. 

however if you are using bookkeeper for long term storage, you might need some sort of data rebalance. There was a BP (bookkeeper proposal) in bookkeeper for that purpose. but you can still do it using autorecovery to rebalance the data manually. 

so this topic depends on what are you using pulsar (bookkeeper) for.

&gt; how do I know my data is already replicated?

if a ledger is underreplicated, it will be listed at zookkeeper under an `underreplicated` znode. there are bookkeeper CLI and also metrics for that as well.
----
2018-02-22 19:57:02 UTC - Karthik Palanivelu: @Sijie Guo I cannot use the deployment model directly from source code. ASG is autoscaling group which will bring up a new instance in case of a node failure. In ZK case if a node fails a new ZK node comes up with a different IP. This need to be updated as a ZK node in pulsar by replacing the old IP. To avoid it AWS allows us to generate Static IPs. We can use these IPs for ZK so that we can hard code it in Pulsar. In this scenario if a ZK fails new ZK node comes up with a assigned IP. I am checking is there a better way to handle this scenario?
----
2018-02-22 20:00:55 UTC - Matteo Merli: @Karthikeyan Palanivelu If you’re using ASG with ZK nodes, you could also assign DNS names to each ZK server. That way there’s no need to change the configuration in other ZK ensemble members when a node is replaced with one with a different IP
----
2018-02-22 20:57:02 UTC - Karthik Palanivelu: @Matteo Merli we have a separate system to manage DNS which makes one more point of failure.
----
2018-02-22 22:54:41 UTC - Matteo Merli: Sure, I was thinking more of the AWS managed DNS
----
2018-02-23 06:14:58 UTC - SansWord Huang: @Sijie Guo Thanks for all answers, these helps me a lot on understanding how Pulsar work.
----
2018-02-23 06:20:43 UTC - SansWord Huang: @SansWord Huang uploaded a file: <https://apache-pulsar.slack.com/files/U9CDBEH1P/F9D3G828H/bookie_restart_error|bookie_restart_error> and commented: By an experiment today I put too many messages into Pulsar and Bookies node shut down.
After extend storage they use, I've tried to restarted all bookies.
And here comes two problem:
1. I've skipped all message using pulsar-admin, when will disc space be released? 
2. one of my bookie node can not restart with the following error message, what can I do?
----
2018-02-23 07:00:43 UTC - SansWord Huang: @SansWord Huang commented on @SansWord Huang’s file <https://apache-pulsar.slack.com/files/U9CDBEH1P/F9D3G828H/bookie_restart_error|bookie_restart_error>: The first question, once I’ve produce more messages, old ledgers will be deleted and disk space is released.
----
2018-02-23 07:01:28 UTC - SansWord Huang: @SansWord Huang commented on @SansWord Huang’s file <https://apache-pulsar.slack.com/files/U9CDBEH1P/F9D3G828H/bookie_restart_error|bookie_restart_error>: The second, I still don’t know why, but I decide to delete this node’s journal and ledgers and start again.
----
2018-02-23 07:18:56 UTC - Sijie Guo: @SansWord Huang sorry for late response. just saw the message now.
----
2018-02-23 07:19:55 UTC - Sijie Guo: for the first question 1) the ledgers are deleted on new ledgers rolled. new ledgers rolled based on time or size. so if you produce new messages, it will triggered ledger rolling, it then will delete ledgers that are all ready skipped.
----
2018-02-23 07:21:18 UTC - Sijie Guo: for the second questions 2) it seems during replaying journal, it tries to replay the entries and it encountered issues on inserting those entries. I am wondering if your disks were full at that time?
----
2018-02-23 08:11:50 UTC - SansWord Huang: yes, I’ve noticed even I’ve expanded the disc, it’s not enough for journal to replay.
so the quickest way is to delete data and restart this book keeper node.

but lesson I learned is that 
1. I should really separate disc for journal and ledgers.
2. if not doing so, I should save some space for ledgers to be able to playback while journal is growing.
----
2018-02-23 08:18:47 UTC - Sijie Guo: yeah i see
----
2018-02-23 10:43:31 UTC - Till Rathschlag: @Till Rathschlag has joined the channel
----
2018-02-23 10:55:27 UTC - Till Rathschlag: Hello everybody, I'm currently evaluating pulsar and I try to understand if it fits to the following usecase: I like to use pulsar (among others) as a task queue. I want my task producer to generate as many jobs as the consumers can work on, so I need some kind of communication consumers -&gt; producer. I tried to build this with acknowledging but noticed that this is only propagated to pulsar and not back to the producer. So my question is, how would I do this? I thought about the following:
- Provide some other topic for job acknowledging
- Monitor the ack-ratio from the producer service
Is pulsar the right tool for this? I would be glad if someone can share their experience with this. Thanks in advance!
----
2018-02-23 16:58:33 UTC - Matteo Merli: @Till Rathschlag The primary function of a messaging system is to decouple the producers and the consumer and that’s way we don’t have correlation of consumers acks to producer :slightly_smiling_face:

However, if you’re not requiring exact precision, you can try using backlog quota to stop the producer. 
You can configure a very low quota (eg: 10MB or 1MB…) and the default action is to block the producers when the consumers accumulate that amount of “backlog” in the queue. 

I’m saying it’s not precise because the check for quota is only done periodically in background (every 1min by default I think) for efficiency reasons, so a user can go a bit over quota before getting stopped. 

If you need a more finer control, you could use a 2nd topic. For example: 
 * Consumer gets a message, process it
 * Consumer sends confirmation on the 2nd topic (referring to a particular msgId for 1st topic)
 * Consumer acks the message

Producer can do a kind of “semaphore” limiting the number of “in-processing” messages, by waiting for confirmations on the 2nd topic. This could work even if there are multiple producers, because you can ignore msg Ids that were published by other producers
----