You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2019/01/03 09:11:02 UTC

Slack digest for #general - 2019-01-03

2019-01-02 12:55:24 UTC - Vincent Ngan: Any examples for me to write my own schemas?
----
2019-01-02 15:50:53 UTC - Sijie Guo: <https://github.com/apache/pulsar/tree/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/schema>

all the existing schemas are available under this directory.
----
2019-01-02 16:32:15 UTC - Emma Pollum: Can someone explain the difference between an unacked message and the backlog in pulsar?
----
2019-01-02 16:38:30 UTC - Sijie Guo: backlog is measured using the number of unacked messages
----
2019-01-02 16:54:31 UTC - Emma Pollum: Interesting. I have a function worker that has created a topic _compaction under public/functions that is showing 2 unacked messages but no backlog for the subscription.
----
2019-01-02 17:04:59 UTC - Shalin: I ran some performance testing using `pulsar-perf produce my-topic`, it filled up my disk with ledgers - obviously(or not). I am trying to delete the ledgers using `pulsar-admin namespaces clear-backlog public/local/default`, which deletes the topic, but the journal and ledgers still remain. Is there anything else that prevents the ledgers from being deleted?
----
2019-01-02 18:11:08 UTC - Sijie Guo: can you share me the details how you have learned “2 unacked messages but no backlog”?
----
2019-01-02 18:43:51 UTC - Sijie Guo: I think a few things to be noted here:

- journal and ledgers are garbage collected lazily in the background. so it might not be reflected immediately.

to understand how data deletion is working, brokers deletes data by segments. only when the segments are deleted, the bookies will garbage collected their data.

so when you clear-backlog, it most likely only move cursors to latest. after moving the cursors, it will start deleting segments, the deletion will be honoring to the retention policy.

in order to troubleshoot this, I would suggest you using `pulsar-admin topics stats-internal` to list the internal stats of your `my-topic`.
----
2019-01-02 18:50:12 UTC - Shalin: Gotcha. But it says `my-topic` doesn't exist anymore after I ran `clear-backlog`
----
2019-01-02 18:51:10 UTC - Sijie Guo: okay. can you check the logs of bookies to see if it is doing garbage collection?
----
2019-01-02 18:57:15 UTC - Shalin: Nope. Nothing there.
----
2019-01-02 18:57:45 UTC - Sijie Guo: can you show me the latest few lines?
----
2019-01-02 18:59:58 UTC - Shalin: ```
Jan  2 18:59:25 ip-10-0-1-244 pulsar: at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [org.apache.pulsar-managed-ledger-original-2.2.0.jar:2.2.0]
Jan  2 18:59:25 ip-10-0-1-244 pulsar: at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2]
```
----
2019-01-02 19:01:27 UTC - Shalin: ```
Jan  2 18:27:26 ip-10-0-1-244 pulsar: 18:27:26.547 [GarbageCollectorThread-17-1] INFO  org.apache.bookkeeper.bookie.GarbageCollectorThread - Enter minor compaction, suspendMinor false
Jan  2 18:27:26 ip-10-0-1-244 pulsar: 18:27:26.547 [GarbageCollectorThread-17-1] INFO  org.apache.bookkeeper.bookie.GarbageCollectorThread - Do compaction to compact those files lower than 0.2
Jan  2 18:27:26 ip-10-0-1-244 pulsar: 18:27:26.547 [GarbageCollectorThread-17-1] INFO  org.apache.bookkeeper.bookie.GarbageCollectorThread - Compaction: entry log usage buckets[10% 20% 30% 40% 50% 60% 70% 80% 90% 100%] = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
```
----
2019-01-02 19:02:01 UTC - Shalin: ^ Thats the last Garbage collector lines.
----
2019-01-02 19:02:22 UTC - Sijie Guo: entry log usage buckets[10% 20% 30% 40% 50% 60% 70% 80% 90% 100%] = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

it seems that the compaction (garbage collection) is happening. is the logging before your test?
----
2019-01-02 19:02:40 UTC - Shalin: No after
----
2019-01-02 19:03:13 UTC - Sijie Guo: then the logs should have been garbage collected. did the disk usage went down?
----
2019-01-02 19:03:47 UTC - Shalin: No, its constant.
----
2019-01-02 19:04:18 UTC - Sijie Guo: can you

`ls data/bookkeeper/ledgers/current` and `ls data/bookkeeper/journal/current`
----
2019-01-02 19:05:10 UTC - Shalin: ```
total 974M
drwxr-xr-x. 4 root root  109 Jan  2 17:12 .
drwxr-xr-x. 3 root root   21 Jan  2 14:57 ..
-rw-r--r--. 1 root root 974M Jan  2 17:12 3.log
-rw-r--r--. 1 root root    0 Jan  2 15:53 4.log
-rw-r--r--. 1 root root    2 Jan  2 15:53 lastId
-rw-r--r--. 1 root root   16 Jan  2 17:12 lastMark
drwxr-xr-x. 2 root root  123 Jan  2 14:57 ledgers
drwxr-xr-x. 2 root root 4.0K Jan  2 18:02 locations
-rw-r--r--. 1 root root  162 Jan  2 14:57 VERSION
```
----
2019-01-02 19:05:43 UTC - Shalin: ```
total 4.1G
drwxr-xr-x. 2 root root   90 Jan  2 16:47 .
drwxr-xr-x. 3 root root   21 Jan  2 14:57 ..
-rw-r--r--. 1 root root 2.1G Jan  2 15:50 1680f11ba0f.txn
-rw-r--r--. 1 root root 2.1G Jan  2 16:47 1680f11ba10.txn
-rw-r--r--. 1 root root  64M Jan  2 16:47 1680f11ba11.txn
-rw-r--r--. 1 root root  162 Jan  2 14:57 VERSION
```
----
2019-01-02 19:07:46 UTC - Sijie Guo: okay -

the latest few journal files will be kept - you can tune the setting `journalMaxBackups` : <https://github.com/apache/pulsar/blob/master/conf/bookkeeper.conf#L293>

if you don’t want to keep those journal files
----
2019-01-02 19:09:02 UTC - Sijie Guo: in the ledgers directory, it seems that `3.log` was deleted : -rw-r--r--. 1 root root 974M Jan  2 17:12 3.log

can you search the log of bookie to see if there any logging message regarding `3.log`?
----
2019-01-02 19:11:07 UTC - Shalin: The last one :
```
Jan  2 16:00:18 ip-10-0-1-185 pulsar: 16:00:18.209 [db-storage-15-1] INFO  org.apache.bookkeeper.bookie.EntryLogger - Flushing entry logger 3 back to filesystem, pending for syncing entry loggers : [BufferedChannel{logId=3, logFile=data/bookkeeper/ledgers/current/3.log}].
```
----
2019-01-02 19:12:52 UTC - Shalin: Ok, there seems to be something wrong with the functions I registered `ImportError: No module named _pulsar` - do you think that could throw off the garbage collection by the bookie. The bookie and broker are on the same node
----
2019-01-02 19:13:52 UTC - Shalin: Maybe signaling `bookie` that the ledgers can be deleted?
----
2019-01-02 19:14:03 UTC - Sijie Guo: well it should not be related. because these are different components. what bookie garbage collection cares is whether the ledgers are deleted or not.
----
2019-01-02 19:14:20 UTC - Sijie Guo: are you using one topic or multiple topics?
----
2019-01-02 19:14:29 UTC - Sijie Guo: can you make sure topics are deleted?
----
2019-01-02 19:15:03 UTC - Sijie Guo: by that mean, can you just run `bin/pulsar-admin namespaces topics &lt;namespace&gt;` to check if your topic is still there?
----
2019-01-02 19:16:16 UTC - Shalin: Yup. I ran `pulsar-perf` on `my-topic` and `public/default` is empty
----
2019-01-02 19:17:13 UTC - emersontres: @emersontres has joined the channel
----
2019-01-02 19:19:27 UTC - Sijie Guo: okay then can you restart the bookie? so the gc should be restarted in around 10 minutes, I would like to see if it garbage collects 3.log.
----
2019-01-02 19:22:11 UTC - Shalin: :thumbsup:
----
2019-01-02 19:28:32 UTC - Shalin: Is it just `bookieformat` and `initbookie`?
----
2019-01-02 19:29:20 UTC - Sijie Guo: no
----
2019-01-02 19:29:30 UTC - Sijie Guo: just restart the bookie process
----
2019-01-02 19:29:48 UTC - Sijie Guo: `bookieformat` will reformat the whole disk
----
2019-01-02 19:30:13 UTC - Shalin: Ah ok
----
2019-01-02 19:52:33 UTC - Shalin: I get an `InvalidCookieException` when restarting.
----
2019-01-02 19:52:41 UTC - Shalin: ```There are directories without a cookie, and this is neither a new environment, nor is storage expansion enabled. Empty directories are [data/bookkeeper/journal/current, data/bookkeeper/ledgers/current]
```
----
2019-01-02 19:53:12 UTC - Sijie Guo: are you running inside docker?
----
2019-01-02 19:53:21 UTC - Sijie Guo: it seems that your `data` directory is gone?
----
2019-01-02 19:57:15 UTC - Shalin: It seems be there
----
2019-01-02 19:57:36 UTC - Shalin: Ah, nevermind.
----
2019-01-02 19:57:43 UTC - Shalin: It was a path issue.
----
2019-01-02 19:57:57 UTC - Sijie Guo: ok
----
2019-01-02 20:26:15 UTC - Shalin: Still there.
----
2019-01-02 21:13:26 UTC - Emma Pollum: I've been looking through the  prometheus metrics
----
2019-01-02 21:16:35 UTC - Sijie Guo: Can you share the bookie log file, so I can exam later?
----
2019-01-02 21:18:04 UTC - Sijie Guo: Can you try run “pulsar-admin topics stats-internal” to get details of that topic?
----
2019-01-02 22:55:55 UTC - Emma Pollum: I've already processed the messages, and have not had it reproduce. Is it a 1:1 relationship backlog:unacked ?
----
2019-01-03 02:38:58 UTC - Vincent Ngan: Thanks
----
2019-01-03 03:04:42 UTC - Sijie Guo: No. Unack is the messages have been delivered and no acked; the backlog is both unacked and undelivered.
----
2019-01-03 03:05:59 UTC - Sijie Guo: So typically if there are unacked messages, backlog should not be zero. Unless there something not accurate regarding metrics collection 
----