You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2020/06/19 09:11:05 UTC

Slack digest for #general - 2020-06-19

2020-06-18 09:14:48 UTC - Pedro Cardoso: Will the talk be recorded?
----
2020-06-18 09:15:09 UTC - Pushkar Sawant: The bookie still has 31468 ledgers on it
----
2020-06-18 09:51:58 UTC - Gilles Barbier: Hi! Apparently Pulsar-Manager uses a HerdDB database using Pulsar 's bookeeper and zookeeper. Does it mean that we could use this database and fill it with Pulsar's data using a JDBC sink connector right away with the right setup? poke @Enrico Olivelli . Any link describing how to do it?
----
2020-06-18 10:31:16 UTC - Enrico Olivelli: Latest released version of PulsarManager comes with HerdDB and you can use it in standalone mode.
PulsarManager itself it is not a distributed application so there is no way to use HerdDB in embedded flavour + replication because you will have a replicated WAL on Bookkeeper but only 1 herddb node.

You can still setup an HerdDB cluster that uses Pulsar Bookies and Pulsar Zookeeper. Then you configure PulsarManager to connect  to that HerdDB cluster.

For next version of PulsarManager we will upgrade HerdDB dependency to 0.16.0 that supports diskless-cluster mode, that is to store the whole dataset (wal + data pages) on Bookkeeper.
This way the DB will be fully stored and replicated on existing Bookies and Zookeeper servers, no need for local persistent storage on PulsarManager machine/pod and you will get the benefits of high availability.

If you are interested in trying it out I will send a PR soon for the upgrade of PM to 0.16 .
----
2020-06-18 10:34:59 UTC - Gilles Barbier: Thx @Enrico Olivelli, is there some documentation somewhere describing how to setup a Herdb cluster using Pulsar bookkeeper/zookeeper ?
----
2020-06-18 10:36:20 UTC - Gilles Barbier: (Great for the 0.16 )
----
2020-06-18 10:37:44 UTC - Enrico Olivelli: <https://medium.com/streamnative/how-to-use-apache-pulsar-manager-with-herddb-dd265c955ca4>
----
2020-06-18 10:37:53 UTC - Enrico Olivelli: Hope that helps.
----
2020-06-18 10:38:01 UTC - Enrico Olivelli: Are you running on k8s / docker ?
----
2020-06-18 10:39:23 UTC - Enrico Olivelli: It is better to try it out in dev/staging env before doing it in production :slightly_smiling_face:
----
2020-06-18 10:40:04 UTC - Gilles Barbier: We are developing on a docker standalone currently. Not yet in production 
----
2020-06-18 10:42:30 UTC - Enrico Olivelli: I see.
Unfortunately we still do not provide a docker image for HerdDB.
But we will be happy to accept contributions .
Btw if you are going to use diskless cluster mode you will only have the container with PulsarManager no need for additional containers for HerdDB
----
2020-06-18 11:03:40 UTC - Pushkar Sawant: I am slowly loosing bookkeeper nodes. The number of underreplicated ledgers are going up. Right now down to 2 available nodes
----
2020-06-18 11:21:08 UTC - Pushkar Sawant: Only thing i see in other bookie logs is org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: Bookie handle is not available.
----
2020-06-18 11:43:40 UTC - Pushkar Sawant: There are timeouts to 3 bookies have started but in an inactive state
----
2020-06-18 12:07:53 UTC - Matej Šaravanja: Hi, does anyone has problem with this? I'm deploying default helm chart to kubernetes cluster, pvcs are created just fine, but bookies are instantly crashing and restarting every 20-30 seconds due to this:
```ERROR org.apache.bookkeeper.bookie.Bookie - There are directories without a cookie, and this is neither a new environment, nor is storage expansion enabled. Empty directories are [/pulsar/data/bookkeeper/journal/current, /pulsar/data/bookkeeper/ledgers/current]```
btw, disk is almost empty
----
2020-06-18 12:44:13 UTC - Matej Šaravanja: And how is this possible?
```12:27:10.456 [LedgerDirsMonitorThread] WARN  org.apache.bookkeeper.bookie.LedgerDirsMonitor - LedgerDirsMonitor check process: All ledger directories are non writable
12:27:10.465 [LedgerDirsMonitorThread] ERROR org.apache.bookkeeper.util.DiskChecker - Space left on device /pulsar/data/bookkeeper/ledgers/current : 9223371978995400704, Used space fraction: 2.0 &gt; threshold 0.95.```
----
2020-06-18 12:54:34 UTC - Enrico Olivelli: Hey @Gilles Barbier please take a look to this PR, I hope it will be delivered with next version of Pulsar Manager
<https://github.com/apache/pulsar-manager/pull/303>
----
2020-06-18 12:54:56 UTC - Gilles Barbier: I will, thx @Enrico Olivelli
----
2020-06-18 13:09:58 UTC - Gilles Barbier: Indeed, the ability to run HerdDB on a stateless pod seems very appealing in this context :slightly_smiling_face:
----
2020-06-18 13:39:41 UTC - Enrico Olivelli: yep.
In my company (<http://emailsuccess.com|emailsuccess.com>) is not very useful, but in other contexts we saw it will ease a lot the deployment of simple applications like Pulsar Manager.
So we released this first version.
Any testing will be very appreciated :slightly_smiling_face:
Feel free to open issues on github <https://github.com/diennea/herddb>
for whatever you need or you find wrong
+1 : Gilles Barbier
----
2020-06-18 13:54:56 UTC - Penghui Li: Looks your disk almost full. You can try to check your topics have many backlogs or you have set data retention for some namespace.
----
2020-06-18 13:58:11 UTC - Matej Šaravanja: I've checked, my disk that bookkeeper is running on has usage of 1%
----
2020-06-18 13:58:28 UTC - Matej Šaravanja: and there are not topics, this is first time pulsar deploy
----
2020-06-18 13:59:12 UTC - Penghui Li: interesting
----
2020-06-18 13:59:23 UTC - Penghui Li: `2.0 &gt; threshold 0.95`
----
2020-06-18 13:59:49 UTC - Matej Šaravanja: yes, I'm aware of that, but don't understand how that could be true
----
2020-06-18 13:59:58 UTC - Matej Šaravanja: I've set 100gb for both ledgers and journal
----
2020-06-18 14:00:18 UTC - Matej Šaravanja: is there some hidden variable that causes problem when creating new ledger or smth like that?
----
2020-06-18 14:01:42 UTC - Penghui Li: ```float checkDiskFull(File dir) throws DiskOutOfSpaceException, DiskWarnThresholdException {
        if (null == dir) {
            return 0f;
        }
        if (dir.exists()) {
            long usableSpace = dir.getUsableSpace();
            long totalSpace = dir.getTotalSpace();
            float free = (float) usableSpace / (float) totalSpace;
            float used = 1f - free;
            if (used &gt; diskUsageThreshold) {
                LOG.error("Space left on device {} : {}, Used space fraction: {} &gt; threshold {}.",
                        dir, usableSpace, used, diskUsageThreshold);
                throw new DiskOutOfSpaceException("Space left on device "
                        + usableSpace + " Used space fraction:" + used + " &gt; threshold " + diskUsageThreshold, used);
            }
            // Warn should be triggered only if disk usage threshold doesn't trigger first.
            if (used &gt; diskUsageWarnThreshold) {
                LOG.warn("Space left on device {} : {}, Used space fraction: {} &gt; WarnThreshold {}.",
                        dir, usableSpace, used, diskUsageWarnThreshold);
                throw new DiskWarnThresholdException("Space left on device:"
                        + usableSpace + " Used space fraction:" + used + " &gt; WarnThreshold:" + diskUsageWarnThreshold,
                        used);
            }
            return used;
        } else {
            return checkDiskFull(dir.getParentFile());
        }
    }```
----
2020-06-18 14:03:18 UTC - Penghui Li: looks `free`  got `-1`
----
2020-06-18 14:04:03 UTC - Matej Šaravanja: I'll check persistent volume claims once more
----
2020-06-18 14:04:04 UTC - Penghui Li: Which java version are you using?
----
2020-06-18 14:04:41 UTC - Penghui Li: Ok
----
2020-06-18 14:05:35 UTC - Matej Šaravanja: Funny thing happened. While we were discussing this, I've deployed it to another namespace and now everything seems to work fine
+1 : Penghui Li
----
2020-06-18 14:05:49 UTC - Matej Šaravanja: thanks for the help, I'll try this and keep you posted :slightly_smiling_face:
----
2020-06-18 14:07:15 UTC - Alex Yaroslavsky: Hi,
----
2020-06-18 14:08:44 UTC - Alex Yaroslavsky: A question, running Pulsar 2.5.2 in EKS, with two function workers (StatefulSet). After restarting the workers, 95% of functions are running on the first one. Shouldn't there be some balancing of functions between workers?
----
2020-06-18 14:11:35 UTC - Matej Šaravanja: And about java version:

openjdk version "1.8.0_232"
OpenJDK Runtime Environment (build 1.8.0_232-b09)
OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
----
2020-06-18 14:20:29 UTC - Penghui Li: 2.6.0 released. there are several fixes and enhancement on the KeyShared subscription. @Ankur Jain you can check the release note here <http://pulsar.apache.org/release-notes/#2.6.0> and download from <http://pulsar.apache.org/en/download/>
tada : Konstantinos Papalias
+1 : Ankur Jain
----
2020-06-18 14:34:44 UTC - Konstantinos Papalias: nice one thanks for sharing @Penghui Li, it may worth sharing on main channel as well!
----
2020-06-18 15:14:12 UTC - David Kjerrumgaard: Error 1 looks like a DNS problem, where the Pulsar connector cannot resolve the URL of the Kinesis source
+1 : Renault
----
2020-06-18 15:44:09 UTC - Alexander Ursu: Hi, was just taking a look at the REST API (<http://pulsar.apache.org/admin-rest-api/?version=2.5.2&amp;apiversion=v2#operation/grantPermissionOnNamespace>)
for granting permissions on a namespace specifically, how might the desired actions be passed in?
for reference I'm doing this in Python 3
```r = <http://requests.post|requests.post>(base_url + f"namespaces/{ns}/permissions/my-role", data="consume", headers=headers)```
this doesn't seem to have any effect
is there also any chance that the documentation can be improved to cover what and how stuff is passed in the request body?
----
2020-06-18 15:46:54 UTC - Matt Mitchell: Are there any test utilities available for java? Specifically, I’d like to test interactions with PulsarClient + consumers/producers. I was going down the path of just mocking everything via mockito, but hoping there’s something already pre-baked or an alternative similar to zookeepers testing/embedded server?
----
2020-06-18 15:47:48 UTC - Pushkar Sawant: The entire cluster failed. Had to rebuild the cluster
----
2020-06-18 15:48:28 UTC - Pushkar Sawant: Second time this has happened. When one of the bookie has an issue and is in recovery, rest of the cluster crashes
----
2020-06-18 16:01:27 UTC - Ebere Abanonu: @Sijie Guo @Penghui Li why is this:`public void getMessageByID` Am particular about the void, isn't it meant to return a message? The REST API have couple of other methods with void.
----
2020-06-18 16:12:06 UTC - Sijie Guo: There is a setting in broker to delete inactive topic. If there are no producers, consumers and subscriptions, broker will treat a topic as inactive and delete it after a certain time. 
----
2020-06-18 16:12:43 UTC - Jesse Anderson: yes
----
2020-06-18 16:12:49 UTC - Sijie Guo: In case B, there is a subscription created by the function. So the topic is not inactive 
----
2020-06-18 16:13:29 UTC - Pavels Sisojevs: In case B, there is a producer created by the function. Can I stop it somehow?
----
2020-06-18 16:13:44 UTC - Sijie Guo: Are you reusing any existing pvcs?
----
2020-06-18 16:37:42 UTC - Prashanth Tirupachur Vasanthakrishnan: This helped me get a response using Python requests:

```ns = "public/default"
r = <http://requests.post|requests.post>(base_url + f"/admin/v2/namespaces/{ns}/permissions/my-role", data=json.dumps(["consume"]), headers={'content-type': 'application/json'})```

----
2020-06-18 16:40:23 UTC - Prashanth Tirupachur Vasanthakrishnan: Can use the return-code (r.status_code) and response (r.text) for looking more into it.
----
2020-06-18 16:44:07 UTC - Matteo Merli: Yes
----
2020-06-18 16:46:36 UTC - Matteo Merli: @Sankararao Routhu Take a look at <https://github.com/apache/pulsar/blob/master/pulsar-proxy/src/main/java/org/apache/pulsar/proxy/server/ProxyConnection.java#L123>

That's where the connection gets accepted. That's where it should get closed if it's not allowed.
----
2020-06-18 16:48:37 UTC - Sijie Guo: producer will be stopped if the function is stopped.
----
2020-06-18 16:53:39 UTC - Logan B: The link for track 3 on the pulsar website is incorrect.

The link is for meeting # 8521331335, but it should be 85213313359
+1 : Julius S
----
2020-06-18 16:54:57 UTC - Alexander Ursu: I also saw the raw swagger.json which generates the docs, and I don't think there even is a definition for the request body, so now I doubt this endpoint even works and accepts it
----
2020-06-18 16:58:38 UTC - Ebere Abanonu: You can find tools to generate client for your language using the swagger file. Look at the function used to create tenants, you will learn how to add body to your request.
----
2020-06-18 17:01:33 UTC - Alexander Ursu: Ah I see. It's also more about just knowing what key/values are accepted for the namespace permissions endpoint. This could just be a lack of documentation
----
2020-06-18 17:08:31 UTC - Endre Karlson: Hi guys, is the PulsarCon recorded to Youtube?
----
2020-06-18 17:14:47 UTC - Oleg Kozlov: Perfect, thank you , that should work for us
----
2020-06-18 17:18:29 UTC - Pavels Sisojevs: exactly. In my case I have a router function which receives a message, fans out the message to few other topics and never produces any messages to the topics again. Is there a way to enforce removing the producer in the function? of course, alternatively I could create a pulsar client inside of the function (once, on the initialisation), then create publisher on every message and close it myself. not sure it’s an efficient way of doing this though
----
2020-06-18 17:28:32 UTC - Jim Smith: @Jim Smith has joined the channel
----
2020-06-18 17:32:56 UTC - Sijie Guo: It is fixed now.
+1 : Logan B
----
2020-06-18 17:33:24 UTC - Sijie Guo: Yes it is recorded and will be uploaded to Youtube after the summit.
----
2020-06-18 17:34:21 UTC - Sijie Guo: which one are you mentioning here?
----
2020-06-18 17:34:49 UTC - anbutech17: Could you please someone help how to set up a pulsar project in pycharm(I'm looking for python + pulsar dev environment) to learn the pulsar APIs.I could see lot of java maven project setup with intelij idea IDE.please share some suggestions
heavy_plus_sign : Caito Scherr
----
2020-06-18 17:35:52 UTC - Sijie Guo: Are most of functions using parallelism 1?
----
2020-06-18 17:38:07 UTC - Sijie Guo: I see. Can you create a github issue for it? We can see how to add the support.
+1 : Tamer
----
2020-06-18 17:38:18 UTC - Pavels Sisojevs: ok, will do
----
2020-06-18 17:41:01 UTC - Alex Yaroslavsky: @Sijie Guo yes, most are at the moment 
----
2020-06-18 18:00:13 UTC - Ebere Abanonu: `public void getMessageByID`
----
2020-06-18 18:00:48 UTC - Ebere Abanonu: What happens when I call that method?
----
2020-06-18 18:06:52 UTC - Sijie Guo: I mean in which class?
----
2020-06-18 18:06:58 UTC - Sijie Guo: sorry which library?
----
2020-06-18 18:25:28 UTC - Ebere Abanonu: `broker\admin\v2\PersistentTopics.java`
----
2020-06-18 18:25:56 UTC - Ebere Abanonu: ```
    @GET
    @Path("/{tenant}/{namespace}/{topic}/ledger/{ledgerId}/entry/{entryId}")
    @ApiOperation(value = "Get message by its messageId.")
    @ApiResponses(value = {
            @ApiResponse(code = 307, message = "Current broker doesn't serve the namespace of this topic"),
            @ApiResponse(code = 401, message = "Don't have permission to administrate resources on this tenant or" +
                    "subscriber is not authorized to access this operation"),
            @ApiResponse(code = 403, message = "Don't have admin permission"),
            @ApiResponse(code = 404, message = "Topic, subscription or the message position does not exist"),
            @ApiResponse(code = 405, message = "Skipping messages on a non-persistent topic is not allowed"),
            @ApiResponse(code = 412, message = "Topic name is not valid"),
            @ApiResponse(code = 500, message = "Internal server error"),
            @ApiResponse(code = 503, message = "Failed to validate global cluster configuration")})
    public void getMessageById(
            @Suspended final AsyncResponse asyncResponse,
            @ApiParam(value = "Specify the tenant", required = true)
            @PathParam("tenant") String tenant,
            @ApiParam(value = "Specify the namespace", required = true)
            @PathParam("namespace") String namespace,
            @ApiParam(value = "Specify topic name", required = true)
            @PathParam("topic") @Encoded String encodedTopic,
            @ApiParam(value = "The ledger id", required = true)
            @PathParam("ledgerId") long ledgerId,
            @ApiParam(value = "The entry id", required = true)
            @PathParam("entryId") long entryId,
            @ApiParam(value = "Is authentication required to perform this operation")
            @QueryParam("authoritative") @DefaultValue("false") boolean authoritative)```
----
2020-06-18 18:42:20 UTC - Sijie Guo: ```AsyncResponse is used for sending the response.```
----
2020-06-18 19:09:41 UTC - Pedro Cardoso: please share the link when it's available!
----
2020-06-18 19:29:08 UTC - Jesse Anderson: I think they'll be sending out an email once they're up on YouTube
----
2020-06-18 19:33:09 UTC - Daniel Kopeinig: @Daniel Kopeinig has joined the channel
----
2020-06-18 19:41:12 UTC - Addison Higham: huh, random question that I realize I don't know: how does `-Xmx` and `-XX:MaxDirectMemorySize` interact? I knoew if you don't set MaxDirectMemorySize it is a fraction of your heap, but if you set MaxDirectMemorySize is the total process size heap + direct memory? or does it carve that chunk out of the heap?
----
2020-06-18 19:43:44 UTC - Matteo Merli: If `XX:MaxDirectMemorySize` is not set, the JVM will set it to the same as `-Xmx`.

The 2 limits are independent. So, if you have `-Xmx=2G -XX:MaxDirectMemorySize=1G` , your process can take up to 3G. Well, actually that doesn't account for JVM internal memory usage (eg: compiler, class cache, GC state, etc..)
----
2020-06-18 19:45:48 UTC - Addison Higham: okay that is what I thought
----
2020-06-18 19:45:54 UTC - Addison Higham: but was struggling to find direct confirmation
----
2020-06-18 19:46:44 UTC - Addison Higham: which BTW, the old helm charts were really confusing about, they would have 16 GB of heap AND 16 GB of direct memory, but only asked for 16gb of memory from k8s, which is why I was suddenly questioning
----
2020-06-18 19:48:28 UTC - Addison Higham: @Matteo Merli do you have more guidance on how much Pulsar should actually use in direct memory? I assume it is just netty? what about bookkeeper?
----
2020-06-18 19:52:19 UTC - Matteo Merli: Brokers --&gt; netty pooling. If you have very high throughput, or many connections,  a larger amount of direct mem will help

Bookies --&gt; BK is not relying on page cache, rather we prefer to allocate a "real" chunk of mem so that it's guaranteed to be mem and not get blocked on disk IO.

For that, BK uses direct for Netty IO, plus Write-Cache and ReadCache.

By default, these caches are set relative to the direct mem size (eg: 25% each)
----
2020-06-18 19:58:34 UTC - Markus Steininger: @Markus Steininger has joined the channel
----
2020-06-18 20:00:15 UTC - Matteo Merli: Again, if you're bookie has high IO rate, use sensible values for direct mem.

eg: at 300 MB/s of writes, you might want to have a write cache big enough to buffer for at least several seconds.

eg: for ~5 secs  that would be 2 GB. That would mean a direct mem size of 8G at least.
----
2020-06-18 20:22:40 UTC - Jesse Anderson: 
tada : Shivji Kumar Jha, Sijie Guo, Caito Scherr
+1 : Karthik Ramasamy
----
2020-06-19 03:32:42 UTC - Addison Higham: Forgot to mention that was really helpful
----
2020-06-19 03:38:10 UTC - Addison Higham: Observed some interesting things today as I added a new workload: it wasn't a huge volume of messages compared to where we have peaked (about 30k msgs/sec) but it came from a lot more producers (~900). We routinely handle burst workloads of 25-30k msgs/sec but from only a handful of producers. This load from many producers was much much more stressful on the cluster. Which I would expect, but just surprised to the degree
----
2020-06-19 03:38:57 UTC - Matteo Merli: Probably due to much less batching on the producers side
----
2020-06-19 03:39:09 UTC - Addison Higham: Which is making me just wonder all of why? Certainly servicing all those sockets (which I wonder if I need to tune)
----
2020-06-19 03:39:40 UTC - Addison Higham: I will need to look, maybe I might need to increase batch timeout a bit
----
2020-06-19 03:40:23 UTC - Matteo Merli: You can compare the message publish rate with bk entry write (for a given topic/namespace)
----
2020-06-19 03:40:46 UTC - Matteo Merli: the ratio between the 2 would be the avg number of messages per batch
----
2020-06-19 03:42:42 UTC - Addison Higham: Good idea, will compare the two, I think part of the problem was also it seemed like hashing on my 5 partions was somewhat unlucky, it split all the way up to 96 bundles and then I added more partitions and it smoothed it out :shrug:
----
2020-06-19 04:06:41 UTC - Jeff Schneller: Are there binaries for the c++ client on windows?  I only see Linux and MacOS.  Looking to save time but not having to build myself.
----
2020-06-19 06:02:40 UTC - Joe Francis: @Matteo Merli  it wont be a bad idea to have a bulk API for addentry - if striping is not in use
----
2020-06-19 06:07:16 UTC - Matteo Merli: it's not that easy. Batching is very efficient because we offload all the work to the clients. brokers and bookies are not looking into it.

Having a bulk add-entry that broker triggers vs sending multiple RPC (pipelined) won't be saving a huge amount of CPU (my expectation)
----
2020-06-19 06:27:58 UTC - Joe Francis: Not batching - bulk add API for BK  all the way to device
----