You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2019/03/07 09:11:04 UTC
Slack digest for #general - 2019-03-07

2019-03-06 10:15:32 UTC - Vincent Ngan: :ok_hand:
----
2019-03-06 10:31:28 UTC - Byron: Ah ok, so clusters in an “instance” are assumed to share/replicate data. Understood thanks.
----
2019-03-06 12:34:09 UTC - Byron: Hi folks, I had a question about this statement in docs about increasing the number of partitions:
&gt; Already created partitioned producers and consumers can’t see newly created partitions and it requires to recreate them at application so, newly created producers and consumers can connect to newly added partitions as well. Therefore, it can violate partition ordering at producers until all producers are restarted at application.

This statement seems to imply that downtime is required (disconnect producers and consumers) before/during a re-partitioning? So app is running, disconnect clients, update the number of partitions, reconnect clients.. to guarantee ordering. Is this correct? A related question is whether existing messages are re-partitioned when this happens? In other words, if a consumer was created to read from one partition (per this thread <https://github.com/apache/pulsar/issues/3098>) then the consumer would need to change the topic name to the new partition to consume from. I suppose this wouldn’t work in the case of a custom partitioning function? I am not suggesting I would do this on the consumer side, but I am curious of the behavior and edge cases that one could run into.
----
2019-03-06 12:55:57 UTC - Sébastien de Melo: It's solved.  The command args must be:
                    mkdir logs &amp;&amp;
                    bin/apply-config-from-env.py conf/broker.conf &amp;&amp;
                    bin/apply-config-from-env.py conf/client.conf &amp;&amp;
                    bin/gen-yml-from-env.py conf/functions_worker.yml &amp;&amp;
                    bin/apply-config-from-env.py conf/pulsar_env.sh &amp;&amp;
                    bin/pulsar broker
----
2019-03-06 12:56:06 UTC - Sébastien de Melo: I am not sure for client.conf
----
2019-03-06 12:56:10 UTC - Darragh: hi, we've managed to get pulsar running on ec2 instances and are seeing some nice mean latencies when doing a pulsar-perf with n=10000, but the tail latencies spike quite often.  Any ideas as to what we could tweak ?
----
2019-03-06 13:11:22 UTC - Maarten Tielemans: Some information about the setup:
- The ledgers and journal run on seperate NVMe
- We are using XFS as filesystem for the NVMe
- The NVMe are ELB, type io, size 128GB, 6400 iops
- journalDataSync=true (but we also see the spikes when set to false)
- Bookkeeper and the broker run on the same instances
- We tried with multiple settings of ensemble, quorum and ack, I believe we currently use 3 2 2
- For Zookeeper, pulsar_env.sh was set to 2GB of memory. For Bookkeeper and broker (same instance) it was set to 12GB
- We also see the spikes when we use a non-persistent topic (200+ms 99.9% latency)
----
2019-03-06 13:26:15 UTC - Byron: Based on that issue, it appears that partitions are just internal topics? So if there is a requirement for message keys to be sticky to a topic, it seems that managing this explicitly is a better strategy?
----
2019-03-06 13:26:35 UTC - Maarten Tielemans: ```
13:23:47.474 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:   9999.9  msg/s ---     78.1 Mbit/s --- Latency: mean:   0.313 ms - med:   0.309 - 95pct:   0.378 - 99pct:   0.420 - 99.9pct:   0.495 - 99.99pct:   1.512 - Max:   1.558
13:23:57.479 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:   9999.8  msg/s ---     78.1 Mbit/s --- Latency: mean:   0.313 ms - med:   0.309 - 95pct:   0.378 - 99pct:   0.418 - 99.9pct:   0.483 - 99.99pct:   0.821 - Max:   1.512
13:24:07.484 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:  10000.2  msg/s ---     78.1 Mbit/s --- Latency: mean:   0.312 ms - med:   0.308 - 95pct:   0.378 - 99pct:   0.422 - 99.9pct:   0.539 - 99.99pct:   0.697 - Max:   1.564
13:24:17.489 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:  10000.5  msg/s ---     78.1 Mbit/s --- Latency: mean:   0.316 ms - med:   0.310 - 95pct:   0.385 - 99pct:   0.441 - 99.9pct:   0.777 - 99.99pct:   1.004 - Max:   1.049
13:24:27.509 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:  10000.5  msg/s ---     78.1 Mbit/s --- Latency: mean:   2.142 ms - med:   0.319 - 95pct:   0.419 - 99pct: 126.272 - 99.9pct: 203.900 - 99.99pct: 206.503 - Max: 207.122
13:24:37.518 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:  10000.1  msg/s ---     78.1 Mbit/s --- Latency: mean:   0.306 ms - med:   0.296 - 95pct:   0.381 - 99pct:   0.429 - 99.9pct:   0.565 - 99.99pct:   0.745 - Max:   1.491
13:24:47.523 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:  10000.1  msg/s ---     78.1 Mbit/s --- Latency: mean:   0.310 ms - med:   0.303 - 95pct:   0.385 - 99pct:   0.423 - 99.9pct:   0.570 - 99.99pct:   0.781 - Max:   0.794
```
(This is non-persistent, producer latency)
----
2019-03-06 13:40:14 UTC - Maarten Tielemans: ```
13:39:09.016 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:  10000.5  msg/s ---     78.1 Mbit/s --- Latency: mean:  20.742 ms - med:   5.251 - 95pct: 152.269 - 99pct: 247.212 - 99.9pct: 286.991 - 99.99pct: 294.061 - Max: 295.105
13:39:19.031 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:  10000.6  msg/s ---     78.1 Mbit/s --- Latency: mean:   9.640 ms - med:   5.184 - 95pct:  31.784 - 99pct: 112.054 - 99.9pct: 278.791 - 99.99pct: 285.463 - Max: 286.473
13:39:29.043 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:   9999.9  msg/s ---     78.1 Mbit/s --- Latency: mean:  23.945 ms - med:   5.237 - 95pct: 169.349 - 99pct: 198.288 - 99.9pct: 323.295 - 99.99pct: 323.377 - Max: 323.435
13:39:39.059 [main] INFO  org.apache.pulsar.testclient.PerformanceProducer - Throughput produced:  10000.5  msg/s ---     78.1 Mbit/s --- Latency: mean:  25.457 ms - med:   5.263 - 95pct: 168.663 - 99pct: 205.695 - 99.9pct: 259.992 - 99.99pct: 265.065 - Max: 317.673
```
(This is persistent, producer latency)
----
2019-03-06 13:55:33 UTC - Wang Jinhong: @Wang Jinhong has joined the channel
----
2019-03-06 13:55:56 UTC - Valery: @Valery has joined the channel
----
2019-03-06 14:35:05 UTC - Matteo Merli: Are the NVMe disks locally attached?
----
2019-03-06 14:35:56 UTC - Matteo Merli: The latency numbers are way off for being writing on nvmes
----
2019-03-06 14:36:54 UTC - Matteo Merli: In any case, to reduce tail latency, the preferred config would be 3 / 3 / 2
----
2019-03-06 14:37:14 UTC - Matteo Merli: That is, write to 3 bookies and wait for 2 acks
----
2019-03-06 14:38:06 UTC - Chris DiGiovanni: After trying to setup Tiered Storage to an Internal Ceph Rados Gateway (S3 API) I ran into what I thought would be the issue, certs.  After waiting for the offload status to come back.  I get this error from `pulsar-admin topics offload-status`

```
null

Reason: Error offloading: org.apache.bookkeeper.mledger.ManagedLedgerException: java.util.concurrent.CompletionException: org.jclouds.http.HttpResponseException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target connecting to POST https://**-***.***.**/***-dev-pulsar-topic-offload/3243c6cf-b3d0-4fff-8e40-912c61793a64-ledger-10?uploads HTTP/1.1
```
----
2019-03-06 14:38:31 UTC - Matteo Merli: Can you verify with `iostat` that the writes are indeed going to the expected disks?
----
2019-03-06 14:40:38 UTC - Chris DiGiovanni: I first inclination is to present my own Java keystore with our internal CA certs.  Though not sure how I add these options to the startup.  Currently deploying via Kubernetes
----
2019-03-06 14:43:47 UTC - Darragh: we can confirm that the writes are going to the nvme's with iostat
----
2019-03-06 14:44:31 UTC - Maarten Tielemans: ```
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.43    0.00    3.51    4.86    0.81   88.38

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
nvme0n1           0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdf              0.00     0.00    0.00 1059.00     0.00    12.17    23.54     0.92    0.88   0.85  90.00
xvdg              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
```
----
2019-03-06 14:44:44 UTC - Darragh: ```[ec2-user@ip-10-0-2-47 ~]$ lsblk
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0    0 884.8G  0 disk 
xvda    202:0    0     8G  0 disk 
└─xvda1 202:1    0     8G  0 part /
xvdf    202:80   0   128G  0 disk /mnt/journal
xvdg    202:96   0   128G  0 disk /mnt/storage```
----
2019-03-06 14:46:16 UTC - Darragh: xvdf and xvdg are nvme ebs volumes
----
2019-03-06 14:52:10 UTC - Matteo Merli: So, from the above, xvdf looks like 90% busy
----
2019-03-06 14:53:25 UTC - Matteo Merli: That’s strange for 12 MB/s and 1K iops 
----
2019-03-06 14:54:28 UTC - Matteo Merli: Is the throughput tied to the volume size? It usually is on EC2
----
2019-03-06 14:55:46 UTC - Maarten Tielemans: Not sure. This is the configuration of the ELB volumes. I can change this if needed.
```
Size - 128 GiB
Encrypted - Not Encrypted
Volume type - io1
IOPS - 6400
```
----
2019-03-06 14:56:32 UTC - Maarten Tielemans: When we set `journalDataSync=false`, we still see similar tail latency numbers. The disk IO is different in those cases, and seems to happen in bursts of ~100-200MB (every 10 seconds).
----
2019-03-06 14:59:43 UTC - Maarten Tielemans: Related to the throughput, the IOPS of the ELB volumes was limited to 50x size in GB (6400 = 50 * 128)
----
2019-03-06 15:13:28 UTC - Chris Bartholomew: Hi Folks. It seems that when de-duplication is enabled the storageSize for the topic never goes to 0 even if all the messages have been acked. In fact, even if I delete all the subcriptions I am still seeing a non-zero value for storage. Here are the stats for my topic:  ```{
    "msgRateIn": 0,
    "msgThroughputIn": 0,
    "msgRateOut": 0,
    "msgThroughputOut": 0,
    "averageMsgSize": 0,
    "storageSize": 55313,
    "publishers": [],
    "subscriptions": {},
    "replication": {},
    "deduplicationStatus": "Enabled"
}``` This doesn't happen if I have de-duplication disabled. Looking at the internal stats for the topic, I noticed that there is a cursor for deduplication: ``` "cursors": {
        "pulsar.dedup": {
            "markDeletePosition": "532:8999",
            "readPosition": "532:9986",
            "waitingReadOp": false,
            "pendingReadOps": 0,
            "messagesConsumedCounter": -986,
            "cursorLedger": -1,
            "cursorLedgerLastEntry": -1,
            "individuallyDeletedMessages": "[]",
            "lastLedgerSwitchTimestamp": "2019-03-06T14:20:37.876Z",
            "state": "NoLedger",
            "numberOfEntriesSinceFirstNotAckedMessage": 987,
            "totalNonContiguousDeletedMessagesRange": 0,
            "properties": {
                "useast1-gcp-31-1": 9011
            }
        }
    }``` Are messages stored for de-duplication purposes? I wouldn't expect that to be necessary (IDs yes, messages no).
----
2019-03-06 15:37:31 UTC - Sijie Guo: can you show the full stats and stats-internal output?
----
2019-03-06 15:38:31 UTC - Sijie Guo: /cc @jia zhai or @Ivan Kelly. they might have a quick idea on this.
----
2019-03-06 15:44:03 UTC - Sijie Guo: &gt; This statement seems to imply that downtime is required (disconnect producers and consumers) before/during a re-partitioning?

the behavior has been changed in 2.3.0. partitions are automatically updated in producer and consumer when the number of partitions are changed. the documentation might need to update. (/cc @jia zhai for updating the documentation)

&gt; A related question is whether existing messages are re-partitioned when this happens?

currently pulsar doesn’t handle this. since the “repartition” is related how messages are routed in partitions. in order to support something like per-key-ordering, additional information might be required (e.g. the hashing rule and the number of partitions before rehashing), and consumers are required to consume messages in a certain order.
----
2019-03-06 15:45:42 UTC - Chris Bartholomew: Sure. What I previously posted was the full stats output. I had used the topic to send messages, but deleted all subscriptions, so I was expecting to see zeroes across the board:  ```{
    "msgRateIn": 0,
    "msgThroughputIn": 0,
    "msgRateOut": 0,
    "msgThroughputOut": 0,
    "averageMsgSize": 0,
    "storageSize": 55313,
    "publishers": [],
    "subscriptions": {},
    "replication": {},
    "deduplicationStatus": "Enabled"
}``` And internalStats: ```{
    "entriesAddedCounter": 0,
    "numberOfEntries": 9986,
    "totalSize": 559313,
    "currentLedgerEntries": 0,
    "currentLedgerSize": 0,
    "lastLedgerCreatedTimestamp": "2019-03-06T14:20:37.875Z",
    "waitingCursorsCount": 0,
    "pendingAddEntriesCount": 0,
    "lastConfirmedEntry": "532:9985",
    "state": "LedgerOpened",
    "ledgers": [
        {
            "ledgerId": 532,
            "entries": 9986,
            "size": 559313,
            "offloaded": false
        },
        {
            "ledgerId": 635,
            "entries": 0,
            "size": 0,
            "offloaded": false
        }
    ],
    "cursors": {
        "pulsar.dedup": {
            "markDeletePosition": "532:8999",
            "readPosition": "532:9986",
            "waitingReadOp": false,
            "pendingReadOps": 0,
            "messagesConsumedCounter": -986,
            "cursorLedger": -1,
            "cursorLedgerLastEntry": -1,
            "individuallyDeletedMessages": "[]",
            "lastLedgerSwitchTimestamp": "2019-03-06T14:20:37.876Z",
            "state": "NoLedger",
            "numberOfEntriesSinceFirstNotAckedMessage": 987,
            "totalNonContiguousDeletedMessagesRange": 0,
            "properties": {
                "useast1-gcp-31-1": 9011
            }
        }
    }
}``` I probably should have mentioned that I have retention set on the namespace (2 days). But from what I can see, that doesn't usually affect the storageSize--that only tracks messages in the unacked messages in the subscription backlog.
----
2019-03-06 15:52:36 UTC - Sijie Guo: so from internal stats, there are 2 ledgers, the size ledger first ledger is 559313. the dedup cursor ’s mark delete position is 532:8999 and the last position is 532:9986, that means the dedup cursor is holding about 987 entries, which is `55313` bytes showed in `stats`. this would explain why the storage size is not zero.

then the question is simpler now, why dedup cursor is holding those 987 entries?
----
2019-03-06 15:59:02 UTC - Chris Bartholomew: If I remember correctly, that is the number of messages I published to the topic. Then I deleted the subscription. However, I am pretty sure I see this even if I ack all the messages with a client (ie the subscription backlog is 0 on all subscriptions)
----
2019-03-06 16:01:29 UTC - Sijie Guo: &gt;  why dedup cursor is holding those 987 entries?

to understand this, you might need to understand a bit how dedup works. I am trying to explain it in short. you can think about - the messages before cursor are snapshotted into a persitent map of producer and the sequence is seen until the cursor, the sequence id of messages after cursor are kept in memory, only when a new snapshot is taken, the cursor will be advanced (this is to guarantee durability and no state lost).

currently the snapshot mechanism is based on messages size (basically snapshotting every x messages). so if the new snapshot was not taken, those messages are kept.
----
2019-03-06 16:05:21 UTC - Alexandre DUVAL: Hi, when I try to list topics on tenants/namespaces as "normal user" i've got:

```➜ kannar@pond  ~/pulsar/logstash-output-pulsar/pulsar/conf git:(master) ✗ ../bin/pulsar-admin topics list yo/logs                                                                                                                                                                                                    
Don't have permission to administrate resources on this tenant

Reason: Don't have permission to administrate resources on this tenant
```
It's normal, but when I try with super, I have the following error:
```
➜ kannar@pond  ~/pulsar/logstash-output-pulsar/pulsar/conf git:(master) ✗ ../bin/pulsar-admin topics list yo/logs
HTTP 500 Server Error

Reason: HTTP 500 Server Error
```
When I check logs from my brokers I have 401 Authentication required. Interpreted as 500 by proxies I guess.
----
2019-03-06 16:06:28 UTC - Alexandre DUVAL: More I've got random 401, I run these commands in the same configuration: ```➜ kannar@pond  ~/pulsar/logstash-output-pulsar/pulsar/conf git:(master) ✗ ../bin/pulsar-admin topics offload-status <persistent://yo/logs/full-partition-2>
Offload has not been run for <persistent://yo/logs/full-partition-2> since broker startup
➜ kannar@pond  ~/pulsar/logstash-output-pulsar/pulsar/conf git:(master) ✗ ../bin/pulsar-admin topics offload-status <persistent://yo/logs/full-partition-2>
HTTP 401 Unauthorized

Reason: HTTP 401 Unauthorized
➜ kannar@pond  ~/pulsar/logstash-output-pulsar/pulsar/conf git:(master) ✗ ../bin/pulsar-admin topics offload-status <persistent://yo/logs/full-partition-2>
Offload has not been run for <persistent://yo/logs/full-partition-2> since broker startup
➜ kannar@pond  ~/pulsar/logstash-output-pulsar/pulsar/conf git:(master) ✗ ../bin/pulsar-admin topics offload-status <persistent://yo/logs/full-partition-2>
HTTP 401 Unauthorized

Reason: HTTP 401 Unauthorized
```
----
2019-03-06 16:07:07 UTC - Alexandre DUVAL: The configurations are the same for all proxies and for all brokers.
----
2019-03-06 16:07:33 UTC - Matteo Merli: Regarding journalDataSync=false, the write spikes are happening when the OS is flushing the page cache
----
2019-03-06 16:08:29 UTC - Matteo Merli: In any case, I’d try with a bigger EBS to get more throughput 
----
2019-03-06 16:09:47 UTC - Chris DiGiovanni: I actually just got this working...  I had to create a keystore with my CAs in it.  Created a config map to present the cacerts file I created to the brokers.  I then needed to add this option to the broker.config

```
PULSAR_EXTRA_OPTS: '"-Djavax.net.ssl.trustStore=/certs/cacerts"'
```
----
2019-03-06 16:09:59 UTC - Alexandre DUVAL: (I use JWT authentication).
----
2019-03-06 16:10:02 UTC - Chris DiGiovanni: After this, everything seems to be working smoothly...
+1 : Sijie Guo, jia zhai
slightly_smiling_face : Sijie Guo
----
2019-03-06 16:10:11 UTC - Alexandre DUVAL: @Matteo Merli you can't imagine the impatience behind the *merlimat is typing* :stuck_out_tongue:.
rolling_on_the_floor_laughing : David Kjerrumgaard, Sébastien de Melo, Laurent Chriqui
----
2019-03-06 16:13:11 UTC - Chris Bartholomew: OK, I get why these messages are still in storage. They haven't been "snapshotted" yet and the only way to do that is to send more messages to the topic. There is no timer to run the snapshot even if the topic is idle. The confusing part (to me, anyway) is that my topic doesn't look empty even though there are no unacked messages in it.
----
2019-03-06 16:15:05 UTC - Maarten Tielemans: Do you have any recommendation for the EBS type? Should I use the same for journal/ledger?
----
2019-03-06 16:16:20 UTC - Byron: &gt; partitions are automatically updated in producer and consumer when the number of partitions are changed.
So to be clear, any client connections that are established (producer or consumer) will get this info transparently? So if I am consuming a topic (so all partitions), it will be full transparent? Likewise a producer will publish a message with a key that went to, say, partition 1 before the change, and after the partition update, it may get routed to partition 4?
----
2019-03-06 16:17:37 UTC - Chris Bartholomew: I am guessing I can calculate the unacked storage by subtracting the amount from the dedup cursor. However, it only looks like I can get the message count, not the total size of those messages.
----
2019-03-06 16:19:14 UTC - Darragh: additionally are there any other commands you could think of that could shed some extra light on what is causing these tail latencies ?
----
2019-03-06 16:26:16 UTC - Sijie Guo: “storageSize”: 55313 is the size.
----
2019-03-06 16:30:57 UTC - Matteo Merli: You could use io1 for journal, with a higher size and st1 for ledgers (if you want to have more storage capacity per cost)
----
2019-03-06 16:31:54 UTC - Matteo Merli: To understand more about the latency, the bookies are exporting a number of stats
----
2019-03-06 16:32:43 UTC - Matteo Merli: That includes the number of flushes in journal, the fsync latency and more
----
2019-03-06 16:32:51 UTC - Chris Bartholomew: @Sijie Guo Thanks for your help in explaining this. Much appreciated.
----
2019-03-06 16:36:06 UTC - Maarten Tielemans: These are the Prometheus stats? We could set that up. Any particular stats to track?
----
2019-03-06 16:59:09 UTC - Matteo Merli: * `bookkeeper_server_ADD_ENTRY_count` for write rate entries/s
 * `bookie_WRITE_BYTES` for MB/s rates
 * `bookkeeper_server_ADD_ENTRY_REQUEST` for rate and latencies
 * `bookie_journal_JOURNAL_SYNC_count` for sync rate
 * `bookie_journal_JOURNAL_SYNC` for fsync latencies
----
2019-03-06 17:00:05 UTC - Matteo Merli: :smile:
----
2019-03-06 17:02:24 UTC - Matteo Merli: is there any difference if you just hit the brokers instead of going through proxy?
----
2019-03-06 17:38:16 UTC - Vikas: hey @David Kjerrumgaard, hope you're having a great day!
The question is regarding providing connection settings in the StandardRestrictedSSLContextService for connecting the NiFi to SSL enabled pulsar.

The Pulsar is installed on the VMs, and the operations guy has provided me the `ca.cert.pem`. I am not sure what all to do with this file.

I need to provide the Keystore and Truststore's : Filename, Password and Type in the StandardRestrictedSSLContextService service in NiFi
----
2019-03-06 17:40:22 UTC - Vikas: 
----
2019-03-06 17:41:49 UTC - Grant Wu: @Matteo Merli Is there a way to subscribe to a topic with RFC 3986 Reserved characters through the Websocket API?
----
2019-03-06 17:43:07 UTC - Matteo Merli: You have to URL-encode the topic name
----
2019-03-06 17:44:10 UTC - Grant Wu: Hrm… I tried that and it seemed like the URL-encoded topic was getting subscribed to instead
----
2019-03-06 17:45:00 UTC - Grant Wu: Yeah, that appears to be the behavior I’m getting :confused:
----
2019-03-06 17:46:31 UTC - Vikas: hey @Matteo Merli, hope you're having a great day!
The question is regarding providing connection settings in the StandardRestrictedSSLContextService for connecting the NiFi to SSL enabled pulsar.

The Pulsar is installed on the VMs, and the operations guy has provided me the `ca.cert.pem`. I am not sure what all to do with this file to connect to the Pulsar hosts.

I need to provide the Keystore and Truststore's : Filename, Password and Type in the StandardRestrictedSSLContextService service in NiFi as below.
----
2019-03-06 17:47:17 UTC - Vikas: 
----
2019-03-06 17:48:00 UTC - Matteo Merli: I’m really not familiar with the NiFi side of things :confused:
----
2019-03-06 17:48:40 UTC - Vikas: no worries, thanks. I'll check with David K when he is online, thanks :slightly_smiling_face:
----
2019-03-06 17:48:43 UTC - Matteo Merli: Uhm, there should be some example or tests through the code that use that
----
2019-03-06 17:48:48 UTC - David Kjerrumgaard: @Vikas You will first need to obtain a copy of BOTH the keystore and truststore files and copy them onto the VM running NiFi.
----
2019-03-06 17:49:00 UTC - Matteo Merli: let me see if I can find that
----
2019-03-06 17:49:35 UTC - David Kjerrumgaard: Then you can configure the "filename" properties to point to those files on the local filesystem (VM's filesystem)
----
2019-03-06 17:50:12 UTC - David Kjerrumgaard: This blog post walks through the process in greater detail.
----
2019-03-06 17:50:13 UTC - David Kjerrumgaard: <http://www.treselle.com/blog/apache-nifi-data-crawling-from-https-websites/>
----
2019-03-06 17:52:19 UTC - Vikas: wonderful, thanks so much @David Kjerrumgaard
----
2019-03-06 17:52:22 UTC - David Kjerrumgaard: This blog post walks through setting up the SSL_Context_Service as well.
----
2019-03-06 17:52:23 UTC - David Kjerrumgaard: <https://bryanbende.com/development/2017/10/13/apache-nifi-tls-with-apache-solr>
----
2019-03-06 17:52:52 UTC - David Kjerrumgaard: Bottom line, I think you need to go back to your admin and get the proper files first
----
2019-03-06 17:53:18 UTC - Vikas: sure, I am struggling with this since yesterday. I was following this webpage:
<https://pulsar.apache.org/docs/en/security-tls-authentication/>
----
2019-03-06 17:53:31 UTC - Vikas: "Creating client certificates"
----
2019-03-06 17:56:24 UTC - Grant Wu: So I fired up Chrome Inspector - this is the URL I’m using for the websocket -
```
<ws://pulsar-broker.petuum-system:8080/ws/v2/consumer/persistent/public/default/testing1%5B%5D/2d60de3c-2202-46e3-9229-0f214fb9ca75>
```
----
2019-03-06 17:57:27 UTC - Alexandre DUVAL: I tried, now I think it's about authentication between brokers.
----
2019-03-06 17:57:39 UTC - Matteo Merli: yes, looks correct..
----
2019-03-06 17:57:57 UTC - Grant Wu: After running `bin/pulsar-admin topics list public/default` I get `<persistent://public/default/testing1%5B%5D>` as the new topic in the list
----
2019-03-06 17:58:53 UTC - Grant Wu: I haven’t verified that the `ws` module I’m using doesn’t do URLencoding on its own.  But I really doubt it, because I was causing exceptions in Pulsar when I didn’t URLencode by myself
----
2019-03-06 17:59:25 UTC - Alexandre DUVAL: It is, i disabled auth on brokers, and now it's work. Will enable it again and try to understand on which conf field I'm wrong.
----
2019-03-06 18:00:20 UTC - Grant Wu: Hrm.  They are using the URL constructor…
----
2019-03-06 18:01:32 UTC - Grant Wu: Let me try setting a breakpoint in the `ws` module internals
----
2019-03-06 18:02:53 UTC - Alexandre DUVAL: Does exist a custom role for broker to broker authorization?
----
2019-03-06 18:03:06 UTC - David Kjerrumgaard: @Vikas I would suggesting following the steps in the second blog post I posted. It uses the nifi-toolkit to generate all the files you need in a single command, including the cert which you can then use to secure Pulsar as well
----
2019-03-06 18:05:16 UTC - David Kjerrumgaard: by updating the following property in `proxy.conf` file:
----
2019-03-06 18:05:17 UTC - David Kjerrumgaard: brokerClientAuthenticationParameters=tlsCertFile:/path/to/proxy.cert.pem,tlsKeyFile:/path/to/proxy.key-pk8.pem
----
2019-03-06 18:05:32 UTC - Matteo Merli: :+1:
----
2019-03-06 18:11:31 UTC - Matteo Merli: Broker will use these plugin and credentials

```
brokerClientAuthenticationPlugin=
brokerClientAuthenticationParameters=
```
----
2019-03-06 18:11:38 UTC - Matteo Merli: (when talking to other brokers)
----
2019-03-06 18:12:21 UTC - Alexandre DUVAL: Yes, but if I'm using JWT, I need to place super role token here?
----
2019-03-06 18:12:32 UTC - Alexandre DUVAL: Or does it exists another role for this?
----
2019-03-06 18:13:18 UTC - Matteo Merli: There’s no pre-defined role. But broker should be using a token whose “subject” is listed as one of the “super-user” roles
----
2019-03-06 18:16:31 UTC - Vikas: sure @David Kjerrumgaard, I have created certificates using the NiFi toolkit
----
2019-03-06 18:16:37 UTC - Vikas: ```bash-3.2$ ls -al
total 32
drwx------  7 vsingh  2074273240   224 Mar  6 11:07 .
drwxr-xr-x@ 9 vsingh  2074273240   288 Mar  6 11:07 ..
-rw-------  1 vsingh  2074273240  3437 Mar  6 11:07 CN=bbende_OU=NIFI.p12
-rw-------  1 vsingh  2074273240    43 Mar  6 11:07 CN=bbende_OU=NIFI.password
drwx------  5 vsingh  2074273240   160 Mar  6 11:07 localhost
-rw-------  1 vsingh  2074273240  1200 Mar  6 11:07 nifi-cert.pem
-rw-------  1 vsingh  2074273240  1675 Mar  6 11:07 nifi-key.key
bash-3.2$ cd localhost/
bash-3.2$ ls -al
total 40
drwx------  5 vsingh  2074273240    160 Mar  6 11:07 .
drwx------  7 vsingh  2074273240    224 Mar  6 11:07 ..
-rw-------  1 vsingh  2074273240   3076 Mar  6 11:07 keystore.jks
-rw-------  1 vsingh  2074273240  11283 Mar  6 11:07 nifi.properties
-rw-------  1 vsingh  2074273240    911 Mar  6 11:07 truststore.jks```
----
2019-03-06 18:19:06 UTC - David Kjerrumgaard: Great, Now you need to use the nifi-cert.pem as the certificate on your Pulsar proxy, e.g. `brokerClientAuthenticationParameters=tlsCertFile:/path/to/nifi-cert.pem` and restart Pulsar
----
2019-03-06 18:19:41 UTC - Vikas: but where can I find the Keystore and Truststore password. Where do I need to provide the `ca.cert.pem` which I got from the Pulsar admin. Sorry for all the lame questions as I am doing and learning this for the first time :neutral_face:
----
2019-03-06 18:20:29 UTC - David Kjerrumgaard: No worries. Is this a test Pulsar cluster that you can access and modify as I am suggesting?
----
2019-03-06 18:20:59 UTC - Vikas: I can't access the Pulsar cluster
----
2019-03-06 18:23:03 UTC - David Kjerrumgaard: Ok, in THAT case you will need to contact the person in charge of securing that cluster, and ask them for the client keystore and truststore files in addition to the certificate file they already provided you.
----
2019-03-06 18:29:46 UTC - Vikas: oh ok sure :+1:
----
2019-03-06 18:30:09 UTC - Alexandre DUVAL: Should I url encode the token ? I defined `brokerClientAuthenticationParameters=file:///home/pulsar/apache-pulsar-2.3.0/conf/keys/broker.to.broker.token` but I got `Caused by: java.lang.IllegalArgumentException: Illegal character(s) in message header value: Bearer &lt;TOKEN_VALUE&gt;`.
----
2019-03-06 18:31:30 UTC - Matteo Merli: The token should already be in base64
----
2019-03-06 18:34:39 UTC - Alexandre DUVAL: `broker.to.broker.token` contains the exact output of its creation with `pulsar tokens create`.
----
2019-03-06 18:37:36 UTC - Alexandre DUVAL: `Illegal character(s) in message header value: Bearer eyJhbGczafzaUzI1NiJ9.kjozeajohfoaZAFAF.iKiHoo7J1Ge_G8JYau_4hUmBzSErTqhe3pye8BUrPg0 ` (I randomly modified the chars in the token, excepts `.` and `_`.
----
2019-03-06 18:38:57 UTC - Matteo Merli: and where is the exception being thrown?
----
2019-03-06 18:39:21 UTC - Alexandre DUVAL: `java.util.concurrent.ExecutionException: org.apache.pulsar.client.admin.PulsarAdminException:`
----
2019-03-06 18:39:52 UTC - Alexandre DUVAL: Do you want all the stack?
----
2019-03-06 18:40:35 UTC - Matteo Merli: yes, that would help
----
2019-03-06 18:41:41 UTC - Alexandre DUVAL: @Matteo Merli more readable here.
----
2019-03-06 18:43:06 UTC - Matteo Merli: The strange thing are the `_` :slightly_smiling_face:
----
2019-03-06 18:43:18 UTC - Matteo Merli: I haven’t seen them when generating tokens
----
2019-03-06 18:43:51 UTC - Matteo Merli: Can you try remove them (just for the sake of seeing if they are the problem here) ?
----
2019-03-06 18:44:35 UTC - Alexandre DUVAL: They appear when you use `.` in your subject.
----
2019-03-06 18:44:38 UTC - Alexandre DUVAL: I'm trying.
----
2019-03-06 18:48:12 UTC - Alexandre DUVAL: Same issue without.
----
2019-03-06 18:49:35 UTC - Matteo Merli: Ok, but client passing these tokens works, right?
----
2019-03-06 18:49:44 UTC - Alexandre DUVAL: Yes.
----
2019-03-06 18:49:50 UTC - Matteo Merli: Can you get a tcdpump of both cases?
----
2019-03-06 18:50:10 UTC - Matteo Merli: tcpdump -i any -w /tmp/test.pcap -s 0 port 6650 -v
----
2019-03-06 18:50:24 UTC - Alexandre DUVAL: On the broker?
----
2019-03-06 18:52:18 UTC - Matteo Merli: Yes
----
2019-03-06 18:54:46 UTC - Alexandre DUVAL: Hum, I'm not a tcpdump master but I'm using wireguard so the dump will be encrypted :confused:.
----
2019-03-06 18:55:06 UTC - Alexandre DUVAL: I'll run it from the broker.
----
2019-03-06 18:55:54 UTC - Matteo Merli: Oh is it going through TLS  ?
----
2019-03-06 18:56:42 UTC - Alexandre DUVAL: It is, yes.
----
2019-03-06 18:57:02 UTC - Matteo Merli: ok.. that makes it difficult then
----
2019-03-06 19:01:11 UTC - Alexandre DUVAL: You agree that the bearer token should be passed without the prefix `token:`, right?
----
2019-03-06 19:01:42 UTC - Matteo Merli: Correct
----
2019-03-06 19:02:58 UTC - Matteo Merli: it should be `Authorization: Bearer xxxx.aaaaa.zzzzzz`
----
2019-03-06 19:06:30 UTC - Alexandre DUVAL: Sure.
----
2019-03-06 19:08:58 UTC - Alexandre DUVAL: What do you think? I failed something in the token generation which would be very sad :confused:. Or something on pulsar side?
----
2019-03-06 19:11:39 UTC - Matteo Merli: Not sure. I’d try to see that without TLS to check the HTTP request
----
2019-03-06 19:12:02 UTC - Matteo Merli: in both cases, and understand the difference
----
2019-03-06 19:13:00 UTC - Grant Wu: It seems that `ws` isn’t doing any additional escaping
----
2019-03-06 19:13:11 UTC - Grant Wu: It’s just passing the URL literally to node’s `http.get`
----
2019-03-06 19:16:25 UTC - Grant Wu: Yeah, @Matteo Merli, I don’t think it’s working, this seems like a smoking gun to me:

```
19:14:58.299 [pulsar-client-io-53-6] INFO  org.apache.pulsar.client.impl.ConsumerImpl - [<persistent://public/default/pusheennstormy%5B%5D>][3ab32f35-6c3f-4e8a-a120-409091bb3cea] Subscribing to topic on cnx [id: 0x190c12dc, L:/10.244.1.53:37080 - R:10.244.2.32/10.244.2.32:6650]
19:14:58.358 [pulsar-client-io-53-6] INFO  org.apache.pulsar.client.impl.ConsumerImpl - [<persistent://public/default/pusheennstormy%5B%5D>][3ab32f35-6c3f-4e8a-a120-409091bb3cea] Subscribed to topic on 10.244.2.32/10.244.2.32:6650 -- consumer: 3
19:14:58.359 [pulsar-web-30-25] INFO  org.eclipse.jetty.server.RequestLog - 10.244.0.61 - - [06/Mar/2019:19:14:58 +0000] "GET //pulsar-broker.petuum-system:8080/ws/v2/consumer/persistent/public/default/pusheennstormy%5B%5D/3ab32f35-6c3f-4e8a-a120-409091bb3cea HTTP/1.1" 101 0 "-" "-"  70
19:14:58.359 [pulsar-web-30-25] INFO  org.apache.pulsar.websocket.AbstractWebSocketHandler - [/10.244.0.61:46224] New WebSocket session on topic <persistent://public/default/pusheennstormy%5B%5D>
```
----
2019-03-06 19:20:16 UTC - Matteo Merli: Ok. I don’t know if there’s an easy fix for that
----
2019-03-06 19:20:56 UTC - Matteo Merli: Possibly in the websocket handler to ensure URLencoded names are decoded
----
2019-03-06 19:22:05 UTC - Grant Wu: :disappointed:
----
2019-03-06 19:22:10 UTC - Grant Wu: I guess I should file a bug for this
----
2019-03-06 19:23:51 UTC - Grant Wu: <https://github.com/apache/pulsar/issues/3768> wait, is there a known deadlock issue with Pulsar Functions?
----
2019-03-06 19:23:58 UTC - Grant Wu: @Rajan Dhabalia?
----
2019-03-06 19:23:58 UTC - Alexandre DUVAL: I removed the TLS to test, but wireguard still doing its job using UDP packets so can't provide it.
----
2019-03-06 19:42:35 UTC - Matteo Merli: UDP ?
----
2019-03-06 19:43:28 UTC - Matteo Merli: got it. But if you do the capture on broker host, you’ll have the clear text tcp stream
----
2019-03-06 19:47:52 UTC - Alexandre DUVAL: My bad you are right, was on wrong port.
----
2019-03-06 19:47:54 UTC - Alexandre DUVAL: ```GET /admin/v2/non-persistent/yo/logs HTTP/1.1                                                    
Authorization:.BearerieyJhbGrgargiJIUzI1NiJ9.eyJzdWgagJzdXBlciJ9.sqdsdf-70QuYZtvncbYY4M7oL0
User-Agent: Jersey/2.27 (HttpUrlConnection 1.8.0_192)                                           )
Host: <http://c1-pulsar-yo-customers.services.yo.com:2000|c1-pulsar-yo-customers.services.yo.com:2000>
Accept: application/json
Via: http/1.1 yo-pulsar-c1-n4
X-Forwarded-For: 10.2.0.1
X-Forwarded-Proto: https
X-Forwarded-Host: <http://c1-pulsar-yo-customers.services.yo.com:2000|c1-pulsar-yo-customers.services.yo.com:2000>
X-Forwarded-Server: 10.2.1.4
X-Original-Principal: super   ```
----
2019-03-06 19:49:17 UTC - Alexandre DUVAL: It is not the same token Oo.
----
2019-03-06 19:49:36 UTC - Matteo Merli: This is through proxy though
----
2019-03-06 19:50:07 UTC - Matteo Merli: also the header looks messed up
----
2019-03-06 19:51:00 UTC - Alexandre DUVAL: Yes, I change my client.conf to hit the broker and I retry.
----
2019-03-06 20:55:42 UTC - Ali Ahmed: @Grant Wu this should fix the issue
<https://github.com/apache/pulsar/pull/3772>
----
2019-03-06 20:56:20 UTC - Grant Wu: But is there an underlying deadlock in function-worker?
----
2019-03-06 20:56:35 UTC - Grant Wu: I’m wondering if it could be related to <https://github.com/apache/pulsar/issues/3715>
----
2019-03-06 20:57:43 UTC - Ali Ahmed: I don’t think that’s related
----
2019-03-06 21:04:08 UTC - Jerry Peng: @Grant Wu that only happens when running function via ThreadRuntime which is not the default. Its not a deadlock per se.  Just takes extra long occasionally to stop a function instance running as a thread in the worker process since in java there isn’t a great way to just kill a thread.
----
2019-03-06 23:39:03 UTC - jia zhai: will update the doc by issue #3773
----
2019-03-07 00:34:34 UTC - Ali Ahmed: I am experimenting with supporting windows builds for pulsar, made some progress
<https://github.com/aahmed-se/incubator-pulsar/blob/win1/.appveyor.yml>
----
2019-03-07 00:34:49 UTC - Ali Ahmed: <https://ci.appveyor.com/project/aahmed-se/incubator-pulsar>
----
2019-03-07 00:35:23 UTC - Ali Ahmed: I need to find a boost-python package for windows am not able to locate one in mysys2
----
2019-03-07 00:35:32 UTC - Ali Ahmed: it anyone know of one let me know
----