You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2018/01/08 17:31:10 UTC

Slack digest for #general - 2018-01-08

2018-01-07 18:53:40 UTC - Daniel Ferreira Jorge: Hello, I am trying to deploy the kubernetes manifests following the exact instructions on the documentation and everything goes as expected, except that "pulsar-perf produce" produces 0 messages per second... and if I go to the broker logs I see thousands of messages like: Write did not succeed to 10.142.0.19:3181, bookieIndex 1, but we have already fixed it.
----
2018-01-07 18:55:06 UTC - Matteo Merli: it looks like all writes are failing on the bookie
----
2018-01-07 18:55:34 UTC - Matteo Merli: or that it’s not reachable by the broker (though it’s registered as available in ZK)
----
2018-01-07 18:56:16 UTC - Daniel Ferreira Jorge: I can ping the any bookie from any broker
----
2018-01-07 18:56:39 UTC - Matteo Merli: bookies logs are telling anything?
----
2018-01-07 18:56:54 UTC - Daniel Ferreira Jorge: nope... just initialized
----
2018-01-07 18:57:43 UTC - Matteo Merli: any other error/warn message in broker log? It should say something on the reason why the write failed in the first place
----
2018-01-07 18:57:43 UTC - Daniel Ferreira Jorge: nothing is happening
----
2018-01-07 18:59:33 UTC - Matteo Merli: Can it be a problem of the IP that it’s being advertised by bookies?&gt;
----
2018-01-07 18:59:45 UTC - Matteo Merli: Are you using StatefulSet or DaemonSet?
----
2018-01-07 18:59:59 UTC - Daniel Ferreira Jorge: daemon
----
2018-01-07 19:00:12 UTC - Daniel Ferreira Jorge: the exact deployment from the repository
----
2018-01-07 19:00:13 UTC - Matteo Merli: are you setting the `advertisedAddress` in bookie config?
----
2018-01-07 19:00:17 UTC - Daniel Ferreira Jorge: absolutely nothing changed
----
2018-01-07 19:00:20 UTC - Matteo Merli: ok
----
2018-01-07 19:01:41 UTC - Daniel Ferreira Jorge: the advertisedAddress is status.hostIP
----
2018-01-07 19:02:27 UTC - Daniel Ferreira Jorge: and the ip 10.142.0.19:3181 is the ip of a bookie node
----
2018-01-07 19:02:36 UTC - Daniel Ferreira Jorge: so the broker knows where it is
----
2018-01-07 19:02:44 UTC - Matteo Merli: ok, telnetting there from broker works?
----
2018-01-07 19:03:13 UTC - Daniel Ferreira Jorge: pinging "bookie" works
----
2018-01-07 19:03:49 UTC - Matteo Merli: my concern is that the Pod is not being exposed on the host network
----
2018-01-07 19:04:22 UTC - Daniel Ferreira Jorge: I tried pinging everything from everywhere
----
2018-01-07 19:04:24 UTC - Matteo Merli: broker might be able to “ping” but the Pod still needs to be bound on 3181 in the bookie host machine
----
2018-01-07 19:04:38 UTC - Matteo Merli: try telnet instead of ping
----
2018-01-07 19:04:45 UTC - Daniel Ferreira Jorge: ok
----
2018-01-07 19:04:49 UTC - Daniel Ferreira Jorge: let me redeploy
----
2018-01-07 19:04:56 UTC - Daniel Ferreira Jorge: I will report in 5 min
----
2018-01-07 19:05:05 UTC - Matteo Merli: :+1:
----
2018-01-07 19:06:03 UTC - Matteo Merli: uhm, just looking the bookie.yaml file
----
2018-01-07 19:06:48 UTC - Matteo Merli: I think the problem is indeed the IP exposed
----
2018-01-07 19:06:54 UTC - Matteo Merli: In this change   <https://github.com/apache/incubator-pulsar/pull/764>
----
2018-01-07 19:07:26 UTC - Matteo Merli: I had put the hostIP.. but this was missing the change to bind the bookie on `hostNetwork`
----
2018-01-07 19:07:28 UTC - Matteo Merli: :confused:
----
2018-01-07 19:10:34 UTC - Daniel Ferreira Jorge: so, I have to remove the advertised address?
----
2018-01-07 19:10:55 UTC - Matteo Merli: try this:
----
2018-01-07 19:10:55 UTC - Matteo Merli: <https://gist.github.com/merlimat/dad357c1cccde8e0b634a9639e1fcb16>
----
2018-01-07 19:11:36 UTC - Matteo Merli: enabling `hostNetwork` tells Kubernetes to expose 3181 in the host network (rather than just bind it on the Pod IP)
----
2018-01-07 19:12:32 UTC - Daniel Ferreira Jorge: great! trying now... will report back soon! thank you @Matteo Merli
----
2018-01-07 19:25:25 UTC - Daniel Ferreira Jorge: @Matteo Merli if I try to deploy with hostNetwork, the bookies fail to start because it cannot find zookeeper anymore "zk-0.zookeeper: Name or service not known"
----
2018-01-07 19:26:08 UTC - Matteo Merli: ok, let me try it as well
----
2018-01-07 19:26:56 UTC - Daniel Ferreira Jorge: ok, im using kube 1.8.4 on GKE
----
2018-01-07 19:28:38 UTC - Sijie Guo: I think you need to expose hostPort?
----
2018-01-07 19:29:31 UTC - Matteo Merli: yes, I got confused with hostNetwork but that’s going too far
----
2018-01-07 19:29:35 UTC - Sijie Guo: ports:
                  - name: client
                    containerPort: 3181
                    # we are using `status.hostIP` for the bookie's advertised address. export 3181 as the hostPort,
                    # so that the containers are able to access the host port
                    hostPort: 3181
----
2018-01-07 19:38:49 UTC - Daniel Ferreira Jorge: exposing the hostPort works... but isn't it against best practices?
----
2018-01-07 19:39:15 UTC - Matteo Merli: the restriction with host port is that you can only have 1 pod per host
----
2018-01-07 19:39:32 UTC - Matteo Merli: but that’s anyway implied by using DaemonSet
----
2018-01-07 19:42:02 UTC - Sijie Guo: Bookkeeper needs a reliable ID for bookie advertisement. Unfortunately in daemonset, host ip is the only way to achieve that. Because pod ip can change when pod being restarted.
----
2018-01-07 19:43:19 UTC - Matteo Merli: yes, StatefulSet is better for that because it preserves the Pod IP, but the support for local volumes is still a bit green
----
2018-01-07 19:43:57 UTC - Daniel Ferreira Jorge: yes, I'm only trying to deploy the manifests supplied, because I'm getting an error with the helm chart I'm making... In my chart, the brokers cannot find zookeeper... I'm trying to debug that but with no success so far... the bookies are deployed as statefulsets with useHostNameAsBookieID: "true"
----
2018-01-07 19:45:04 UTC - Daniel Ferreira Jorge: but when the brokers are deployed, they do not start because they cannot find zookeeper... the bookies found zookeeper
----
2018-01-07 19:45:52 UTC - Matteo Merli: that is strange, do they use the same zk connection string?
----
2018-01-07 19:46:13 UTC - Daniel Ferreira Jorge: the exact same
----
2018-01-07 19:46:44 UTC - Daniel Ferreira Jorge: I spent 7 hours trying to find something wrong...
----
2018-01-07 19:47:19 UTC - Matteo Merli: <https://github.com/apache/incubator-pulsar/pull/1035>
----
2018-01-07 19:49:31 UTC - Daniel Ferreira Jorge: also, with the manifests from the repo, nothing is shown in grafana
----
2018-01-07 19:49:45 UTC - Daniel Ferreira Jorge: the pulsar dashboard works
----
2018-01-07 19:49:54 UTC - Daniel Ferreira Jorge: maybe there is some config missing
----
2018-01-07 19:53:58 UTC - Daniel Ferreira Jorge: @Daniel Ferreira Jorge uploaded a file: <https://apache-pulsar.slack.com/files/U8E1J0DHS/F8PCK1JF7/stack.txt|stack.txt> and commented: The error I'm getting when initializing a broker with my chart is this
----
2018-01-07 19:55:51 UTC - Daniel Ferreira Jorge: the bookies can access the "alpha-pulsar-zookeeper-X.alpha-pulsar-zookeeper", and the metadata is already initialized in zookeeper
----
2018-01-07 20:39:10 UTC - Matteo Merli: does the container restart at that point?
----
2018-01-07 20:39:20 UTC - Daniel Ferreira Jorge: yes, many times
----
2018-01-07 20:40:22 UTC - Matteo Merli: there should be no difference with what the bookies are doing then
----
2018-01-07 20:40:42 UTC - Daniel Ferreira Jorge: I even tried putting an init container on the broker to wait for like 5min to make sure everything else is already up
----
2018-01-07 20:40:53 UTC - Matteo Merli: and the DNS error while it might happen before the ZK pods are active, it should resolve after that
----
2018-01-07 20:41:48 UTC - Matteo Merli: one thing catches the eye:
----
2018-01-07 20:41:50 UTC - Matteo Merli: `alpha-pulsar-zookeeper-0.alpha-pulsar-zookeeper, alpha-pulsar-zookeeper-1.alpha-pulsar-zookeeper, alpha-pulsar-zookeeper-2.alpha-pulsar-zookeeper`
----
2018-01-07 20:41:57 UTC - Matteo Merli: there’s a space after the `,`
----
2018-01-07 20:42:19 UTC - Matteo Merli: looks like the DNS name it’s trying to use it’s ` alpha-pulsar-zookeeper-1.alpha-pulsar-zookeeper`
----
2018-01-07 20:42:40 UTC - Matteo Merli: so, picking zk-0 works but zk-1 and zk-2 won’t
----
2018-01-07 20:43:26 UTC - Daniel Ferreira Jorge: well, that may be the issue... I will try to change that, the bookkeeper pods have the same string with spaces
----
2018-01-07 20:43:34 UTC - Daniel Ferreira Jorge: and it works there
----
2018-01-07 20:43:52 UTC - Matteo Merli: ZK clients picks one random server to connect
----
2018-01-07 20:44:14 UTC - Matteo Merli: if you reach to the first in the list it won’t have the extra space
----
2018-01-07 20:44:40 UTC - Matteo Merli: can you check the BK log for which ZK server it actually connected to ?
----
2018-01-07 20:44:59 UTC - Daniel Ferreira Jorge: sure
----
2018-01-07 20:45:07 UTC - Daniel Ferreira Jorge: I will report back in 2 min
+1 : Matteo Merli
----
2018-01-07 20:49:58 UTC - Daniel Ferreira Jorge: @Daniel Ferreira Jorge uploaded a file: <https://apache-pulsar.slack.com/files/U8E1J0DHS/F8PCTFMV3/stack.txt|stack.txt> and commented: This is the logs from BK-0 pod... seems it connected with the ZK-2...
----
2018-01-07 20:50:44 UTC - Daniel Ferreira Jorge: BK-1 and BK-2 connected to ZK-1
----
2018-01-07 20:51:00 UTC - Matteo Merli: Uhm, interesting
----
2018-01-07 20:51:19 UTC - Matteo Merli: It might be related to how the property file is loaded
----
2018-01-07 20:51:42 UTC - Daniel Ferreira Jorge: maybe BK strips the string? and the broker does not?
----
2018-01-07 20:53:15 UTC - Matteo Merli: I think in broker, we’re just picking that as a String
----
2018-01-07 20:53:27 UTC - Matteo Merli: in BK, it’s reading the property as a list:
----
2018-01-07 20:53:28 UTC - Matteo Merli: <https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/AbstractConfiguration.java#L157>
----
2018-01-07 20:56:02 UTC - Daniel Ferreira Jorge: it was the spaces... 7 hours on this man... 7!
----
2018-01-07 20:56:14 UTC - Daniel Ferreira Jorge: I removed and it works now
----
2018-01-07 20:56:57 UTC - Daniel Ferreira Jorge: unbelievable
----
2018-01-07 20:57:17 UTC - Daniel Ferreira Jorge: thanks for the help @Matteo Merli
----
2018-01-07 20:57:55 UTC - Matteo Merli: :grinning:
----
2018-01-07 21:10:17 UTC - Matteo Merli: Sorry for that
----
2018-01-07 21:11:03 UTC - Daniel Ferreira Jorge: sorry for what??
----