You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2018/06/13 09:11:02 UTC

Slack digest for #general - 2018-06-13

2018-06-12 14:04:45 UTC - Karthik Palanivelu: Hi All, This is regarding the Zookeeper in K8 cluster. I am using the script generate-zookeeper-config.sh without DOMAIN based on @Sijie Guo comment to my earlier issue which I raised couple of days back. But not I am getting the below exception for the zookeeper.conf entries. Please help me here.
```
server.1=zookeeper-0:2888:3888
server.2=zookeeper-1:2888:3888
server.3=zookeeper-2:2888:3888
``` 
```
09:46:34.875 [WorkerSender[myid=3]] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager - Cannot open channel to 1 at election address zookeeper-0:3888
java.net.UnknownHostException: zookeeper-0
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184) ~[?:1.8.0_161]
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.8.0_161]
	at java.net.Socket.connect(Socket.java:589) ~[?:1.8.0_161]
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:562) [org.apache.pulsar-pulsar-zookeeper-2.0.0-rc1-incubating.jar:2.0.0-rc1-incubating]
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:538) [org.apache.pulsar-pulsar-zookeeper-2.0.0-rc1-incubating.jar:2.0.0-rc1-incubating]
	at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:452) [org.apache.pulsar-pulsar-zookeeper-2.0.0-rc1-incubating.jar:2.0.0-rc1-incubating]
	at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:433) [org.apache.pulsar-pulsar-zookeeper-2.0.0-rc1-incubating.jar:2.0.0-rc1-incubating]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
09:46:34.876 [WorkerSender[myid=3]] WARN  org.apache.zookeeper.server.quorum.QuorumPeer - Failed to resolve address: zookeeper-0
java.net.UnknownHostException: zookeeper-0
	at java.net.InetAddress.getAllByName0(InetAddress.java:1280) ~[?:1.8.0_161]
	at java.net.InetAddress.getAllByName(InetAddress.java:1192) ~[?:1.8.0_161]
	at java.net.InetAddress.getAllByName(InetAddress.java:1126) ~[?:1.8.0_161]
	at java.net.InetAddress.getByName(InetAddress.java:1076) ~[?:1.8.0_161]
	at org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:166) [org.apache.pulsar-pulsar-zookeeper-2.0.0-rc1-incubating.jar:2.0.0-rc1-incubating]
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:595) [org.apache.pulsar-pulsar-zookeeper-2.0.0-rc1-incubating.jar:2.0.0-rc1-incubating]
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:538) [org.apache.pulsar-pulsar-zookeeper-2.0.0-rc1-incubating.jar:2.0.0-rc1-incubating]
	at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:452) [org.apache.pulsar-pulsar-zookeeper-2.0.0-rc1-incubating.jar:2.0.0-rc1-incubating]
	at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:433) [org.apache.pulsar-pulsar-zookeeper-2.0.0-rc1-incubating.jar:2.0.0-rc1-incubating]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
```
----
2018-06-12 14:06:53 UTC - Ivan Kelly: don't know specifically for k8s, but it can't resolve the zookeeper-0 to an IP
----
2018-06-12 14:20:58 UTC - David Asher: Is it possible to publish messages &gt; 5MB on a pulsar topic?
----
2018-06-12 14:22:01 UTC - Karthik Palanivelu: Yes @Ivan Kelly but not sure how to get the hostname of it. I am new to k8s
----
2018-06-12 14:22:23 UTC - Ivan Kelly: @David Asher i think the 5MB limit is hard coded
----
2018-06-12 14:22:59 UTC - Ivan Kelly: @Karthikeyan Palanivelu how are you starting things in k8s?
----
2018-06-12 14:24:49 UTC - David Asher: @Ivan Kelly thx... so the only way around is storing it somewhere else and reference the external id?
----
2018-06-12 14:25:41 UTC - Ivan Kelly: Ya, that's one option. there was a bit of a discussion about unlimited message size a while back, but I'm not sure what happened with it
----
2018-06-12 14:26:35 UTC - Ivan Kelly: effectively, it would be transparently chunking the message into multiple messages, and sending a "commit" message at the end.
----
2018-06-12 14:32:51 UTC - Karthik Palanivelu: @Ivan Kelly We have a K8s cluster which I used to deploy the Zookeeper containers based on my image with similar yaml that is in repo along with the script. I use `./pulsar zookeeper` cmd within my image.
----
2018-06-12 14:36:26 UTC - Ivan Kelly: how do you set up the zookeeper.conf?
----
2018-06-12 14:40:29 UTC - Karthik Palanivelu: Using the below script:
----
2018-06-12 14:40:36 UTC - Karthik Palanivelu: @Karthik Palanivelu uploaded a file: <https://apache-pulsar.slack.com/files/U7VRE0Q1G/FB63M16P4/-.sh|Untitled>
----
2018-06-12 14:40:46 UTC - Karthik Palanivelu: Same from the repo; only change is I removed domain which is not resolving
----
2018-06-12 14:41:21 UTC - Ivan Kelly: @Karthikeyan Palanivelu are you just running "./pulsar zookeeper" on container start? nothing else?
----
2018-06-12 14:41:30 UTC - Karthik Palanivelu: Yes
----
2018-06-12 14:42:34 UTC - Karthik Palanivelu: I believe I am making mistake at the below config:
```
server.1=zookeeper-0:2888:3888
server.2=zookeeper-1:2888:3888
server.3=zookeeper-2:2888:3888
```
----
2018-06-12 14:43:24 UTC - Ivan Kelly: is your statefulset called zookeeper?
----
2018-06-12 14:43:46 UTC - Ivan Kelly: will you post your k8s deployment yaml?
----
2018-06-12 14:48:49 UTC - Karthik Palanivelu: @Karthik Palanivelu uploaded a file: <https://apache-pulsar.slack.com/files/U7VRE0Q1G/FB5CPJTJL/-.php|Untitled>
----
2018-06-12 14:53:08 UTC - Ivan Kelly: strange. It should be named zookeeper-0, etc in the dns
----
2018-06-12 14:53:19 UTC - Ivan Kelly: could you post the contents on /etc/hosts from one of the pods?
----
2018-06-12 14:56:17 UTC - Karthik Palanivelu: Are you referring to container or POD? If POD, Can you please let me know how to get it?
----
2018-06-12 14:56:31 UTC - Ivan Kelly: container
----
2018-06-12 14:58:07 UTC - Karthik Palanivelu: @Karthik Palanivelu uploaded a file: <https://apache-pulsar.slack.com/files/U7VRE0Q1G/FB5CXM15E/-.txt|Untitled>
----
2018-06-12 15:00:43 UTC - Ivan Kelly: can you ping zookeeper-0 from the same container?
----
2018-06-12 15:02:10 UTC - Karthik Palanivelu: Within my container I do not have ping, on host it is UnknownHost
----
2018-06-12 15:02:51 UTC - Ivan Kelly: nslookup? nc?
----
2018-06-12 15:03:38 UTC - Karthik Palanivelu: @Karthik Palanivelu uploaded a file: <https://apache-pulsar.slack.com/files/U7VRE0Q1G/FB647HWUA/-.m|Untitled>
----
2018-06-12 15:04:35 UTC - Ivan Kelly: cat /etc/resolv.conf &amp; /etc/nsswitch.conf?
----
2018-06-12 15:08:56 UTC - Ivan Kelly: does nslookup zookeeper-0.zookeeper work?
----
2018-06-12 15:09:44 UTC - Karthik Palanivelu: yes
----
2018-06-12 15:10:12 UTC - Ivan Kelly: ok, strange, k8s docs say just the statefulset-&lt;number&gt; should work
----
2018-06-12 15:10:45 UTC - Ivan Kelly: but anyhow, you need to change the ZOOKEEPER_SERVERS env variable in the deployment spec
----
2018-06-12 15:11:03 UTC - Ivan Kelly: to be zookeeper-0.zookeeper,zookeeper-1.zookeeper,zookeeper-2.zookeeper
----
2018-06-12 15:13:10 UTC - Karthik Palanivelu: I tried that Option as well and it did not work. Let me try one more time.
----
2018-06-12 15:13:58 UTC - Ivan Kelly: how does the zookeeper.conf look after you do that?
----
2018-06-12 15:22:59 UTC - Karthik Palanivelu: I think it worked, Thank You @Ivan Kelly. BTW can you please let me know what would be the host name for Broker to be used within Intialize Cluster Data
----
2018-06-12 15:24:32 UTC - Ivan Kelly: broker-0.broker I would guess, but can't be sure, it's not a stateful set, so rules may be different
----
2018-06-12 15:34:34 UTC - Karthik Palanivelu: Bookie also got started...Working on Broker. BTW why do we need to have bookie as Daemon Set, Initialization and autoRecovery; It was not working for me and I just used Deployment option
----
2018-06-12 16:50:53 UTC - Matteo Merli: @Karthikeyan Palanivelu the broker URL will be the DNS name associated with the brokers service. either `broker` or similar
----
2018-06-12 16:59:06 UTC - Karthik Palanivelu: Thanks @Matteo Merli Let me try that
----
2018-06-12 16:59:26 UTC - Guillaume LECROC: @David Asher <https://github.com/apache/incubator-pulsar/issues/523>
I think the max message size is configurable in bookeeper and/or pulsar
----
2018-06-12 17:54:36 UTC - William Fry: What would it take to use Spark’s structured streaming with Pulsar via PySpark?
----
2018-06-12 17:58:06 UTC - Sijie Guo: @William Fry I think we need a python based pulsar input sources implementing spark data frames or streaming datasets interface.
----
2018-06-12 18:07:58 UTC - William Fry: Gotcha, how would I push for that to happen? A ticket on Github?
----
2018-06-12 18:09:13 UTC - William Fry: Would it be very difficult to implement?
----
2018-06-12 18:13:15 UTC - Matteo Merli: The Pulsar py API should be straightforward to use. If you're familiar with PySpark, it should be easy to create an adaptor
----
2018-06-12 18:19:07 UTC - Alex Bradbury: @Alex Bradbury has joined the channel
----
2018-06-12 18:26:36 UTC - Alex Bradbury: Hi @durga, how did you get on with your prototype? Were you able to send large binary messages? I'm considering something similar. Thanks!
----
2018-06-12 19:28:59 UTC - Sijie Guo: @William Fry I think it should be fairly simple. I’ve checked the pyspark there are already multiple input sources there, for example kafka.py <https://github.com/apache/spark/blob/master/python/pyspark/streaming/kafka.py>

I think one interesting question is where to host the pulsar python source code, is it in pulsar, in spark or some 3rd party repo. my feeling is it might be better to contribute the python one back to spark, since it might be easier to manage pyspark dependencies. although I am not a python export, @Matteo Merli or @Sanjeev Kulkarni might have a better thought on this
----
2018-06-12 19:33:19 UTC - Sanjeev Kulkarni: Where are spark connectors usually based?
----
2018-06-12 20:07:55 UTC - Ali Ahmed: @William Fry @Matteo Merli @Sanjeev Kulkarni I don’t think it’s simple my understanding pyspark connectors are wrappers over java code, so you need to write both at the same time
----
2018-06-12 20:09:20 UTC - William Fry: Interesting, I believe there’s already a Spark connector for Pulsar written in Java
----
2018-06-12 20:09:29 UTC - William Fry: just nothing for PySpark
----
2018-06-12 20:10:25 UTC - Ali Ahmed: basically the java code is called via py4j
----
2018-06-12 20:12:01 UTC - Matteo Merli: I don't think it has to do anything with java, spark itself already is bridging python from Java. There's no need to go back to Java, if you can use a Py library library like pulsar client
----
2018-06-12 20:15:08 UTC - Ali Ahmed: I remember you  had to implicitly call java methods from python like so
<https://github.com/radanalyticsio/streaming-amqp/blob/master/python/amqp.py#L30>
----
2018-06-13 06:40:59 UTC - Idan: @Idan uploaded a file: <https://apache-pulsar.slack.com/files/UALJD8929/FB61UDBPB/-.java|Untitled>
----
2018-06-13 06:41:23 UTC - Idan: also would be great to know how to stop the standalone gracefully seems like when I cntrl+C the process (mac) it always has issues coming back again
----
2018-06-13 06:43:17 UTC - Sijie Guo: @Idan which version of this? and can you describe your command sequence?
----
2018-06-13 06:44:13 UTC - Idan: apache-pulsar-2.0.0-rc1-incubating
----
2018-06-13 06:44:23 UTC - Idan: doing via /bin/plusar standalone
----
2018-06-13 06:44:34 UTC - Idan: then it comes up with the end of the log I just sent ya
----
2018-06-13 06:44:59 UTC - Idan: to shutdown usually iam just cntrl+c: ^Z
[1]+  Stopped                 ./pulsar standalone
ip-10-8-0-10:bin idanfridman$
----
2018-06-13 06:45:47 UTC - Sijie Guo: let me produce
----
2018-06-13 06:47:06 UTC - Idan: pretty naive sequence
----
2018-06-13 06:48:43 UTC - Sijie Guo: @Idan I tried download the tarball and run the sequence. I didn’t see the problem though.

is it a new tarball downloaded or have you run with older versions before?
----
2018-06-13 06:49:00 UTC - Idan: actually it’s the first one I used
----
2018-06-13 06:49:23 UTC - Idan: perhaps via logs we can nail what’s wrong?
----
2018-06-13 06:49:38 UTC - Idan: something with: 09:48:42.396 [main] INFO  org.apache.bookkeeper.proto.BookieNettyServer - Shutting down BookieNettyServer
----
2018-06-13 06:50:10 UTC - Idan: i shutted down everything. perhaps you can catch here unreleased used port:
----
2018-06-13 06:50:19 UTC - Idan: @Idan uploaded a file: <https://apache-pulsar.slack.com/files/UALJD8929/FB6HZFYLA/-.java|Untitled>
----
2018-06-13 06:53:46 UTC - Sijie Guo: @Idan: the ports look good. the logging basically says it can’t find /ledgers/cookies, when starting up. it is weird that this would happen on a standalone instance, unless the data directory is on a tempfs directory. this is a bit strange.

can you do following:

1) in the pulsar directory : copy the data as a backup : ‘mv data data_back’
2) run `bin/pulsar standalone` again to see if standalone can come up
----
2018-06-13 06:55:54 UTC - Idan: Ok left my comp for a hour ill do that and show results
----
2018-06-13 06:59:10 UTC - Sijie Guo: sure. ping me when you have results
----