You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2019/05/17 09:11:02 UTC

Slack digest for #general - 2019-05-17

2019-05-16 09:47:18 UTC - jia zhai: @Shivji Kumar Jha  Here is the link for S3 config: <https://pulsar.apache.org/docs/en/cookbooks-tiered-storage/#configuring-the-offload-driver>

seems role is not supported yet.
----
2019-05-16 09:49:34 UTC - jia zhai: @eric.olympe Is there a the detailed requirement for “fanout” pattern?
----
2019-05-16 09:57:44 UTC - Shivji Kumar Jha: @jia zhai the documentation does not say explicitly so i assumed it does… keys are actually considered bad security practice so we prefer roles.. Do you know if this is a pulsar limitation or jcloud library that pulsar uses for offloading?
----
2019-05-16 10:18:06 UTC - jia zhai: right, currently we leverage jclouds, and following jclouds way
----
2019-05-16 11:01:08 UTC - eric.olympe: @jia zhai I mean broadcast messages to all consumers of a topic : all consumers receive the same messages, no load balancing.
----
2019-05-16 11:26:51 UTC - jia zhai: @eric.olympe then you could create several subscriptions, each with a consumer connected
----
2019-05-16 11:28:19 UTC - jia zhai: a topic could have different subscriptions, each subscription will receive all the messages in the topic
----
2019-05-16 12:08:27 UTC - eric.olympe: @jia zhai Pulsar has push or pull mode ?
----
2019-05-16 12:21:03 UTC - jia zhai: @eric.olympe It is push + pull.
----
2019-05-16 12:23:09 UTC - eric.olympe: you mean push for producing, pull for consuming ?
----
2019-05-16 12:24:50 UTC - jia zhai: once consumer is connected, it will send a CommandFlow to broker. CommandFlow contains the message number that consumer could cache in its queue; and broker acked this command by pushing a lot of messages to consumer.
----
2019-05-16 12:26:19 UTC - jia zhai: But if currently, broker side not contains a available message, once consumer called a consumer.receive(), it is waiting pull a message from broker
----
2019-05-16 12:26:22 UTC - jia zhai: @eric.olympe
----
2019-05-16 12:29:40 UTC - jia zhai: So, if there is enough messages in broker side, and consumer has a queue to receive and cache messages, The message is pushed from broker to consumer.
----
2019-05-16 12:32:00 UTC - jia zhai: otherwise, consumer is pulling 1 message from broker.
----
2019-05-16 12:34:48 UTC - eric.olympe: @jia zhai Thanks a lot.
----
2019-05-16 12:37:42 UTC - jia zhai: welcome
----
2019-05-16 12:38:00 UTC - Brian Doran: Question re: Avro, Is it possible to use an avro.GenericRecord in a producer.send()? I am having an issue whereby having debugged to a point, I am seeing in  `org.apache.pulsar.shade.org.apache.avro.reflect.ReflectData`

&gt;private FieldAccessor[] createAccessorsFor(Schema schema) {
&gt;    List&lt;org.apache.pulsar.shade.org.apache.avro.Schema.Field&gt; avroFields = schema.getFields();
&gt;    FieldAccessor[] result = new FieldAccessor[avroFields.size()];
&gt;    org.apache.pulsar.shade.org.apache.avro.Schema.Field avroField;
&gt;    for(Iterator i$ = schema.getFields().iterator(); i$.hasNext(); result[avroField.pos()] = (FieldAccessor)this.byName.get(avroField.name())) {
&gt;        avroField = (org.apache.pulsar.shade.org.apache.avro.Schema.Field)i$.next();
&gt;    }
&gt;    return result;
&gt;}

even though the fields exist in the schema, I am getting a FieldAccessor[] of null values.
----
2019-05-16 12:55:51 UTC - tuteng: You can refer to this code <https://github.com/apache/pulsar/blob/78502a3cfae0d789cea667c4829830487517b7ea/pulsar-client/src/test/java/org/apache/pulsar/client/impl/schema/SchemaBuilderTest.java>
----
2019-05-16 12:58:06 UTC - tuteng: It seems impossible to directly use avro.GenericRecord. Pulsar encapsulates class avro.GenericRecord
----
2019-05-16 12:59:06 UTC - Brian Doran: That's what I thought @tuteng Thanks for taking a look.
----
2019-05-16 13:08:18 UTC - Brian Doran: Actually @tuteng that is more specifically for the generation of the Avro schema, this can be done with an  an avro schema via
&gt;Schema.AVRO(
&gt;                    SchemaDefinition.builder()
&gt;                            .withJsonDef(schema)
&gt;                            .build());
----
2019-05-16 14:58:58 UTC - Addison Higham: hrm... that's unfortunate... just digging into the jcloud and they don't even wrap the AWS SDKs, where usually you could just rely on the credential provider chain even if it wasn't plumbed all the way through :confused:
----
2019-05-16 16:36:26 UTC - Byron: At the Kafka summit there was a key slide in one of the keynotes. “overprovision or outage: pick one” the argument seems moot in the Pulsar context based on my understanding. If storage is an issue, more bookies are added. If CPU is an issue, more brokers are added. Throughput is dependent on the topic structure and semantics (i.e. is it partitioned and are there corresponding consumers of those partitions)
white_check_mark : Ali Ahmed, Joe Francis
----
2019-05-16 16:44:05 UTC - David Kjerrumgaard: @Byron I am not familiar with the exact issue the presenter was addressing, but in Pulsar each layer (storage &amp; serving) can scale independently. This design decision was specifically made to address the scenario you mention.
----
2019-05-16 16:45:27 UTC - Byron: Thanks. There wasn’t an issue per se, but rather the ability to support elastic scaling. As you call out, the scaling would happen at different layers.
----
2019-05-16 16:46:31 UTC - Byron: This has been a main attraction of Pulsar
----
2019-05-16 16:46:57 UTC - David Kjerrumgaard: @Byron The other key feature to scalability is the stateless nature of the Pulsar Brokers and the Pulsar Proxy, which allows newly added Broker nodes to serve old data.
----
2019-05-16 16:48:04 UTC - David Kjerrumgaard: In Kafka if you lose a serving node, then only one of the 2 remaining replicas can serve data reads. A newly added node to Kafka can't.
----
2019-05-16 16:48:46 UTC - Byron: Right
----
2019-05-16 16:53:08 UTC - Byron: I was an early Kafka “fanboy”, but after I learned about its architecture I started questioning it from an operational standpoint. Then I learned about Pulsar and the arch is much more well suited for elastic changes.
----
2019-05-16 16:57:07 UTC - Byron: Whenever there is Kafka news and heavy marketing, there is only a need to step back and re-evaluate what is true with the tech. Unfortunately the marketing can win over more often than not. I am not at all suggesting Kafka is not good software, but there are fundamental arch decisions that are limiting for operators.
----
2019-05-16 17:02:55 UTC - Byron: In any case, thanks for entertaining my question
----
2019-05-16 17:04:12 UTC - David Kjerrumgaard: @Byron We are in the same boat in that regard. I did a lot of consulting for a big Kafka provider and implemented a lot of streaming solutions around Kafka   :smiley:
+1 : Byron
----
2019-05-16 17:07:42 UTC - David Kjerrumgaard: @Byron Have you considered writing a blog post about some of your concerns with Kafka?  We need all the community support we can get :smiley:
----
2019-05-16 17:11:34 UTC - Byron: No, but I should
----
2019-05-16 17:13:26 UTC - Byron: This is a bit of an odd use case, but my initial attraction to it was the “infinite storage” support with tiered storage with a disabled retention policy for event sourcing use cases.
----
2019-05-16 17:15:11 UTC - Byron: Building an application layer around that to support causal writes with a single produce to a given topic, for an actor-based layer to parallelize writes for different entities across multiple producers (and/or partitions).
----
2019-05-16 17:15:44 UTC - Byron: From there I have had the need for supporting queues and actual streaming use cases like metrics from user actions on the client side
----
2019-05-16 17:17:12 UTC - David Kjerrumgaard: Those requirements align well with Pulsar.
----
2019-05-16 17:18:06 UTC - David Kjerrumgaard: If you need help during your evaluation, you can always post your questions here!  Great community of users to help
----
2019-05-16 17:20:45 UTC - Byron: Thanks. I have posted here several times and helped with the initial Go client (last year?). This was more of a reflection kind of comment :wink:
----
2019-05-16 17:23:04 UTC - Byron: But I should write a blog post
+1 : Matteo Merli, Karthik Ramasamy
----
2019-05-16 17:28:38 UTC - David Kjerrumgaard: FWIW, we are working on an improved Go client, due to increased demand.  So stay tuned for that
----
2019-05-16 17:35:50 UTC - David Kjerrumgaard: <https://github.com/apache/pulsar-client-go/pull/1>
100 : Byron
----
2019-05-16 17:39:38 UTC - Addison Higham: I imagine some streamlio people are here... I filled out the form but haven't had anyone reach out, we are evaluation phase of pulsar and would be curious to hear about the hosted offering
----
2019-05-16 17:40:55 UTC - Addison Higham: @David Kjerrumgaard Not sure if you can put me in touch with someone?
----
2019-05-16 17:42:51 UTC - Addison Higham: and the few other things I am getting questions about from my org:
- a ruby client
- exactly-once flink consumer
- bulk reads of segments out of tiered storage (AFAIK, there is an API for seeing where the segments are stored and relevant order information, but curious what the longer term ideas around that are)
----
2019-05-16 17:43:02 UTC - Addison Higham: wrong addison (doesn't happen very often!) but cool!
----
2019-05-16 17:43:31 UTC - David Kjerrumgaard: @Addison Higham We are finalizing the service offering now and conducting some security testing, etc before we roll it out to the general public.
----
2019-05-16 17:43:42 UTC - David Kjerrumgaard: Sorry, auto-complete error  :smiley:
----
2019-05-16 17:43:43 UTC - Jon Bock: Hi @Addison Higham, we e-mailed you a couple of times last week but may those e-mails didn’t reach you.  Can you DM me the best address to use?
----
2019-05-16 17:44:23 UTC - Jon Bock: (Sorry Addison Bair, I also used the wrong Addison).
----
2019-05-16 18:42:14 UTC - Kevin Brown: I have a question about the Pulsar Functions API. Is there a way to publish to a topic and specify a key? I don’t see a method in the Context interface that will accomplish this. Is there some other way or workaround?
----
2019-05-16 18:54:27 UTC - Devin G. Bost: What do you mean by key? That could mean a lot of things. What's your use case?
----
2019-05-16 18:56:29 UTC - Devin G. Bost: I keep getting:
`Reason: javax.ws.rs.ProcessingException: Connection refused: localhost/127.0.0.1:8080`
when trying to connect to Pulsar. It's happening with the Pulsar docker containers, and after I added the Kafka connectors to my local installation, I started getting it with my local Kafka setup as well.
----
2019-05-16 18:56:33 UTC - Devin G. Bost: Any ideas?
----
2019-05-16 19:00:40 UTC - David Kjerrumgaard: @Devin G. Bost Assuming you have done the basic diagnostics to ensure the process is up and listening on the port, etc
----
2019-05-16 19:02:02 UTC - David Kjerrumgaard: @Devin G. Bost And you installed them via these steps?  <http://pulsar.apache.org/docs/en/io-quickstart/#installing-builtin-connectors>
----
2019-05-16 19:10:01 UTC - Devin G. Bost: @David Kjerrumgaard The last command never responds. (I edited to put the message in a snippet.)
----
2019-05-16 19:10:13 UTC - Devin G. Bost: That's what I'm getting.
----
2019-05-16 19:13:57 UTC - Devin G. Bost: That's using `docker run -it   -p 6650:6650   -p 8080:8080 -p 2181:2181 -v $PWD/data:/pulsar/data   -v /data/provisioning:/data/provisioning   apachepulsar/pulsar-all   bin/pulsar standalone`
----
2019-05-16 19:14:23 UTC - David Kjerrumgaard: I had to ask the obvious first....  :smiley:
----
2019-05-16 19:14:27 UTC - Devin G. Bost: :slightly_smiling_face:
----
2019-05-16 19:14:44 UTC - David Kjerrumgaard: Which tag on the docker image?  latest?
----
2019-05-16 19:14:51 UTC - Devin G. Bost: I assume it's latest.
----
2019-05-16 19:14:54 UTC - Devin G. Bost: I could try 2.3.1
----
2019-05-16 19:15:21 UTC - David Kjerrumgaard: does telnet also hang when connecting to port 8080?
----
2019-05-16 19:15:39 UTC - Devin G. Bost: I need to install telnet to check. One moment.
----
2019-05-16 19:18:44 UTC - Devin G. Bost: @David Kjerrumgaard It's just hanging at:

```
OCPC-LM31977:bin dbost$ telnet localhost 8080
Trying ::1...
Connected to localhost.
Escape character is '^]'.```

and

```
OCPC-LM31977:bin dbost$ telnet localhost 6650
Trying ::1...
Connected to localhost.
Escape character is '^]'.
```
----
2019-05-16 19:19:53 UTC - David Kjerrumgaard: Telnet hangs for both ports?
----
2019-05-16 19:20:05 UTC - David Kjerrumgaard: or just 8080?
----
2019-05-16 19:21:29 UTC - Devin G. Bost: Both.
----
2019-05-16 19:23:50 UTC - David Kjerrumgaard: I noticed that both of the those ports have connections stuck on  CLOSE_WAIT status, which means there is unread data left in the stream and the previous client didn't close the connection by sending a FIN
----
2019-05-16 19:24:10 UTC - David Kjerrumgaard: not sure if that is blocking it from establishing new connections or not
----
2019-05-16 19:25:59 UTC - David Kjerrumgaard: Did you hard kill a client that was connected to the docker container?
----
2019-05-16 19:27:04 UTC - Devin G. Bost: Very good observation. I probably did.
----
2019-05-16 19:29:57 UTC - Devin G. Bost: Hmm, but I think I hard killed a client that was connected to a previous container. I don't think I hard killed a client with this one. If I did, I think it was already after I was getting this error.
----
2019-05-16 19:32:10 UTC - David Kjerrumgaard: ok, can you see if the CLOSE_WAIT connections are still in the netstat output?
----
2019-05-16 19:41:23 UTC - Devin G. Bost: Sure thing.
----
2019-05-16 19:42:24 UTC - Devin G. Bost: ```
OCPC-LM31977:bin dbost$ netstat -vanp tcp | grep 8080
tcp6       9      0  ::1.8080               ::1.64330              CLOSE_WAIT  407795 146808  29851      0 0x0122 0x0000010c
tcp6       0      0  ::1.64330              ::1.8080               FIN_WAIT_2  260992 146808  44876      0 0x2131 0x00000100
OCPC-LM31977:bin dbost$ netstat -vanp tcp | grep 6650
tcp6       0      0  ::1.6650               ::1.64338              CLOSE_WAIT  407795 146808  29851      0 0x0122 0x0000010c
tcp6       0      0  ::1.64338              ::1.6650               FIN_WAIT_2  260992 146808  44890      0 0x2131 0x00000100
OCPC-LM31977:bin dbost$ netstat -vanp tcp | grep 2181
tcp4       0      0  127.0.0.1.2181         127.0.0.1.64489        CLOSE_WAIT  408251 146988  29851      0 0x0122 0x0000010c
tcp4       0      0  127.0.0.1.64489        127.0.0.1.2181         FIN_WAIT_2  261312 146988  30490      0 0x2131 0x00000000
```

That's after I killed my docker container.
----
2019-05-16 19:49:17 UTC - Kevin Brown: Example org.apache.pulsar.client.api.Producer has a method publish that publishes a key along with the message.

Producer producer ....

producer.newMessage.key(“mykey”).value(“myvalue”).send()

In pulsar functions we use Context to publish which does not have a way to specify key. See current API documentation below, it is not possible through context.

Java Context class
<https://pulsar.apache.org/api/pulsar-functions/>
Python Context class
<http://pulsar.apache.org/api/python/functions/context.m.html>
----
2019-05-16 19:49:46 UTC - David Kjerrumgaard: very interesting
----
2019-05-16 19:50:03 UTC - Devin G. Bost: It gets more interesting. I'll post more of what I found.
grinning : David Kjerrumgaard
----
2019-05-16 19:50:59 UTC - Devin G. Bost: ```
OCPC-LM31977:bin dbost$ lsof -t -i :6650
29851
OCPC-LM31977:bin dbost$ lsof -t -i :8080
29851
OCPC-LM31977:bin dbost$ lsof -t -i :2181
29851
OCPC-LM31977:bin dbost$ ps -ef | grep 29851
  502 29851 29849   0 12:02PM ??         0:03.12 com.docker.vpnkit --ethernet fd:3 --port vpnkit.port.sock --port hyperkit://:62373/./vms/0 --diagnostics fd:4 --pcap fd:5 --vsock-path vms/0/connect --host-names host.docker.internal,docker.for.mac.host.internal,docker.for.mac.localhost --gateway-names gateway.docker.internal,docker.for.mac.gateway.internal,docker.for.mac.http.internal --vm-names docker-for-desktop --listen-backlog 32 --mtu 1500 --allowed-bind-addresses 0.0.0.0 --http /Users/dbost/Library/Group Containers/group.com.docker/http_proxy.json --dhcp /Users/dbost/Library/Group Containers/group.com.docker/dhcp.json --port-max-idle-time 300 --max-connections 2000 --gateway-ip 192.168.65.1 --host-ip 192.168.65.2 --lowest-ip 192.168.65.3 --highest-ip 192.168.65.254 --log-destination asl --udpv4-forwards 123:127.0.0.1:55622 --gc-compact-interval 1800
  502 29855 29851   0 12:02PM ??         0:00.00 (uname)
  502 46540  2358   0  1:50PM ttys000    0:00.00 grep 29851
```
----
2019-05-16 19:51:04 UTC - Devin G. Bost: Evil docker.
smile : David Kjerrumgaard
----
2019-05-16 19:52:14 UTC - David Kjerrumgaard: So Docker is holding on to those connections and blocking new clients....
----
2019-05-16 19:52:35 UTC - David Kjerrumgaard: What happens if you do a `docker rm $(docker ps -aq)` ?
----
2019-05-16 19:55:41 UTC - Devin G. Bost: `docker ps -aq`
gives me:

`Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?`

I needed to kill the process manually.
----
2019-05-16 19:56:18 UTC - Devin G. Bost: (Note that I killed the process before running `docker ps -aq`, but when I ran `docker ps --all` before killing the process, none of the docker containers were running.)
----
2019-05-16 19:58:34 UTC - David Kjerrumgaard: wow.
----
2019-05-16 19:58:45 UTC - Devin G. Bost: Yeah.
----
2019-05-16 19:58:54 UTC - David Kjerrumgaard: Let's try this again with a fresh Docker daemon and container  :smiley:
----
2019-05-16 19:59:29 UTC - Devin G. Bost: I'm afraid.
----
2019-05-16 19:59:32 UTC - Devin G. Bost: haha
----
2019-05-16 19:59:48 UTC - David Kjerrumgaard: fear leads to the dark side
----
2019-05-16 20:00:43 UTC - Devin G. Bost: haha good point.

I think I'm going to continue developing on my local Pulsar for a while because I have some pressure for a deadline. I'll revisit this afterwards and message you when I'm ready to continue.
----
2019-05-16 20:01:11 UTC - Devin G. Bost: If you don't mind.
----
2019-05-16 20:01:53 UTC - David Kjerrumgaard: sounds good.
----
2019-05-16 20:01:56 UTC - David Kjerrumgaard: good luck
----
2019-05-16 20:02:21 UTC - Devin G. Bost: Thanks.
----
2019-05-16 21:31:19 UTC - Sanjeev Kulkarni: 2.4 has a new method of publishing from context that should support this use case 
----
2019-05-16 23:12:26 UTC - Kevin Brown: @ Sanjeev do you know when 2.4 will be released?
----
2019-05-17 00:00:34 UTC - Sanjeev Kulkarni: @Matteo Merli might know the answer for that 
----