You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2019/08/07 09:11:08 UTC

Slack digest for #general - 2019-08-07

2019-08-06 10:20:52 UTC - Richard Sherman: This was when creating a new Consumer via a pulsar-proxy. Restarting the proxy cleared the problem.
----
2019-08-06 13:30:39 UTC - Chris Bartholomew: @Addison Higham I built a Docker image for the broker that is 2.3.1 plus that PR. I have been using it for a while now. If you are using k8s, it would be easy to try it to see if it fixes your Datadog issues: ```broker:
    repository: kafkaesqueio/pulsar-all
    pullPolicy: IfNotPresent
    tag: 2.3.1_kesque_1
```
----
2019-08-06 13:36:54 UTC - Alexandre DUVAL: Sure, but it is not implemented in the pulsar-admin functions command, right?
----
2019-08-06 18:15:56 UTC - Zhenhao Li: hi there, has anyone used Pulsar SQL?
----
2019-08-06 18:16:44 UTC - Jerry Peng: Yes
----
2019-08-06 18:16:50 UTC - Zhenhao Li: I read this page <https://pulsar.apache.org/docs/en/sql-overview/> and have a few questions.
are Presto workers part of Pulsar?
----
2019-08-06 18:17:26 UTC - Zhenhao Li: and do you have performance benchmarking results?
----
2019-08-06 18:18:17 UTC - Jerry Peng: @Zhenhao Li Presto is packaged with pulsar for ease of use thus you can use the ./bin/pulsar to start a presto cluster.  Alternatively, you can also just deploy your own presto cluster using a presto distribution
----
2019-08-06 18:19:11 UTC - Zhenhao Li: I see. Is it possible to use Spark to query data in Pulsar via a Spark connector?
----
2019-08-06 18:19:38 UTC - Zhenhao Li: if so what is the difference from Presto?
----
2019-08-06 18:20:59 UTC - Jerry Peng: The Presto connector reads data directly from the storage layer to maximize throughput, while the spark connector, I believe reads data with consumer/reader API which is optimized for latency.
----
2019-08-06 18:22:01 UTC - Jerry Peng: While we can do a similar implementation for the spark connector to read directly from the storage layer.  We have yet to do that.
----
2019-08-06 18:22:25 UTC - Zhenhao Li: ok. thanks!
----
2019-08-06 18:22:55 UTC - Zhenhao Li: I actually have a real use case in mind.
how fast can it be to replay all historical events for a given key?
----
2019-08-06 18:23:29 UTC - Zhenhao Li: I am thinking of using Pulsar as the persistence layer for event sourcing
----
2019-08-06 18:25:17 UTC - Zhenhao Li: in Kafka there will be no bound because each partition can have growing number of keys
----
2019-08-06 18:25:43 UTC - Jerry Peng: Because Pulsar SQL (presto connector) reads data directly from the storage layer, it can read the data within topics in parallel from multiple replicas.  The more replicas you have of the data, the higher the potential read throughput
----
2019-08-06 18:25:56 UTC - Zhenhao Li: I wonder if Pulsar can provide a time bound on queries per key
----
2019-08-06 18:27:29 UTC - Jerry Peng: A hard time bound cannot be guaranteed but throughput will be much higher than query kafka using presto since data is read out from a consumer and limited by number of partitions in that case
----
2019-08-06 18:28:40 UTC - Zhenhao Li: I see
----
2019-08-06 18:29:50 UTC - Jerry Peng: Messages of a key will only be in a particular partition so if you know which partition the messages for a key resides, you can just query for that partition
----
2019-08-06 18:30:03 UTC - Jerry Peng: to minimize the amount of data you will need to filter
----
2019-08-06 18:30:46 UTC - Jerry Peng: There is not HARD time bound but the soft time bound is the query time is proportional to the number of messages in the topic
----
2019-08-06 18:31:39 UTC - Jerry Peng: Pulsar, like any big data system, is not a hard real-time system and thus does not have hard guarantee on completion time
----
2019-08-06 18:31:42 UTC - Zhenhao Li: it makes sense to use a hashing function to map keys to partitions. so both producers and query makers know the map
----
2019-08-06 18:33:46 UTC - Jerry Peng: Its also interesting to note that read throughput is not bounded on the number of partitions in Pulsar SQL. Users can simply increase the number of write replicas
----
2019-08-06 18:33:50 UTC - Zhenhao Li: but it would great if there are some benchmarking results, say against HBASE, Cassandra, etc
----
2019-08-06 18:35:00 UTC - Jerry Peng: Though Pulsar and Pulsar SQL has different use cases as compared to HBASE, Cassandra, which are nosql kv stores
----
2019-08-06 18:35:08 UTC - Zhenhao Li: sorry, do you mean read replicas?
----
2019-08-06 18:35:33 UTC - Jerry Peng: Pulsar SQL also works with tiered storage and is able to query data offload to object stores like S3
----
2019-08-06 18:40:15 UTC - Jerry Peng: In pulsar terminology, when message is published there is a write quorum and and ack quorum.  For example if my cluster is setup with a write quorum of 3 and an ack quorum of 2, this means for every message written to the storage layer nodes,  2 out of 3 acks needs to be received from the storage layer nodes before the message is deemed successfully published.  However, data can be read from all 3 replicas.  So when I say write replicas, I mean the write quorum
----
2019-08-06 18:43:58 UTC - Zhenhao Li: I know it is not a fair question. Sorry I was thinking in my context. The "problem" I face is basically fine-grained recovery in a event-sourcing or stream system.
Steam processing frameworks like Flink has only stream level replay on failures, which means the whole topic will be replayed from last checkpoint.
Akka Actors offers actor level recovery, but has to use something like Cassandra for the persistent layer
----
2019-08-06 18:45:40 UTC - Zhenhao Li: I see. so the data replicating happens in the background asynchronous?
----
2019-08-06 18:46:33 UTC - Zhenhao Li: to increase read throughput, I can set write quorum of 10 and ack quorum of 2?
----
2019-08-06 18:48:25 UTC - Zhenhao Li: here is a crazier idea. is it possible to have one partition per key and dynamically grow partitions? surely it is not possible with Kafka
----
2019-08-06 18:52:20 UTC - Ambud Sharma: thanks @David Kjerrumgaard
----
2019-08-06 20:03:07 UTC - Vahid Hashemian: @Vahid Hashemian has joined the channel
----
2019-08-06 20:09:40 UTC - Yu Yang: @Yu Yang has joined the channel
----
2019-08-06 21:22:08 UTC - Jerry Peng: replicating happens in the background
----
2019-08-06 21:27:33 UTC - Jerry Peng: &gt; to increase read throughput, I can set write quorum of 10 and ack quorum of 2?

correct
----
2019-08-06 21:27:51 UTC - Jerry Peng: data can also be stripped across multiple nodes
----
2019-08-06 21:28:06 UTC - Jerry Peng: in that configuration you wouldn’t even need to increase the write quorum
----
2019-08-06 21:29:07 UTC - Jerry Peng: Depends on how many keys.  Pulsar can have millions of topics
----
2019-08-06 21:29:15 UTC - Jerry Peng: partitions can be increased on the fly
----
2019-08-06 21:29:44 UTC - Jerry Peng: &gt; is it possible to have one partition per key.
Its possible but not a typical architecture
----
2019-08-06 22:07:14 UTC - Addison Higham: hrm... so the `/metrics` endpoint in the broker is not authenticated so any client can hit it, but in the proxy, it is authenticated, is that intended?
----
2019-08-06 23:30:32 UTC - Devin G. Bost: For a Pulsar function (like `public String process(String input, Context context) throws Exception { . . . `)
is there a way to skip a message?
There are certain cases for one of our functions that we want to ignore (and not send downstream).
----
2019-08-06 23:33:24 UTC - Victor Siu: I usually just return null from my function but I’m not sure if that’s the recommended way.
----
2019-08-06 23:42:10 UTC - Victor Li: Is there a document describing all topic level metrics in details? things like rate_in/out, throughput_in/out? I am not sure what they mean specifically. Thanks!
----
2019-08-06 23:45:48 UTC - Devin G. Bost: haha thanks Victor.
----
2019-08-07 02:36:24 UTC - Sijie Guo: for pulsar + spark integration, check out this spark connector: <https://github.com/streamnative/pulsar>

it support spark structured streaming and spark SQL.

@yijie also wrote a great blog post about it: <https://medium.com/streamnative/apache-pulsar-as-one-storage-455222c59017>
----
2019-08-07 02:37:56 UTC - Sijie Guo: I am working on a page for it. a pull request will be out today
----
2019-08-07 02:39:01 UTC - Sijie Guo: I don’t think it is intended. I guess the authentication was unfortunately applied because the http endpoint in proxy are just forwarding requests to broker.
----
2019-08-07 02:39:43 UTC - Sijie Guo: @Chris Bartholomew can you rebase that pull request? I will pick up the review for it.
----
2019-08-07 02:47:40 UTC - yijie: <https://github.com/streamnative/pulsar-spark>
----
2019-08-07 03:08:24 UTC - Sijie Guo: (sorry I pasted the wrong link)
----
2019-08-07 03:22:41 UTC - Rui Fu: hi all, how can we set function runtime’s python version under localrun mode?
----
2019-08-07 03:23:52 UTC - Rui Fu: such as we have “python” -&gt; python 2.7 and “python3" -&gt; python 3.x,  we want python3 to run the function, but pulsar with localrun mode always use python
----
2019-08-07 03:27:07 UTC - Ali Ahmed: @Rui Fu you can use venv
```mkdir test
cd test
python3 -m venv env
source env/bin/activate
python -V
```
----
2019-08-07 03:27:49 UTC - Rui Fu: @Ali Ahmed i will try, thanks :wink:
----
2019-08-07 03:31:34 UTC - Ali Ahmed: you will also need ```pip install pulsar-client``` before you run the function in local mode
----
2019-08-07 06:30:16 UTC - divyasree: @Sijie Guo I am trying to enable authentication using TLS, in two cluster (each cluster is in different DC) and proxy , goe-replication is enabled this setup.
I am trying TLS, to make authenticationa and authorisation work in geo replication.
----
2019-08-07 06:30:35 UTC - divyasree: I have configured the instance as below
----
2019-08-07 06:31:05 UTC - divyasree: ``` Broker Configuration

brokerServicePortTls=6651

webServicePortTls=8443

tlsEnabled=true

tlsCertificateFilePath=/opt/apache-pulsar-2.3.2/broker.cert.pem

tlsKeyFilePath=/opt/apache-pulsar-2.3.2/broker.key-pk8.pem

tlsTrustCertsFilePath=/opt/apache-pulsar-2.3.2/ca.cert.pem

tlsProtocols=TLSv1.2,TLSv1.1

tlsCiphers=TLS_DH_RSA_WITH_AES_256_GCM_SHA384,TLS_DH_RSA_WITH_AES_256_CBC_SHA

authenticationEnabled=true

authenticationProviders=org.apache.pulsar.broker.authentication.AuthenticationProviderTls

brokerClientTlsEnabled=true

brokerClientAuthenticationPlugin=org.apache.pulsar.client.impl.auth.AuthenticationTls

brokerClientAuthenticationParameters=tlsCertFile:/opt/apache-pulsar-2.3.2/broker.cert.pem,tlsKeyFile:/opt/apache-pulsar-2.3.2/broker.key-pk8.pem

brokerClientTrustCertsFilePath=/opt/apache-pulsar-2.3.2/ca.cert.pem



Proxy Configuration

brokerServiceURLTLS=<pulsar+ssl://pulsar.ttc.ole.prd.target.com:6651>

brokerWebServiceURLTLS=<https://pulsar.ttc.ole.prd.target.com:8443>



authenticationEnabled=true

authenticationProviders=org.apache.pulsar.broker.authentication.AuthenticationProviderTls

brokerClientAuthenticationPlugin=org.apache.pulsar.client.impl.auth.AuthenticationTls

brokerClientAuthenticationParameters=tlsCertFile:/opt/apache-pulsar-2.3.2/broker.cert.pem,tlsKeyFile:/opt/apache-pulsar-2.3.2/broker.key-pk8.pem

brokerClientTrustCertsFilePath=/opt/apache-pulsar-2.3.2/ca.cert.pem

tlsEnabledWithBroker=true



Client Configuration

webServiceUrl=<https://localhost:8443/>

brokerServiceUrl=<pulsar+ssl://localhost:6651/>

authPlugin=org.apache.pulsar.client.impl.auth.AuthenticationTls

authParams=tlsCertFile:/opt/apache-pulsar-2.3.2/broker.cert.pem,tlsKeyFile:/opt/apache-pulsar-2.3.2/broker.key-pk8.pem

tlsTrustCertsFilePath=/opt/apache-pulsar-2.3.2/ca.cert.pem

tlsAllowInsecureConnection=false. ```
----
2019-08-07 06:31:30 UTC - divyasree: broker is not getting started, with the above changes..
----
2019-08-07 06:31:46 UTC - divyasree: i dont see any error logs in the log file..
----
2019-08-07 06:32:17 UTC - divyasree: broker is getting stopped after few mins, i guess.
----
2019-08-07 06:32:25 UTC - divyasree: dnt know what i am missing here
----
2019-08-07 06:32:33 UTC - divyasree: Can you help me out on this?
----
2019-08-07 06:38:59 UTC - Sijie Guo: how did you start pulsar broker. in foreground or background (via pulsar-daemon)?
----
2019-08-07 06:45:27 UTC - divyasree: via pulsar-daemon
----
2019-08-07 06:46:47 UTC - divyasree: I have a doubt on broker.cert.pem file... When creating this file i have given common name as wildcard "pulsar-prd-*.<http://target.com|target.com>"
----
2019-08-07 06:47:47 UTC - divyasree: i have enabled proxy, which have dns name as "pulsar.ttc........."
----
2019-08-07 06:48:12 UTC - Sijie Guo: change conf/log4j.yaml to set immediateFlush to true. and try to start broker again and you will see the logs

```
    RollingFile:
      name: RollingFile
      fileName: "${sys:pulsar.log.dir}/${sys:pulsar.log.file}"
      filePattern: "${sys:pulsar.log.dir}/${sys:pulsar.log.file}-%d{MM-dd-yyyy}-%i.log.gz"
      immediateFlush: false
```
----
2019-08-07 06:48:46 UTC - divyasree: so do i need to create separage cert.pem file and configure broker with broker.cert.pem and  configure proxy with proxy.cert.pem?
----
2019-08-07 06:52:44 UTC - divyasree: i am getting error as Caused by: java.lang.IllegalArgumentException: unsupported cipher suite: TLS_DH_RSA_WITH_AES_256_GCM_SHA384(DH-RSA-AES256-GCM-SHA384)
----
2019-08-07 06:53:03 UTC - divyasree: meaning the cipher mentioned in the conf file is not supported right?
----
2019-08-07 06:53:28 UTC - divyasree: hope if i am giving blank, will support all types of ciphers
----
2019-08-07 07:05:51 UTC - Sijie Guo: correct you can remove the supported cipher and leave it blank
----
2019-08-07 07:12:20 UTC - divyasree: ok i works, making the broker to run..
----
2019-08-07 07:15:18 UTC - divyasree: but when producing messages through client, i am getting the below error ``` Search domain query failed. Original hostname: '<http://pulsar-prd-bk1.target.com|pulsar-prd-bk1.target.com>' failed to resolve '<http://pulsar-prd-bk1.target.com.stores.target.com|pulsar-prd-bk1.target.com.stores.target.com>' after 7 queries 
	at org.apache.pulsar.shade.io.netty.resolver.dns.DnsResolveContext.finishResolve(DnsResolveContext.java:848) ```
----
2019-08-07 07:16:06 UTC - divyasree: "<http://stores.target.com|stores.target.com>" this is getting appended at the end of the hostname
----
2019-08-07 07:16:11 UTC - divyasree: what does it mean
----
2019-08-07 07:18:14 UTC - Kim Christian Gaarder: Q: In the Java client, will a call like “consumer.receive(0, TimeUnit.SECONDS)” always return a message if one is available in the topic, or will it time out if it cannot deliver the message immediately? i.e. I want to know if I can rely on this method to know for sure whether I have found the last message in the topic.
----
2019-08-07 07:18:31 UTC - Sijie Guo: I think it is attempts to query your DNS server. and your DNS server returns the  name ‘<http://pulsar-prd-bk1.target.com.stores.target.com|pulsar-prd-bk1.target.com.stores.target.com>’
----
2019-08-07 07:18:46 UTC - Sijie Guo: how did you configure the DNS and LB?
----
2019-08-07 07:19:06 UTC - divyasree: we are having VM's in openstack
----
2019-08-07 07:19:19 UTC - divyasree: so i configure via openstack
----
2019-08-07 07:19:37 UTC - divyasree: add the brokers to the pool of the LB listener
----
2019-08-07 07:20:10 UTC - divyasree: but same configuration works with token authentication for me
----
2019-08-07 07:20:36 UTC - divyasree: i have configured TCP for token authentication with port 6650
----
2019-08-07 07:20:51 UTC - divyasree: do i need to change it to HTTPS with 6651?
----
2019-08-07 07:25:10 UTC - Sijie Guo: it returns if there is one message already prefetched in the consumer queue, it will return the message. However it doesn’t mean if it is at the end of the topic.

if you want to do so, use hasMessagesAvailable.
----
2019-08-07 07:31:07 UTC - Kim Christian Gaarder: There is no such method hasMessagesAvailable
----
2019-08-07 07:31:19 UTC - Kim Christian Gaarder: Not in the “Consumer” interface anyway
----
2019-08-07 07:34:41 UTC - Sijie Guo: I see. oh it is only available in `Reader` interface for now.  But it is available in Consumer though. because the fundamentals are the same. You can consider upcast to ConsumerImpl for now, we can look into expose it to consumer as well.
----
2019-08-07 07:34:58 UTC - Kim Christian Gaarder: ok, thanks :slightly_smiling_face:
----
2019-08-07 07:35:04 UTC - Kim Christian Gaarder: I will attempt the upcast
----
2019-08-07 07:36:19 UTC - Kim Christian Gaarder: My problem right now is that the only reliable way I have to determine whether I have reached the last message is by producing a marker message and consuming until I get that message, which is obviously not ideal as I have to write to the topic in order to reliably know whether I have read all messages.
----
2019-08-07 07:36:59 UTC - Sijie Guo: @divyasree hmm I am not sure whether it is related to the common name or not.
----
2019-08-07 07:37:07 UTC - Kim Christian Gaarder: but hasMessagesAvailable could potentially solve that problem!
+1 : Sijie Guo
----
2019-08-07 07:37:42 UTC - Sijie Guo: did you try connect with ip?
----
2019-08-07 07:40:05 UTC - divyasree: ok.. any suggestion on this part?
``` I have a doubt on broker.cert.pem file... When creating this file i have given common name as wildcard "pulsar-prd-*.<http://target.com|target.com>"

 so do i need to create separage cert.pem file and configure broker with broker.cert.pem and  configure proxy with proxy.cert.pem? 

And also i saw, in documentation mentioning the cert with "my-roles.cert.pem" who can we define the role in the pem

i am confused in this part ```
----
2019-08-07 07:41:09 UTC - divyasree: trying with IP, or the proxy name didnt work.. I guess, since i have given wildcard host name  with matches the broker hostname alone.. it didnt work
----
2019-08-07 07:41:44 UTC - divyasree: that y i am asking, do we need to generate separate cert for proxy
----
2019-08-07 07:42:07 UTC - divyasree: if so, which on should i use where? And wat to use in client?
----
2019-08-07 07:42:48 UTC - divyasree: And also, which key should i use to create authorisation token, and how to specify role in it..
----
2019-08-07 07:44:43 UTC - divyasree: Can you provide detailed docs on tls with proxy
----
2019-08-07 07:45:07 UTC - Sijie Guo: since you are configuring proxy to connect to brokers with tls cert, I would suggest you configure proxy certs:

- a client/server cert pair: one for configuring broker and for the `brokerClient` part in proxy.
- a client/server cert pair: one for configuring proxy and the client. because your client is talking to proxy
----
2019-08-07 08:12:56 UTC - Zhenhao Li: thanks!
----
2019-08-07 08:16:33 UTC - Zhenhao Li: what do you mean by "stripped across multiple nodes"?
----
2019-08-07 08:16:58 UTC - Zhenhao Li: isn't the write quorum across multiple nodes?
----
2019-08-07 08:32:35 UTC - Kim Christian Gaarder: Q: Assuming we don’t know any pulsar message-ids externally, is there any way to ask Pulsar for the content of the last message currently published on a topic without scanning through all message from the beginning?

Even if I can get the message-id of the last message in the topic, I have not found a way to read that message without scanning from the start (or by resuming from a previous subscription position). This is why the hasMessageAvailable method was helpful, but it still does not provide a good way to read the last message without having to first scan through all messages or to use a separate subscription to track this.
----
2019-08-07 08:34:28 UTC - boldrick: @boldrick has joined the channel
----
2019-08-07 08:34:48 UTC - Sijie Guo: I think there was a similar request before to return the last message (id). The mechanism is actually ready and being used for `hasMessageAvailable). @jia zhai was creating a issue for this new api request.
----
2019-08-07 08:35:12 UTC - Kim Christian Gaarder: As I understand it, if we knew the message-id of the message before the last message, then we could seek to that and then we would be reading the last message next.
----
2019-08-07 08:35:37 UTC - Kim Christian Gaarder: but seek will not allow you to read the message of the message-id that you seek to, only the next message.
----
2019-08-07 08:36:13 UTC - Matteo Merli: Take a look at `ReaderBuilder.startMessageIdInclusive(true)`
----
2019-08-07 08:36:44 UTC - Matteo Merli: creating a reader starting on `MessageId.Latest` inclusive
----
2019-08-07 08:36:57 UTC - Kim Christian Gaarder: ahh … that will solve it! thank you!
----
2019-08-07 08:37:55 UTC - Kim Christian Gaarder: @Matteo Merli can that only be done when creating a new reader, or is there a way to re-use an existing reader to repeat this operation ?
----
2019-08-07 08:39:02 UTC - Kim Christian Gaarder: I suppose in my use-case it’s good enough to only do this when creating a new reader, as I only need this for continuation after failure (or intentional stoppage) which is rare anyway.
----
2019-08-07 08:39:28 UTC - Matteo Merli: Yes, it’s done on the reader creation only
----
2019-08-07 08:39:51 UTC - boldrick: hi guys, I wanted to ask you if there is an easy way to build local docker image of pulsar. Found the Dockerfile that builds the C++ client but wanted to build pulsar-all and I thought it is the pulsar/docker/pulsar/Dockerfile but it wants to move all from /apache-pulsar but there isn't such directory in the project
----
2019-08-07 08:42:36 UTC - Kim Christian Gaarder: @Matteo Merli Actually the reader interface javadoc says: “This configuration option also applies for any cursor reset operation like Reader.seek(MessageId).” which means this will also apply to seek :slightly_smiling_face:
----
2019-08-07 08:46:49 UTC - Matteo Merli: Ok, then it should work :slightly_smiling_face:
----
2019-08-07 08:52:10 UTC - Kim Christian Gaarder: This is very good news, I felt like there was some functionality missing here in order to build some basic applications on top of Pulsar, this really does solve all of that!
+1 : Sijie Guo, Matteo Merli
----
2019-08-07 09:06:04 UTC - Sijie Guo: <https://github.com/apache/pulsar/pull/4910>
----
2019-08-07 09:06:07 UTC - Sijie Guo: here is the pull request
----
2019-08-07 09:08:18 UTC - Matteo Merli: @boldrick You can build the docker image (and everything that’s required) from scratch with `mvn install -Pdocker -DskipTests`
----