You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2019/07/03 09:11:04 UTC

Slack digest for #general - 2019-07-03

2019-07-02 13:50:37 UTC - dipali: now getting the  dashboard using proxy URL
----
2019-07-02 13:51:08 UTC - dipali: now trying to access using ELB
----
2019-07-02 13:51:16 UTC - dipali: its not hapenning yet
----
2019-07-02 14:05:26 UTC - Ryan Samo: Is there a way to increase the receiverQueueSize on a Pulsar function via the pulsar-admin cli or its yml config file?
----
2019-07-02 14:17:42 UTC - dipali: anybody tried to access the ELB where 8081 port is expose outside to access from docker?
----
2019-07-02 14:25:02 UTC - Alexandre DUVAL: Hi, I produce using string schema but it's json formatted (more simple because of the rust pulsar client used), there a way to consume message in pulsar functions directly as json, or there is a json library already included in pulsar that i can use from my pulsar function to convert string to java object then to json object ? Or maybe an example? :slightly_smiling_face:
----
2019-07-02 14:25:23 UTC - Alexandre DUVAL: (i use pulsar function in java)
----
2019-07-02 14:52:45 UTC - Matteo Merli: Yes, it can be specified in the `inputSpecs` section:

<https://github.com/apache/pulsar/blob/42c3bf94920f1d177a2403e06650500509f94aaa/pulsar-common/src/main/java/org/apache/pulsar/common/functions/FunctionConfig.java#L68>
----
2019-07-02 14:55:53 UTC - Matteo Merli: eg:

```
...

inputSpecs:
  my-topic:
    receiverQueueSize: 5000

```
----
2019-07-02 14:57:31 UTC - Ryan Samo: Nice! So from a yml file for the function, the inputSpecs would be topic for the string and consumerConfig for the object correct?
----
2019-07-02 14:57:37 UTC - Matteo Merli: If the topic schema is set to bytes, the function will then get bytes.

If you were publishing from Java, you could use the AUTO_PRODUCE schema type to pass a string JSON and have that automatically checked
----
2019-07-02 14:57:49 UTC - Ryan Samo: Never mind cool!
----
2019-07-02 14:58:27 UTC - Ryan Samo: Thanks @Matteo Merli !
----
2019-07-02 14:58:35 UTC - Matteo Merli: yes, the inputSpec is a map for the topics the function will consume from. The values of the map are the configs for the consumers
----
2019-07-02 15:03:40 UTC - Ryan Samo: Works perfect!
----
2019-07-02 15:17:57 UTC - Alexandre DUVAL: Yes, it's set to STRING by producer, so now from pulsar function process which json library can I use? I mean pulsar has already one?
----
2019-07-02 15:18:23 UTC - Alexandre DUVAL: I mean I will process string, but just to not add another library for parsing
----
2019-07-02 15:19:28 UTC - Matteo Merli: The JSON library is internally shaded though
----
2019-07-02 15:19:46 UTC - Matteo Merli: precisely to avoid conflicts with a user supplied library
----
2019-07-02 15:28:28 UTC - Alexandre DUVAL: okay :slightly_smiling_face:
----
2019-07-02 15:48:01 UTC - Devin G. Bost: @Jerry Peng Do you know who might have any ideas about this?
----
2019-07-02 17:29:25 UTC - Edison Chindrawaly: @Edison Chindrawaly has joined the channel
----
2019-07-02 18:41:38 UTC - GoX: @GoX has joined the channel
----
2019-07-02 20:41:23 UTC - Devin G. Bost: Maybe it's a secondary issue. It starts with an error like this:
```
14:39:40.200 [pulsar-web-69-6] ERROR o.a.p.f.w.rest.api.ComponentImpl - Invalid register Source request @ /osp/campaigns/campaign-kafka-source
java.lang.IllegalArgumentException: Source Package is not provided
	at org.apache.pulsar.functions.utils.SourceConfigUtils.validate(SourceConfigUtils.java:248)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.validateUpdateRequestParams(ComponentImpl.java:1474)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.validateUpdateRequestParams(ComponentImpl.java:1338)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.registerFunction(ComponentImpl.java:370)
	at org.apache.pulsar.broker.admin.impl.SourceBase.registerSource(SourceBase.java:80)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at . . . 
```
----
2019-07-02 20:57:33 UTC - Devin G. Bost: @Ali Ahmed, when I attempt to use the Pulsar in-process test (<https://github.com/streamlio/pulsar-embedded-tutorial/blob/master/src/test/java/org/apache/pulsar/PulsarEmbeddedTest.java>), it's saying that the PulsarService() constructor cannot be resolved. Has the constructor definition changed?
----
2019-07-02 21:05:11 UTC - David Kjerrumgaard: @Santiago Del Campo What do you mean by "bookie pods are re deployed" ? Does this mean you deploy a whole new set of bookie pods?
----
2019-07-02 21:30:59 UTC - Jerry Peng: How are you passing inn the source package?
----
2019-07-02 21:50:35 UTC - Ali Ahmed: @Devin G. Bost what version of pulsar are trying with ?
----
2019-07-02 21:53:55 UTC - Devin G. Bost: @Ali Ahmed 2.3.2
----
2019-07-02 22:01:09 UTC - Devin G. Bost: @Jerry Peng Here are the configs on the SourceConfig that we're passing into `pulsarAdmin.source().createSource()`:
----
2019-07-02 22:01:49 UTC - Devin G. Bost: 
----
2019-07-02 22:03:13 UTC - Devin G. Bost: The call looks like this:
`pulsarAdmin.source().createSource( this.getSourceConfig(),          this.getArtifactPathOrUrl());`

&gt; this.getArtifactPathOrUrl()
returns null in this case because it's using a built-in kafka source.
----
2019-07-02 22:04:09 UTC - Ali Ahmed: @Devin G. Bost I don’t see the issue you but I did see another one so I have the updated the code to use 2.3.2 I see it passing with
```gradle test```
----
2019-07-02 22:07:01 UTC - Devin G. Bost: @Ali Ahmed Thanks. I see what was happening now.
----
2019-07-02 22:07:22 UTC - Devin G. Bost: I copied over some of your code but didn't realize there were other dependencies in your project.
----
2019-07-02 22:20:51 UTC - Jerry Peng: do you have built-in connectors configured.  I have never tried running those tests with built-in connectors
----
2019-07-02 22:22:23 UTC - Jerry Peng: btw almost done implementing:
<https://github.com/apache/pulsar/issues/4277>
so you can use PulsarAdmin within a function
----
2019-07-02 22:47:31 UTC - Santiago Del Campo: That's right... for example, if i want to do some maintenance over a server that's being used as a bookeeper, i stop it and start it again.... and that produces a new redeployment of pods

After that, several error may occur.. the most common ones are related to problems to the reading of ledgers, invalid CookieExceptions and the ones previously mentioned about directories that are not "empty".

I've been tweaking the yamls trying to erase any kind of persistence in each new redeployment of pods... and expecting that the services recover by themselves, and for some errors, i've been successful, even if that means that my custom topics are erased in the process.. my main goal for now is trying to deploy with a configuration that can remain stable when some components are turned off

Another important point to note is that... even though in the configmaps of the broker deployments, the minimal quorums are set to one.. i cannot still have two bookie pods running in different machines, turn off one of them and expect to the information be replicated and keep consistency in the whole cluster, and thus, avoid crash :confused:
----
2019-07-02 23:02:33 UTC - David Kjerrumgaard: You need to modify the bookie’s configuration to make sure that it points to the right Zookeeper cluster before starting up each bookie. Since you are trying to add them to an existing BK cluster, you need to make sure that the ZK settings point to the existing ZK pod(s).
----
2019-07-02 23:13:03 UTC - Devin G. Bost: @Jerry Peng I have not yet configured built-in connectors for those tests. I could send a PR to the Pulsar source code if I can get it working.
Do you know what I'd need to do to add the built-in connectors?
----
2019-07-02 23:13:16 UTC - Devin G. Bost: BTW, you are awesome. Thank you for all of your help.
----
2019-07-02 23:19:17 UTC - Santiago Del Campo: Mmmm well, the bookies always know about the ubication of the ZKs because in the yamls several Enviromental variables are executed...

I was able to confirm this because i also tweaked the way that bookies are deployed for debugging purposes to make sure the pods are not changed just yet and throw some commands inside the Bookie pods....

Without problems the new bookie pods cant reach the ZKs servers :thinking_face:
----
2019-07-02 23:21:02 UTC - Santiago Del Campo: But still if i let the bookies continue with the deployment of the Bookeeper service, this errors that i've been talking about still occur.
----
2019-07-02 23:22:12 UTC - Devin G. Bost: When I try running that EndToEnd test on just a function, here's what I get:
----
2019-07-02 23:22:42 UTC - Devin G. Bost: `org.apache.pulsar.client.admin.PulsarAdminException: java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Request timeout to localhost/127.0.0.1:16356 after 60000 ms`
----
2019-07-02 23:22:46 UTC - Devin G. Bost: Here's the stacktrace:
----
2019-07-02 23:23:08 UTC - Devin G. Bost: 
----
2019-07-02 23:27:01 UTC - Jerry Peng: Not sure why.  Just a timeout exception.  Are there any other errors in the log?
----
2019-07-02 23:28:15 UTC - Jerry Peng: are you able to run the vanilla version of <https://github.com/apache/pulsar/blob/master/pulsar-broker/src/test/java/org/apache/pulsar/io/PulsarFunctionE2ETest.java> successfully?
----
2019-07-03 08:07:49 UTC - Kim Christian Gaarder: Use-case: Pulsar (with tiered storage) as a permanent stream storage. All messages in the stream have an externally managed identifier. I need to be able to position a consumer at a given starting point in the stream, where that starting point is known through the external identifier. Also, I would like to avoid having an external (non-pulsar) index of pulsar message-ids by external message ids.

Is there a good way to do this in pulsar?

I can probably use pulsar functions and use the state storage to build this index by storing the message-id keyed by the external-id? Is this scalable and reliable? If so, Is it possible to run another one-of function to read back the pulsar message-id from a given external-id? Or is there a way to interact with the state storage directly from java clients without having to run pulsar functions?

Also, I would like to be able to read or peek the very last message in a topic. What is the best way to do this? Can the admin-client be relied on for this, or is there a way to position a subscription at say last-message minus 1, so that the very last message is immediately returned to the consumer. Or is the only way to do this a full stream scan, i.e. to start at the beginning of the stream and just consuming them all until I get a timeout on the receive call signifying that no more messages are available? Or maybe I can use Pulsar functions and state-storage here as well to simply just store the last and second last message-ids, allowing easy start positioning of a consumer subscription?
----