You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2019/05/16 09:11:04 UTC

Slack digest for #general - 2019-05-16

2019-05-15 09:20:32 UTC - jia zhai: <https://pulsar.apache.org/docs/en/cookbooks-retention-expiry/#docsNav>
----
2019-05-15 09:20:43 UTC - jia zhai: @bhagesharora try to read this page
----
2019-05-15 09:21:05 UTC - jia zhai: It is controlled by retention policy
----
2019-05-15 11:31:57 UTC - Brian Doran: I am having an issue with sending Avro message. I've seen a lot of messages about allowing nulls but I am not sure what the outcome is.

Any comment appreciated... Thanks.

My error is as follows:
```
2019-05-15 11:22:23.168Z WARN  [Export-Pipeline-Queue-2] c.n.n.c.p.BlockingQueueBufferContainer - Exception encountered: org.apache.pulsar.client.api.SchemaSerializationException: java.lang.NullPointerException: in persistent_public_default_SUP null of persistent_public_default_SUP
org.apache.pulsar.client.api.SchemaSerializationException: java.lang.NullPointerException: in persistent_public_default_SUP null of persistent_public_default_SUP
	at org.apache.pulsar.client.impl.schema.AvroSchema.encode(AvroSchema.java:99) ~[pulsar-client-2.3.1.jar!/:2.3.1]
	at org.apache.pulsar.client.impl.TypedMessageBuilderImpl.value(TypedMessageBuilderImpl.java:80) ~[pulsar-client-2.3.1.jar!/:2.3.1]
	at com.pulsar.PulsarExportProducer.send(PulsarExportProducer.java:189) ~[export-core-0.2-SNAPSHOT.jar!/:0.2-SNAPSHOT]
	at com.pipeline.Pipeline.processRecord(Pipeline.java:245) ~[export-core-0.2-SNAPSHOT.jar!/:0.2-SNAPSHOT]
	at com.pipeline.Pipeline.input(Pipeline.java:132) ~[export-core-0.2-SNAPSHOT.jar!/:0.2-SNAPSHOT]
	at com.pipeline.PipelineQueue$1.onNewItem(PipelineQueue.java:96) ~[export-core-0.2-SNAPSHOT.jar!/:0.2-SNAPSHOT]
	at com.pipeline.PipelineQueue$1.onNewItem(PipelineQueue.java:1) ~[export-core-0.2-SNAPSHOT.jar!/:0.2-SNAPSHOT]
	at com.common.pipeline.AbstractPipelineItemHandler.processNewItem(AbstractPipelineItemHandler.java:11) ~[bda-common-api-1.0.7-SNAPSHOT.jar!/:na]
	at com.common.pipeline.BlockingQueueBufferContainer.run(BlockingQueueBufferContainer.java:188) ~[bda-common-api-1.0.7-SNAPSHOT.jar!/:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_181]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_181]
	at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
Caused by: java.lang.NullPointerException: in persistent_public_default_SUP null of persistent_public_default_SUP
	at org.apache.pulsar.shade.org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:161) ~[pulsar-client-2.3.1.jar!/:2.3.1]
	at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62) ~[pulsar-client-2.3.1.jar!/:2.3.1]
	at org.apache.pulsar.client.impl.schema.AvroSchema.encode(AvroSchema.java:95) ~[pulsar-client-2.3.1.jar!/:2.3.1]
	... 11 common frames omitted
Caused by: java.lang.NullPointerException: null
	at org.apache.pulsar.shade.org.apache.avro.reflect.ReflectData.getField(ReflectData.java:158) ~[pulsar-client-2.3.1.jar!/:2.3.1]
	at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:164) ~[pulsar-client-2.3.1.jar!/:2.3.1]
	at org.apache.pulsar.shade.org.apache.avro.specific.SpecificDatumWriter.writeField(SpecificDatumWriter.java:90) ~[pulsar-client-2.3.1.jar!/:2.3.1]
	at org.apache.pulsar.shade.org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:191) ~[pulsar-client-2.3.1.jar!/:2.3.1]
	at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156) ~[pulsar-client-2.3.1.jar!/:2.3.1]
	at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118) ~[pulsar-client-2.3.1.jar!/:2.3.1]
	at org.apache.pulsar.shade.org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75) ~[pulsar-client-2.3.1.jar!/:2.3.1]
	at org.apache.pulsar.shade.org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:159) ~[pulsar-client-2.3.1.jar!/:2.3.1]
```
I do not have any null fields in my GenericRecord and am wondering what else could cause this?
----
2019-05-15 11:34:55 UTC - Brian Doran: A sample schema is:

```

   "type":"record",
   "name":"persistent_public_default_SUP",
   "fields":[
      {
         "name":"cal_timestamp_time",
         "type":"string"
      },
      {
         "name":"d_name",
         "type":[
            "null",
            "string"
         ],
         "default":null
      },
      {
         "name":"dip_address",
         "type":[
            "null",
            "string"
         ],
         "default":null
      },
      {
         "name":"sip_address",
         "type":[
            "null",
            "string"
         ],
         "default":null
      },
      {
         "name":"sid",
         "type":[
            "null",
            "long"
         ],
         "default":null
      },
      {
         "name":"sub_m",
         "type":[
            "null",
            "long"
         ],
         "default":null
      },
      {
         "name":"sub_name",
         "type":[
            "null",
            "string"
         ],
         "default":null
      },
      {
         "name":"sub_seg_name",
         "type":[
            "null",
            "string"
         ],
         "default":null
      }
  ]
}
```
----
2019-05-15 11:38:33 UTC - Brian Doran: Example of send:
```
try {
  GenericRecord record =
  GenericRecordGenerator.generate(avroSchemaManager.getSchemaForTopic(topic), data);
producer.newMessage().value(record).sendAsync();
} catch (RecordAccessException e) {
  LOGGER.error("Error creating/sending Avro Record for topic {} ", topic);
}
```
----
2019-05-15 11:41:34 UTC - Brian Doran: Producer setup:

```
producer = 
  pulsarClient
    .newProducer(
      Schema.AVRO(SchemaDefinition.builder()
    	.withJsonDef(avroJSONSchema)
        .withAlwaysAllowNull(true)
        .build()))
    .compressionType(CompressionType.valueOf(compression.toUpperCase()))
    .producerName(clientId)
    .topic(topic)
    .create();
```
----
2019-05-15 12:04:33 UTC - Brian Doran: @Sijie Guo I know you have been working in the schema registry / avro area so any insight would be great. Thanks
----
2019-05-15 13:07:05 UTC - Christophe Bornet: Hi. Is there a way to use Pulsar Functions as a lib like we do with Kafka Streams (outside of a Pulsar instance) ? I think this way of doing is better since it lets you embed the stream processing in the framework of your choice (eg. Spring) and it integrates better with existing infrastructure for the app lifecycle (eg Kubernetes).
----
2019-05-15 13:25:42 UTC - Addison Higham: @Addison Higham has joined the channel
----
2019-05-15 15:32:05 UTC - Sanjeev Kulkarni: @Christophe Bornet please read about localrun, which aligns more with what you are looking at 
----
2019-05-15 15:42:55 UTC - Christophe Bornet: @Sanjeev Kulkarni localrun still uses a "Pulsar" context inside which you load a jar. What I would want is to use a "Spring-Boot" context where a pulsar function is just a library call. This way you keep the power of your favorite framework which is generally already setup for your micro-service infra.
----
2019-05-15 15:52:55 UTC - Sanjeev Kulkarni: That’s correct. Although a Runner interface and implementation based on local runner should be trivial to add 
----
2019-05-15 15:53:08 UTC - Christophe Bornet: Also debugging a jar is not developper-friendly and having to package each time is a slow process
----
2019-05-15 16:42:27 UTC - Sanjeev Kulkarni: Can you please file an issue so that I can take a look?
----
2019-05-15 16:42:35 UTC - Nathan Linebarger: yes! when you have a many-but-small values use case, it doesn't sound as crazy :slightly_smiling_face:
----
2019-05-15 17:27:45 UTC - Addison Higham: hey, a quick question, I can't find any tickets to track the work for the Reader API to be able to work on partitioned topics, I imagine that is likely just client changes, is that tracked anywhere or is that on any roadmaps?
----
2019-05-15 17:43:36 UTC - Matteo Merli: The issue is that a Reader allows for precise positioning on a specific MessageId. If it were to read from multiple partitions, it would have to know the message ids for each partitions.

You can still create 1 reader per each partition, using this method: `PulsarClient.getPartitionsForTopic()`
----
2019-05-15 17:43:37 UTC - Matteo Merli: <https://pulsar.apache.org/api/client/org/apache/pulsar/client/api/PulsarClient.html#getPartitionsForTopic-java.lang.String->
----
2019-05-15 17:43:49 UTC - Matteo Merli: And then create the readers directly on the partitions
----
2019-05-15 20:14:46 UTC - Roan Brasil: @Roan Brasil has joined the channel
----
2019-05-15 20:20:32 UTC - Roan Brasil: HI people, I would like to contribute with the project.
----
2019-05-15 20:20:37 UTC - Roan Brasil: any tips?
----
2019-05-15 20:20:38 UTC - Addison Higham: @Matteo Merli ah perfect! that is a nice work around
----
2019-05-15 20:23:07 UTC - Jagannath S Bilgi: @Jagannath S Bilgi has joined the channel
----
2019-05-15 20:29:19 UTC - Jagannath S Bilgi: Hi Team,
Trying to explore Pulsar connector option with Kafka. Would you please advise how to proceed on this?. Able to find few links using "Pulsar connector". However it uses Pulsar as stream.
Objective is to read content from kafka topic. Use Pulsar connector to process and ingest into cassandra
----
2019-05-15 20:42:54 UTC - Joe Francis: Curious ..why would you want to do this? Doesn't Kafka have a connector for Cassandra?
----
2019-05-15 21:05:53 UTC - Jagannath S Bilgi: Am new bee to stream processing. There may be gap in my understanding.
Objective is to read stream data and persist in cassandra.

Below are possible option.
1) stream data -&gt; Kafka connector -&gt; Kafka -&gt; Kafka connector -&gt; Persistence

2) stream data -&gt; Pulsar connector -&gt; Pulsar -&gt; Pulsar connector -&gt; Persistence

Query - Is it possible to inter change connectors and use?
----
2019-05-15 21:08:11 UTC - Jagannath S Bilgi: also please suggest current industry practice.
----
2019-05-15 21:10:54 UTC - Ali Ahmed: @Jagannath S Bilgi You can review
<https://pulsar.apache.org/docs/en/io-overview/>
----
2019-05-15 21:11:06 UTC - Ali Ahmed: <https://streaml.io/blog/introducing-pulsar-io>
----
2019-05-15 21:22:48 UTC - Thor Sigurjonsson: Quick question: when working with a single logical `instance` of multi-site Pulsar, where you can specify the "cluster" in the API, can you deploy with the admin tool to any cluster as long as you have permissions to do so (and specify the target cluster)? [just validating assumptions for some designs]
----
2019-05-16 01:59:22 UTC - Devin G. Bost: With the Java Admin API, when I create a Kafka source, I'm getting a 400 error with the message "Source Package is not provided" because it seems to be expecting me to provide a path or URL to a Jar file. I didn't think a Jar was required for the built-in Kafka source type.
Is there a workaround?
----
2019-05-16 01:59:44 UTC - Devin G. Bost: I suppose that I could point it at a meaningless file as a hack.
----
2019-05-16 02:05:06 UTC - Sanjeev Kulkarni: What does sources list return in 
----
2019-05-16 02:05:51 UTC - Devin G. Bost: What do you mean?
----
2019-05-16 02:06:00 UTC - Devin G. Bost: If I pass a dummy jar file here:
```
pulsarAdmin.source().createSource(
                            this.getSourceConfig(),
                            dummyJarPath);
```
then I get:

`org.apache.pulsar.client.admin.PulsarAdminException: /var/folders/zg/mg9pbwns01d3hnc5p2y0hxnr0000gp/T/pulsar-nar/functions1230099721191039489.tmp-unpacked/META-INF/MANIFEST.MF (No such file or directory)`

I feel like I've seen this before.
----
2019-05-16 02:06:11 UTC - Devin G. Bost: I don't remember how I got around it.
----
2019-05-16 02:07:50 UTC - Devin G. Bost: When I list sources for the tenant and namespace, I get null (an empty array).
Is that what's causing this?
----
2019-05-16 02:09:28 UTC - Sanjeev Kulkarni: Yes
----
2019-05-16 02:09:34 UTC - Devin G. Bost: Thanks.
----
2019-05-16 02:09:55 UTC - Sanjeev Kulkarni: There is a concept of built in sources and sinks
----
2019-05-16 02:10:15 UTC - Sanjeev Kulkarni: These are hard/nars that need to be resent on the brokers 
----
2019-05-16 02:12:09 UTC - Devin G. Bost: Wait, I wasn't thinking. I listed sources for the tenant and namespace that I was attempting to create my source under. So, I wouldn't expect to see anything there. What tenant and namespace do you want me to check?
----
2019-05-16 02:12:59 UTC - Devin G. Bost: I ran `./pulsar-admin source list --tenant public --namespace default` and got an empty array as well.
----
2019-05-16 02:16:13 UTC - Sanjeev Kulkarni: Built in sources are not instances of sources running. Thus they are not associated with any tenent or namespace 
----
2019-05-16 02:16:35 UTC - Sanjeev Kulkarni: They are preloaded jars / nars 
----
2019-05-16 02:17:07 UTC - Devin G. Bost: So just `./pulsar-admin source list` ?
----
2019-05-16 02:17:26 UTC - Devin G. Bost: I'm not sure how to list them.
----
2019-05-16 02:47:07 UTC - penny: @penny has joined the channel
----
2019-05-16 02:48:43 UTC - Sanjeev Kulkarni: Yes
----
2019-05-16 02:49:07 UTC - Sanjeev Kulkarni: That will list the prebuilt sources 
----
2019-05-16 03:19:11 UTC - Zhi Wang: @Zhi Wang has joined the channel
----
2019-05-16 04:36:15 UTC - Jason Gu: hi guys
----
2019-05-16 04:40:30 UTC - bhagesharora: @jia zhai We have read through retention expiry cookbook and tried it out. Documentation is mainly for Consumers and it is working as expected on Consumer client.
However we want to understand how much and how long data is available for Reader client and Pulsar SQL (on schema based topic), coz we noticed data is available for these clients even on messages which is been ack by consumer client.
It would be good to know if you can point to documentation where this information is available and also setting where we can change duration and expiry on reader and Pulsar SQL clients.
----
2019-05-16 04:42:55 UTC - Karthik Ramasamy: @Jason Gu - hi
----
2019-05-16 04:54:24 UTC - Jason Gu: i am trying to config s3 offload for few hours and couldn't make it work.
----
2019-05-16 04:57:00 UTC - Jason Gu: followed <https://pulsar.apache.org/docs/latest/cookbooks/tiered-storage/#WhenshouldIuseTieredStorage?-tw9oki>
----
2019-05-16 04:57:25 UTC - Jason Gu: and turns out offloaders don't come with standard gz file
----
2019-05-16 04:57:48 UTC - Jason Gu: got that working
----
2019-05-16 04:58:20 UTC - Jason Gu: Error in offload
null

Reason: Error offloading: org.apache.bookkeeper.mledger.ManagedLedgerException: java.util.concurrent.CompletionException: java.lang.UnsupportedOperationException
----
2019-05-16 04:58:42 UTC - Jason Gu: manually triggered offload and endup with above error
----
2019-05-16 05:05:58 UTC - Jason Gu: ah, i figured it out.....
+1 : jia zhai, Karthik Ramasamy
----
2019-05-16 05:17:45 UTC - jia zhai: retention is for topic
----
2019-05-16 05:18:07 UTC - jia zhai: reader is a wrapper of consumer
----
2019-05-16 07:23:16 UTC - eric.olympe: @eric.olympe has joined the channel
----
2019-05-16 07:26:21 UTC - Shivji Kumar Jha: I am trying to set up offloaders (aws-S3) . Instead of using the access key we prefer to use AWS roles. Is this supported?

Here is what I did:
1. Attach a role to EC2 which allows it to write to the S3 bucket that is configured for offload. I have verified that my EC2 can write to S3 bucket.
2. I started a manual offload for a topic and it said the offload has started
3. On checking the status of offload, there is an error which seems to still look for asw access key ID.
```
$ pulsar-admin topics offload --size-threshold 2G public/default/perf-test
Offload triggered for <persistent://public/default/perf-test> for messages before 17:0:-1
$ pulsar-admin topics offload-status public/default/perf-test
Error in offload
null

Reason: Error offloading: org.apache.bookkeeper.mledger.ManagedLedgerException: java.util.concurrent.CompletionException: org.jclouds.rest.AuthorizationException: The AWS Access Key Id you provided does not exist in our records.
$
```
----
2019-05-16 08:07:21 UTC - eric.olympe: Hi, I am new to Pulsar. I would like to know how it is possible to implement a fanout pattern ? I have read the subscription documentation part, and I am not sure how to do. Thanks by advance.
----
2019-05-16 08:20:07 UTC - Christophe Bornet: Yes, sure
----