You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Apache Pulsar Slack <ap...@gmail.com> on 2019/12/16 09:11:03 UTC

Slack digest for #general - 2019-12-16

2019-12-15 11:12:50 UTC - Costi Ciudatu: @Costi Ciudatu has joined the channel
----
2019-12-15 12:14:12 UTC - Viji: We created a source connector using PulsarSourceBuilder to consume data from Pulsar partitioned topic in our Flink Application. The parallelism of the job in Flink is different from the number of partitions set for the pulsar topic. In the case of kafka, if the number of flink source instances are greater than the number of partitions, some of the source instances do not receive data. Similarly if the number of source instances are less, then more than one partition is assigned to the same source instance. It is not mandatory to set parallelism of flink source operator same as the number of partitions. Can you confirm if the behavior is same in Pulsar also? Is it mandatory to set parallelism of pulsar flink source operator same as number of partitions ?
----
2019-12-15 12:36:09 UTC - Viji: We are using Event time based windowing functions in Flink. Partitioned Pulsar topic is the source but we do not use pulsar event time as the timestamp field is decided by the processing function in flink. The event time is extracted from the message by attaching a TimeStampExtractor to the stream which extracts time from the message. We noticed the window is not triggered even though watermark is crossed. Attached here is the sample code. As many of our apps use event time based tumbling and sliding windows, it is important for us to get this functionality working.
----
2019-12-15 13:04:04 UTC - Subbu Ramanathan: @Subbu Ramanathan has joined the channel
----
2019-12-15 16:56:31 UTC - Sijie Guo: No. The default pulsar flink connector has the similar behavior as kafka one. you can configure the subscription mode to go beyond the number of partitions (means you can have more parallelism than the number of partitions).

reference: <http://pulsar.apache.org/docs/en/concepts-messaging/#subscription-modes>
----
2019-12-15 16:58:24 UTC - Sijie Guo: A better way for helping this is a github issue.
----
2019-12-15 16:59:13 UTC - Sijie Guo: Can you create a github issue for this? Hence we are able to triage and help with it. This would also benefit all the other people in the community as well.
----
2019-12-15 21:04:31 UTC - Jared Mackey: How do sequence IDs work? I am building a client library and I’m running into an issue where every time I start a producer the server always gives me back -1 as the last sequence ID. I am sending the sequence ID on the messages every time and it increments correctly while the producer is alive, but the sequence ID does not survive a restart of the client code. 
+1 : Ryan
----
2019-12-15 21:51:46 UTC - Ryan: Is JBOD recommended?
----
2019-12-15 23:18:02 UTC - Zoran: @Zoran has joined the channel
----
2019-12-16 01:15:42 UTC - xue: In pulsar version 2.4.1,sql module,If there are duplicate messages in pulsar topic, querying topic through Presto will throw exceptions.
----
2019-12-16 01:53:21 UTC - Penghui Li: @xue Can you provide some reproduce steps?
----
2019-12-16 01:56:08 UTC - xue: Use pulsar function to publish the application. In the process of processing, pulsar sent repeated messages, and the broker did not enable the de duplication configuration.
----
2019-12-16 01:56:14 UTC - xue: Use pulsar function to publish the application. In the process of processing, pulsar sent repeated messages, and the broker did not enable the de duplication configuration.
----
2019-12-16 02:03:06 UTC - xue: Duplicate messages appear during production
----
2019-12-16 02:09:55 UTC - Penghui Li: I have test publish some messages with same key and then query data by Pulsar SQL
```presto&gt; select * from pulsar."public/default"."test";
 __value__ | __partition__ | __event_time__ |    __publish_time__     | __message_id__ | __sequence_id__ | __producer_name__ | __key__ | __properties__
-----------+---------------+----------------+-------------------------+----------------+-----------------+-------------------+---------+----------------
 msg-5     |            -1 | NULL           | 2019-12-16 10:07:17.338 | (29,5,0)       |               5 | standalone-1-3    | key     | {}
 msg-6     |            -1 | NULL           | 2019-12-16 10:07:17.341 | (29,6,0)       |               6 | standalone-1-3    | key     | {}
 msg-7     |            -1 | NULL           | 2019-12-16 10:07:17.344 | (29,7,0)       |               7 | standalone-1-3    | key     | {}
 msg-8     |            -1 | NULL           | 2019-12-16 10:07:17.347 | (29,8,0)       |               8 | standalone-1-3    | key     | {}
 msg-0     |            -1 | NULL           | 2019-12-16 10:07:17.252 | (29,0,0)       |               0 | standalone-1-3    | key     | {}
 msg-1     |            -1 | NULL           | 2019-12-16 10:07:17.326 | (29,1,0)       |               1 | standalone-1-3    | key     | {}
 msg-2     |            -1 | NULL           | 2019-12-16 10:07:17.329 | (29,2,0)       |               2 | standalone-1-3    | key     | {}
 msg-3     |            -1 | NULL           | 2019-12-16 10:07:17.332 | (29,3,0)       |               3 | standalone-1-3    | key     | {}
 msg-4     |            -1 | NULL           | 2019-12-16 10:07:17.335 | (29,4,0)       |               4 | standalone-1-3    | key     | {}```
----
2019-12-16 02:11:07 UTC - Penghui Li: And i found the stack trace shows the Duplicate key is a topic name, are you using topic name as partition key of messages?
----
2019-12-16 02:12:20 UTC - xue: Topic has 3 partitions
----
2019-12-16 02:22:45 UTC - Penghui Li: Ok, can you reproduce it? it’s better to add an issue at github and discribe some details there. I create a partitioned topic with 3 partitions, but can’t reproduce it.

BTW, i think you can list your topics and check if the topic name is duplicated.
----
2019-12-16 02:24:26 UTC - xue: OK, I'll check later
----
2019-12-16 02:26:30 UTC - xue: Chinese?
beers : jia zhai
----
2019-12-16 02:27:44 UTC - jia zhai: you could discuss in the # china channel with Chinese
----
2019-12-16 03:59:44 UTC - Viji: Thanks @Sijie Guo. Our App uses Failover subscription mode as the order is guaranteed only with Failover mode. Interestingly, the window in Flink is triggered when the parallelism is same as the number of partitions, so wanted a confirmation on the behavior.
----
2019-12-16 04:01:13 UTC - Viji: Sure, we will create a github issue as this functionality is needed to migrate from Kafka to Pulsar.
----
2019-12-16 04:31:10 UTC - Sijie Guo: ok
----
2019-12-16 05:22:41 UTC - Abhishek Kumar: @Abhishek Kumar has joined the channel
----
2019-12-16 05:26:10 UTC - Abhishek Kumar: I am not able to list pulsar topics using pulsar-admin unless I create a Producer or Consumer for the same topic . Does anyone know why is this happening?
----
2019-12-16 05:30:00 UTC - Sijie Guo: what command did you run?
----
2019-12-16 06:00:40 UTC - Abhishek Kumar: • Command to create topic:  ./pulsar-admin topics create-partitioned-topic -p 1 <persistent://public/default/test-topic>
• Command to list topic: ./pulsar-admin topics list public/default
----
2019-12-16 06:05:31 UTC - Sijie Guo: oh for partitioned topic, you can use `pulsar-admin topics list-partitioned-topic``
----
2019-12-16 06:06:15 UTC - Sijie Guo: the partitions are not created when `create-partitioned-topic`. individual partitions are created when there are messages produced to them or consumers try to consume from them.
----
2019-12-16 06:13:47 UTC - Abhishek Kumar: 'list-partitioned-topic'  doesn't seem to be a valid command , It gave below error
----
2019-12-16 06:13:52 UTC - Abhishek Kumar: ./pulsar-admin topics list-partitioned-topic  public/default
Expected a command, got list-partitioned-topic

Exception in thread "main" com.beust.jcommander.ParameterException: Asking description for unknown command: null
        at com.beust.jcommander.JCommander.getCommandDescription(JCommander.java:1003)
        at com.beust.jcommander.JCommander.usage(JCommander.java:988)
        at com.beust.jcommander.JCommander.usage(JCommander.java:980)
----
2019-12-16 06:32:57 UTC - Sijie Guo: list-partitioned-topics
----
2019-12-16 06:33:02 UTC - Sijie Guo: sorry for the typo
----
2019-12-16 06:33:10 UTC - Sijie Guo: there is an ‘s’ at the end.
----
2019-12-16 07:03:22 UTC - Abhishek Kumar: ok i will try that
----
2019-12-16 07:04:05 UTC - Abhishek Kumar: thanks, that helps
----
2019-12-16 07:06:57 UTC - Abhishek Kumar: When using KoP (Kafka on Pulsar ) also I faced the same issue , as Kafka Client was not able to discover partitioned topic .. So by any chance are you aware of what command or API is used to discover topics in case of KoP?
----
2019-12-16 07:12:54 UTC - Sijie Guo: @jia zhai can help you on this topic (KoP).
----
2019-12-16 07:17:43 UTC - jia zhai: @Sijie Guo @Abhishek Kumar Yes, this is a know issue.
`create-partitioned-topic` will only create a zk node that contains the partition number, but real topic is not created until it is used.
I recall there was an issue tracking this. we could support this feature.
----
2019-12-16 07:21:03 UTC - jia zhai: <https://github.com/apache/pulsar/pull/5572>
----
2019-12-16 07:21:14 UTC - Abhishek Kumar: Thanks a lot
----
2019-12-16 07:21:14 UTC - jia zhai: Here is the PR that working to fix the issue.
----
2019-12-16 07:21:24 UTC - jia zhai: welcome
----
2019-12-16 07:25:41 UTC - Abhishek Kumar: So I assume after the fix , `pulsar-admin topics list ` command   should list partitioned topics as well, without creating any producer or consumer to that topic
----
2019-12-16 07:25:57 UTC - jia zhai: right
----
2019-12-16 07:26:23 UTC - Abhishek Kumar: and also Kafka Clients using KoP  should be able to discover the same ?
----
2019-12-16 07:26:51 UTC - jia zhai: yes
----
2019-12-16 07:27:10 UTC - Abhishek Kumar: ok thanks for the info @jia zhai
----
2019-12-16 07:29:13 UTC - jia zhai: welcome
----
2019-12-16 07:55:36 UTC - xingzhou: @xingzhou has joined the channel
----