You are viewing a plain text version of this content. The canonical link for it is here.
Posted to by Apache Pulsar Slack <> on 2020/08/22 09:11:05 UTC

Slack digest for #general - 2020-08-22

2020-08-21 11:22:06 UTC - Takahiro Hozumi: Is it possible to use avro schema without code generation in pulsar?
I have an avro schema as json file and want to create a pulsar message with `org.apache.avro.generic.GenericRecord` which use the schema.
It seems that pulsar producer requires pojo generated from the schema.

2020-08-21 11:25:33 UTC - Joshua Decosta: That seems like a standard process. It’s the same way you would produce or consume any message.
2020-08-21 12:13:12 UTC - Aaron: You can produce messages with a Producer of type GenericRecord
2020-08-21 13:41:01 UTC - Raghav: What is the use of the command “./bookkeeper shell localconsistencycheck”. In my cluster with 3 bookies and E,Qw,Qa as (3,3,3) the simpletest is working fine “./bookkeeper shell simpletest -ensemble 3 -writeQuorum 3 -ackQuorum  3 -numEntries 100". But the localconsistencycheck is failing with exception on all 3 boxes

Exception in thread “main” Error open RocksDB database
        at org.apache.bookkeeper.bookie.BookieShell$LocalConsistencyCheck.runCmd(
        at org.apache.bookkeeper.bookie.BookieShell$MyCommand.runCmd(
        at org.apache.bookkeeper.bookie.BookieShell.main(
Caused by: <|>.IOException: Error open RocksDB database
        at org.apache.bookkeeper.bookie.Bookie.mountLedgerStorageOffline(
        ... 4 more
Caused by: org.rocksdb.RocksDBException: While lock file: /var/pulsar/bookie/ledger1/data-1/current/ledgers/LOCK: Resource temporarily unavailable
        at Method)
        ... 13 more
2020-08-21 14:52:30 UTC - Addison Higham: The biggest one I can think of is schemas, if you aren't using schemas, then you wouldn't need to worry
2020-08-21 14:54:24 UTC - Addison Higham: That version was built manually and should have included rc1 in the tag, since rc1 passed it is pretty much the same, but just not the official version
2020-08-21 15:00:36 UTC - Lari Hotari: Thanks. It looks like <;name=2.6.1|the official 2.6.1 image is now available> so I'll use that one.
2020-08-21 15:32:27 UTC - Frank Kelly: FYI documentation says Python Client for 2.6.1 is available <>
But I see the following
```$ pip3 install pulsar-client==2.6.1
ERROR: Could not find a version that satisfies the requirement pulsar-client==2.6.1 (from versions: 2.1.0, 2.1.1, 2.2.0, 2.2.1, 2.3.0, 2.3.0.post1, 2.3.1, 2.3.2, 2.4.0, 2.4.1, 2.4.1.post1, 2.4.2, 2.5.0, 2.5.1, 2.5.2, 2.6.0)
ERROR: No matching distribution found for pulsar-client==2.6.1```
2020-08-21 15:36:48 UTC - Matt Mitchell: Anyone know what might cause this?
Caused by: org.apache.pulsar.client.api.PulsarClientException$IncompatibleSchemaException: Topic does not have schema to check
	at org.apache.pulsar.client.impl.ClientCnx.getPulsarClientException(
	at org.apache.pulsar.client.impl.ClientCnx.handleError(
	at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(
2020-08-21 15:48:37 UTC - Matt Mitchell: I have several services running, and I’m thinking one of them is using an older version of the client code, which contains the schema (protobuf format) - is that potentially a reason for causing this error?
2020-08-21 15:51:04 UTC - Addison Higham: do you have the full stacktrace?
2020-08-21 15:51:19 UTC - Addison Higham: err actually better yet if there are logs from the broker?
2020-08-21 15:54:10 UTC - Matt Mitchell: checking
2020-08-21 16:48:15 UTC - Nathan Mills: Just bumping this to see if anyone can provide some clarity for me: <>
2020-08-21 17:23:48 UTC - Addison Higham: does it allow you to set it up that way? Backlog Quotas you usually think of as being a "subset" of your retention policy. But with infinite retention it may make sense to still have a limit on the size of a subscription.

But stepping back a bit, it is import to remember the distinction between a messages in and out of a subscription.

A retention policy applies to messages NOT in a subscription, backlog quotas and TTL only apply to messages IN a subscription. I like to think of subscriptions as a "view" over all the messages, with each subscription having it's own view over the same underlying data. The backlog quota and TTL allow you to place some constraints on how long a message is visible in that view, but the retention policy is what is responsible for how long the data remains in the underlying storage
2020-08-21 17:24:19 UTC - Addison Higham: so more concretely:
if messages are evicted from your subscription, they will no longer be visible in your subscription but they remain in the underlying storage
2020-08-21 17:28:48 UTC - Nathan Mills: ok, just to make sure I understand correctly. With `consumer_backlog_eviction` the messages still get written to the topic just removed from subscriptions that have exceeded the backlog quota, but if someone uses the `producer_exception` policy, then if any of the subscriptions exceed the the backlog quota it will cause the producer to disconnect?
2020-08-21 17:29:11 UTC - Addison Higham: yes
2020-08-21 17:29:20 UTC - Nathan Mills: :thumbsup:  thanks
2020-08-21 19:18:54 UTC - Nathan Mills: So here is what I'm trying to figure out, I'm investigating reports of missing messages. So I created a function, stop the function, and reset the cursor for the input topic to before it was created. When I look at internal stats initially I get
```    "canvas-cdc%2Ffiltered%2Fcdc-filter-96": {
      "markDeletePosition": "8367455:-1",
      "readPosition": "8367455:0",
      "waitingReadOp": false,
      "pendingReadOps": 0,
      "messagesConsumedCounter": -16267653,
      "cursorLedger": 9274512,
      "cursorLedgerLastEntry": 1,
      "individuallyDeletedMessages": "[]",
      "lastLedgerSwitchTimestamp": "2020-08-21T19:10:31.355Z",
      "state": "Open",
      "numberOfEntriesSinceFirstNotAckedMessage": 1,
      "totalNonContiguousDeletedMessagesRange": 0,
      "properties": {}
after a couple of minutes with the function still stopped the internal stats for the cursor looks like:
```  "canvas-cdc%2Ffiltered%2Fcdc-filter-96": {
    "markDeletePosition": "9216422:3654",
    "readPosition": "9216422:3655",
    "waitingReadOp": false,
    "pendingReadOps": 0,
    "messagesConsumedCounter": 2189250,
    "cursorLedger": 9274512,
    "cursorLedgerLastEntry": 2,
    "individuallyDeletedMessages": "[]",
    "lastLedgerSwitchTimestamp": "2020-08-21T19:10:31.355Z",
    "state": "Open",
    "numberOfEntriesSinceFirstNotAckedMessage": 1,
    "totalNonContiguousDeletedMessagesRange": 0,
    "properties": {}
The read position seems to jump forward without the function running.  Would this be caused by the backlog quota policy which is currently
  "destination_storage" : {
    "limit" : 5368709120,
    "policy" : "consumer_backlog_eviction"

2020-08-21 19:21:57 UTC - Addison Higham: yes, that is what would be expected
2020-08-21 19:23:06 UTC - Nathan Mills: Any recommended settings for the backlog policy since it lives at the namespace? Just increase the limit to an arbitrary large size?
2020-08-21 19:23:46 UTC - Addison Higham: @Nathan Mills sorry should keep that threaded, is there a default backlog quota set?
2020-08-21 19:23:54 UTC - Addison Higham: cluster wide one I mean
2020-08-21 19:24:05 UTC - Nathan Mills: I think so, need to validate that though
2020-08-21 19:27:00 UTC - Nathan Mills: yes, The one above is the default quota that was inherited
2020-08-21 19:27:14 UTC - Nathan Mills: I guess I could just set the limit to `-1`?
2020-08-21 19:27:21 UTC - Nathan Mills: for that namespace
2020-08-21 19:35:13 UTC - Joe Selvi: @Joe Selvi has joined the channel
2020-08-21 19:37:31 UTC - Addison Higham: IDK if -1 is a valid "unlimited" value, there is a field call to `remove-backlog-quota` but I think that will just set you back to the cluster default
2020-08-21 19:37:54 UTC - Nathan Mills: yeah I tried that, but it looks like it just set it back to the cluster default.
2020-08-21 19:38:07 UTC - Nathan Mills: I'll try a value larger than the topic size and see what happens
2020-08-21 19:38:18 UTC - Addison Higham: your best bet may be an arbitrarily large number, but it is pretty crappy UX, there *might* be a issue/bug for this
2020-08-21 19:38:59 UTC - Addison Higham: it is like 2 "empty" states, need a "unset use default" and "unlimited"
2020-08-21 19:40:19 UTC - Nathan Mills: yeah that would be nice.
2020-08-21 19:40:47 UTC - Addison Higham: if you want to look and see if there is an issue for this or file one, that would be super great
2020-08-21 20:13:59 UTC - Nathan Mills: thanks for the help, looks like `-1` does work to disable the backlog quota for a namespace. But we have a default TTL as well, and you aren't able to disable it at the namespace level. So I set it back to it's max of 68 years, and created <>
2020-08-21 20:24:26 UTC - Vil: I came across this by Twitter. written by the Confluence folks

Any comments from us? Technical parts are a bit too deep for me. The only thing I can say it looks indeed ‘fair’ to me. Like they at least tried to make it fair. I am still beginner so I can not say whether statements are true or not
2020-08-21 20:27:33 UTC - Frank Kelly: Is there any documentation on the backwards compatibility strategy for Apache pulsar e.g. is the expectation that a minor version upgrade will be backwards compatible e.g. 2.6 client -&gt; 2.5 server OR 2.5 client -&gt; 2.6 server? Thanks in advance
2020-08-21 20:39:31 UTC - Addison Higham: See <#C5Z1W2BDY|random> where there has been some discussions about it, there are some issues with the way they configure bookkeeper disk that make the test not very apple-to-apples in comparison
+1 : Vil, Sijie Guo
2020-08-21 20:40:03 UTC - Addison Higham: oh okay, cool
2020-08-21 21:40:17 UTC - Vil: thanks for pointer
2020-08-21 22:09:20 UTC - Jorge Miralles: Hello, is there a way to delete  messages acked or outside the retention limit?