You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/05/27 14:46:36 UTC

[GitHub] [pulsar] Saswatibhoi opened a new issue #7055: Unable to Produce Messages: Topic is stuck on a ledger that doesn't exist

Saswatibhoi opened a new issue #7055:
URL: https://github.com/apache/pulsar/issues/7055


   **Describe the bug**
   While producing messages on a topic, it is trying to reach to a ledger that doesn't exists. The produce operation fails throwing an error message "No such ledger exists on Metadata Server -  ledger=1380467". And, that is the specific ledger the topic is trying to reach while producing.
   
   Tried skipping few messages, and also tried resetting the cursor, but it doesn't help. I have also tried restarting the services of the broker that is currently serving the topic.
   
   We are using pulsar 2.5.1 on our brokers and bookies.
   
   **Error from Logs**
   10:01:00.311 [main] ERROR org.apache.pulsar.client.cli.PulsarClientTool - Error while producing messages
   10:01:00.311 [main] ERROR org.apache.pulsar.client.cli.PulsarClientTool - java.util.concurrent.ExecutionException: org.apache.pulsar.client.api.PulsarClientException: java.io.IOException: No such ledger exists on Metadata Server -  ledger=1380467 - operation=Failed to open ledger
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Saswatibhoi commented on issue #7055: Unable to Produce Messages: Topic is stuck on a ledger that doesn't exist: "No such ledger exists on Metadata Server"

Posted by GitBox <gi...@apache.org>.
Saswatibhoi commented on issue #7055:
URL: https://github.com/apache/pulsar/issues/7055#issuecomment-636529976


   > @Saswatibhoi I am suspecting the issues were related to your deployment. I have to understand how do you deploy the cluster to understand the problems. In your [previous comment](https://github.com/apache/pulsar/issues/7055#issuecomment-635445303), I still don't know what did your deployment do. Did you spin up a new cluster? Or did you migrate from one old cluster to a new cluster?
   
   In our organization, our AMI expires every 3 months. So, every three months we are required to update our instances with a new AMI. In this deployment, we spinned up few new bookies with a new AMI, and decommissioned the old bookies with the process I have mentioned before.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Saswatibhoi commented on issue #7055: Unable to Produce Messages: Topic is stuck on a ledger that doesn't exist: "No such ledger exists on Metadata Server"

Posted by GitBox <gi...@apache.org>.
Saswatibhoi commented on issue #7055:
URL: https://github.com/apache/pulsar/issues/7055#issuecomment-635445303


   > @Saswatibhoi
   > 
   > How do you deploy your pulsar cluster? Did you happen to run any bookkeeper commands like `metaformat` to format the bookkeeper metadata?
   
   For Bookkeepers, below is our deployment process,
   *Spin up new bookies in a new stack.
   *Update old bookies to readOnly.
   *Wait for 24 hours, as our max ttl is 24 hours.
   *Decommission the old bookies
   *Stop the services of the old bookies
   *Remove old stack.
   
   We don't use the metaformat command or bookieformat command in any circumtances.
   
   However, I was thinking, is it possible that the managed ledger cache eviction is not happening gracefully, and that ledger which the topic is trying to reach to is still there somewhere in the cache?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Ghatage edited a comment on issue #7055: Unable to Produce Messages: Topic is stuck on a ledger that doesn't exist: "No such ledger exists on Metadata Server"

Posted by GitBox <gi...@apache.org>.
Ghatage edited a comment on issue #7055:
URL: https://github.com/apache/pulsar/issues/7055#issuecomment-641038722


   @Saswatibhoi 
   Do you check if there are any under-replicated ledgers before decommissioning?
   FWIW, I've written a doc on a safe decommission procedure we use in the BookKeeper documentation recently. [You can check it out here](https://github.com/apache/bookkeeper/blob/2ca4025e4c699e49c39272a067d8ec6056ca4358/site/docs/latest/admin/decomission.md).
   
   cc @sijie 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on issue #7055: Unable to Produce Messages: Topic is stuck on a ledger that doesn't exist: "No such ledger exists on Metadata Server"

Posted by GitBox <gi...@apache.org>.
sijie commented on issue #7055:
URL: https://github.com/apache/pulsar/issues/7055#issuecomment-641717746


   @Ghatage are you interested in adding a section in this https://pulsar.apache.org/docs/en/administration-zk-bk/#bookkeeper page to refer to bookkeeper documentation or the page you are writing.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on issue #7055: Unable to Produce Messages: Topic is stuck on a ledger that doesn't exist: "No such ledger exists on Metadata Server"

Posted by GitBox <gi...@apache.org>.
sijie commented on issue #7055:
URL: https://github.com/apache/pulsar/issues/7055#issuecomment-636366274


   @Saswatibhoi I am suspecting the issues were related to your deployment. I have to understand how do you deploy the cluster to understand the problems. In your [previous comment](https://github.com/apache/pulsar/issues/7055#issuecomment-635445303), I still don't know what did your deployment do. Did you spin up a new cluster? Or did you migrate from one old cluster to a new cluster?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on issue #7055: Unable to Produce Messages: Topic is stuck on a ledger that doesn't exist: "No such ledger exists on Metadata Server"

Posted by GitBox <gi...@apache.org>.
sijie commented on issue #7055:
URL: https://github.com/apache/pulsar/issues/7055#issuecomment-636242082


   @Saswatibhoi I don't think this is a clear way to *decommission* bookies.
   
   At which step the error was encountered?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Ghatage commented on issue #7055: Unable to Produce Messages: Topic is stuck on a ledger that doesn't exist: "No such ledger exists on Metadata Server"

Posted by GitBox <gi...@apache.org>.
Ghatage commented on issue #7055:
URL: https://github.com/apache/pulsar/issues/7055#issuecomment-641738131


   Sure @sijie let me write it up for the pulsar documentation too. Will send a PR soon.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie closed issue #7055: Unable to Produce Messages: Topic is stuck on a ledger that doesn't exist: "No such ledger exists on Metadata Server"

Posted by GitBox <gi...@apache.org>.
sijie closed issue #7055:
URL: https://github.com/apache/pulsar/issues/7055


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Ghatage commented on issue #7055: Unable to Produce Messages: Topic is stuck on a ledger that doesn't exist: "No such ledger exists on Metadata Server"

Posted by GitBox <gi...@apache.org>.
Ghatage commented on issue #7055:
URL: https://github.com/apache/pulsar/issues/7055#issuecomment-641038722


   @Saswatibhoi 
   Do you check if there are any under-replicated ledgers before decommissioning?
   FWIW, I've written a doc on a safe decommission procedure we use in the BookKeeper documentation recently. [You can check it out here](https://github.com/apache/bookkeeper/blob/2ca4025e4c699e49c39272a067d8ec6056ca4358/site/docs/latest/admin/decomission.md).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on issue #7055: Unable to Produce Messages: Topic is stuck on a ledger that doesn't exist: "No such ledger exists on Metadata Server"

Posted by GitBox <gi...@apache.org>.
sijie commented on issue #7055:
URL: https://github.com/apache/pulsar/issues/7055#issuecomment-635041688


   @Saswatibhoi 
   
   How do you deploy your pulsar cluster? Did you happen to run any bookkeeper commands like `metaformat` to format the bookkeeper metadata?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Saswatibhoi commented on issue #7055: Unable to Produce Messages: Topic is stuck on a ledger that doesn't exist: "No such ledger exists on Metadata Server"

Posted by GitBox <gi...@apache.org>.
Saswatibhoi commented on issue #7055:
URL: https://github.com/apache/pulsar/issues/7055#issuecomment-636349262


   > @Saswatibhoi I don't think this is a clear way to _decommission_ bookies.
   > 
   > At which step the error was encountered?
   
   @sijie Actually we did not face any issues while deploying the new bookies. The deployment was successfully completed three weeks back, and is running fine. No other topics are complaining about the ledger issues. The stuck ledger issue was reported for just a couple of topics three days back.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on issue #7055: Unable to Produce Messages: Topic is stuck on a ledger that doesn't exist: "No such ledger exists on Metadata Server"

Posted by GitBox <gi...@apache.org>.
sijie commented on issue #7055:
URL: https://github.com/apache/pulsar/issues/7055#issuecomment-641738553


   @Ghatage thank you!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on issue #7055: Unable to Produce Messages: Topic is stuck on a ledger that doesn't exist: "No such ledger exists on Metadata Server"

Posted by GitBox <gi...@apache.org>.
sijie commented on issue #7055:
URL: https://github.com/apache/pulsar/issues/7055#issuecomment-637183395


   > we spinned up few new bookies with a new AMI, and decommissioned the old bookies with the process I have mentioned before.
   
   Okay. Then you need to run `bin/bookkeeper shell decommission` command to decommission people. "*Wait for 24 hours, as our max ttl is 24 hours." doesn't provide you the safe way to decomission bookies.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Saswatibhoi commented on issue #7055: Unable to Produce Messages: Topic is stuck on a ledger that doesn't exist: "No such ledger exists on Metadata Server"

Posted by GitBox <gi...@apache.org>.
Saswatibhoi commented on issue #7055:
URL: https://github.com/apache/pulsar/issues/7055#issuecomment-639844040


   We do run the decommission bookie every time to decommission.
   
   Looks like, the topic is stuck on that ledger and is not able to recover no matter of the options we have tried like resetting cursor or skipping messages..


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org