You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@brooklyn.apache.org by he...@apache.org on 2021/08/31 12:58:13 UTC
[brooklyn-docs] branch master updated: add troubleshooting for
startup and rebind issues
This is an automated email from the ASF dual-hosted git repository.
heneveld pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/brooklyn-docs.git
The following commit(s) were added to refs/heads/master by this push:
new f4f52c1 add troubleshooting for startup and rebind issues
new 634a88f Merge branch 'master' of https://gitbox.apache.org/repos/asf/brooklyn-docs
f4f52c1 is described below
commit f4f52c12c6a9f4cdf788fc1ff87a35d84ac169ca
Author: Alex Heneveld <al...@cloudsoftcorp.com>
AuthorDate: Tue Aug 31 13:43:53 2021 +0100
add troubleshooting for startup and rebind issues
---
guide/ops/persistence/index.md | 34 +--------
guide/ops/troubleshooting/fails-to-start.md | 81 ++++++++++++++++++++++
.../troubleshooting/going-deep-in-java-and-logs.md | 4 +-
guide/ops/troubleshooting/index.md | 1 +
4 files changed, 88 insertions(+), 32 deletions(-)
diff --git a/guide/ops/persistence/index.md b/guide/ops/persistence/index.md
index 5574a0c..231cfef 100644
--- a/guide/ops/persistence/index.md
+++ b/guide/ops/persistence/index.md
@@ -105,37 +105,9 @@ any registered policies.
## Handling Rebind Failures
-If rebind fails fail for any reason, details of the underlying failures will be reported
-in the [`brooklyn.debug.log`](../paths.html). This will include the entities, locations or policies which caused an issue, and in what
-way it failed. There are several approaches to resolving problems.
-
-1) Determine Underlying Cause
-
-Go through the log and identify the likely areas in the code from the error message.
-
-2) Seek Help
-
- Help can be found by contacting the Apache Brooklyn mailing list.
-
-3) Fix-up the State
-
-The state of each entity, location, policy and enricher is persisted in XML.
-It is thus human readable and editable.
-
-After first taking a backup of the state, it is possible to modify the state. For example,
-an offending entity could be removed, or references to that entity removed, or its XML
-could be fixed to remove the problem.
-
-
-4) Fixing with Groovy Scripts
-
-The final (powerful and dangerous!) tool is to execute Groovy code on the running Brooklyn
-instance. If authorized, the REST API allows arbitrary Groovy scripts to be passed in and
-executed. This allows the state of entities to be modified (and thus fixed) at runtime.
-
-If used, it is strongly recommended that Groovy scripts are run against a disconnected Brooklyn
-instance. After fixing the entities, locations and/or policies, the Brooklyn instance's
-new persisted state can be copied and used to fix the production instance.
+It is possible to confuse Apache Brooklyn such that it is unable to rebind to previously persisted
+state after a restart or when running from a different instance.
+Detailed steps to troubleshoot and correct these situations can be found [here](../troubleshooting/fails-to-start.md).
# Writing Persistable Code
diff --git a/guide/ops/troubleshooting/fails-to-start.md b/guide/ops/troubleshooting/fails-to-start.md
new file mode 100644
index 0000000..b4afa5c
--- /dev/null
+++ b/guide/ops/troubleshooting/fails-to-start.md
@@ -0,0 +1,81 @@
+---
+layout: website-normal
+title: "Brooklyn Fails to Start"
+toc: /guide/toc.json
+---
+
+If Apache Brooklyn does not start, or starts with errors, the problem is usually easy to resolve.
+The first place to look is the [logs](/guide/ops/logging.html): `grep` for the first `ERROR`,
+and sometimes look backwards for the first `WARN` message.
+
+There are a handful of common causes.
+
+## Memory
+
+If there is not enough memory available either on the system or for the software, it will have problems.
+This may manifest itself as the process being killed, e.g. if the OS does not have enough memory
+(and there will usually be a message in the system log, e.g. `/var/log/syslog`);
+or some modules failing to load with an `OutOfMemoryException` in the log.
+
+If either of these occurs, you can assign additional memory if available on your system
+by editing the files in `bin/`, such as `JAVA_MAX_MEM` in `setenv` (or `setenv.bat` on Windows),
+or by running Apache Brooklyn on a system with more memory.
+
+
+## Rebind Errors
+
+It is possible to get the persistent state into an incompatible state, where Apache Brooklyn
+cannot load its previous state. In this case it fails fast so as not to corrupt the state further.
+In addition, a backup of the persistent state will be written to the `backups/` folder in
+the persistent state directory.
+
+The log files contain detailed information about what is unable to be loaded and why;
+some causes include:
+
+* A type that is deployed is no longer available, e.g. because a `SNAPSHOT` bundle was installed,
+ say with a type `X`, the type `X` is used in an active deployment, and then the bundle
+ was either uninstalled or a new version installed at the same version (for `SNAPSHOT` or forced)
+ that did not contain the type in use (`X`)
+
+* A deployment did not correctly clean up and leaked resources;
+ this will happen only with Java entities or adjuncts that are incorrectly unmanaged
+
+* A dependency is unavailable, possibly because it was added via the `dropins/` folder or
+ is not installed in the Brooklyn instance being started
+
+There are some good practices which can help avoid these errors:
+
+* Avoid the use of `SNAPSHOT` bundles in production (and do not `force` install bundles)
+* If `SNAPSHOT` bundles are updated in an incompatible way in a dev environment (eg blueprint name change),
+ take care to remove pre-existing incompatible deployments
+* When upgrading or restarting Brooklyn, it is recommended to start a second instance as hot-standby first:
+ this will flag the issue that there is an existing deployment which cannot be re-read on a clean start,
+ and it can be removed from the primary Brooklyn
+
+If a rebind problem does occur, all is not lost. There are several ways that recovery can be achieved:
+
+* Delete the incompatible persisted state item files indicated in the logs
+ (or simply delete all the persisted state in a dev environment)
+* Restore to a previous backup state (automatically written to the `backups/` folder with a datestamp)
+* Tell Brooklyn to ignore a certain number of rebind errors with settings in `brooklyn.cfg`:
+ * `rebind.failureMode.danglingRefs.minRequiredHealthy`: takes `QuorumCheck` syntax, consisting
+ of points on a line, e.g. `[[0,0],[10,5],[20,14]]` to allow up to 1 failure for every 2 items up to 10 items
+ (5 needed when 10 items are persisted, per the second point), then subsequently 1 failure for every additional 10 items deployed
+ (14 needed when 20 items are persisted, per the third point)
+ * `rebind.failureMode.rebind`: either `FAIL_FAST`, `FAIL_AT_END`, or `CONTINUE`, for how to treat serious rebind problems
+ (default `FAIL_AT_END`)
+ * Further options available as per the JavaDoc on `RebindManagerImpl` config keys
+* When Brooklyn is stopped, remove the persisted state; then restart in a pristine environment, install any missing bundles,
+ then import the offending persistent state via the UI (About) or REST API;
+ alternatively in some cases it may be possible to add additional/missing bundles via the `dropins/` folder of Karaf
+ or using the `karaf` console (`bundle:install -s ...`)
+* If the broken persisted state is critical, it is possible to edit them: they are simply an XML model of the items
+ using a lot of unique identifiers designed so that references can be easily found using `grep`
+* Finally, if all else fails, open a support ticket: there are a number of other advanced techniques available,
+ such as specifying that types should be automatically renamed or migrated by new bundles ([see the Persistence section here](../upgrades/)).
+
+It may also be useful to review the sections on [Persistence](../persistence/) and [HA](../high-availability/).
+
+
+
+
diff --git a/guide/ops/troubleshooting/going-deep-in-java-and-logs.md b/guide/ops/troubleshooting/going-deep-in-java-and-logs.md
index 581a27b..3c072b4 100644
--- a/guide/ops/troubleshooting/going-deep-in-java-and-logs.md
+++ b/guide/ops/troubleshooting/going-deep-in-java-and-logs.md
@@ -475,4 +475,6 @@ SEVERE: Cannot start server. Server instance is not configured.
{% endhighlight %}
-As expected, we can see here that the `unmatched-element` element has not been terminated in the `server.xml` file
+As expected, we can see here that the `unmatched-element` element has not been terminated in the `server.xml` file.
+
+
diff --git a/guide/ops/troubleshooting/index.md b/guide/ops/troubleshooting/index.md
index 331e267..7909648 100644
--- a/guide/ops/troubleshooting/index.md
+++ b/guide/ops/troubleshooting/index.md
@@ -3,6 +3,7 @@ title: Troubleshooting
layout: website-normal
children:
- { path: overview.md, title: Overview }
+- { path: fails-to-start.md }
- { path: web-console-issues.md, title: Web Console Issues }
- { path: deployment.md, title: Deployment }
- { path: connectivity.md, title: Server Connectivity }