You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@brooklyn.apache.org by al...@apache.org on 2015/07/28 17:45:21 UTC

[2/4] incubator-brooklyn git commit: rejig titles of troubleshooting sections

http://git-wip-us.apache.org/repos/asf/incubator-brooklyn/blob/3d930107/docs/guide/ops/troubleshooting/troubleshooting-connectivity.md
----------------------------------------------------------------------
diff --git a/docs/guide/ops/troubleshooting/troubleshooting-connectivity.md b/docs/guide/ops/troubleshooting/troubleshooting-connectivity.md
deleted file mode 100644
index 07874c0..0000000
--- a/docs/guide/ops/troubleshooting/troubleshooting-connectivity.md
+++ /dev/null
@@ -1,143 +0,0 @@
----
-layout: website-normal
-title: Troubleshooting Server Connectivity Issues in the Cloud
-toc: /guide/toc.json
----
-
-A common problem when setting up an application in the cloud is getting the basic connectivity right - how
-do I get my service (e.g. a TCP host:port) publicly accessible over the internet?
-
-This varies a lot - e.g. Is the VM public or in a private network? Is the service only accessible through
-a load balancer? Should the service be globally reachable or only to a particular CIDR?
-
-This guide gives some general tips for debugging connectivity issues, which are applicable to a 
-range of different service types. Choose those that are appropriate for your use-case.
-
-## VM reachable
-If the VM is supposed to be accessible directly (e.g. from the public internet, or if in a private network
-then from a jump host)...
-
-### ping
-Can you `ping` the VM from the machine you are trying to reach it from?
-
-However, ping is over ICMP. If the VM is unreachable, it could be that the firewall forbids ICMP but still
-lets TCP traffic through.
-
-### telnet to TCP port
-You can check if a given TCP port is reachable and listening using `telnet <host> <port>`, such as
-`telnet www.google.com 80`, which gives output like:
-
-```
-    Trying 31.55.163.219...
-    Connected to www.google.com.
-    Escape character is '^]'.
-```
-
-If this is very slow to respond, it can be caused by a firewall blocking access. If it is fast, it could
-be that the server is just not listening on that port.
-
-### DNS and routing
-If using a hostname rather than IP, then is it resolving to a sensible IP?
-
-Is the route to the server sensible? (e.g. one can hit problems with proxy servers in a corporate
-network, or ISPs returning a default result for unknown hosts).
-
-The following commands can be useful:
-
-* `host` is a DNS lookup utility. e.g. `host www.google.com`.
-* `dig` stands for "domain information groper". e.g. `dig www.google.com`.
-* `traceroute` prints the route that packets take to a network host. e.g. `traceroute www.google.com`.
-
-## Service is listening
-
-### Service responds
-Try connecting to the service from the VM itself. For example, `curl http://localhost:8080` for a
-web-service.
-
-On dev/test VMs, don't be afraid to install the utilities you need such as `curl`, `telnet`, `nc`,
-etc. Cloud VMs often have a very cut-down set of packages installed. For example, execute
-`sudo apt-get update; sudo apt-get install -y curl` or `sudo yum install -y curl`.
-
-### Listening on port
-Check that the service is listening on the port, and on the correct NIC(s).
-
-Execute `netstat -antp` (or on OS X `netstat -antp TCP`) to list the TCP ports in use (or use
-`-anup` for UDP). You should expect to see the something like the output below for a service.
-
-```
-Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name   
-tcp        0      0 :::8080                     :::*                        LISTEN      8276/java           
-```
-
-In this case a Java process with pid 8276 is listening on port 8080. The local address `:::8080`
-format means all NICs (in IPv6 address format). You may also see `0.0.0.0:8080` for IPv4 format.
-If it says 127.0.0.1:8080 then your service will most likely not be reachable externally.
-
-Use `ip addr show` (or the obsolete `ifconfig -a`) to see the network interfaces on your server.
-
-For `netstat`, run with `sudo` to see the pid for all listed ports.
-
-## Firewalls
-On Linux, check if `iptables` is preventing the remote connection. On Windows, check the Windows Firewall.
-
-If it is acceptable (e.g. it is not a server in production), try turning off the firewall temporarily,
-and testing connectivity again. Remember to re-enable it afterwards! On CentOS, this is `sudo service
-iptables stop`. On Ubuntu, use `sudo ufw disable`. On Windows, press the Windows key and type 'Windows
-Firewall with Advanced Security' to open the firewall tools, then click 'Windows Firewall Properties'
-and set the firewall state to 'Off' in the Domain, Public and Private profiles.
-
-If you cannot temporarily turn off the firewall, then look carefully at the firewall settings. For
-example, execute `sudo iptables -n --list` and `iptables -t nat -n --list`.
-
-## Cloud firewalls
-Some clouds offer a firewall service, where ports need to be explicitly listed to be reachable.
-
-For example, [security groups for EC2-classic]
-(http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html#ec2-classic-security-groups)
-have rules for the protocols and ports to be reachable from specific CIDRs.
-
-Check these settings via the cloud provider's web-console (or API).
-
-## Quick test of a listener port
-It can be useful to start listening on a given port, and to then check if that port is reachable.
-This is useful for testing basic connectivity when your service is not yet running, or to a
-different port to compare behaviour, or to compare with another VM in the network.
-
-The `nc` netcat tool is useful for this. For example, `nc -l 0.0.0.0 8080` will listen on port
-TCP 8080 on all network interfaces. On another server, you can then run `echo hello from client
-| nc <hostname> 8080`. If all works well, this will send "hello from client" over the TCP port 8080,
-which will be written out by the `nc -l` process before exiting.
-
-Similarly for UDP, you use `-lU`.
-
-You may first have to install `nc`, e.g. with `sudo yum install -y nc` or `sudo apt-get install netcat`.
-
-### Cloud load balancers
-For some use-cases, it is good practice to use the load balancer service offered by the cloud provider
-(e.g. [ELB in AWS](http://aws.amazon.com/elasticloadbalancing/) or the [Cloudstack Load Balancer]
-(http://docs.cloudstack.apache.org/projects/cloudstack-installation/en/latest/network_setup.html#management-server-load-balancing))
-
-The VMs can all be isolated within a private network, with access only through the load balancer service.
-
-Debugging techniques here include ensuring connectivity from another jump server within the private
-network, and careful checking of the load-balancer configuration from the Cloud Provider's web-console.
-
-### DNAT
-Use of DNAT is appropriate for some use-cases, where a particular port on a particular VM is to be
-made available.
-
-Debugging connectivity issues here is similar to the steps for a cloud load balancer. Ensure
-connectivity from another jump server within the private network. Carefully check the NAT rules from
-the Cloud Provider's web-console.
-
-### Guest wifi
-It is common for guest wifi to restrict access to only specific ports (e.g. 80 and 443, restricting
-ssh over port 22 etc).
-
-Normally your best bet is then to abandon the guest wifi (e.g. to tether to a mobile phone instead).
-
-There are some unconventional workarounds such as [configuring sshd to listen on port 80 so you can
-use an ssh tunnel](http://askubuntu.com/questions/107173/is-it-possible-to-ssh-through-port-80).
-However, the firewall may well inspect traffic so sending non-http traffic over port 80 may still fail.
-
-  

http://git-wip-us.apache.org/repos/asf/incubator-brooklyn/blob/3d930107/docs/guide/ops/troubleshooting/troubleshooting-deployment.md
----------------------------------------------------------------------
diff --git a/docs/guide/ops/troubleshooting/troubleshooting-deployment.md b/docs/guide/ops/troubleshooting/troubleshooting-deployment.md
deleted file mode 100644
index c343762..0000000
--- a/docs/guide/ops/troubleshooting/troubleshooting-deployment.md
+++ /dev/null
@@ -1,88 +0,0 @@
----
-layout: website-normal
-title: Troubleshooting Deployment
-toc: /guide/toc.json
----
-
-This guide describes common problems encountered when deploying applications.
-
-
-## YAML deployment errors
-
-The error `Invalid YAML: Plan not in acceptable format: Cannot convert ...` means that the text is not 
-valid YAML. Common reasons include that the indentation is incorrect, or that there are non-matching
-brackets.
-
-The error `Unrecognized application blueprint format: no services defined` means that the `services:`
-section is missing.
-
-An error like `Deployment plan item io.brooklyn.camp.spi.pdp.Service@23c159e2[name=<null>,description=<null>,serviceType=com.acme.Foo,characteristics=[],customAttributes={}] cannot be matched` means that the given entity type (in this case com.acme.Foo) is not in the catalog or on the classpath.
-
-An error like `Illegal parameter for 'location' (aws-ec3); not resolvable: java.util.NoSuchElementException: Unknown location 'aws-ec3': either this location is not recognised or there is a problem with location resolver configuration` means that the given location (in this case aws-ec3) 
-was unknown. This means it does not match any of the named locations in brooklyn.properties, nor any of the
-clouds enabled in the jclouds support, nor any of the locations added dynamically through the catalog API.
-
-
-## VM Provisioning Failures
-
-There are many stages at which VM provisioning can fail! An error `Failure running task provisioning` 
-means there was some problem obtaining or connecting to the machine.
-
-An error like `... Not authorized to access cloud ...` usually means the wrong identity/credential was used.
-
-An error like `Unable to match required VM template constraints` means that a matching image (e.g. AMI in AWS terminology) could not be found. This 
-could be because an incorrect explicit image id was supplied, or because the match-criteria could not
-be satisfied using the given images available in the given cloud. The first time this error is 
-encountered, a listing of all images in that cloud/region will be written to the debug log.
-
-Failure to form an ssh connection to the newly provisioned VM can be reported in several different ways, 
-depending on the nature of the error. This breaks down into failures at different points:
-
-* Failure to reach the ssh port (e.g. `... could not connect to any ip address port 22 on node ...`).
-* Failure to do the very initial ssh login (e.g. `... Exhausted available authentication methods ...`).
-* Failure to ssh using the newly created user.
-
-There are many possible reasons for this ssh failure, which include:
-
-* The VM was "dead on arrival" (DOA) - sometimes a cloud will return an unusable VM. One can work around
-  this using the `machineCreateAttempts` configuration option, to automatically retry with a new VM.
-* Local network restrictions. On some guest wifis, external access to port 22 is forbidden.
-  Check by manually trying to reach port 22 on a different machine that you have access it.
-* NAT rules not set up correctly. On some clouds that have only private IPs, Brooklyn can automatically
-  create NAT rules to provide access to port 22. If this NAT rule creation fails for some reason,
-  then Brooklyn will not be able to reach the VM. If NAT rules are being created for your cloud, then
-  check the logs for warnings or errors about the NAT rule creation.
-* ssh credentials incorrectly configured. The Brooklyn configuration is very flexible in how ssh
-  credentials can be configured. However, if a more advanced configuration is used incorrectly (e.g. 
-  the wrong login user, or invalid ssh keys) then this will fail.
-* Wrong login user. The initial login user to use when first logging into the new VM is inferred from 
-  the metadata provided by the cloud provider about that image. This can sometimes be incomplete, so
-  the wrong user may be used. This can be explicitly set using the `loginUser` configuration option.
-  An example of this is with some Ubuntu VMs, where the "ubuntu" user should be used. However, on some clouds
-  it defaults to trying to ssh as "root".
-* Bad choice of user. By default, Brooklyn will create a user with the same name as the user running the
-  Brooklyn process; the choice of user name is configurable. If this user already exists on the machine, 
-  then the user setup will not behave as expected. Subsequent attempts to ssh using this user could then fail.
-* Custom credentials on the VM. Most clouds will automatically set the ssh login details (e.g. in AWS using  
-  the key-pair, or in CloudStack by auto-generating a password). However, with some custom images the VM
-  will have hard-coded credentials that must be used. If Brooklyn's configuration does not match that,
-  then it will fail.
-* Guest customisation by the cloud. On some clouds (e.g. vCloud Air), the VM can be configured to do
-  guest customisation immediately after the VM starts. This can include changing the root password.
-  If Brooklyn is not configured with the expected changed password, then the VM provisioning may fail
-  (depending if Brooklyn connects before or after the password is changed!).
- 
-A very useful debug configuration is to set `destroyOnFailure` to false. This will allow ssh failures to
-be more easily investigated.
-
-
-## Timeout Waiting For Service-Up
-
-A common generic error message is that there was a timeout waiting for service-up.
-
-This just means that the entity did not get to service-up in the pre-defined time period (the default is 
-two minutes, and can be configured using the `start.timeout` config key; the timer begins after the 
-start tasks are completed).
-
-See the guide on [runtime errors](troubleshooting-runtime-errors.html) for where to find additional information, especially the section on
-"Entity's Error Status".

http://git-wip-us.apache.org/repos/asf/incubator-brooklyn/blob/3d930107/docs/guide/ops/troubleshooting/troubleshooting-runtime-errors.md
----------------------------------------------------------------------
diff --git a/docs/guide/ops/troubleshooting/troubleshooting-runtime-errors.md b/docs/guide/ops/troubleshooting/troubleshooting-runtime-errors.md
deleted file mode 100644
index 8b657fc..0000000
--- a/docs/guide/ops/troubleshooting/troubleshooting-runtime-errors.md
+++ /dev/null
@@ -1,116 +0,0 @@
----
-layout: website-normal
-title: Troubleshooting Runtime Errors
-toc: /guide/toc.json
----
-
-This guide describes sources of information for runtime errors.
-
-Whether you're customizing out-of-the-box blueprints, or developing your own custom blueprints, you will
-inevitably have to deal with entity failure. Thankfully Brooklyn provides plenty of information to help 
-you locate and resolve any issues you may encounter.
-
-
-## Web-console Runtime Error Information
- 
-### Entity Hierarchy
-
-The Brooklyn web-console includes a tree view of the entities within an application. Errors within the
-application are represented visually, showing a "fire" image on the entity.
-
-When an error causes an entire application to be unexpectedly down, the error is generally propagated to the
-top-level entity - i.e. marking it as "on fire". To find the underlying error, one should expand the entity
-hierarchy tree to find the specific entities that have actually failed.
-
-
-### Entity's Error Status
-
-Many entities have some common sensors (i.e. attributes) that give details of the error status:
-
-* `service.isUp` (often referred to as "service up") is a boolean, saying whether the service is up. For many 
-  software processes, this is inferred from whether the "service.notUp.indicators" is empty. It is also
-  possible for some entities to set this attribute directly.
-* `service.notUp.indicators` is a map of errors. This often gives much more information than the single 
-  `service.isUp` attribute. For example, there may be many health-check indicators for a component: 
-  is the root URL reachable, it the management api reporting healthy, is the process running, etc.
-* `service.problems` is a map of namespaced indicators of problems with a service.
-* `service.state` is the actual state of the service - e.g. CREATED, STARTING, RUNNING, STOPPING, STOPPED, 
-  DESTROYED and ON_FIRE.
-* `service.state.expected` indicates the state the service is expected to be in (and when it transitioned to that).
-  For example, is the service expected to be starting, running, stopping, etc.
-
-These sensor values are shown in the "sensors" tab - see below.
-
-
-### Sensors View
-
-The "Sensors" tab in the Brooklyn web-console shows the attribute values of a particular entity.
-This gives lots of runtime information, including about the health of the entity - the 
-set of attributes will vary between different entity types.
-
-[![Sensors view in the Brooklyn debug console.](images/jmx-sensors.png)](images/jmx-sensors-large.png)
-
-Note that null (or not set) sensors are hidden by default. You can click on the `Show/hide empty records` 
-icon (highlighted in yellow above) to see these sensors as well.
-
-The sensors view is also tabulated. You can configure the numbers of sensors shown per page 
-(at the bottom). There is also a search bar (at the top) to filter the sensors shown.
-
-
-### Activity View
-
-The activity view shows the tasks executed by a given entity. The top-level tasks are the effectors
-(i.e. operations) invoked on that entity. This view allows one to drill into the task, to 
-see details of errors.
-
-Select the entity, and then click on the `Activities` tab.
-
-In the table showing the tasks, each row is a link - clicking on the row will drill into the details of that task, 
-including sub-tasks:
-
-[![Task failure error in the Brooklyn debug console.](images/failed-task.png)](images/failed-task-large.png)
-
-For ssh tasks, this allows one to drill down to see the env, stdin, stdout and stderr. That is, you can see the
-commands executed (stdin) and environment variables (env), and the output from executing that (stdout and stderr). 
-
-For tasks that did not fail, one can still drill into the tasks to see what was done.
-
-It's always worth looking at the Detailed Status section as sometimes that will give you the information you need.
-For example, it can show the exception stack trace in the thread that was executing the task that failed.
-
-
-## Log Files
-
-Brooklyn's logging is configurable, for the files created, the logging levels, etc. 
-See [Logging docs](/guide/ops/logging.html).
-
-With out-of-the-box logging, `brooklyn.info.log` and `brooklyn.debug.log` files are created. These are by default 
-rolling log files: when the log reaches a given size, it is compressed and a new log file is started.
-Therefore check the timestamps of the log files to ensure you are looking in the correct file for the 
-time of your error.
-
-With out-of-the-box logging, info, warnings and errors are written to the `brooklyn.info.log` file. This gives
-a summary of the important actions and errors. However, it does not contain full stacktraces for errors.
-
-To find the exception, we'll need to look in Brooklyn's debug log file. By default, the debug log file
-is named `brooklyn.debug.log`. You can use your favourite tools for viewing large text files. 
-
-One possible tool is `less`, e.g. `less brooklyn.debug.log`. We can quickly find the last exception 
-by navigating to the end of the log file (using `Shift-G`), then performing a reverse-lookup by typing `?Exception` 
-and pressing `Enter`. Sometimes an error results in multiple exceptions being logged (e.g. first for the
-entity, then for the cluster, then for the app). If you know the text of the error message (e.g. copy-pasted
-from the Activities view of the web-console) then one can search explicitly for that text.
-
-The `grep` command is also extremely helpful. Useful things to grep for include:
-
-* The entity id (see the "summary" tab of the entity in the web-console for the id).
-* The entity type name (if there are only a small number of entities of that type). 
-* The VM IP address.
-* A particular error message (e.g. copy-pasted from the Activities view of the web-console).
-* The word WARN etc, such as `grep -E "WARN|ERROR" brooklyn.info.log`.
-
-Grep'ing for particular log messages is also useful. Some examples are shown below:
-
-* INFO: "Started application", "Stopping application" and "Stopped application"
-* INFO: "Creating VM "
-* DEBUG: "Finished VM "

http://git-wip-us.apache.org/repos/asf/incubator-brooklyn/blob/3d930107/docs/guide/ops/troubleshooting/troubleshooting-softwareprocess.md
----------------------------------------------------------------------
diff --git a/docs/guide/ops/troubleshooting/troubleshooting-softwareprocess.md b/docs/guide/ops/troubleshooting/troubleshooting-softwareprocess.md
deleted file mode 100644
index a09f902..0000000
--- a/docs/guide/ops/troubleshooting/troubleshooting-softwareprocess.md
+++ /dev/null
@@ -1,50 +0,0 @@
----
-layout: website-normal
-title: Troubleshooting SoftwareProcess Entities
-toc: /guide/toc.json
----
-
-The [guide for troubleshooting runtime errors](troubleshooting-runtime-errors.html) in Brooklyn gives 
-information for how to find more information about errors.
-
-If that doesn't give enough information to diagnose, fix or workaround the problem, then it can be required
-to login to the machine, to investigate further. This guide applies to entities that are types
-of "SoftwareProcess" in Brooklyn, or that follows those conventions.
-
-
-## VM connection details
-
-The ssh connection details for an entity is published to a sensor `host.sshAddress`. The login 
-credentials will depend on the Brooklyn configuration. The default is to use the `~/.ssh/id_rsa` 
-or `~/.ssh/id_dsa` on the Brooklyn host (uploading the associated `~/.ssh/id_rsa.pub` to the machine's 
-authorised_keys). However, this can be overridden (e.g. with specific passwords etc) in the 
-location's configuration.
-
-For Windows, there is a similar sensor with the name `host.winrmAddress`. (TODO sensor for password?) 
-
-
-## Install and Run Directories
-
-For ssh-based software processes, the install directory and the run directory are published as sensors
-`install.dir` and `run.dir` respectively.
-
-For some entities, files are unpacked into the install dir; configuration files are written to the
-run dir along with log files. For some other entities, these directories may be mostly empty - 
-e.g. if installing RPMs, and that software writes its logs to a different standard location.
-
-Most entities have a sensor `log.location`. It is generally worth checking this, along with other files
-in the run directory (such as console output).
-
-
-## Process and OS Health
-
-It is worth checking that the process is running, e.g. using `ps aux` to look for the desired process.
-Some entities also write the pid of the process to `pid.txt` in the run directory.
-
-It is also worth checking if the required port is accessible. This is discussed in the guide 
-"Troubleshooting Server Connectivity Issues in the Cloud", including listing the ports in use:
-execute `netstat -antp` (or on OS X `netstat -antp TCP`) to list the TCP ports in use (or use
-`-anup` for UDP).
-
-It is also worth checking the disk space on the server, e.g. using `df -m`, to check that there
-is sufficient space on each of the required partitions.