You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2021/01/14 06:12:28 UTC

[GitHub] [couchdb-documentation] jaydoane opened a new pull request #615: Port weatherreport README to cluster troubleshooting section

jaydoane opened a new pull request #615:
URL: https://github.com/apache/couchdb-documentation/pull/615


   <!-- Thank you for your contribution!
   
        Please file this form by replacing the Markdown comments
        with your text. If a section needs no action - remove it.
   
        Also remember, that CouchDB uses the Review-Then-Commit (RTC) model
        of code collaboration. Positive feedback is represented +1 from committers
        and negative is a -1. The -1 also means veto, and needs to be addressed
        to proceed. Once there are no objections, the PR can be merged by a
        CouchDB committer.
   
        See: http://couchdb.apache.org/bylaws.html#decisions for more info. -->
   
   ## Overview
   
   This is a port of the `weatherreport` README
   
   <!-- Please give a short brief for the pull request,
        what problem it solves or how it makes things better. -->
   
   ## Testing recommendations
   
   <!-- Describe how we can test your changes.
        Does it provides any behaviour that the end users
        could notice? -->
   
   ## GitHub issue number
   
   <!-- If this is a significant change, please file a separate issue at:
        https://github.com/apache/couchdb-documentation/issues
        and include the number here and in commit message(s) using
        syntax like "Fixes #472" or "Fixes apache/couchdb#472".  -->
   
   ## Related Pull Requests
   
   https://github.com/apache/couchdb/pull/3312
   
   <!-- If your changes affects multiple components in different
        repositories please put links to those pull requests here.  -->
   
   ## Checklist
   
   - [ ] Update [rebar.config.script](https://github.com/apache/couchdb/blob/master/rebar.config.script) with the commit hash once this PR is rebased and merged
   <!-- Before opening the PR, consider running `make check` locally for a faster turnaround time -->
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb-documentation] flimzy commented on pull request #615: Port weatherreport README to cluster troubleshooting section

Posted by GitBox <gi...@apache.org>.
flimzy commented on pull request #615:
URL: https://github.com/apache/couchdb-documentation/pull/615#issuecomment-761619293


   LGTM!
   
   +1
   
   Sorry I didn't see everything on my first pass, and dragged this out longer :/


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb-documentation] jaydoane commented on pull request #615: Port weatherreport README to cluster troubleshooting section

Posted by GitBox <gi...@apache.org>.
jaydoane commented on pull request #615:
URL: https://github.com/apache/couchdb-documentation/pull/615#issuecomment-809593343


   Unfortunately, I have not yet merged the `weatherreport` PR proper, and was planning to hold off for a while longer, so merging this may have been premature.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb-documentation] flimzy commented on a change in pull request #615: Port weatherreport README to cluster troubleshooting section

Posted by GitBox <gi...@apache.org>.
flimzy commented on a change in pull request #615:
URL: https://github.com/apache/couchdb-documentation/pull/615#discussion_r558170347



##########
File path: src/cluster/troubleshooting.rst
##########
@@ -0,0 +1,114 @@
+.. Licensed under the Apache License, Version 2.0 (the "License"); you may not
+.. use this file except in compliance with the License. You may obtain a copy of
+.. the License at
+..
+..   http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+.. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+.. License for the specific language governing permissions and limitations under
+.. the License.
+
+.. _cluster/troubleshooting:
+
+=============================================
+Troubleshooting CouchDB 3 with Weather Report
+=============================================
+
+.. _cluster/troubleshooting/overview:
+
+Overview
+========
+
+``weatherreport`` is an escript and set of tools that diagnoses common
+problems which could affect a CouchDB version 3 node or cluster. It
+does not support version 4 or later.

Review comment:
       Should we explicitly mention (lack of) support for version 2 and earlier, as well?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb-documentation] wohali merged pull request #615: Port weatherreport README to cluster troubleshooting section

Posted by GitBox <gi...@apache.org>.
wohali merged pull request #615:
URL: https://github.com/apache/couchdb-documentation/pull/615


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb-documentation] jaydoane commented on a change in pull request #615: Port weatherreport README to cluster troubleshooting section

Posted by GitBox <gi...@apache.org>.
jaydoane commented on a change in pull request #615:
URL: https://github.com/apache/couchdb-documentation/pull/615#discussion_r558809322



##########
File path: src/cluster/troubleshooting.rst
##########
@@ -0,0 +1,114 @@
+.. Licensed under the Apache License, Version 2.0 (the "License"); you may not
+.. use this file except in compliance with the License. You may obtain a copy of
+.. the License at
+..
+..   http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+.. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+.. License for the specific language governing permissions and limitations under
+.. the License.
+
+.. _cluster/troubleshooting:
+
+=============================================
+Troubleshooting CouchDB 3 with Weather Report
+=============================================
+
+.. _cluster/troubleshooting/overview:
+
+Overview
+========
+
+``weatherreport`` is an escript and set of tools that diagnoses common
+problems which could affect a CouchDB version 3 node or cluster. It
+does not support version 4 or later.
+
+Here is a basic example of using ``weatherreport`` followed immediately
+by the command's output:
+
+.. code-block:: bash
+
+    $ weatherreport --etc /path/to/etc
+    [warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down.
+
+.. _cluster/troubleshooting/usage:
+
+Usage
+=====
+
+For most cases, you can just run the ``weatherreport`` command as
+shown above.  However, sometimes you might want to know some extra
+detail, or run only specific checks. For that, there are command-line
+options. Execute ``weatherreport --help`` to learn more about these
+options:
+
+.. code-block:: bash
+
+    $ weatherreport --help
+    Usage: weatherreport [-c <path>] [-d <level>] [-e] [-h] [-l] [check_name ...]
+
+      -c, --etc                 Path to the CouchDB configuration directory
+      -d, --level               Minimum message severity level (default: notice)
+      -l, --list                Describe available diagnostic tasks
+      -e, --expert              Perform more detailed diagnostics
+      -h, --help                Display help/usage
+      check_name                A specific check to run
+
+To get an idea of what checks will be run, use the `--list` option:
+
+.. code-block:: bash
+
+    $ weatherreport --list
+    Available diagnostic checks:
+
+      custodian            Shard safety/liveness checks
+      disk                 Data directory permissions and atime
+      internal_replication Check the number of pending internal replication jobs
+      ioq                  Check the total number of active IOQ requests
+      mem3_sync            Check there is a registered mem3_sync process
+      membership           Cluster membership validity
+      memory_use           Measure memory usage
+      message_queues       Check for processes with large mailboxes
+      node_stats           Check useful erlang statistics for diagnostics
+      nodes_connected      Cluster node liveness
+      process_calls        Check for large numbers of processes with the same current/initial call
+      process_memory       Check for processes with high memory usage
+      safe_to_rebuild      Check whether the node can safely be taken out of service
+      search               Check the local search node is responsive
+      tcp_queues           Measure the length of tcp queues in the kernel
+
+If you want all the gory details about what WeatherReport is doing,

Review comment:
       We're using `weatherreport` to refer to the actual command line executable, but you're right about the other inconsistency. I lean toward using WeatherReport as the name, but don't have a strong preference either.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb-documentation] jaydoane commented on a change in pull request #615: Port weatherreport README to cluster troubleshooting section

Posted by GitBox <gi...@apache.org>.
jaydoane commented on a change in pull request #615:
URL: https://github.com/apache/couchdb-documentation/pull/615#discussion_r558809037



##########
File path: src/cluster/troubleshooting.rst
##########
@@ -0,0 +1,114 @@
+.. Licensed under the Apache License, Version 2.0 (the "License"); you may not
+.. use this file except in compliance with the License. You may obtain a copy of
+.. the License at
+..
+..   http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+.. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+.. License for the specific language governing permissions and limitations under
+.. the License.
+
+.. _cluster/troubleshooting:
+
+=============================================
+Troubleshooting CouchDB 3 with Weather Report
+=============================================
+
+.. _cluster/troubleshooting/overview:
+
+Overview
+========
+
+``weatherreport`` is an escript and set of tools that diagnoses common
+problems which could affect a CouchDB version 3 node or cluster. It
+does not support version 4 or later.

Review comment:
       We're [importing weatherreport into the 3.x branch](https://github.com/apache/couchdb/pull/3312), so it's not clear how someone could use it in version 2 without significant effort. But, if they _did_ make that effort, most of the checks would likely work. So while it could mostly be used with 2, I don't know if we really want to mention it?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb-documentation] jaydoane commented on pull request #615: Port weatherreport README to cluster troubleshooting section

Posted by GitBox <gi...@apache.org>.
jaydoane commented on pull request #615:
URL: https://github.com/apache/couchdb-documentation/pull/615#issuecomment-809840861


   @wohali Yeah, there's a chance it can land in the next ~2 weeks; I'll do my best to get it squashed and merged by then. If not 3.2, then 3.3 for sure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb-documentation] flimzy commented on a change in pull request #615: Port weatherreport README to cluster troubleshooting section

Posted by GitBox <gi...@apache.org>.
flimzy commented on a change in pull request #615:
URL: https://github.com/apache/couchdb-documentation/pull/615#discussion_r557408980



##########
File path: src/cluster/troubleshooting.rst
##########
@@ -0,0 +1,113 @@
+.. Licensed under the Apache License, Version 2.0 (the "License"); you may not
+.. use this file except in compliance with the License. You may obtain a copy of
+.. the License at
+..
+..   http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+.. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+.. License for the specific language governing permissions and limitations under
+.. the License.
+
+.. _cluster/troubleshooting:
+
+====================================
+Trouble Shooting with Weather Report

Review comment:
       Should be one word.
   
   ```suggestion
   Troubleshooting with Weather Report
   ```

##########
File path: src/cluster/troubleshooting.rst
##########
@@ -0,0 +1,113 @@
+.. Licensed under the Apache License, Version 2.0 (the "License"); you may not
+.. use this file except in compliance with the License. You may obtain a copy of
+.. the License at
+..
+..   http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+.. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+.. License for the specific language governing permissions and limitations under
+.. the License.
+
+.. _cluster/troubleshooting:
+
+====================================
+Trouble Shooting with Weather Report
+====================================
+
+.. _cluster/troubleshooting/overview:
+
+Overview
+========
+
+``weatherreport`` is an escript and set of tools that diagnoses common
+problems which could affect a CouchDB node or cluster.
+
+Here is a basic example of using ``weatherreport`` followed immediately
+by the command's output:
+
+.. code-block:: bash
+
+    $ weatherreport --etc /path/to/etc
+    [warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down.
+
+.. _cluster/troubleshooting/usage:
+
+Usage
+=====
+
+For most cases, you can just run the ``weatherreport`` command as
+shown above.  However, sometimes you might want to know some extra
+detail, or run only specific checks. For that, there are command-line
+options. Execute ``weatherreport --help`` to learn more about these
+options:
+
+.. code-block:: bash
+
+    $ weatherreport --help
+    Usage: weatherreport [-c <path>] [-d <level>] [-e] [-h] [-l] [check_name ...]
+
+      -c, --etc                 Path to the CouchDB configuration directory
+      -d, --level               Minimum message severity level (default: notice)
+      -l, --list                Describe available diagnostic tasks
+      -e, --expert              Perform more detailed diagnostics
+      -h, --help                Display help/usage
+      check_name                A specific check to run
+
+To get an idea of what checks will be run, use the `--list` option:
+
+.. code-block:: bash
+
+    $ weatherreport --list
+    Available diagnostic checks:
+
+      custodian            Shard safety/liveness checks
+      disk                 Data directory permissions and atime
+      internal_replication Check the number of pending internal replication jobs
+      ioq                  Check the total number of active IOQ requests
+      mem3_sync            Check there is a registered mem3_sync process
+      membership           Cluster membership validity
+      memory_use           Measure memory usage
+      message_queues       Check for processes with large mailboxes
+      node_stats           Check useful erlang statistics for diagnostics
+      nodes_connected      Cluster node liveness
+      process_calls        Check for large numbers of processes with the same current/initial call
+      process_memory       Check for processes with high memory usage
+      safe_to_rebuild      Check whether the node can safely be taken out of service
+      search               Check the local search node is responsive
+      tcp_queues           Measure the length of tcp queues in the kernel
+
+If you want all the gory details about what WeatherReport is doing,
+you can run the checks at a more verbose logging level with
+the ``--level`` option:
+
+.. code-block:: bash
+
+    $ weatherreport --etc /path/to/etc -d debug
+    [debug] Not connected to the local cluster node, trying to connect. alive:false connect_failed:undefined
+    [debug] Starting distributed Erlang.
+    [debug] Connected to local cluster node 'node1@127.0.0.1'.
+    [debug] Local RPC: mem3:nodes([]) [5000]
+    [debug] Local RPC: os:getpid([]) [5000]
+    [debug] Running shell command: ps -o pmem,rss -p 73905
+    [debug] Shell command output:
+    %MEM    RSS
+    0.3  25116
+
+    [debug] Local RPC: erlang:nodes([]) [5000]
+    [debug] Local RPC: mem3:nodes([]) [5000]
+    [warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down.
+    [info] Process is using 0.3% of available RAM, totalling 25116 KB of real memory.
+
+Most times you'll want to use the defaults, but any Syslog severity

Review comment:
       ```suggestion
   Most times you'll want to use the defaults, but any syslog severity
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb-documentation] flimzy commented on a change in pull request #615: Port weatherreport README to cluster troubleshooting section

Posted by GitBox <gi...@apache.org>.
flimzy commented on a change in pull request #615:
URL: https://github.com/apache/couchdb-documentation/pull/615#discussion_r558845978



##########
File path: src/cluster/troubleshooting.rst
##########
@@ -0,0 +1,115 @@
+.. Licensed under the Apache License, Version 2.0 (the "License"); you may not
+.. use this file except in compliance with the License. You may obtain a copy of
+.. the License at
+..
+..   http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+.. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+.. License for the specific language governing permissions and limitations under
+.. the License.
+
+.. _cluster/troubleshooting:
+
+============================================
+Troubleshooting CouchDB 3 with WeatherReport
+============================================
+
+.. _cluster/troubleshooting/overview:
+
+Overview
+========
+
+WeatherReport is an OTP application and set of tools that diagnoses
+common problems which could affect a CouchDB version 3 node or cluster
+(version 4 or later is not supported). It is accessed via the
+``weatherreport`` command line escript.
+
+Here is a basic example of using ``weatherreport`` followed immediately
+by the command's output:
+
+.. code-block:: bash
+
+    $ weatherreport --etc /path/to/etc
+    [warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down.
+
+.. _cluster/troubleshooting/usage:
+
+Usage
+=====
+
+For most cases, you can just run the ``weatherreport`` command as
+shown above.  However, sometimes you might want to know some extra
+detail, or run only specific checks. For that, there are command-line
+options. Execute ``weatherreport --help`` to learn more about these
+options:
+
+.. code-block:: bash
+
+    $ weatherreport --help
+    Usage: weatherreport [-c <path>] [-d <level>] [-e] [-h] [-l] [check_name ...]
+
+      -c, --etc                 Path to the CouchDB configuration directory
+      -d, --level               Minimum message severity level (default: notice)
+      -l, --list                Describe available diagnostic tasks
+      -e, --expert              Perform more detailed diagnostics
+      -h, --help                Display help/usage
+      check_name                A specific check to run
+
+To get an idea of what checks will be run, use the `--list` option:
+
+.. code-block:: bash
+
+    $ weatherreport --list
+    Available diagnostic checks:
+
+      custodian            Shard safety/liveness checks
+      disk                 Data directory permissions and atime
+      internal_replication Check the number of pending internal replication jobs
+      ioq                  Check the total number of active IOQ requests
+      mem3_sync            Check there is a registered mem3_sync process
+      membership           Cluster membership validity
+      memory_use           Measure memory usage
+      message_queues       Check for processes with large mailboxes
+      node_stats           Check useful erlang statistics for diagnostics
+      nodes_connected      Cluster node liveness
+      process_calls        Check for large numbers of processes with the same current/initial call
+      process_memory       Check for processes with high memory usage
+      safe_to_rebuild      Check whether the node can safely be taken out of service
+      search               Check the local search node is responsive
+      tcp_queues           Measure the length of tcp queues in the kernel
+
+If you want all the gory details about what WeatherReport is doing,
+you can run the checks at a more verbose logging level with
+the ``--level`` option:
+
+.. code-block:: bash
+
+    $ weatherreport --etc /path/to/etc -d debug

Review comment:
       The paragraph above mentions "the `--level` option", but the example shows `-d`.  I guess `-d` is the short form of `--level`?  Would it be good to explain that, to avoid confusion?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb-documentation] jaydoane commented on a change in pull request #615: Port weatherreport README to cluster troubleshooting section

Posted by GitBox <gi...@apache.org>.
jaydoane commented on a change in pull request #615:
URL: https://github.com/apache/couchdb-documentation/pull/615#discussion_r558971973



##########
File path: src/cluster/troubleshooting.rst
##########
@@ -0,0 +1,115 @@
+.. Licensed under the Apache License, Version 2.0 (the "License"); you may not
+.. use this file except in compliance with the License. You may obtain a copy of
+.. the License at
+..
+..   http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+.. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+.. License for the specific language governing permissions and limitations under
+.. the License.
+
+.. _cluster/troubleshooting:
+
+============================================
+Troubleshooting CouchDB 3 with WeatherReport
+============================================
+
+.. _cluster/troubleshooting/overview:
+
+Overview
+========
+
+WeatherReport is an OTP application and set of tools that diagnoses
+common problems which could affect a CouchDB version 3 node or cluster
+(version 4 or later is not supported). It is accessed via the
+``weatherreport`` command line escript.
+
+Here is a basic example of using ``weatherreport`` followed immediately
+by the command's output:
+
+.. code-block:: bash
+
+    $ weatherreport --etc /path/to/etc
+    [warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down.
+
+.. _cluster/troubleshooting/usage:
+
+Usage
+=====
+
+For most cases, you can just run the ``weatherreport`` command as
+shown above.  However, sometimes you might want to know some extra
+detail, or run only specific checks. For that, there are command-line
+options. Execute ``weatherreport --help`` to learn more about these
+options:
+
+.. code-block:: bash
+
+    $ weatherreport --help
+    Usage: weatherreport [-c <path>] [-d <level>] [-e] [-h] [-l] [check_name ...]
+
+      -c, --etc                 Path to the CouchDB configuration directory
+      -d, --level               Minimum message severity level (default: notice)
+      -l, --list                Describe available diagnostic tasks
+      -e, --expert              Perform more detailed diagnostics
+      -h, --help                Display help/usage
+      check_name                A specific check to run
+
+To get an idea of what checks will be run, use the `--list` option:
+
+.. code-block:: bash
+
+    $ weatherreport --list
+    Available diagnostic checks:
+
+      custodian            Shard safety/liveness checks
+      disk                 Data directory permissions and atime
+      internal_replication Check the number of pending internal replication jobs
+      ioq                  Check the total number of active IOQ requests
+      mem3_sync            Check there is a registered mem3_sync process
+      membership           Cluster membership validity
+      memory_use           Measure memory usage
+      message_queues       Check for processes with large mailboxes
+      node_stats           Check useful erlang statistics for diagnostics
+      nodes_connected      Cluster node liveness
+      process_calls        Check for large numbers of processes with the same current/initial call
+      process_memory       Check for processes with high memory usage
+      safe_to_rebuild      Check whether the node can safely be taken out of service
+      search               Check the local search node is responsive
+      tcp_queues           Measure the length of tcp queues in the kernel
+
+If you want all the gory details about what WeatherReport is doing,
+you can run the checks at a more verbose logging level with
+the ``--level`` option:
+
+.. code-block:: bash
+
+    $ weatherreport --etc /path/to/etc -d debug

Review comment:
       The `--help` command (above) already shows both short and long argument flags, including the (admittedly odd):
   ```
         -d, --level               Minimum message severity level (default: notice)
   ```
   But I wonder if it's better to just avoid the short forms altogether in the docs? I've updated the text.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb-documentation] wohali commented on pull request #615: Port weatherreport README to cluster troubleshooting section

Posted by GitBox <gi...@apache.org>.
wohali commented on pull request #615:
URL: https://github.com/apache/couchdb-documentation/pull/615#issuecomment-809734084


   @jaydoane Oops, my bad. We can revert, but... is there a chance `weatherreport` would land for 3.2 (in the next ~2 weeks ish)? I will add it to the 3.2.0 milestone in the hopes it can get there.
   
   What support do you need for that to land?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb-documentation] flimzy commented on a change in pull request #615: Port weatherreport README to cluster troubleshooting section

Posted by GitBox <gi...@apache.org>.
flimzy commented on a change in pull request #615:
URL: https://github.com/apache/couchdb-documentation/pull/615#discussion_r558845978



##########
File path: src/cluster/troubleshooting.rst
##########
@@ -0,0 +1,115 @@
+.. Licensed under the Apache License, Version 2.0 (the "License"); you may not
+.. use this file except in compliance with the License. You may obtain a copy of
+.. the License at
+..
+..   http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+.. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+.. License for the specific language governing permissions and limitations under
+.. the License.
+
+.. _cluster/troubleshooting:
+
+============================================
+Troubleshooting CouchDB 3 with WeatherReport
+============================================
+
+.. _cluster/troubleshooting/overview:
+
+Overview
+========
+
+WeatherReport is an OTP application and set of tools that diagnoses
+common problems which could affect a CouchDB version 3 node or cluster
+(version 4 or later is not supported). It is accessed via the
+``weatherreport`` command line escript.
+
+Here is a basic example of using ``weatherreport`` followed immediately
+by the command's output:
+
+.. code-block:: bash
+
+    $ weatherreport --etc /path/to/etc
+    [warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down.
+
+.. _cluster/troubleshooting/usage:
+
+Usage
+=====
+
+For most cases, you can just run the ``weatherreport`` command as
+shown above.  However, sometimes you might want to know some extra
+detail, or run only specific checks. For that, there are command-line
+options. Execute ``weatherreport --help`` to learn more about these
+options:
+
+.. code-block:: bash
+
+    $ weatherreport --help
+    Usage: weatherreport [-c <path>] [-d <level>] [-e] [-h] [-l] [check_name ...]
+
+      -c, --etc                 Path to the CouchDB configuration directory
+      -d, --level               Minimum message severity level (default: notice)
+      -l, --list                Describe available diagnostic tasks
+      -e, --expert              Perform more detailed diagnostics
+      -h, --help                Display help/usage
+      check_name                A specific check to run
+
+To get an idea of what checks will be run, use the `--list` option:
+
+.. code-block:: bash
+
+    $ weatherreport --list
+    Available diagnostic checks:
+
+      custodian            Shard safety/liveness checks
+      disk                 Data directory permissions and atime
+      internal_replication Check the number of pending internal replication jobs
+      ioq                  Check the total number of active IOQ requests
+      mem3_sync            Check there is a registered mem3_sync process
+      membership           Cluster membership validity
+      memory_use           Measure memory usage
+      message_queues       Check for processes with large mailboxes
+      node_stats           Check useful erlang statistics for diagnostics
+      nodes_connected      Cluster node liveness
+      process_calls        Check for large numbers of processes with the same current/initial call
+      process_memory       Check for processes with high memory usage
+      safe_to_rebuild      Check whether the node can safely be taken out of service
+      search               Check the local search node is responsive
+      tcp_queues           Measure the length of tcp queues in the kernel
+
+If you want all the gory details about what WeatherReport is doing,
+you can run the checks at a more verbose logging level with
+the ``--level`` option:
+
+.. code-block:: bash
+
+    $ weatherreport --etc /path/to/etc -d debug

Review comment:
       The paragraph above mentions "the `--level` option`, but the example shows `-d`.  I guess `-d` is the short form of `--level`?  Would it be good to explain that, to avoid confusion?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb-documentation] jaydoane commented on a change in pull request #615: Port weatherreport README to cluster troubleshooting section

Posted by GitBox <gi...@apache.org>.
jaydoane commented on a change in pull request #615:
URL: https://github.com/apache/couchdb-documentation/pull/615#discussion_r558809322



##########
File path: src/cluster/troubleshooting.rst
##########
@@ -0,0 +1,114 @@
+.. Licensed under the Apache License, Version 2.0 (the "License"); you may not
+.. use this file except in compliance with the License. You may obtain a copy of
+.. the License at
+..
+..   http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+.. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+.. License for the specific language governing permissions and limitations under
+.. the License.
+
+.. _cluster/troubleshooting:
+
+=============================================
+Troubleshooting CouchDB 3 with Weather Report
+=============================================
+
+.. _cluster/troubleshooting/overview:
+
+Overview
+========
+
+``weatherreport`` is an escript and set of tools that diagnoses common
+problems which could affect a CouchDB version 3 node or cluster. It
+does not support version 4 or later.
+
+Here is a basic example of using ``weatherreport`` followed immediately
+by the command's output:
+
+.. code-block:: bash
+
+    $ weatherreport --etc /path/to/etc
+    [warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down.
+
+.. _cluster/troubleshooting/usage:
+
+Usage
+=====
+
+For most cases, you can just run the ``weatherreport`` command as
+shown above.  However, sometimes you might want to know some extra
+detail, or run only specific checks. For that, there are command-line
+options. Execute ``weatherreport --help`` to learn more about these
+options:
+
+.. code-block:: bash
+
+    $ weatherreport --help
+    Usage: weatherreport [-c <path>] [-d <level>] [-e] [-h] [-l] [check_name ...]
+
+      -c, --etc                 Path to the CouchDB configuration directory
+      -d, --level               Minimum message severity level (default: notice)
+      -l, --list                Describe available diagnostic tasks
+      -e, --expert              Perform more detailed diagnostics
+      -h, --help                Display help/usage
+      check_name                A specific check to run
+
+To get an idea of what checks will be run, use the `--list` option:
+
+.. code-block:: bash
+
+    $ weatherreport --list
+    Available diagnostic checks:
+
+      custodian            Shard safety/liveness checks
+      disk                 Data directory permissions and atime
+      internal_replication Check the number of pending internal replication jobs
+      ioq                  Check the total number of active IOQ requests
+      mem3_sync            Check there is a registered mem3_sync process
+      membership           Cluster membership validity
+      memory_use           Measure memory usage
+      message_queues       Check for processes with large mailboxes
+      node_stats           Check useful erlang statistics for diagnostics
+      nodes_connected      Cluster node liveness
+      process_calls        Check for large numbers of processes with the same current/initial call
+      process_memory       Check for processes with high memory usage
+      safe_to_rebuild      Check whether the node can safely be taken out of service
+      search               Check the local search node is responsive
+      tcp_queues           Measure the length of tcp queues in the kernel
+
+If you want all the gory details about what WeatherReport is doing,

Review comment:
       We're using `weatherreport` to refer to the actual command line executable, but you're right about the other inconsistency. I lean toward using WeatherReport as the name, but don't have a strong preference either. I've pushed a fixup that I hope helps clarify things 🤞 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb-documentation] jaydoane commented on a change in pull request #615: Port weatherreport README to cluster troubleshooting section

Posted by GitBox <gi...@apache.org>.
jaydoane commented on a change in pull request #615:
URL: https://github.com/apache/couchdb-documentation/pull/615#discussion_r557862096



##########
File path: src/cluster/troubleshooting.rst
##########
@@ -0,0 +1,113 @@
+.. Licensed under the Apache License, Version 2.0 (the "License"); you may not
+.. use this file except in compliance with the License. You may obtain a copy of
+.. the License at
+..
+..   http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+.. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+.. License for the specific language governing permissions and limitations under
+.. the License.
+
+.. _cluster/troubleshooting:
+
+====================================
+Trouble Shooting with Weather Report

Review comment:
       Thanks for the review! Fixed.

##########
File path: src/cluster/troubleshooting.rst
##########
@@ -0,0 +1,113 @@
+.. Licensed under the Apache License, Version 2.0 (the "License"); you may not
+.. use this file except in compliance with the License. You may obtain a copy of
+.. the License at
+..
+..   http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+.. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+.. License for the specific language governing permissions and limitations under
+.. the License.
+
+.. _cluster/troubleshooting:
+
+====================================
+Trouble Shooting with Weather Report
+====================================
+
+.. _cluster/troubleshooting/overview:
+
+Overview
+========
+
+``weatherreport`` is an escript and set of tools that diagnoses common
+problems which could affect a CouchDB node or cluster.
+
+Here is a basic example of using ``weatherreport`` followed immediately
+by the command's output:
+
+.. code-block:: bash
+
+    $ weatherreport --etc /path/to/etc
+    [warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down.
+
+.. _cluster/troubleshooting/usage:
+
+Usage
+=====
+
+For most cases, you can just run the ``weatherreport`` command as
+shown above.  However, sometimes you might want to know some extra
+detail, or run only specific checks. For that, there are command-line
+options. Execute ``weatherreport --help`` to learn more about these
+options:
+
+.. code-block:: bash
+
+    $ weatherreport --help
+    Usage: weatherreport [-c <path>] [-d <level>] [-e] [-h] [-l] [check_name ...]
+
+      -c, --etc                 Path to the CouchDB configuration directory
+      -d, --level               Minimum message severity level (default: notice)
+      -l, --list                Describe available diagnostic tasks
+      -e, --expert              Perform more detailed diagnostics
+      -h, --help                Display help/usage
+      check_name                A specific check to run
+
+To get an idea of what checks will be run, use the `--list` option:
+
+.. code-block:: bash
+
+    $ weatherreport --list
+    Available diagnostic checks:
+
+      custodian            Shard safety/liveness checks
+      disk                 Data directory permissions and atime
+      internal_replication Check the number of pending internal replication jobs
+      ioq                  Check the total number of active IOQ requests
+      mem3_sync            Check there is a registered mem3_sync process
+      membership           Cluster membership validity
+      memory_use           Measure memory usage
+      message_queues       Check for processes with large mailboxes
+      node_stats           Check useful erlang statistics for diagnostics
+      nodes_connected      Cluster node liveness
+      process_calls        Check for large numbers of processes with the same current/initial call
+      process_memory       Check for processes with high memory usage
+      safe_to_rebuild      Check whether the node can safely be taken out of service
+      search               Check the local search node is responsive
+      tcp_queues           Measure the length of tcp queues in the kernel
+
+If you want all the gory details about what WeatherReport is doing,
+you can run the checks at a more verbose logging level with
+the ``--level`` option:
+
+.. code-block:: bash
+
+    $ weatherreport --etc /path/to/etc -d debug
+    [debug] Not connected to the local cluster node, trying to connect. alive:false connect_failed:undefined
+    [debug] Starting distributed Erlang.
+    [debug] Connected to local cluster node 'node1@127.0.0.1'.
+    [debug] Local RPC: mem3:nodes([]) [5000]
+    [debug] Local RPC: os:getpid([]) [5000]
+    [debug] Running shell command: ps -o pmem,rss -p 73905
+    [debug] Shell command output:
+    %MEM    RSS
+    0.3  25116
+
+    [debug] Local RPC: erlang:nodes([]) [5000]
+    [debug] Local RPC: mem3:nodes([]) [5000]
+    [warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down.
+    [info] Process is using 0.3% of available RAM, totalling 25116 KB of real memory.
+
+Most times you'll want to use the defaults, but any Syslog severity

Review comment:
       Fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb-documentation] flimzy commented on a change in pull request #615: Port weatherreport README to cluster troubleshooting section

Posted by GitBox <gi...@apache.org>.
flimzy commented on a change in pull request #615:
URL: https://github.com/apache/couchdb-documentation/pull/615#discussion_r558172416



##########
File path: src/cluster/troubleshooting.rst
##########
@@ -0,0 +1,114 @@
+.. Licensed under the Apache License, Version 2.0 (the "License"); you may not
+.. use this file except in compliance with the License. You may obtain a copy of
+.. the License at
+..
+..   http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+.. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+.. License for the specific language governing permissions and limitations under
+.. the License.
+
+.. _cluster/troubleshooting:
+
+=============================================
+Troubleshooting CouchDB 3 with Weather Report
+=============================================
+
+.. _cluster/troubleshooting/overview:
+
+Overview
+========
+
+``weatherreport`` is an escript and set of tools that diagnoses common
+problems which could affect a CouchDB version 3 node or cluster. It
+does not support version 4 or later.
+
+Here is a basic example of using ``weatherreport`` followed immediately
+by the command's output:
+
+.. code-block:: bash
+
+    $ weatherreport --etc /path/to/etc
+    [warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down.
+
+.. _cluster/troubleshooting/usage:
+
+Usage
+=====
+
+For most cases, you can just run the ``weatherreport`` command as
+shown above.  However, sometimes you might want to know some extra
+detail, or run only specific checks. For that, there are command-line
+options. Execute ``weatherreport --help`` to learn more about these
+options:
+
+.. code-block:: bash
+
+    $ weatherreport --help
+    Usage: weatherreport [-c <path>] [-d <level>] [-e] [-h] [-l] [check_name ...]
+
+      -c, --etc                 Path to the CouchDB configuration directory
+      -d, --level               Minimum message severity level (default: notice)
+      -l, --list                Describe available diagnostic tasks
+      -e, --expert              Perform more detailed diagnostics
+      -h, --help                Display help/usage
+      check_name                A specific check to run
+
+To get an idea of what checks will be run, use the `--list` option:
+
+.. code-block:: bash
+
+    $ weatherreport --list
+    Available diagnostic checks:
+
+      custodian            Shard safety/liveness checks
+      disk                 Data directory permissions and atime
+      internal_replication Check the number of pending internal replication jobs
+      ioq                  Check the total number of active IOQ requests
+      mem3_sync            Check there is a registered mem3_sync process
+      membership           Cluster membership validity
+      memory_use           Measure memory usage
+      message_queues       Check for processes with large mailboxes
+      node_stats           Check useful erlang statistics for diagnostics
+      nodes_connected      Cluster node liveness
+      process_calls        Check for large numbers of processes with the same current/initial call
+      process_memory       Check for processes with high memory usage
+      safe_to_rebuild      Check whether the node can safely be taken out of service
+      search               Check the local search node is responsive
+      tcp_queues           Measure the length of tcp queues in the kernel
+
+If you want all the gory details about what WeatherReport is doing,

Review comment:
       I notice we alternately refer to this as "Weather Report", "WeatherReport" and "``weatherreport``".  Do we want to standardize on one spelling in normal prose?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb-documentation] jaydoane commented on a change in pull request #615: Port weatherreport README to cluster troubleshooting section

Posted by GitBox <gi...@apache.org>.
jaydoane commented on a change in pull request #615:
URL: https://github.com/apache/couchdb-documentation/pull/615#discussion_r558971973



##########
File path: src/cluster/troubleshooting.rst
##########
@@ -0,0 +1,115 @@
+.. Licensed under the Apache License, Version 2.0 (the "License"); you may not
+.. use this file except in compliance with the License. You may obtain a copy of
+.. the License at
+..
+..   http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+.. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+.. License for the specific language governing permissions and limitations under
+.. the License.
+
+.. _cluster/troubleshooting:
+
+============================================
+Troubleshooting CouchDB 3 with WeatherReport
+============================================
+
+.. _cluster/troubleshooting/overview:
+
+Overview
+========
+
+WeatherReport is an OTP application and set of tools that diagnoses
+common problems which could affect a CouchDB version 3 node or cluster
+(version 4 or later is not supported). It is accessed via the
+``weatherreport`` command line escript.
+
+Here is a basic example of using ``weatherreport`` followed immediately
+by the command's output:
+
+.. code-block:: bash
+
+    $ weatherreport --etc /path/to/etc
+    [warning] Cluster member node3@127.0.0.1 is not connected to this node. Please check whether it is down.
+
+.. _cluster/troubleshooting/usage:
+
+Usage
+=====
+
+For most cases, you can just run the ``weatherreport`` command as
+shown above.  However, sometimes you might want to know some extra
+detail, or run only specific checks. For that, there are command-line
+options. Execute ``weatherreport --help`` to learn more about these
+options:
+
+.. code-block:: bash
+
+    $ weatherreport --help
+    Usage: weatherreport [-c <path>] [-d <level>] [-e] [-h] [-l] [check_name ...]
+
+      -c, --etc                 Path to the CouchDB configuration directory
+      -d, --level               Minimum message severity level (default: notice)
+      -l, --list                Describe available diagnostic tasks
+      -e, --expert              Perform more detailed diagnostics
+      -h, --help                Display help/usage
+      check_name                A specific check to run
+
+To get an idea of what checks will be run, use the `--list` option:
+
+.. code-block:: bash
+
+    $ weatherreport --list
+    Available diagnostic checks:
+
+      custodian            Shard safety/liveness checks
+      disk                 Data directory permissions and atime
+      internal_replication Check the number of pending internal replication jobs
+      ioq                  Check the total number of active IOQ requests
+      mem3_sync            Check there is a registered mem3_sync process
+      membership           Cluster membership validity
+      memory_use           Measure memory usage
+      message_queues       Check for processes with large mailboxes
+      node_stats           Check useful erlang statistics for diagnostics
+      nodes_connected      Cluster node liveness
+      process_calls        Check for large numbers of processes with the same current/initial call
+      process_memory       Check for processes with high memory usage
+      safe_to_rebuild      Check whether the node can safely be taken out of service
+      search               Check the local search node is responsive
+      tcp_queues           Measure the length of tcp queues in the kernel
+
+If you want all the gory details about what WeatherReport is doing,
+you can run the checks at a more verbose logging level with
+the ``--level`` option:
+
+.. code-block:: bash
+
+    $ weatherreport --etc /path/to/etc -d debug

Review comment:
       The `--help` command (above) already shows both short and long argument flags, including the (admittedly odd):
   ```
         -d, --level               Minimum message severity level (default: notice)
   ```
   But I wonder if it's better to just avoid the short forms altogether in the docs?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org