You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2021/03/12 16:02:39 UTC

[GitHub] [couchdb] tonysun83 opened a new pull request #3416: Prometheus endpoint

tonysun83 opened a new pull request #3416:
URL: https://github.com/apache/couchdb/pull/3416


   ## Overview
   
   This implements a new `_prometheus` endpoint outlined by https://github.com/apache/couchdb/issues/3377
   
   Users can retrieve metrics info via `/_node/{node-name}/_prometheus`. 
   
   **Additional Capability not mentioned in the RFC**
   Some users may not want to use authentication
   for scraping metrics, so I've added a configurable option that allows scraping via a different port. 
   
   The output looks like this:
   ```
   # TYPE couchdb_couch_log_requests_total counter
   couchdb_couch_log_requests_total{level="alert"} 0
   couchdb_couch_log_requests_total{level="critical"} 0
   couchdb_couch_log_requests_total{level="debug"} 0
   couchdb_couch_log_requests_total{level="emergency"} 0
   couchdb_couch_log_requests_total{level="error"} 1
   couchdb_couch_log_requests_total{level="info"} 5
   couchdb_couch_log_requests_total{level="notice"} 10
   couchdb_couch_log_requests_total{level="warning"} 0
   # TYPE couchdb_couch_replicator_changes_manager_deaths_total counter
   couchdb_couch_replicator_changes_manager_deaths_total 0
   # TYPE couchdb_couch_replicator_changes_queue_deaths_total counter
   couchdb_couch_replicator_changes_queue_deaths_total 0
   # TYPE couchdb_couch_replicator_changes_read_failures_total counter
   couchdb_couch_replicator_changes_read_failures_total 0
   # TYPE couchdb_couch_replicator_changes_reader_deaths_total counter
   couchdb_couch_replicator_changes_reader_deaths_total 0
   # TYPE couchdb_couch_replicator_checkpoints_failure_total counter
   couchdb_couch_replicator_checkpoints_failure_total 0
   # TYPE couchdb_couch_replicator_checkpoints_total counter
   couchdb_couch_replicator_checkpoints_total 0
   # TYPE couchdb_couch_replicator_connection_acquires_total counter
   couchdb_couch_replicator_connection_acquires_total 0
   # TYPE couchdb_couch_replicator_connection_closes_total counter
   couchdb_couch_replicator_connection_closes_total 0
   # TYPE couchdb_couch_replicator_connection_creates_total counter
   couchdb_couch_replicator_connection_creates_total 0
   # TYPE couchdb_couch_replicator_connection_owner_crashes_total counter
   couchdb_couch_replicator_connection_owner_crashes_total 0
   # TYPE couchdb_couch_replicator_connection_releases_total counter
   couchdb_couch_replicator_connection_releases_total 0
   # TYPE couchdb_couch_replicator_connection_worker_crashes_total counter
   couchdb_couch_replicator_connection_worker_crashes_total 0
   # TYPE couchdb_couch_replicator_docs_completed_state_updates_total counter
   couchdb_couch_replicator_docs_completed_state_updates_total 0
   # TYPE couchdb_couch_replicator_docs_db_changes_total counter
   couchdb_couch_replicator_docs_db_changes_total 0
   # TYPE couchdb_couch_replicator_docs_dbs_created_total counter
   couchdb_couch_replicator_docs_dbs_created_total 0
   # TYPE couchdb_couch_replicator_docs_dbs_deleted_total counter
   couchdb_couch_replicator_docs_dbs_deleted_total 0
   # TYPE couchdb_couch_replicator_docs_failed_state_updates_total counter
   couchdb_couch_replicator_docs_failed_state_updates_total 0
   # TYPE couchdb_couch_replicator_failed_starts_total counter
   couchdb_couch_replicator_failed_starts_total 0
   # TYPE couchdb_couch_replicator_jobs_accepting gauge
   couchdb_couch_replicator_jobs_accepting 2
   # TYPE couchdb_couch_replicator_jobs_accepts_total counter
   couchdb_couch_replicator_jobs_accepts_total 16
   # TYPE couchdb_couch_replicator_jobs_adds_total counter
   couchdb_couch_replicator_jobs_adds_total 0
   # TYPE couchdb_couch_replicator_jobs_crashes_total counter
   couchdb_couch_replicator_jobs_crashes_total 0
   # TYPE couchdb_couch_replicator_jobs_pending gauge
   couchdb_couch_replicator_jobs_pending 0
   # TYPE couchdb_couch_replicator_jobs_removes_total counter
   couchdb_couch_replicator_jobs_removes_total 0
   # TYPE couchdb_couch_replicator_jobs_reschedules_total counter
   couchdb_couch_replicator_jobs_reschedules_total 2
   # TYPE couchdb_couch_replicator_jobs_running gauge
   couchdb_couch_replicator_jobs_running 0
   # TYPE couchdb_couch_replicator_jobs_starts_total counter
   couchdb_couch_replicator_jobs_starts_total 0
   # TYPE couchdb_couch_replicator_jobs_stops_total counter
   couchdb_couch_replicator_jobs_stops_total 0
   # TYPE couchdb_couch_replicator_requests_total counter
   couchdb_couch_replicator_requests_total 0
   # TYPE couchdb_couch_replicator_responses_failure_total counter
   couchdb_couch_replicator_responses_failure_total 0
   # TYPE couchdb_couch_replicator_responses_total counter
   couchdb_couch_replicator_responses_total 0
   # TYPE couchdb_couch_replicator_stream_responses_failure_total counter
   couchdb_couch_replicator_stream_responses_failure_total 0
   # TYPE couchdb_couch_replicator_stream_responses_total counter
   couchdb_couch_replicator_stream_responses_total 0
   # TYPE couchdb_couch_replicator_worker_deaths_total counter
   couchdb_couch_replicator_worker_deaths_total 0
   # TYPE couchdb_couch_replicator_workers_started_total counter
   couchdb_couch_replicator_workers_started_total 0
   # TYPE couchdb_auth_cache_requests_total counter
   couchdb_auth_cache_requests_total 0
   # TYPE couchdb_auth_cache_misses_total counter
   couchdb_auth_cache_misses_total 0
   # TYPE couchdb_collect_results_time_seconds summary
   couchdb_collect_results_time_seconds{quantile="0.5"} 0.0
   couchdb_collect_results_time_seconds{quantile="0.75"} 0.0
   couchdb_collect_results_time_seconds{quantile="0.9"} 0.0
   couchdb_collect_results_time_seconds{quantile="0.95"} 0.0
   couchdb_collect_results_time_seconds{quantile="0.99"} 0.0
   couchdb_collect_results_time_seconds{quantile="0.999"} 0.0
   couchdb_collect_results_time_seconds_sum 0.0
   couchdb_collect_results_time_seconds_count 0
   # TYPE couchdb_couch_server_lru_skip_total counter
   couchdb_couch_server_lru_skip_total 0
   # TYPE couchdb_database_purges_total counter
   couchdb_database_purges_total 0
   # TYPE couchdb_database_reads_total counter
   couchdb_database_reads_total 0
   # TYPE couchdb_database_writes_total counter
   couchdb_database_writes_total 0
   # TYPE couchdb_db_open_time_seconds summary
   couchdb_db_open_time_seconds{quantile="0.5"} 0.0
   couchdb_db_open_time_seconds{quantile="0.75"} 0.0
   couchdb_db_open_time_seconds{quantile="0.9"} 0.0
   couchdb_db_open_time_seconds{quantile="0.95"} 0.0
   couchdb_db_open_time_seconds{quantile="0.99"} 0.0
   couchdb_db_open_time_seconds{quantile="0.999"} 0.0
   couchdb_db_open_time_seconds_sum 0.0
   couchdb_db_open_time_seconds_count 0
   # TYPE couchdb_dbinfo_seconds summary
   couchdb_dbinfo_seconds{quantile="0.5"} 0.0
   couchdb_dbinfo_seconds{quantile="0.75"} 0.0
   couchdb_dbinfo_seconds{quantile="0.9"} 0.0
   couchdb_dbinfo_seconds{quantile="0.95"} 0.0
   couchdb_dbinfo_seconds{quantile="0.99"} 0.0
   couchdb_dbinfo_seconds{quantile="0.999"} 0.0
   couchdb_dbinfo_seconds_sum 0.0
   couchdb_dbinfo_seconds_count 0
   # TYPE couchdb_document_inserts_total counter
   couchdb_document_inserts_total 0
   # TYPE couchdb_document_purges_failure_total counter
   couchdb_document_purges_failure_total 0
   # TYPE couchdb_document_purges_success_total counter
   couchdb_document_purges_success_total 0
   .
   .
   .
   ```
   
   ## Testing recommendations
   To see metrics flowing, run:
   
   1)`brew install prometheus`
   2) Create a prometheus.yml file in a folder with:
   ```
   global:
     scrape_interval:     15s # By default, scrape targets every 15 seconds.
   
     # Attach these labels to any time series or alerts when communicating with
     # external systems (federation, remote storage, Alertmanager).
     external_labels:
       monitor: 'codelab-monitor'
   
   # A scrape configuration containing exactly one endpoint to scrape:
   # Here it's Prometheus itself.
   scrape_configs:
     # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
     - job_name: 'prometheus'
   
       # Override the global default and scrape targets from this job every 5 seconds.
       scrape_interval: 30s
   
       static_configs:
         - targets: ['localhost:15984']
       metrics_path: "/_node/_local/_prometheus"
       basic_auth:
         username: 'adm'
         password: 'pass'
   ```
   3) Launch your cluster with `./dev/run -a adm pass`
   4) Launch prometheus with `prometheus --config.file=prometheus.yml` 
   5) Open a browser to `http://localhost:9090/metrics` to view metrics data.
   For additional info check out https://prometheus.io/docs/introduction/first_steps/
   
   Additional Unit Tests will be added.
   
   ## Related Issues or Pull Requests
   
   <!-- If your changes affects multiple components in different
        repositories please put links to those issues or pull requests here.  -->
   
   ## Checklist
   
   - [ ] Code is written and works correctly
   - [ ] Changes are covered by tests
   - [ ] Any new configurable parameters are documented in `rel/overlay/etc/default.ini`
   - [ ] A PR for documentation changes has been made in https://github.com/apache/couchdb-documentation
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] bessbd commented on a change in pull request #3416: Prometheus endpoint

Posted by GitBox <gi...@apache.org>.
bessbd commented on a change in pull request #3416:
URL: https://github.com/apache/couchdb/pull/3416#discussion_r612273213



##########
File path: src/couch_prometheus/src/couch_prometheus_http.erl
##########
@@ -0,0 +1,89 @@
+-module(couch_prometheus_http).

Review comment:
       License header seems to be missing here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on pull request #3416: Prometheus endpoint

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on pull request #3416:
URL: https://github.com/apache/couchdb/pull/3416#issuecomment-809768501


   @bessbd  I'd like to get some consensus on this before writing E2E tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] rnewson commented on a change in pull request #3416: Prometheus endpoint

Posted by GitBox <gi...@apache.org>.
rnewson commented on a change in pull request #3416:
URL: https://github.com/apache/couchdb/pull/3416#discussion_r597809665



##########
File path: src/chttpd/src/chttpd_node.erl
##########
@@ -117,6 +117,12 @@ handle_node_req(#httpd{method='GET', path_parts=[_, Node, <<"_stats">> | Path]}=
     chttpd:send_json(Req, EJSON1);
 handle_node_req(#httpd{path_parts=[_, _Node, <<"_stats">>]}=Req) ->
     send_method_not_allowed(Req, "GET");
+handle_node_req(#httpd{method='GET', path_parts=[_, Node, <<"_prometheus">>]}=Req) ->
+    Metrics = call_node(Node, couch_prometheus_server, scrape, []),
+    Header = [{<<"Content-Type">>, <<"text/plain">>}],

Review comment:
       I think there's supposed to be a version parameter on this?

##########
File path: src/couch_prometheus/src/couch_prometheus_http.erl
##########
@@ -0,0 +1,79 @@
+-module(couch_prometheus_http).
+
+-compile(tuple_calls).
+
+-export([
+    start_link/0,
+    handle_request/1
+]).
+
+-include("couch_prometheus.hrl").
+
+start_link() ->
+    IP = case config:get("prometheus", "bind_address", "any") of
+        "any" -> any;
+        Else -> Else
+    end,
+    Port = config:get("prometheus", "port"),
+    ok = couch_httpd:validate_bind_address(IP),
+
+    Options = [
+        {name, ?MODULE},
+        {loop, fun ?MODULE:handle_request/1},
+        {ip, IP},
+        {port, Port}
+    ],
+    case mochiweb_http:start(Options) of
+        {ok, Pid} ->
+            {ok, Pid};
+        {error, Reason} ->
+            io:format("Failure to start Mochiweb: ~s~n", [Reason]),
+            {error, Reason}
+    end.
+
+handle_request(MochiReq) ->
+    RawUri = MochiReq:get(raw_path),
+    {"/" ++ Path, _, _} = mochiweb_util:urlsplit_path(RawUri),
+    PathParts =  string:tokens(Path, "/"),    try
+        case PathParts of
+            ["_node", Node, "_prometheus"] ->
+                send_prometheus(MochiReq, Node);
+            _ ->
+                send_resp(MochiReq, 404, [], <<>>)
+        end
+    catch T:R ->
+        Body = io_lib:format("~p:~p", [T, R]),
+        send_resp(MochiReq, 500, [], Body)

Review comment:
       we work hard elsewhere not to send ugly errors to the user, can we follow the chttpd:send_error pattern here?

##########
File path: src/couch_prometheus/src/couch_prometheus_sup.erl
##########
@@ -0,0 +1,39 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+%   http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_prometheus_sup).
+
+-behaviour(supervisor).
+
+-export([
+    start_link/0,
+    init/1
+]).
+
+-define(CHILD(I, Type), {I, {I, start_link, []}, permanent, 5000, Type, [I]}).
+
+start_link() ->
+    supervisor:start_link({local, ?MODULE}, ?MODULE, []).
+
+init([]) ->
+    {ok, {
+        {one_for_one, 5, 10}, [

Review comment:
       how were these numbers chosen? 

##########
File path: src/couch_prometheus/src/couch_prometheus_server.erl
##########
@@ -0,0 +1,167 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+%   http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_prometheus_server).
+
+-behaviour(gen_server).
+
+-import(couch_prometheus_util, [
+    couch_to_prom/3,
+    to_prom/3,
+    to_prom_summary/2
+]).
+
+-export([
+    start_link/0,
+    init/1,
+    handle_call/3,
+    handle_cast/2,
+    handle_info/2,
+    code_change/3,
+    terminate/2,
+
+    scrape/0

Review comment:
       can we keep the gen_server callbacks in a separate export block to public or private functions pls? we do that ~everywhere else.

##########
File path: src/couch_prometheus/src/couch_prometheus_http.erl
##########
@@ -0,0 +1,79 @@
+-module(couch_prometheus_http).
+
+-compile(tuple_calls).
+
+-export([
+    start_link/0,
+    handle_request/1
+]).
+
+-include("couch_prometheus.hrl").
+
+start_link() ->
+    IP = case config:get("prometheus", "bind_address", "any") of
+        "any" -> any;
+        Else -> Else
+    end,
+    Port = config:get("prometheus", "port"),
+    ok = couch_httpd:validate_bind_address(IP),
+
+    Options = [
+        {name, ?MODULE},
+        {loop, fun ?MODULE:handle_request/1},
+        {ip, IP},
+        {port, Port}
+    ],
+    case mochiweb_http:start(Options) of
+        {ok, Pid} ->
+            {ok, Pid};
+        {error, Reason} ->
+            io:format("Failure to start Mochiweb: ~s~n", [Reason]),
+            {error, Reason}
+    end.
+
+handle_request(MochiReq) ->
+    RawUri = MochiReq:get(raw_path),
+    {"/" ++ Path, _, _} = mochiweb_util:urlsplit_path(RawUri),
+    PathParts =  string:tokens(Path, "/"),    try
+        case PathParts of
+            ["_node", Node, "_prometheus"] ->
+                send_prometheus(MochiReq, Node);
+            _ ->
+                send_resp(MochiReq, 404, [], <<>>)
+        end
+    catch T:R ->
+        Body = io_lib:format("~p:~p", [T, R]),
+        send_resp(MochiReq, 500, [], Body)
+    end.
+
+send_prometheus(MochiReq, Node) ->
+    Headers = couch_httpd:server_header() ++ [
+        {<<"Content-Type">>, <<"text/plain">>}

Review comment:
       version param?

##########
File path: src/couch_prometheus/src/couch_prometheus_server.erl
##########
@@ -0,0 +1,167 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+%   http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_prometheus_server).
+
+-behaviour(gen_server).
+
+-import(couch_prometheus_util, [
+    couch_to_prom/3,
+    to_prom/3,
+    to_prom_summary/2
+]).
+
+-export([
+    start_link/0,
+    init/1,
+    handle_call/3,
+    handle_cast/2,
+    handle_info/2,
+    code_change/3,
+    terminate/2,
+
+    scrape/0
+]).
+
+-include("couch_prometheus.hrl").
+
+start_link() ->
+    gen_server:start_link({local, ?MODULE}, ?MODULE, [], []).
+
+-record(st, {
+    metrics,
+    refresh
+}).
+
+init([]) ->
+    Metrics = refresh_metrics(),
+    RT = update_refresh_timer(),
+    {ok, #st{metrics=Metrics, refresh=RT}}.
+
+scrape() ->
+    {ok, Metrics} = gen_server:call(?MODULE, scrape),

Review comment:
       will this ever take >5s?

##########
File path: src/couch_prometheus/src/couch_prometheus_sup.erl
##########
@@ -0,0 +1,39 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+%   http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_prometheus_sup).
+
+-behaviour(supervisor).
+
+-export([
+    start_link/0,
+    init/1
+]).
+
+-define(CHILD(I, Type), {I, {I, start_link, []}, permanent, 5000, Type, [I]}).
+
+start_link() ->
+    supervisor:start_link({local, ?MODULE}, ?MODULE, []).
+
+init([]) ->
+    {ok, {
+        {one_for_one, 5, 10}, [

Review comment:
       if the couch_prometheus_server process crashing should the couch_prometheus_http be restarted? (i.e, rest_for_one).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on a change in pull request #3416: Prometheus endpoint

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on a change in pull request #3416:
URL: https://github.com/apache/couchdb/pull/3416#discussion_r601992829



##########
File path: src/couch_prometheus/src/couch_prometheus_sup.erl
##########
@@ -0,0 +1,39 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+%   http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_prometheus_sup).
+
+-behaviour(supervisor).
+
+-export([
+    start_link/0,
+    init/1
+]).
+
+-define(CHILD(I, Type), {I, {I, start_link, []}, permanent, 5000, Type, [I]}).
+
+start_link() ->
+    supervisor:start_link({local, ?MODULE}, ?MODULE, []).
+
+init([]) ->
+    {ok, {
+        {one_for_one, 5, 10}, [

Review comment:
       The values were copied over directly from couch_stats without much consideration. I think I can make this either configurable or have higher numbers? 
   
   You'e right in that doesn't make sense for  `couch_prometheus_http` to exist without `couch_prometheus_server`, but does restarting really enforce that? I guess if `couch_prometheus_server` keeps crashing until it reaches RestartLimit and won't restart anymore, then we also don't have a lingering `couch_prometheus_http` process.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on a change in pull request #3416: Prometheus endpoint

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on a change in pull request #3416:
URL: https://github.com/apache/couchdb/pull/3416#discussion_r601821668



##########
File path: src/chttpd/src/chttpd_node.erl
##########
@@ -117,6 +117,12 @@ handle_node_req(#httpd{method='GET', path_parts=[_, Node, <<"_stats">> | Path]}=
     chttpd:send_json(Req, EJSON1);
 handle_node_req(#httpd{path_parts=[_, _Node, <<"_stats">>]}=Req) ->
     send_method_not_allowed(Req, "GET");
+handle_node_req(#httpd{method='GET', path_parts=[_, Node, <<"_prometheus">>]}=Req) ->
+    Metrics = call_node(Node, couch_prometheus_server, scrape, []),
+    Header = [{<<"Content-Type">>, <<"text/plain">>}],

Review comment:
       I don't think there is: https://en.wikipedia.org/wiki/List_of_HTTP_header_fields




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] bessbd commented on pull request #3416: Prometheus endpoint

Posted by GitBox <gi...@apache.org>.
bessbd commented on pull request #3416:
URL: https://github.com/apache/couchdb/pull/3416#issuecomment-812723352


   > @bessbd I'd like to get some consensus on this before writing E2E tests.
   
   Sure, that works too!
   For me personally it's easier to start with the tests when reviewing a PR. That's like a best description of what's happening in a change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on a change in pull request #3416: Prometheus endpoint

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on a change in pull request #3416:
URL: https://github.com/apache/couchdb/pull/3416#discussion_r612636568



##########
File path: src/couch_prometheus/src/couch_prometheus_http.erl
##########
@@ -0,0 +1,89 @@
+-module(couch_prometheus_http).
+
+-compile(tuple_calls).
+
+-export([
+    start_link/0,
+    handle_request/1
+]).
+
+-include("couch_prometheus.hrl").
+-include_lib("couch/include/couch_db.hrl").
+
+start_link() ->
+    IP = case config:get("prometheus", "bind_address", "any") of
+        "any" -> any;
+        Else -> Else
+    end,
+    Port = config:get("prometheus", "port"),
+    ok = couch_httpd:validate_bind_address(IP),
+
+    Options = [
+        {name, ?MODULE},
+        {loop, fun ?MODULE:handle_request/1},
+        {ip, IP},
+        {port, Port}
+    ],
+    case mochiweb_http:start(Options) of
+        {ok, Pid} ->
+            {ok, Pid};
+        {error, Reason} ->
+            io:format("Failure to start Mochiweb: ~s~n", [Reason]),
+            {error, Reason}
+    end.
+
+handle_request(MochiReq) ->
+    RawUri = MochiReq:get(raw_path),
+    {"/" ++ Path, _, _} = mochiweb_util:urlsplit_path(RawUri),
+    PathParts =  string:tokens(Path, "/"),    try
+        case PathParts of
+            ["_node", Node, "_prometheus"] ->
+                send_prometheus(MochiReq, Node);
+            _ ->
+                send_error(MochiReq, 404, <<"not_found">>, <<>>)
+        end
+    catch T:R ->

Review comment:
       it's just a fail-safe to print out any unknown errors
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] bessbd commented on pull request #3416: Prometheus endpoint

Posted by GitBox <gi...@apache.org>.
bessbd commented on pull request #3416:
URL: https://github.com/apache/couchdb/pull/3416#issuecomment-809524183


   @tonysun83 : do we want to have this in `3.x` too?
   Other: I couldn't find E2E tests for this PR. Are they in progress or did I miss something?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] rnewson commented on a change in pull request #3416: Prometheus endpoint

Posted by GitBox <gi...@apache.org>.
rnewson commented on a change in pull request #3416:
URL: https://github.com/apache/couchdb/pull/3416#discussion_r608075547



##########
File path: src/chttpd/src/chttpd_node.erl
##########
@@ -117,6 +117,12 @@ handle_node_req(#httpd{method='GET', path_parts=[_, Node, <<"_stats">> | Path]}=
     chttpd:send_json(Req, EJSON1);
 handle_node_req(#httpd{path_parts=[_, _Node, <<"_stats">>]}=Req) ->
     send_method_not_allowed(Req, "GET");
+handle_node_req(#httpd{method='GET', path_parts=[_, Node, <<"_prometheus">>]}=Req) ->
+    Metrics = call_node(Node, couch_prometheus_server, scrape, []),
+    Header = [{<<"Content-Type">>, <<"text/plain">>}],

Review comment:
       I'm referring to https://prometheus.io/docs/instrumenting/exposition_formats/
   
   `HTTP Content-Type | text/plain; version=0.0.4 (A missing version value will lead to a fall-back to the most recent text format version.)`
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 edited a comment on pull request #3416: Prometheus endpoint

Posted by GitBox <gi...@apache.org>.
tonysun83 edited a comment on pull request #3416:
URL: https://github.com/apache/couchdb/pull/3416#issuecomment-809768501


   @bessbd  I'd like to get some consensus on this before writing E2E tests. I think the goal is probably just for 4.x for now, but let's see how the discussion develops


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] chewbranca commented on pull request #3416: Prometheus endpoint

Posted by GitBox <gi...@apache.org>.
chewbranca commented on pull request #3416:
URL: https://github.com/apache/couchdb/pull/3416#issuecomment-802988940


   Does this need to be a stateful application? Rather than having a dedicated app hierarchy here, can we just add a `_prometheus` endpoint to https://github.com/apache/couchdb/blob/main/src/chttpd/src/chttpd_node.erl#L121-L124 
   
   that then does the conversion over https://github.com/apache/couchdb/blob/main/src/chttpd/src/chttpd_node.erl#L206-L236
   
   We could avoid having the server that does the caching and just do it on demand when the request comes in.
   
   I'm also not a huge fan of introducing an additional http server to the mix. Is supporting unauthenticated requests to the Prometheus endpoint a necessary feature?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on pull request #3416: Prometheus endpoint

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on pull request #3416:
URL: https://github.com/apache/couchdb/pull/3416#issuecomment-807907979


   @chewbranca 
   
   Those were my exact thoughts when I was writing this up as having an app/gen_server was a bit of overkill. The bulk of the code is in the wrapper which does the conversion so it wouldn't make sense to add the extra supervision tree. The two reasons of adding it however were 1) We're going to add _active_task info as well later on and I didn't want to have a huge chunk of code inside chttpd_node to combine 3 different sources of stats info (stats, system, active_tasks) as it'll get messier.  2) The addition of the mochiweb_server was the deciding factor as now we needed a separate process to linger around. Even though it's optional, I'd rather not have a non-supervised mochiweb server lingering in the vm.
   
   The reason for the mochiweb_server was to bypass the need to have a username/password in certain situations. One of those situations that arose was the need to store credentials inside a Prometheus config file. It was an internal scrape but forcing the storage of credentials meant additional security parameters in place such that the credentials would not leak. It would be more secure in that scenario not to use credentials. Thus, it's configurable and OFF by default. Would love some additional thoughts on this cc @janl @wohali.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 merged pull request #3416: Prometheus endpoint

Posted by GitBox <gi...@apache.org>.
tonysun83 merged pull request #3416:
URL: https://github.com/apache/couchdb/pull/3416


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] bessbd commented on a change in pull request #3416: Prometheus endpoint

Posted by GitBox <gi...@apache.org>.
bessbd commented on a change in pull request #3416:
URL: https://github.com/apache/couchdb/pull/3416#discussion_r612272131



##########
File path: rel/overlay/etc/default.ini
##########
@@ -884,3 +884,8 @@ compaction = false
 
 ; The interval in seconds of how often the expiration check runs.
 ;cache_expiration_check_sec = 10
+
+[prometheus]
+additional_port = true
+bind_address = 127.0.0.1
+port = {{prometheus_port}}

Review comment:
       Nit: newline seems missing here.

##########
File path: src/couch_prometheus/src/couch_prometheus_http.erl
##########
@@ -0,0 +1,89 @@
+-module(couch_prometheus_http).

Review comment:
       Nit: license header missing?

##########
File path: src/couch_prometheus/src/couch_prometheus_http.erl
##########
@@ -0,0 +1,89 @@
+-module(couch_prometheus_http).
+
+-compile(tuple_calls).
+
+-export([
+    start_link/0,
+    handle_request/1
+]).
+
+-include("couch_prometheus.hrl").
+-include_lib("couch/include/couch_db.hrl").
+
+start_link() ->
+    IP = case config:get("prometheus", "bind_address", "any") of
+        "any" -> any;
+        Else -> Else
+    end,
+    Port = config:get("prometheus", "port"),
+    ok = couch_httpd:validate_bind_address(IP),
+
+    Options = [
+        {name, ?MODULE},
+        {loop, fun ?MODULE:handle_request/1},
+        {ip, IP},
+        {port, Port}
+    ],
+    case mochiweb_http:start(Options) of
+        {ok, Pid} ->
+            {ok, Pid};
+        {error, Reason} ->
+            io:format("Failure to start Mochiweb: ~s~n", [Reason]),
+            {error, Reason}
+    end.
+
+handle_request(MochiReq) ->
+    RawUri = MochiReq:get(raw_path),
+    {"/" ++ Path, _, _} = mochiweb_util:urlsplit_path(RawUri),
+    PathParts =  string:tokens(Path, "/"),    try

Review comment:
       Is the `try` on the same line on purpose here?

##########
File path: src/couch_prometheus/test/eunit/couch_prometheus_e2e_tests.erl
##########
@@ -0,0 +1,152 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+%   http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_prometheus_e2e_tests).
+
+-include_lib("couch/include/couch_eunit.hrl").
+-include_lib("couch/include/couch_db.hrl").
+
+-define(USER, "prometheus_test_admin").
+-define(PASS, "pass").
+-define(AUTH, {basic_auth, {?USER, ?PASS}}).
+-define(PROM_PORT, "17986").
+-define(CONTENT_JSON, {"Content-Type", "application/json"}).
+
+start() ->
+    test_util:start_couch([chttpd, couch_prometheus]).
+
+setup() ->
+    Hashed = couch_passwords:hash_admin_password(?PASS),
+    ok = config:set("admins", ?USER, ?b2l(Hashed), _Persist=false),
+    ok = config:set_integer("stats", "interval", 2),
+    ok = config:set_integer("couch_prometheus", "interval", 1),
+    Port = mochiweb_socket_server:get(chttpd, port),
+    construct_url(Port).
+
+teardown(_) ->
+    ok.
+
+couch_prometheus_e2e_test_() ->
+    {
+        "Prometheus E2E Tests",
+        {
+            setup,
+            fun start/0, fun test_util:stop_couch/1,
+            {
+                foreach,
+                fun setup/0, fun teardown/1,
+                [
+                    fun node_call_chttpd/1,
+                    fun node_call_prometheus_http/1,
+                    fun deny_prometheus_http/1,
+                    fun node_see_updated_metrics/1
+                ]
+            }
+        }
+    }.
+
+% normal chttpd path via cluster port
+node_call_chttpd(Url) ->
+   {ok, RC1, _, _} = test_request:get(
+            Url,
+            [?CONTENT_JSON, ?AUTH],
+            []
+        ),
+    ?_assertEqual(200, RC1).
+
+% normal chttpd path via cluster port
+node_see_updated_metrics(Url) ->
+    TmpDb = ?tempdb(),
+    Addr = config:get("chttpd", "bind_address", "127.0.0.1"),
+    Port = mochiweb_socket_server:get(chttpd, port),
+    DbUrl = lists:concat(["http://", Addr, ":", Port, "/", ?b2l(TmpDb)]),
+    create_db(DbUrl),
+    [create_doc(DbUrl, "testdoc" ++ integer_to_binary(I)) || I <- lists:seq(1, 100)],
+    delete_db(DbUrl),
+    InitMetrics = wait_for_metrics(Url, "couchdb_httpd_requests_total 0", 5000),
+    UpdatedMetrics = wait_for_metrics(Url, "couchdb_httpd_requests_total", 10000),
+    % since the puts happen so fast, we can't have an exact
+    % total requests given the scraping interval. so we just want to acknowledge
+    % a change as occurred
+    ?_assertNotEqual(InitMetrics, UpdatedMetrics).
+
+% normal chttpd path via cluster port
+node_call_prometheus_http(_) ->
+    maybe_start_http_server("true"),
+    Url = construct_url(?PROM_PORT),
+    {ok, RC1, _, _} = test_request:get(
+            Url,
+            [?CONTENT_JSON, ?AUTH],
+            []
+        ),
+    % since this port doesn't require auth, this should work
+    {ok, RC2, _, _} = test_request:get(
+            Url,
+            [?CONTENT_JSON],
+            []
+        ),
+    delete_db(Url),
+    ?_assertEqual(200, RC2).
+
+% we don't start the http server
+deny_prometheus_http(_) ->
+    maybe_start_http_server("false"),
+    Url = construct_url(?PROM_PORT),
+    Response = test_request:get(
+            Url,
+            [?CONTENT_JSON, ?AUTH],
+            []
+        ),
+    ?_assertEqual({error,{conn_failed,{error,econnrefused}}}, Response).
+
+maybe_start_http_server(Additional) ->
+    test_util:stop_applications([couch_prometheus, chttpd]),
+    Hashed = couch_passwords:hash_admin_password(?PASS),
+    ok = config:set("admins", ?USER, ?b2l(Hashed), _Persist=false),
+    ok = config:set("prometheus", "additional_port", Additional),
+    ok = config:set("prometheus", "port", ?PROM_PORT),
+    test_util:start_applications([couch_prometheus, chttpd]).
+
+construct_url(Port) ->
+    Addr = config:get("chttpd", "bind_address", "127.0.0.1"),
+    lists:concat(["http://", Addr, ":", Port, "/_node/_local/_prometheus"]).
+
+create_db(Url) ->
+    Addr = config:get("chttpd", "bind_address", "127.0.0.1"),
+    Port = mochiweb_socket_server:get(chttpd, port),
+    {ok, Status, _, _} = test_request:put(Url, [?CONTENT_JSON, ?AUTH], "{}"),
+    case Status of
+        201 -> ok;
+        202 -> ok;
+        _ -> io:format(user, "~n HTTP Status Code: ~p~n", [Status])
+    end,

Review comment:
       These lines can probably be removed.

##########
File path: src/couch_prometheus/src/couch_prometheus_http.erl
##########
@@ -0,0 +1,89 @@
+-module(couch_prometheus_http).
+
+-compile(tuple_calls).
+
+-export([
+    start_link/0,
+    handle_request/1
+]).
+
+-include("couch_prometheus.hrl").
+-include_lib("couch/include/couch_db.hrl").
+
+start_link() ->
+    IP = case config:get("prometheus", "bind_address", "any") of
+        "any" -> any;
+        Else -> Else
+    end,
+    Port = config:get("prometheus", "port"),
+    ok = couch_httpd:validate_bind_address(IP),
+
+    Options = [
+        {name, ?MODULE},
+        {loop, fun ?MODULE:handle_request/1},
+        {ip, IP},
+        {port, Port}
+    ],
+    case mochiweb_http:start(Options) of
+        {ok, Pid} ->
+            {ok, Pid};
+        {error, Reason} ->
+            io:format("Failure to start Mochiweb: ~s~n", [Reason]),
+            {error, Reason}
+    end.
+
+handle_request(MochiReq) ->
+    RawUri = MochiReq:get(raw_path),
+    {"/" ++ Path, _, _} = mochiweb_util:urlsplit_path(RawUri),
+    PathParts =  string:tokens(Path, "/"),    try
+        case PathParts of
+            ["_node", Node, "_prometheus"] ->
+                send_prometheus(MochiReq, Node);
+            _ ->
+                send_error(MochiReq, 404, <<"not_found">>, <<>>)
+        end
+    catch T:R ->

Review comment:
       What kind of errors are possible here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on a change in pull request #3416: Prometheus endpoint

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on a change in pull request #3416:
URL: https://github.com/apache/couchdb/pull/3416#discussion_r601823172



##########
File path: src/couch_prometheus/src/couch_prometheus_server.erl
##########
@@ -0,0 +1,167 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+%   http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_prometheus_server).
+
+-behaviour(gen_server).
+
+-import(couch_prometheus_util, [
+    couch_to_prom/3,
+    to_prom/3,
+    to_prom_summary/2
+]).
+
+-export([
+    start_link/0,
+    init/1,
+    handle_call/3,
+    handle_cast/2,
+    handle_info/2,
+    code_change/3,
+    terminate/2,
+
+    scrape/0
+]).
+
+-include("couch_prometheus.hrl").
+
+start_link() ->
+    gen_server:start_link({local, ?MODULE}, ?MODULE, [], []).
+
+-record(st, {
+    metrics,
+    refresh
+}).
+
+init([]) ->
+    Metrics = refresh_metrics(),
+    RT = update_refresh_timer(),
+    {ok, #st{metrics=Metrics, refresh=RT}}.
+
+scrape() ->
+    {ok, Metrics} = gen_server:call(?MODULE, scrape),

Review comment:
       Only if the message queue is super backed up, but if that's the case, an asynchronous call wouldn't matter either given the queue. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [couchdb] tonysun83 commented on a change in pull request #3416: Prometheus endpoint

Posted by GitBox <gi...@apache.org>.
tonysun83 commented on a change in pull request #3416:
URL: https://github.com/apache/couchdb/pull/3416#discussion_r601992829



##########
File path: src/couch_prometheus/src/couch_prometheus_sup.erl
##########
@@ -0,0 +1,39 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+%   http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(couch_prometheus_sup).
+
+-behaviour(supervisor).
+
+-export([
+    start_link/0,
+    init/1
+]).
+
+-define(CHILD(I, Type), {I, {I, start_link, []}, permanent, 5000, Type, [I]}).
+
+start_link() ->
+    supervisor:start_link({local, ?MODULE}, ?MODULE, []).
+
+init([]) ->
+    {ok, {
+        {one_for_one, 5, 10}, [

Review comment:
       The values were copied over directly from couch_stats without much consideration. I think I can make this either configurable or have higher numbers? 
   
   You'e right in that doesn't make sense for  `couch_prometheus_http` to exist without `couch_prometheus_server`, but does restarting really enforce that? I guess if `couch_prometheus_server` keeps crashing till it doesn't crash anymore, then we don't start `couch_prometheus_http`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org