You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@skywalking.apache.org by ke...@apache.org on 2021/08/12 02:51:25 UTC

[skywalking-python] branch master updated: Remove HTTP basic auth credentials from log, stacktrace, segment (#152)

This is an automated email from the ASF dual-hosted git repository.

kezhenxu94 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/skywalking-python.git


The following commit(s) were added to refs/heads/master by this push:
     new 9abe90d  Remove HTTP basic auth credentials from log, stacktrace, segment (#152)
9abe90d is described below

commit 9abe90d80512075eb4db6ee3960f498dbdbd543b
Author: Yihao Chen <Su...@outlook.com>
AuthorDate: Thu Aug 12 10:51:18 2021 +0800

    Remove HTTP basic auth credentials from log, stacktrace, segment (#152)
---
 CHANGELOG.md                      |  8 ++++++-
 docs/EnvVars.md                   |  1 +
 docs/LogReporter.md               | 23 +++++++++++++-------
 skywalking/agent/__init__.py      |  8 +++----
 skywalking/config.py              | 16 +++++++-------
 skywalking/log/formatter.py       | 38 ++++++++++++++++++++++++++++++++
 skywalking/log/sw_logging.py      | 28 +++++++++++++-----------
 skywalking/plugins/sw_requests.py |  6 ++---
 skywalking/plugins/sw_urllib3.py  |  6 ++---
 skywalking/trace/span.py          |  4 ++--
 skywalking/utils/filter.py        | 46 +++++++++++++++++++++++++++++++++++++++
 11 files changed, 142 insertions(+), 42 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index f7e2b27..ce8e32b 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,9 +2,15 @@
 
 ### 0.7.0
 
-- New plugins
+- Feature:
+    - Support collecting and reporting logs to backend (#147)
+
+- New plugins:
     - Falcon Plugin (#146)
 
+- Fixes:
+    - Now properly removes HTTP basic auth credentials from segments and logs (#152)
+ 
 ### 0.6.0
 
 - Fixes:
diff --git a/docs/EnvVars.md b/docs/EnvVars.md
index ac78283..e128c9c 100644
--- a/docs/EnvVars.md
+++ b/docs/EnvVars.md
@@ -38,3 +38,4 @@ Environment Variable | Description | Default
 | `SW_AGENT_LOG_IGNORE_FILTER` | This config customizes whether to ignore the application-defined logger filters, if `True`, all logs are reported disregarding any filter rules. | `False` |
 | `SW_AGENT_LOG_REPORTER_FORMATTED` | If `True`, the log reporter will transmit the logs as formatted. Otherwise, puts logRecord.msg and logRecord.args into message content and tags(`argument.n`), respectively. Along with an `exception` tag if an exception was raised. | `True` |
 | `SW_AGENT_LOG_REPORTER_LAYOUT` | The log reporter formats the logRecord message based on the layout given. | `%(asctime)s [%(threadName)s] %(levelname)s %(name)s - %(message)s` |
+| `SW_AGENT_CAUSE_EXCEPTION_DEPTH` | This config limits agent to report up to `limit` stacktrace, please refer to [Python traceback](https://docs.python.org/3/library/traceback.html#traceback.print_tb) for more explanations. | `5` | 
diff --git a/docs/LogReporter.md b/docs/LogReporter.md
index b8e7937..9fc0b97 100644
--- a/docs/LogReporter.md
+++ b/docs/LogReporter.md
@@ -9,13 +9,13 @@ To utilize this feature, you will need to add some new configurations to the age
 from skywalking import agent, config
 
 config.init(collector_address='127.0.0.1:11800', service_name='your awesome service',
-                log_grpc_reporter_active=True)
+                log_reporter_active=True)
 agent.start()
 ``` 
 
-`log_grpc_reporter_active=True` - Enables the log reporter.
+`log_reporter_active=True` - Enables the log reporter.
 
-`log_grpc_reporter_max_buffer_size` - The maximum queue backlog size for sending log data to backend, logs beyond this are silently dropped.
+`log_reporter_max_buffer_size` - The maximum queue backlog size for sending log data to backend, logs beyond this are silently dropped.
 
 Alternatively, you can pass configurations through environment variables. 
 Please refer to [EnvVars.md](EnvVars.md) for the list of environment variables associated with the log reporter.
@@ -24,7 +24,7 @@ Please refer to [EnvVars.md](EnvVars.md) for the list of environment variables a
 Only the logs with a level equal to or higher than the specified will be collected and reported. 
 In other words, the agent ignores some unwanted logs based on your level threshold.
 
-`log_grpc_reporter_level` - The string name of a logger level. 
+`log_reporter_level` - The string name of a logger level. 
 
 Note that it also works with your custom logger levels, simply specify its string name in the config.
 
@@ -40,7 +40,7 @@ class AppFilter(logging.Filter):
 
 logger.addFilter(AppFilter())
 ```
-However, if you do would like to report those filtered logs, set the `log_grpc_reporter_ignore_filter` to `True`.
+However, if you do would like to report those filtered logs, set the `log_reporter_ignore_filter` to `True`.
 
 
 ## Formatting
@@ -51,20 +51,27 @@ Note that regardless of the formatting, Python agent will always report the foll
 `logger` - the logger name  
 
 `thread` - the thread name
+
+### Limit stacktrace depth
+You can set the `cause_exception_depth` config entry to a desired level(defaults to 5), which limits the output depth of exception stacktrace in reporting.
+
+This config limits agent to report up to `limit` stacktrace, please refer to [Python traceback](https://docs.python.org/3/library/traceback.html#traceback.print_tb) for more explanations.
+
 ### Customize the reported log format
 You can choose to report collected logs in a custom layout.
 
-If not set, the agent uses the layout below by default, else the agent uses your custom layout set in `log_grpc_reporter_layout`.
+If not set, the agent uses the layout below by default, else the agent uses your custom layout set in `log_reporter_layout`.
 
 `'%(asctime)s [%(threadName)s] %(levelname)s %(name)s - %(message)s'`
 
-If the layout is set to `None`, the reported log content will only contain the pre-formatted `LogRecord.message`(`msg % args`) without any additional styles, information or extra fields.
+If the layout is set to `None`, the reported log content will only contain 
+the pre-formatted `LogRecord.message`(`msg % args`) without any additional styles or extra fields, stacktrace will be attached if an exception was raised. 
 
 ### Transmit un-formatted logs
 You can also choose to report the log messages without any formatting.
 It separates the raw log msg `logRecord.msg` and `logRecord.args`, then puts them into message content and tags starting from `argument.0`, respectively, along with an `exception` tag if an exception was raised.
 
-Note when you set `log_grpc_reporter_formatted` to False, it ignores your custom layout introduced above.
+Note when you set `log_reporter_formatted` to False, it ignores your custom layout introduced above.
 
 As an example, the following code:
 ```python
diff --git a/skywalking/agent/__init__.py b/skywalking/agent/__init__.py
index 0703b10..97cd77c 100644
--- a/skywalking/agent/__init__.py
+++ b/skywalking/agent/__init__.py
@@ -99,8 +99,8 @@ def __init_threading():
     __report_thread.start()
     __command_dispatch_thread.start()
 
-    if config.log_grpc_reporter_active:
-        __log_queue = Queue(maxsize=config.log_grpc_reporter_max_buffer_size)
+    if config.log_reporter_active:
+        __log_queue = Queue(maxsize=config.log_reporter_max_buffer_size)
         __log_report_thread = Thread(name='LogReportThread', target=__report_log, daemon=True)
         __log_report_thread.start()
 
@@ -122,7 +122,7 @@ def __init():
         __protocol = KafkaProtocol()
 
     plugins.install()
-    if config.log_grpc_reporter_active:  # todo - Add support for printing traceID/ context in logs
+    if config.log_reporter_active:  # todo - Add support for printing traceID/ context in logs
         from skywalking import log
         log.install()
 
@@ -132,7 +132,7 @@ def __init():
 def __fini():
     __protocol.report(__queue, False)
     __queue.join()
-    if config.log_grpc_reporter_active:
+    if config.log_reporter_active:
         __protocol.report_log(__log_queue, False)
         __log_queue.join()
     __finished.set()
diff --git a/skywalking/config.py b/skywalking/config.py
index 88235e5..d8149e0 100644
--- a/skywalking/config.py
+++ b/skywalking/config.py
@@ -68,17 +68,17 @@ profile_active = True if os.getenv('SW_AGENT_PROFILE_ACTIVE') and \
 profile_task_query_interval = int(os.getenv('SW_PROFILE_TASK_QUERY_INTERVAL') or '20')
 
 # NOTE - Log reporting requires a separate channel, will merge in the future.
-log_grpc_reporter_active = True if os.getenv('SW_AGENT_LOG_REPORTER_ACTIVE') and \
-                         os.getenv('SW_AGENT_LOG_REPORTER_ACTIVE') == 'True' else False  # type: bool
-log_grpc_reporter_max_buffer_size = int(os.getenv('SW_AGENT_LOG_REPORTER_BUFFER_SIZE') or '10000')  # type: int
-log_grpc_reporter_level = os.getenv('SW_AGENT_LOG_REPORTER_LEVEL') or 'WARNING'  # type: str
-log_grpc_reporter_ignore_filter = True if os.getenv('SW_AGENT_LOG_IGNORE_FILTER') and \
+log_reporter_active = True if os.getenv('SW_AGENT_LOG_REPORTER_ACTIVE') and \
+                              os.getenv('SW_AGENT_LOG_REPORTER_ACTIVE') == 'True' else False  # type: bool
+log_reporter_max_buffer_size = int(os.getenv('SW_AGENT_LOG_REPORTER_BUFFER_SIZE') or '10000')  # type: int
+log_reporter_level = os.getenv('SW_AGENT_LOG_REPORTER_LEVEL') or 'WARNING'  # type: str
+log_reporter_ignore_filter = True if os.getenv('SW_AGENT_LOG_IGNORE_FILTER') and \
                          os.getenv('SW_AGENT_LOG_REPORTER_FORMATTED') == 'True' else False  # type: bool
-log_grpc_reporter_formatted = False if os.getenv('SW_AGENT_LOG_REPORTER_FORMATTED') and \
+log_reporter_formatted = False if os.getenv('SW_AGENT_LOG_REPORTER_FORMATTED') and \
                          os.getenv('SW_AGENT_LOG_REPORTER_FORMATTED') == 'False' else True  # type: bool
-log_grpc_reporter_layout = os.getenv('SW_AGENT_LOG_REPORTER_LAYOUT') or \
+log_reporter_layout = os.getenv('SW_AGENT_LOG_REPORTER_LAYOUT') or \
                         '%(asctime)s [%(threadName)s] %(levelname)s %(name)s - %(message)s'  # type: str
-
+cause_exception_depth = int(os.getenv('SW_AGENT_CAUSE_EXCEPTION_DEPTH') or '5')  # type: int
 
 options = {key for key in globals() if key not in options}  # THIS MUST FOLLOW DIRECTLY AFTER LIST OF CONFIG OPTIONS!
 
diff --git a/skywalking/log/formatter.py b/skywalking/log/formatter.py
new file mode 100644
index 0000000..c67df1e
--- /dev/null
+++ b/skywalking/log/formatter.py
@@ -0,0 +1,38 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import io
+import logging
+import traceback
+
+
+class SWFormatter(logging.Formatter):
+    """ A slightly modified formatter that allows traceback depth """
+
+    def __init__(self, fmt, tb_limit):
+        logging.Formatter.__init__(self, fmt)
+        self.tb_limit = tb_limit
+
+    def formatException(self, ei):
+        sio = io.StringIO()
+        tb = ei[2]
+        traceback.print_exception(ei[0], ei[1], tb, self.tb_limit, sio)
+        s = sio.getvalue()
+        sio.close()
+        if s[-1:] == "\n":
+            s = s[:-1]
+        return s
diff --git a/skywalking/log/sw_logging.py b/skywalking/log/sw_logging.py
index 3c8380b..dd87a57 100644
--- a/skywalking/log/sw_logging.py
+++ b/skywalking/log/sw_logging.py
@@ -22,17 +22,19 @@ from skywalking.protocol.logging.Logging_pb2 import LogData, LogDataBody, TraceC
 
 from skywalking import config, agent
 from skywalking.trace.context import get_context
+from skywalking.utils.filter import sw_traceback, sw_filter
 
 
 def install():
-    from logging import Logger, Formatter
+    from logging import Logger
 
-    layout = config.log_grpc_reporter_layout  # type: str
+    layout = config.log_reporter_layout  # type: str
     if layout:
-        formatter = Formatter(fmt=layout)
+        from skywalking.log.formatter import SWFormatter
+        formatter = SWFormatter(fmt=layout, tb_limit=config.cause_exception_depth)
 
     _handle = Logger.handle
-    log_reporter_level = logging.getLevelName(config.log_grpc_reporter_level)  # type: int
+    log_reporter_level = logging.getLevelName(config.log_reporter_level)  # type: int
 
     def _sw_handle(self, record):
         if record.name == "skywalking":  # Ignore SkyWalking internal logger
@@ -41,7 +43,7 @@ def install():
         if record.levelno < log_reporter_level:
             return _handle(self, record)
 
-        if not config.log_grpc_reporter_ignore_filter and not self.filter(record):  # ignore filtered logs
+        if not config.log_reporter_ignore_filter and not self.filter(record):  # ignore filtered logs
             return _handle(self, record)  # return handle to original if record is vetoed, just to be safe
 
         def build_log_tags() -> LogTags:
@@ -53,15 +55,16 @@ def install():
             l_tags = LogTags()
             l_tags.data.extend(core_tags)
 
-            if config.log_grpc_reporter_formatted:
+            if config.log_reporter_formatted:
                 return l_tags
 
             for i, arg in enumerate(record.args):
                 l_tags.data.append(KeyStringValuePair(key='argument.' + str(i), value=str(arg)))
 
             if record.exc_info:
-                l_tags.data.append(KeyStringValuePair(key='exception', value=str(record.exc_info)))
-
+                l_tags.data.append(KeyStringValuePair(key='exception',
+                                                      value=sw_traceback()
+                                                      ))  # \n doesn't work in tags for UI
             return l_tags
 
         context = get_context()
@@ -73,7 +76,7 @@ def install():
             body=LogDataBody(
                 type='text',
                 text=TextLog(
-                    text=transform(record)
+                    text=sw_filter(transform(record))
                 )
             ),
             traceContext=TraceContext(
@@ -90,9 +93,8 @@ def install():
     Logger.handle = _sw_handle
 
     def transform(record) -> str:
-        if config.log_grpc_reporter_formatted:
+        if config.log_reporter_formatted:
             if layout:
                 return formatter.format(record=record)
-            return record.getMessage()
-
-        return record.msg
+            return record.getMessage() + '\n' + sw_traceback()
+        return str(record.msg)  # convert possible exception to str
diff --git a/skywalking/plugins/sw_requests.py b/skywalking/plugins/sw_requests.py
index a6604f9..d44c837 100644
--- a/skywalking/plugins/sw_requests.py
+++ b/skywalking/plugins/sw_requests.py
@@ -31,8 +31,8 @@ def install():
                     auth=None, timeout=None, allow_redirects=True, proxies=None,
                     hooks=None, stream=None, verify=None, cert=None, json=None):
 
-        from urllib.parse import urlparse
-        url_param = urlparse(url)
+        from skywalking.utils.filter import sw_urlparse
+        url_param = sw_urlparse(url)
 
         # ignore trace skywalking self request
         if config.protocol == 'http' and config.collector_address.rstrip('/').endswith(url_param.netloc):
@@ -55,7 +55,7 @@ def install():
                 headers[item.key] = item.val
 
             span.tag(TagHttpMethod(method.upper()))
-            span.tag(TagHttpURL(url))
+            span.tag(TagHttpURL(url_param.geturl()))
 
             res = _request(this, method, url, params, data, headers, cookies, files, auth, timeout,
                            allow_redirects,
diff --git a/skywalking/plugins/sw_urllib3.py b/skywalking/plugins/sw_urllib3.py
index b9d671e..b7a94c4 100644
--- a/skywalking/plugins/sw_urllib3.py
+++ b/skywalking/plugins/sw_urllib3.py
@@ -27,9 +27,9 @@ def install():
     _request = RequestMethods.request
 
     def _sw_request(this: RequestMethods, method, url, fields=None, headers=None, **urlopen_kw):
-        from urllib.parse import urlparse
+        from skywalking.utils.filter import sw_urlparse
 
-        url_param = urlparse(url)
+        url_param = sw_urlparse(url)
 
         span = NoopSpan(NoopContext()) if config.ignore_http_method_check(method) \
             else get_context().new_exit_span(op=url_param.path or "/", peer=url_param.netloc,
@@ -45,7 +45,7 @@ def install():
                 headers[item.key] = item.val
 
             span.tag(TagHttpMethod(method.upper()))
-            span.tag(TagHttpURL(url))
+            span.tag(TagHttpURL(url_param.geturl()))
 
             res = _request(this, method, url, fields=fields, headers=headers, **urlopen_kw)
 
diff --git a/skywalking/trace/span.py b/skywalking/trace/span.py
index 4d56cd4..dd39036 100644
--- a/skywalking/trace/span.py
+++ b/skywalking/trace/span.py
@@ -16,7 +16,6 @@
 #
 
 import time
-import traceback
 from abc import ABC
 from collections import defaultdict
 from typing import List, Union, DefaultDict
@@ -85,9 +84,10 @@ class Span(ABC):
         return True
 
     def raised(self) -> 'Span':
+        from skywalking.utils.filter import sw_traceback
         self.error_occurred = True
         self.logs = [Log(items=[
-            LogItem(key='Traceback', val=traceback.format_exc()),
+            LogItem(key='Traceback', val=sw_traceback()),
         ])]
         return self
 
diff --git a/skywalking/utils/filter.py b/skywalking/utils/filter.py
new file mode 100644
index 0000000..7c4e5d4
--- /dev/null
+++ b/skywalking/utils/filter.py
@@ -0,0 +1,46 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import re
+import traceback
+from urllib.parse import urlparse
+
+from skywalking import config
+
+
+def sw_urlparse(url):
+    # Removes basic auth credentials from netloc
+    url_param = urlparse(url)
+    safe_netloc = url_param.netloc
+    try:
+        safe_netloc = str(url_param.hostname) + (':' + str(url_param.port) if url_param.port else '')
+    except ValueError:  # illegal url, skip
+        pass
+
+    return url_param._replace(netloc=safe_netloc)
+
+
+def sw_filter(target: str):
+    # Remove user:pw from any valid full urls
+
+    return re.sub(r'://(.*?)@', r'://', target)
+
+
+def sw_traceback():
+    stack_trace = traceback.format_exc(limit=config.cause_exception_depth)
+
+    return sw_filter(target=stack_trace)