You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@skywalking.apache.org by ke...@apache.org on 2021/08/12 02:51:25 UTC
[skywalking-python] branch master updated: Remove HTTP basic auth
credentials from log, stacktrace, segment (#152)
This is an automated email from the ASF dual-hosted git repository.
kezhenxu94 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/skywalking-python.git
The following commit(s) were added to refs/heads/master by this push:
new 9abe90d Remove HTTP basic auth credentials from log, stacktrace, segment (#152)
9abe90d is described below
commit 9abe90d80512075eb4db6ee3960f498dbdbd543b
Author: Yihao Chen <Su...@outlook.com>
AuthorDate: Thu Aug 12 10:51:18 2021 +0800
Remove HTTP basic auth credentials from log, stacktrace, segment (#152)
---
CHANGELOG.md | 8 ++++++-
docs/EnvVars.md | 1 +
docs/LogReporter.md | 23 +++++++++++++-------
skywalking/agent/__init__.py | 8 +++----
skywalking/config.py | 16 +++++++-------
skywalking/log/formatter.py | 38 ++++++++++++++++++++++++++++++++
skywalking/log/sw_logging.py | 28 +++++++++++++-----------
skywalking/plugins/sw_requests.py | 6 ++---
skywalking/plugins/sw_urllib3.py | 6 ++---
skywalking/trace/span.py | 4 ++--
skywalking/utils/filter.py | 46 +++++++++++++++++++++++++++++++++++++++
11 files changed, 142 insertions(+), 42 deletions(-)
diff --git a/CHANGELOG.md b/CHANGELOG.md
index f7e2b27..ce8e32b 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,9 +2,15 @@
### 0.7.0
-- New plugins
+- Feature:
+ - Support collecting and reporting logs to backend (#147)
+
+- New plugins:
- Falcon Plugin (#146)
+- Fixes:
+ - Now properly removes HTTP basic auth credentials from segments and logs (#152)
+
### 0.6.0
- Fixes:
diff --git a/docs/EnvVars.md b/docs/EnvVars.md
index ac78283..e128c9c 100644
--- a/docs/EnvVars.md
+++ b/docs/EnvVars.md
@@ -38,3 +38,4 @@ Environment Variable | Description | Default
| `SW_AGENT_LOG_IGNORE_FILTER` | This config customizes whether to ignore the application-defined logger filters, if `True`, all logs are reported disregarding any filter rules. | `False` |
| `SW_AGENT_LOG_REPORTER_FORMATTED` | If `True`, the log reporter will transmit the logs as formatted. Otherwise, puts logRecord.msg and logRecord.args into message content and tags(`argument.n`), respectively. Along with an `exception` tag if an exception was raised. | `True` |
| `SW_AGENT_LOG_REPORTER_LAYOUT` | The log reporter formats the logRecord message based on the layout given. | `%(asctime)s [%(threadName)s] %(levelname)s %(name)s - %(message)s` |
+| `SW_AGENT_CAUSE_EXCEPTION_DEPTH` | This config limits agent to report up to `limit` stacktrace, please refer to [Python traceback](https://docs.python.org/3/library/traceback.html#traceback.print_tb) for more explanations. | `5` |
diff --git a/docs/LogReporter.md b/docs/LogReporter.md
index b8e7937..9fc0b97 100644
--- a/docs/LogReporter.md
+++ b/docs/LogReporter.md
@@ -9,13 +9,13 @@ To utilize this feature, you will need to add some new configurations to the age
from skywalking import agent, config
config.init(collector_address='127.0.0.1:11800', service_name='your awesome service',
- log_grpc_reporter_active=True)
+ log_reporter_active=True)
agent.start()
```
-`log_grpc_reporter_active=True` - Enables the log reporter.
+`log_reporter_active=True` - Enables the log reporter.
-`log_grpc_reporter_max_buffer_size` - The maximum queue backlog size for sending log data to backend, logs beyond this are silently dropped.
+`log_reporter_max_buffer_size` - The maximum queue backlog size for sending log data to backend, logs beyond this are silently dropped.
Alternatively, you can pass configurations through environment variables.
Please refer to [EnvVars.md](EnvVars.md) for the list of environment variables associated with the log reporter.
@@ -24,7 +24,7 @@ Please refer to [EnvVars.md](EnvVars.md) for the list of environment variables a
Only the logs with a level equal to or higher than the specified will be collected and reported.
In other words, the agent ignores some unwanted logs based on your level threshold.
-`log_grpc_reporter_level` - The string name of a logger level.
+`log_reporter_level` - The string name of a logger level.
Note that it also works with your custom logger levels, simply specify its string name in the config.
@@ -40,7 +40,7 @@ class AppFilter(logging.Filter):
logger.addFilter(AppFilter())
```
-However, if you do would like to report those filtered logs, set the `log_grpc_reporter_ignore_filter` to `True`.
+However, if you do would like to report those filtered logs, set the `log_reporter_ignore_filter` to `True`.
## Formatting
@@ -51,20 +51,27 @@ Note that regardless of the formatting, Python agent will always report the foll
`logger` - the logger name
`thread` - the thread name
+
+### Limit stacktrace depth
+You can set the `cause_exception_depth` config entry to a desired level(defaults to 5), which limits the output depth of exception stacktrace in reporting.
+
+This config limits agent to report up to `limit` stacktrace, please refer to [Python traceback](https://docs.python.org/3/library/traceback.html#traceback.print_tb) for more explanations.
+
### Customize the reported log format
You can choose to report collected logs in a custom layout.
-If not set, the agent uses the layout below by default, else the agent uses your custom layout set in `log_grpc_reporter_layout`.
+If not set, the agent uses the layout below by default, else the agent uses your custom layout set in `log_reporter_layout`.
`'%(asctime)s [%(threadName)s] %(levelname)s %(name)s - %(message)s'`
-If the layout is set to `None`, the reported log content will only contain the pre-formatted `LogRecord.message`(`msg % args`) without any additional styles, information or extra fields.
+If the layout is set to `None`, the reported log content will only contain
+the pre-formatted `LogRecord.message`(`msg % args`) without any additional styles or extra fields, stacktrace will be attached if an exception was raised.
### Transmit un-formatted logs
You can also choose to report the log messages without any formatting.
It separates the raw log msg `logRecord.msg` and `logRecord.args`, then puts them into message content and tags starting from `argument.0`, respectively, along with an `exception` tag if an exception was raised.
-Note when you set `log_grpc_reporter_formatted` to False, it ignores your custom layout introduced above.
+Note when you set `log_reporter_formatted` to False, it ignores your custom layout introduced above.
As an example, the following code:
```python
diff --git a/skywalking/agent/__init__.py b/skywalking/agent/__init__.py
index 0703b10..97cd77c 100644
--- a/skywalking/agent/__init__.py
+++ b/skywalking/agent/__init__.py
@@ -99,8 +99,8 @@ def __init_threading():
__report_thread.start()
__command_dispatch_thread.start()
- if config.log_grpc_reporter_active:
- __log_queue = Queue(maxsize=config.log_grpc_reporter_max_buffer_size)
+ if config.log_reporter_active:
+ __log_queue = Queue(maxsize=config.log_reporter_max_buffer_size)
__log_report_thread = Thread(name='LogReportThread', target=__report_log, daemon=True)
__log_report_thread.start()
@@ -122,7 +122,7 @@ def __init():
__protocol = KafkaProtocol()
plugins.install()
- if config.log_grpc_reporter_active: # todo - Add support for printing traceID/ context in logs
+ if config.log_reporter_active: # todo - Add support for printing traceID/ context in logs
from skywalking import log
log.install()
@@ -132,7 +132,7 @@ def __init():
def __fini():
__protocol.report(__queue, False)
__queue.join()
- if config.log_grpc_reporter_active:
+ if config.log_reporter_active:
__protocol.report_log(__log_queue, False)
__log_queue.join()
__finished.set()
diff --git a/skywalking/config.py b/skywalking/config.py
index 88235e5..d8149e0 100644
--- a/skywalking/config.py
+++ b/skywalking/config.py
@@ -68,17 +68,17 @@ profile_active = True if os.getenv('SW_AGENT_PROFILE_ACTIVE') and \
profile_task_query_interval = int(os.getenv('SW_PROFILE_TASK_QUERY_INTERVAL') or '20')
# NOTE - Log reporting requires a separate channel, will merge in the future.
-log_grpc_reporter_active = True if os.getenv('SW_AGENT_LOG_REPORTER_ACTIVE') and \
- os.getenv('SW_AGENT_LOG_REPORTER_ACTIVE') == 'True' else False # type: bool
-log_grpc_reporter_max_buffer_size = int(os.getenv('SW_AGENT_LOG_REPORTER_BUFFER_SIZE') or '10000') # type: int
-log_grpc_reporter_level = os.getenv('SW_AGENT_LOG_REPORTER_LEVEL') or 'WARNING' # type: str
-log_grpc_reporter_ignore_filter = True if os.getenv('SW_AGENT_LOG_IGNORE_FILTER') and \
+log_reporter_active = True if os.getenv('SW_AGENT_LOG_REPORTER_ACTIVE') and \
+ os.getenv('SW_AGENT_LOG_REPORTER_ACTIVE') == 'True' else False # type: bool
+log_reporter_max_buffer_size = int(os.getenv('SW_AGENT_LOG_REPORTER_BUFFER_SIZE') or '10000') # type: int
+log_reporter_level = os.getenv('SW_AGENT_LOG_REPORTER_LEVEL') or 'WARNING' # type: str
+log_reporter_ignore_filter = True if os.getenv('SW_AGENT_LOG_IGNORE_FILTER') and \
os.getenv('SW_AGENT_LOG_REPORTER_FORMATTED') == 'True' else False # type: bool
-log_grpc_reporter_formatted = False if os.getenv('SW_AGENT_LOG_REPORTER_FORMATTED') and \
+log_reporter_formatted = False if os.getenv('SW_AGENT_LOG_REPORTER_FORMATTED') and \
os.getenv('SW_AGENT_LOG_REPORTER_FORMATTED') == 'False' else True # type: bool
-log_grpc_reporter_layout = os.getenv('SW_AGENT_LOG_REPORTER_LAYOUT') or \
+log_reporter_layout = os.getenv('SW_AGENT_LOG_REPORTER_LAYOUT') or \
'%(asctime)s [%(threadName)s] %(levelname)s %(name)s - %(message)s' # type: str
-
+cause_exception_depth = int(os.getenv('SW_AGENT_CAUSE_EXCEPTION_DEPTH') or '5') # type: int
options = {key for key in globals() if key not in options} # THIS MUST FOLLOW DIRECTLY AFTER LIST OF CONFIG OPTIONS!
diff --git a/skywalking/log/formatter.py b/skywalking/log/formatter.py
new file mode 100644
index 0000000..c67df1e
--- /dev/null
+++ b/skywalking/log/formatter.py
@@ -0,0 +1,38 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import io
+import logging
+import traceback
+
+
+class SWFormatter(logging.Formatter):
+ """ A slightly modified formatter that allows traceback depth """
+
+ def __init__(self, fmt, tb_limit):
+ logging.Formatter.__init__(self, fmt)
+ self.tb_limit = tb_limit
+
+ def formatException(self, ei):
+ sio = io.StringIO()
+ tb = ei[2]
+ traceback.print_exception(ei[0], ei[1], tb, self.tb_limit, sio)
+ s = sio.getvalue()
+ sio.close()
+ if s[-1:] == "\n":
+ s = s[:-1]
+ return s
diff --git a/skywalking/log/sw_logging.py b/skywalking/log/sw_logging.py
index 3c8380b..dd87a57 100644
--- a/skywalking/log/sw_logging.py
+++ b/skywalking/log/sw_logging.py
@@ -22,17 +22,19 @@ from skywalking.protocol.logging.Logging_pb2 import LogData, LogDataBody, TraceC
from skywalking import config, agent
from skywalking.trace.context import get_context
+from skywalking.utils.filter import sw_traceback, sw_filter
def install():
- from logging import Logger, Formatter
+ from logging import Logger
- layout = config.log_grpc_reporter_layout # type: str
+ layout = config.log_reporter_layout # type: str
if layout:
- formatter = Formatter(fmt=layout)
+ from skywalking.log.formatter import SWFormatter
+ formatter = SWFormatter(fmt=layout, tb_limit=config.cause_exception_depth)
_handle = Logger.handle
- log_reporter_level = logging.getLevelName(config.log_grpc_reporter_level) # type: int
+ log_reporter_level = logging.getLevelName(config.log_reporter_level) # type: int
def _sw_handle(self, record):
if record.name == "skywalking": # Ignore SkyWalking internal logger
@@ -41,7 +43,7 @@ def install():
if record.levelno < log_reporter_level:
return _handle(self, record)
- if not config.log_grpc_reporter_ignore_filter and not self.filter(record): # ignore filtered logs
+ if not config.log_reporter_ignore_filter and not self.filter(record): # ignore filtered logs
return _handle(self, record) # return handle to original if record is vetoed, just to be safe
def build_log_tags() -> LogTags:
@@ -53,15 +55,16 @@ def install():
l_tags = LogTags()
l_tags.data.extend(core_tags)
- if config.log_grpc_reporter_formatted:
+ if config.log_reporter_formatted:
return l_tags
for i, arg in enumerate(record.args):
l_tags.data.append(KeyStringValuePair(key='argument.' + str(i), value=str(arg)))
if record.exc_info:
- l_tags.data.append(KeyStringValuePair(key='exception', value=str(record.exc_info)))
-
+ l_tags.data.append(KeyStringValuePair(key='exception',
+ value=sw_traceback()
+ )) # \n doesn't work in tags for UI
return l_tags
context = get_context()
@@ -73,7 +76,7 @@ def install():
body=LogDataBody(
type='text',
text=TextLog(
- text=transform(record)
+ text=sw_filter(transform(record))
)
),
traceContext=TraceContext(
@@ -90,9 +93,8 @@ def install():
Logger.handle = _sw_handle
def transform(record) -> str:
- if config.log_grpc_reporter_formatted:
+ if config.log_reporter_formatted:
if layout:
return formatter.format(record=record)
- return record.getMessage()
-
- return record.msg
+ return record.getMessage() + '\n' + sw_traceback()
+ return str(record.msg) # convert possible exception to str
diff --git a/skywalking/plugins/sw_requests.py b/skywalking/plugins/sw_requests.py
index a6604f9..d44c837 100644
--- a/skywalking/plugins/sw_requests.py
+++ b/skywalking/plugins/sw_requests.py
@@ -31,8 +31,8 @@ def install():
auth=None, timeout=None, allow_redirects=True, proxies=None,
hooks=None, stream=None, verify=None, cert=None, json=None):
- from urllib.parse import urlparse
- url_param = urlparse(url)
+ from skywalking.utils.filter import sw_urlparse
+ url_param = sw_urlparse(url)
# ignore trace skywalking self request
if config.protocol == 'http' and config.collector_address.rstrip('/').endswith(url_param.netloc):
@@ -55,7 +55,7 @@ def install():
headers[item.key] = item.val
span.tag(TagHttpMethod(method.upper()))
- span.tag(TagHttpURL(url))
+ span.tag(TagHttpURL(url_param.geturl()))
res = _request(this, method, url, params, data, headers, cookies, files, auth, timeout,
allow_redirects,
diff --git a/skywalking/plugins/sw_urllib3.py b/skywalking/plugins/sw_urllib3.py
index b9d671e..b7a94c4 100644
--- a/skywalking/plugins/sw_urllib3.py
+++ b/skywalking/plugins/sw_urllib3.py
@@ -27,9 +27,9 @@ def install():
_request = RequestMethods.request
def _sw_request(this: RequestMethods, method, url, fields=None, headers=None, **urlopen_kw):
- from urllib.parse import urlparse
+ from skywalking.utils.filter import sw_urlparse
- url_param = urlparse(url)
+ url_param = sw_urlparse(url)
span = NoopSpan(NoopContext()) if config.ignore_http_method_check(method) \
else get_context().new_exit_span(op=url_param.path or "/", peer=url_param.netloc,
@@ -45,7 +45,7 @@ def install():
headers[item.key] = item.val
span.tag(TagHttpMethod(method.upper()))
- span.tag(TagHttpURL(url))
+ span.tag(TagHttpURL(url_param.geturl()))
res = _request(this, method, url, fields=fields, headers=headers, **urlopen_kw)
diff --git a/skywalking/trace/span.py b/skywalking/trace/span.py
index 4d56cd4..dd39036 100644
--- a/skywalking/trace/span.py
+++ b/skywalking/trace/span.py
@@ -16,7 +16,6 @@
#
import time
-import traceback
from abc import ABC
from collections import defaultdict
from typing import List, Union, DefaultDict
@@ -85,9 +84,10 @@ class Span(ABC):
return True
def raised(self) -> 'Span':
+ from skywalking.utils.filter import sw_traceback
self.error_occurred = True
self.logs = [Log(items=[
- LogItem(key='Traceback', val=traceback.format_exc()),
+ LogItem(key='Traceback', val=sw_traceback()),
])]
return self
diff --git a/skywalking/utils/filter.py b/skywalking/utils/filter.py
new file mode 100644
index 0000000..7c4e5d4
--- /dev/null
+++ b/skywalking/utils/filter.py
@@ -0,0 +1,46 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import re
+import traceback
+from urllib.parse import urlparse
+
+from skywalking import config
+
+
+def sw_urlparse(url):
+ # Removes basic auth credentials from netloc
+ url_param = urlparse(url)
+ safe_netloc = url_param.netloc
+ try:
+ safe_netloc = str(url_param.hostname) + (':' + str(url_param.port) if url_param.port else '')
+ except ValueError: # illegal url, skip
+ pass
+
+ return url_param._replace(netloc=safe_netloc)
+
+
+def sw_filter(target: str):
+ # Remove user:pw from any valid full urls
+
+ return re.sub(r'://(.*?)@', r'://', target)
+
+
+def sw_traceback():
+ stack_trace = traceback.format_exc(limit=config.cause_exception_depth)
+
+ return sw_filter(target=stack_trace)