You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@skywalking.apache.org by GitBox <gi...@apache.org> on 2021/08/02 14:53:43 UTC

[GitHub] [skywalking-python] tom-pytel opened a new pull request #138: fix grpc disconnect, SW_AGENT_MAX_BUFFER_SIZE

tom-pytel opened a new pull request #138:
URL: https://github.com/apache/skywalking-python/pull/138


   Discovered this issue while using grpc_proxy in nginx, doesn't seem to happen when connecting directly to grpc server. The presence of the `connected()` protocol method could lead to a deadlock where grpc was disconnected and because of this no other calls to it would be made to reconnect. This is because both the heartbeat and creation of `SpanContext()`s instead of `NoopContext()`s (which would lead to a `report()`) depended on `connected()` being True.
   
   * For this reason removed the `connected()` method entirely since it doesn't even apply to the two other protocols - http and kafka.
   
   * Set `__finished.wait(0)` in the report thread instead of blocking for 3 or 30 seconds because this could be detrimental in high throughput scenarios. Now the only report thread blocking that happens is in the report generators of the various protocols waiting on the queue.
   
   * Also wrapped the various threaded protocol calls in try / except in order to prevent unexpected transient exceptions from killing the threads.
   
   * Added SW_AGENT_MAX_BUFFER_SIZE which sets the maximum queue length like in the Node agent.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] tom-pytel commented on pull request #138: fix grpc disconnect, SW_AGENT_MAX_BUFFER_SIZE

Posted by GitBox <gi...@apache.org>.
tom-pytel commented on pull request #138:
URL: https://github.com/apache/skywalking-python/pull/138#issuecomment-891792572


   Screwed up the local test I guess, removed those options since they were a flawed fix for what this PR fixes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] tom-pytel commented on pull request #138: fix grpc disconnect, SW_AGENT_MAX_BUFFER_SIZE

Posted by GitBox <gi...@apache.org>.
tom-pytel commented on pull request #138:
URL: https://github.com/apache/skywalking-python/pull/138#issuecomment-905485882


   > Setting `__finished.wait(0)` makes it infinitely retry to connect without any delay when the OAP is down, this may keep the CPU unnecessarily busy, what do you think?
   
   It should block in agent/protocol/grpc / http / kafka on the queue wait:
   ```
                       timeout = config.QUEUE_TIMEOUT - int(time() - start)  # type: int
                       if timeout <= 0:  # this is to make sure we exit eventually instead of being fed continuously
                           return
                       segment = queue.get(block=block, timeout=timeout)  # type: Segment
   ```
   If you want to increase time between connect attempts then increase `QUEUE_TIMEOUT` in config.py from its current value of 1 second.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] tom-pytel commented on pull request #138: fix grpc disconnect, SW_AGENT_MAX_BUFFER_SIZE

Posted by GitBox <gi...@apache.org>.
tom-pytel commented on pull request #138:
URL: https://github.com/apache/skywalking-python/pull/138#issuecomment-905503408


   But you do have a point, if segments are being generated all the time then this block will not keep connection attempts from being made continuously.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] tom-pytel commented on pull request #138: fix grpc disconnect, SW_AGENT_MAX_BUFFER_SIZE

Posted by GitBox <gi...@apache.org>.
tom-pytel commented on pull request #138:
URL: https://github.com/apache/skywalking-python/pull/138#issuecomment-905502679


   > Well, the `block` is `False` by default...
   
   `True` by default.
   ```py
       def report(self, queue: Queue, block: bool = True):
           start = time()
   
           def generator():
               while True:
                   try:
                       timeout = config.QUEUE_TIMEOUT - int(time() - start)  # type: int
                       if timeout <= 0:  # this is to make sure we exit eventually instead of being fed continuously
                           return
                       segment = queue.get(block=block, timeout=timeout)  # type: Segment
                   except Empty:
                       return
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] kezhenxu94 commented on pull request #138: fix grpc disconnect, SW_AGENT_MAX_BUFFER_SIZE

Posted by GitBox <gi...@apache.org>.
kezhenxu94 commented on pull request #138:
URL: https://github.com/apache/skywalking-python/pull/138#issuecomment-905495738


   Well, the `block` is `False` by default...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] kezhenxu94 commented on pull request #138: fix grpc disconnect, SW_AGENT_MAX_BUFFER_SIZE

Posted by GitBox <gi...@apache.org>.
kezhenxu94 commented on pull request #138:
URL: https://github.com/apache/skywalking-python/pull/138#issuecomment-891782548


   @tom-pytel this usually means the agent can't even start up because of some changes, in this PR, I think this is the cause
   <img width="1092" alt="Screen Shot 2021-08-03 at 19 50 30" src="https://user-images.githubusercontent.com/15965696/128010466-749f434d-9e5d-4df4-a8a1-8b687d2bef36.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] tom-pytel commented on pull request #138: fix grpc disconnect, SW_AGENT_MAX_BUFFER_SIZE

Posted by GitBox <gi...@apache.org>.
tom-pytel commented on pull request #138:
URL: https://github.com/apache/skywalking-python/pull/138#issuecomment-891202080


   Hey @kezhenxu94, consult request, this is weird, this PR fails every test here but mostly succeeds locally:
   ```
   =========================================================== test session starts ============================================================
   platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /home/tom/venv/bin/python3
   cachedir: .pytest_cache
   rootdir: /home/tom/src/revdebugpy/skywalking-python
   collected 35 items                                                                                                                         
   
   tests/test_ant_matcher.py::TestFastPathMatch::test_match PASSED                                                                      [  2%]
   tests/test_counter.py::TestAtomicCounter::test_counter PASSED                                                                        [  5%]
   tests/test_version_check.py::TestVersionCheck::test_operators PASSED                                                                 [  8%]
   tests/test_version_check.py::TestVersionCheck::test_version_check PASSED                                                             [ 11%]
   tests/plugin/sw_aiohttp/test_aiohttp.py::TestPlugin::test_plugin[aiohttp==3.7.3] PASSED                                              [ 14%]
   tests/plugin/sw_django/test_django.py::TestPlugin::test_plugin[django==2.0] PASSED                                                   [ 17%]
   tests/plugin/sw_django/test_django.py::TestPlugin::test_plugin[django==2.2] PASSED                                                   [ 20%]
   tests/plugin/sw_django/test_django.py::TestPlugin::test_plugin[django==3.0] PASSED                                                   [ 22%]
   tests/plugin/sw_django/test_django.py::TestPlugin::test_plugin[django==3.1] PASSED                                                   [ 25%]
   tests/plugin/sw_elasticsearch/test_elasticsearch.py::TestPlugin::test_plugin[elasticsearch==7.9.0] PASSED                            [ 28%]
   tests/plugin/sw_flask/test_flask.py::TestPlugin::test_plugin[flask==1.1.2] PASSED                                                    [ 31%]
   tests/plugin/sw_flask/test_flask.py::TestPlugin::test_plugin[flask==1.0.4] PASSED                                                    [ 34%]
   tests/plugin/sw_http/test_http.py::TestPlugin::test_plugin PASSED                                                                    [ 37%]
   tests/plugin/sw_http_wsgi/test_http_wsgi.py::TestPlugin::test_plugin[werkzeug==1.0.1] PASSED                                         [ 40%]
   tests/plugin/sw_kafka/test_kafka.py::TestPlugin::test_plugin[kafka-python==2.0.1] PASSED                                             [ 42%]
   tests/plugin/sw_psycopg2/test_psycopg2.py::TestPlugin::test_plugin[psycopg2-binary==2.8.6] FAILED                                    [ 45%]
   tests/plugin/sw_pymongo/test_pymongo.py::TestPlugin::test_plugin[pymongo==3.11.0] PASSED                                             [ 48%]
   tests/plugin/sw_pymysql/test_pymysql.py::TestPlugin::test_plugin[PyMySQL==0.10.0] FAILED                                             [ 51%]
   tests/plugin/sw_pyramid/test_plugin.py::TestPlugin::test_plugin[pyramid==1.10] PASSED                                                [ 54%]
   tests/plugin/sw_pyramid/test_plugin.py::TestPlugin::test_plugin[pyramid==1.9] PASSED                                                 [ 57%]
   tests/plugin/sw_rabbitmq/test_rabbitmq.py::TestPlugin::test_plugin[pika==1.1.0] PASSED                                               [ 60%]
   tests/plugin/sw_redis/test_redis.py::TestPlugin::test_plugin[redis==3.5.3] PASSED                                                    [ 62%]
   tests/plugin/sw_requests/test_request.py::TestPlugin::test_plugin[requests==2.24.0] PASSED                                           [ 65%]
   tests/plugin/sw_requests/test_request.py::TestPlugin::test_plugin[requests==2.20.0] PASSED                                           [ 68%]
   tests/plugin/sw_requests/test_request.py::TestPlugin::test_plugin[requests==2.19.0] PASSED                                           [ 71%]
   tests/plugin/sw_requests/test_request.py::TestPlugin::test_plugin[requests==2.13.0] PASSED                                           [ 74%]
   tests/plugin/sw_requests/test_request.py::TestPlugin::test_plugin[requests==2.9.0] PASSED                                            [ 77%]
   tests/plugin/sw_sanic/test_sanic.py::TestPlugin::test_plugin[sanic==20.3.0] PASSED                                                   [ 80%]
   tests/plugin/sw_sanic/test_sanic.py::TestPlugin::test_plugin[sanic==20.6.0] PASSED                                                   [ 82%]
   tests/plugin/sw_sanic/test_sanic.py::TestPlugin::test_plugin[sanic==20.9.0] PASSED                                                   [ 85%]
   tests/plugin/sw_sanic/test_sanic.py::TestPlugin::test_plugin[sanic==20.9.1] PASSED                                                   [ 88%]
   tests/plugin/sw_tornado/test_tornado.py::TestPlugin::test_plugin[tornado==6.0.4] PASSED                                              [ 91%]
   tests/plugin/sw_tornado/test_tornado.py::TestPlugin::test_plugin[tornado==5.1.1] PASSED                                              [ 94%]
   tests/plugin/sw_urllib3/test_urllib3.py::TestPlugin::test_plugin[urllib3==1.25.10] PASSED                                            [ 97%]
   tests/plugin/sw_urllib3/test_urllib3.py::TestPlugin::test_plugin[urllib3==1.25.9] PASSED                                             [100%]
   ```
   And the two tests that fail locally are for a completely different reason and they have been failing locally for some time even though they succeed here.
   
   Any insight about why all tests fail?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] tom-pytel merged pull request #138: fix grpc disconnect, SW_AGENT_MAX_BUFFER_SIZE

Posted by GitBox <gi...@apache.org>.
tom-pytel merged pull request #138:
URL: https://github.com/apache/skywalking-python/pull/138


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] kezhenxu94 commented on pull request #138: fix grpc disconnect, SW_AGENT_MAX_BUFFER_SIZE

Posted by GitBox <gi...@apache.org>.
kezhenxu94 commented on pull request #138:
URL: https://github.com/apache/skywalking-python/pull/138#issuecomment-905481630


   Hi @tom-pytel 
   
   > Set __finished.wait(0) in the report thread instead of blocking for 3 or 30 seconds because this could be detrimental in high throughput scenarios. Now the only report thread blocking that happens in is the report generators of the various protocols waiting on the queue.
   
   Setting `__finished.wait(0)` makes it infinitely retry to connect without any delay when the OAP is down, this may keep the CPU unnecessarily busy, what do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org