You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@skywalking.apache.org by GitBox <gi...@apache.org> on 2021/08/12 03:57:54 UTC

[GitHub] [skywalking-python] Superskyyy opened a new pull request #149: Enable HTTP log reporting

Superskyyy opened a new pull request #149:
URL: https://github.com/apache/skywalking-python/pull/149


   This is a work in progress. 
   
   Now reporting logs in JSON through HTTP to `http://oap/v3/logs` does work. 
   
   But, I'm not sure if the `oap/v3/logs` endpoint is for such usage(It seems designed for fluent-bit batch reporting?). Reporting logs one by one through HTTP may not be ideal in terms of performance. I'm not sure whether the Java agent only implements gRPC reporter intentionally out of this reason. 
   
   Please advise.
   
     
   Signed-off-by: YihaoChen <Su...@outlook.com>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] Superskyyy commented on a change in pull request #149: Enable HTTP Kafka log reporting

Posted by GitBox <gi...@apache.org>.
Superskyyy commented on a change in pull request #149:
URL: https://github.com/apache/skywalking-python/pull/149#discussion_r686973912



##########
File path: skywalking/client/http.py
##########
@@ -109,3 +110,20 @@ def report(self, generator):
                 } for span in segment.spans]
             })
             logger.debug('report traces response: %s', res)
+
+
+class HttpLogDataReportService(TraceSegmentReportService):
+    def __init__(self):
+        proto = 'https://' if config.force_tls else 'http://'
+        self.url_report = proto + config.collector_address.rstrip('/') + '/v3/logs'
+        self.session = requests.Session()
+
+    def fork_after_in_child(self):
+        self.session.close()
+        self.session = requests.Session()
+
+    def report(self, generator):
+        for log_data in generator:
+            json_string = json_format.MessageToJson(log_data)
+            res = self.session.post(self.url_report, json=[json.loads(json_string)])

Review comment:
       Umm, should I make a new config entry allowing the user to choose whether batch or not?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] Superskyyy commented on pull request #149: Enable HTTP log reporting

Posted by GitBox <gi...@apache.org>.
Superskyyy commented on pull request #149:
URL: https://github.com/apache/skywalking-python/pull/149#issuecomment-897377986


   @kezhenxu94 Checks done, ready to merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] kezhenxu94 merged pull request #149: Enable HTTP log reporting

Posted by GitBox <gi...@apache.org>.
kezhenxu94 merged pull request #149:
URL: https://github.com/apache/skywalking-python/pull/149


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] tom-pytel commented on a change in pull request #149: Enable HTTP Kafka log reporting

Posted by GitBox <gi...@apache.org>.
tom-pytel commented on a change in pull request #149:
URL: https://github.com/apache/skywalking-python/pull/149#discussion_r686980125



##########
File path: skywalking/client/http.py
##########
@@ -109,3 +110,20 @@ def report(self, generator):
                 } for span in segment.spans]
             })
             logger.debug('report traces response: %s', res)
+
+
+class HttpLogDataReportService(TraceSegmentReportService):
+    def __init__(self):
+        proto = 'https://' if config.force_tls else 'http://'
+        self.url_report = proto + config.collector_address.rstrip('/') + '/v3/logs'
+        self.session = requests.Session()
+
+    def fork_after_in_child(self):
+        self.session.close()
+        self.session = requests.Session()
+
+    def report(self, generator):
+        for log_data in generator:
+            json_string = json_format.MessageToJson(log_data)
+            res = self.session.post(self.url_report, json=[json.loads(json_string)])

Review comment:
       No, batch is the right way to go (assuming this endpoint does take arrays, which is what you should check).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] Superskyyy commented on a change in pull request #149: Enable HTTP Kafka log reporting

Posted by GitBox <gi...@apache.org>.
Superskyyy commented on a change in pull request #149:
URL: https://github.com/apache/skywalking-python/pull/149#discussion_r686988959



##########
File path: skywalking/client/http.py
##########
@@ -109,3 +110,20 @@ def report(self, generator):
                 } for span in segment.spans]
             })
             logger.debug('report traces response: %s', res)
+
+
+class HttpLogDataReportService(TraceSegmentReportService):
+    def __init__(self):
+        proto = 'https://' if config.force_tls else 'http://'
+        self.url_report = proto + config.collector_address.rstrip('/') + '/v3/logs'
+        self.session = requests.Session()
+
+    def fork_after_in_child(self):
+        self.session.close()
+        self.session = requests.Session()
+
+    def report(self, generator):
+        for log_data in generator:
+            json_string = json_format.MessageToJson(log_data)
+            res = self.session.post(self.url_report, json=[json.loads(json_string)])

Review comment:
       Got it, it does take arrays.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] Superskyyy commented on pull request #149: Enable HTTP log reporting

Posted by GitBox <gi...@apache.org>.
Superskyyy commented on pull request #149:
URL: https://github.com/apache/skywalking-python/pull/149#issuecomment-897329294


   Oops, messed up a bit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] kezhenxu94 commented on pull request #149: Enable HTTP Kafka log reporting

Posted by GitBox <gi...@apache.org>.
kezhenxu94 commented on pull request #149:
URL: https://github.com/apache/skywalking-python/pull/149#issuecomment-896414409


   
   > But, I'm not sure if the `oap/v3/logs` endpoint is for such usage(It seems designed for fluent-bit batch reporting?).
   
   Yes it's correct usage. Http protocol is provided for those language that don't (or hard to ) support gRPC protocol. 
   
   > Reporting logs one by one through HTTP may not be ideal in terms of performance. I'm not sure whether the Java agent only implements gRPC reporter intentionally out of this reason.
   
   Yes, Java agent can eliminate all possible side effects via shading the gRPC libs but for Python, we still need http protocol in case that users' applications are using a different (incompatible) gRPC package version, we decided to implement http protocol in Python agent from day one to give the users a secondary choice.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] kezhenxu94 commented on pull request #149: Enable HTTP log reporting

Posted by GitBox <gi...@apache.org>.
kezhenxu94 commented on pull request #149:
URL: https://github.com/apache/skywalking-python/pull/149#issuecomment-897329431


   > Oops, messed up a bit.
   
   It's ok, we will squash the commits into one when merging


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] kezhenxu94 commented on pull request #149: Enable HTTP Kafka log reporting

Posted by GitBox <gi...@apache.org>.
kezhenxu94 commented on pull request #149:
URL: https://github.com/apache/skywalking-python/pull/149#issuecomment-897246779


   > I would say merge this first then do Kafka as a separate PR.
   
   I agree. @Superskyyy let's do one thing at a time, in a single PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] tom-pytel commented on a change in pull request #149: Enable HTTP Kafka log reporting

Posted by GitBox <gi...@apache.org>.
tom-pytel commented on a change in pull request #149:
URL: https://github.com/apache/skywalking-python/pull/149#discussion_r686229622



##########
File path: skywalking/client/http.py
##########
@@ -109,3 +110,20 @@ def report(self, generator):
                 } for span in segment.spans]
             })
             logger.debug('report traces response: %s', res)
+
+
+class HttpLogDataReportService(TraceSegmentReportService):
+    def __init__(self):
+        proto = 'https://' if config.force_tls else 'http://'
+        self.url_report = proto + config.collector_address.rstrip('/') + '/v3/logs'
+        self.session = requests.Session()
+
+    def fork_after_in_child(self):
+        self.session.close()
+        self.session = requests.Session()
+
+    def report(self, generator):
+        for log_data in generator:
+            json_string = json_format.MessageToJson(log_data)
+            res = self.session.post(self.url_report, json=[json.loads(json_string)])

Review comment:
       Actually, looking at this suggests that the `/v3/logs` endpoint can take an array of logs so a batch send could be done. Change to:
   ```py
   def report(self, generator):
       json = [json_format.MessageToJson(log_data) for log_data in generator]
       res = self.session.post(self.url_report, json=json)
   ```
   Your call if you want to do now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] Humbertzhang commented on pull request #149: Enable HTTP Kafka log reporting

Posted by GitBox <gi...@apache.org>.
Humbertzhang commented on pull request #149:
URL: https://github.com/apache/skywalking-python/pull/149#issuecomment-896975603


   So far looks good to me, looking forward to your Kafka part, thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] kezhenxu94 commented on pull request #149: Enable HTTP log reporting

Posted by GitBox <gi...@apache.org>.
kezhenxu94 commented on pull request #149:
URL: https://github.com/apache/skywalking-python/pull/149#issuecomment-897336412


   @Superskyyy ping me when it's ready to merge


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] tom-pytel commented on pull request #149: Enable HTTP Kafka log reporting

Posted by GitBox <gi...@apache.org>.
tom-pytel commented on pull request #149:
URL: https://github.com/apache/skywalking-python/pull/149#issuecomment-897015804


   I would say merge this first then do Kafka as a separate PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] kezhenxu94 commented on pull request #149: Enable HTTP log reporting

Posted by GitBox <gi...@apache.org>.
kezhenxu94 commented on pull request #149:
URL: https://github.com/apache/skywalking-python/pull/149#issuecomment-897378372


   > @kezhenxu94 Checks done, ready to merge.
   
   Thank you @Superskyyy very much 🙇🏻 , excellent work!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] tom-pytel commented on pull request #149: Enable HTTP Kafka log reporting

Posted by GitBox <gi...@apache.org>.
tom-pytel commented on pull request #149:
URL: https://github.com/apache/skywalking-python/pull/149#issuecomment-896221383


   > But, I'm not sure if the `oap/v3/logs` endpoint is for such usage(It seems designed for fluent-bit batch reporting?). Reporting logs one by one through HTTP may not be ideal in terms of performance. I'm not sure whether the Java agent only implements gRPC reporter intentionally out of this reason.
   
   Not sure what the INTENDED usage of that endpoint was, but the same individual send inefficiency applies to how the spans are sent currently to `/v3/segment` instead of batching to `/v3/segments`, this should be optimized at some point. The hit is not that bad though since the http protocol uses `requests.Session` which should do persistent connections as per https://docs.python-requests.org/en/master/user/advanced/:
   
   "The Session object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance, **and will use urllib3’s connection pooling. So if you’re making several requests to the same host, the underlying TCP connection will be reused**, which can result in a significant performance increase (see HTTP persistent connection)."
   
   In any case, if it works but may not yet be optional it is a step forward.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] Superskyyy commented on pull request #149: Enable HTTP Kafka log reporting

Posted by GitBox <gi...@apache.org>.
Superskyyy commented on pull request #149:
URL: https://github.com/apache/skywalking-python/pull/149#issuecomment-897324475


   > > I would say merge this first then do Kafka as a separate PR.
   > 
   > I agree. @Superskyyy let's do one thing at a time, in a single PR
   
   No problem, but let me add a commit on the batch reporting first. Then lets merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [skywalking-python] Superskyyy closed pull request #149: Enable HTTP log reporting

Posted by GitBox <gi...@apache.org>.
Superskyyy closed pull request #149:
URL: https://github.com/apache/skywalking-python/pull/149


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@skywalking.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org