You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 13:32:47 UTC

[GitHub] [beam] damccorm opened a new issue, #19767: bundle_processor log spam using python SDK on dataflow runner

damccorm opened a new issue, #19767:
URL: https://github.com/apache/beam/issues/19767

   When running my pipeline on dataflow, I can see in the stackdriver logs a large amount of spam for the following messages (note that the numbers in them change every message):
    * [INFO] (bundle_processor.create_operation) No unique name set for transform generatedPtransform-67
    * [INFO] (bundle_processor.create_operation) No unique name for transform -19
    * [ERROR] (bundle_processor.create) Missing required coder_id on grpc_port for -19; using deprecated fallback.
   
   I tried running locally using the debugger and setting breakpoints on where these log messages originate using the direct runner and it never hit it, so I don't know specifically what is causing them.
   
   I also tried using the logging module to change the threshold and also mocked out the logging attribute in the bundle_processor module to change the log level to CRITICAL and I still see the log messages.
   
   The pipeline is a streaming pipeline that reads from two pubsub topics, merges the inputs and runs distinct on the inputs over each processing time window, fetches from an external service, does processing, and inserts into elasticsearch with failures going into bigquery. I notice the log messages seem to cluster and this appears early on before any other log messages in any of the other steps so I wonder if maybe this is coming from the pubsub read or windowing portion.
   
   Expected behavior:
    * I don't expect to see these noisy log messages which seem to indicate something is wrong
    * The missing required coder_id message is at the ERROR log level so it pollutes the error logs. I would expect this to be at the WARNING or INFO level.
   
   Imported from Jira [BEAM-7930](https://issues.apache.org/jira/browse/BEAM-7930). Original Jira may contain additional context.
   Reported by: jimpremise.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] udim commented on issue #19767: bundle_processor log spam using python SDK on dataflow runner

Posted by "udim (via GitHub)" <gi...@apache.org>.
udim commented on issue #19767:
URL: https://github.com/apache/beam/issues/19767#issuecomment-1450641503

   Looks like a duplicate of #19567


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org