You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 19:08:37 UTC

[GitHub] [beam] damccorm opened a new issue, #20757: Logging from DoFn doesn't work with Spark Runner in cluster mode

damccorm opened a new issue, #20757:
URL: https://github.com/apache/beam/issues/20757

   Log messages emitted by any DoFn is not logged by spark executors when the pipeline is run with Spark in cluster deployment mode (on YARN). Tested on Cloudera 6 with Spark 2.4.
   
   I made a test project to reproduce the issue: [https://github.com/ventuc/beam-log-test](https://github.com/ventuc/beam-log-test). Run it with:
   
   `spark-submit --class beam.tests.log.LogTesting --name LogTesting --deploy-mode cluster --master yarn --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=[file:log4j.properties|file://log4j.properties/]" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=[file:log4j.properties|file://log4j.properties/]" --files $HOME/log4j.properties beam-log-test-0.0.1-SNAPSHOT.jar`
   
   To retrieve logs from YARN run:
   
   `yarn logs -applicationId <app_id>`
   
   As you can see, logs from the beam.tests.log appear only in the driver's log, and not in the executor's log.
   
    
   
   There's not any documentation about how to handle logs in Beam with the Spark runner. Please document it as requested also by BEAM-792.
   
    
   
   Imported from Jira [BEAM-11735](https://issues.apache.org/jira/browse/BEAM-11735). Original Jira may contain additional context.
   Reported by: claventu.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org