You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Andrew Davidson <ae...@ucsc.edu.INVALID> on 2022/01/20 22:32:39 UTC
How to configure log4j in pyspark to get log level, file name, and line number
Hi
When I use python logging for my unit test. I am able to control the output format. I get the log level, the file and line number, then the msg
[INFO testEstimatedScalingFactors.py:166 - test_B_convertCountsToInts()] BEGIN
In my spark driver I am able to get the log4j logger
spark = SparkSession\
.builder\
.appName("estimatedScalingFactors")\
.getOrCreate()
#
# https://medium.com/@lubna_22592/building-production-pyspark-jobs-5480d03fd71e
# initialize logger for yarn cluster logs
#
log4jLogger = spark.sparkContext._jvm.org.apache.log4j
logger = log4jLogger.LogManager.getLogger(__name__)
However it only outputs the message. As a hack I have been adding the function names to the msg.
I wonder if this is because of the way I make my python code available. When I submit my job using
‘$ gcloud dataproc jobs submit pyspark’
I pass my python file in a zip file
--py-files ${extraPkg}
I use level warn because the driver info logs are very verbose
###############################################################################
def rowSums( self, countsSparkDF, columnNames ):
self.logger.warn( "rowSums BEGIN" )
# https://stackoverflow.com/a/54283997/4586180
retDF = countsSparkDF.na.fill( 0 ).withColumn( "rowSum" , reduce( add, [col( x ) for x in columnNames] ) )
self.logger.warn( "rowSums retDF numRows:{} numCols:{}"\
.format( retDF.count(), len( retDF.columns ) ) )
self.logger.warn( "rowSums END\n" )
return retDF
kind regards
Andy
Re: How to configure log4j in pyspark to get log level, file name, and line number
Posted by Andrew Davidson <ae...@ucsc.edu.INVALID>.
Interesting. I noticed that my drive log messages with time stamp, function name but no line number. However log message in other python files only contain the messages. All of my python code is a single zip file. The zip file is job submit argument
2022-01-21 19:45:02 WARN __main__:? - sparkConfig: ('spark.sql.cbo.enabled', 'true')
2022-01-21 19:48:34 WARN __main__:? - readsSparkDF.rdd.getNumPartitions():1698
__init__ BEGIN
__init__ END
run BEGIN
run rawCountsSparkDF numRows:5387495 numCols:10409
My guess is somehow I need to change the way log4j is configure on the workers?
Kind regards
Andy
From: Andrew Davidson <ae...@ucsc.edu>
Date: Thursday, January 20, 2022 at 2:32 PM
To: "user @spark" <us...@spark.apache.org>
Subject: How to configure log4j in pyspark to get log level, file name, and line number
Hi
When I use python logging for my unit test. I am able to control the output format. I get the log level, the file and line number, then the msg
[INFO testEstimatedScalingFactors.py:166 - test_B_convertCountsToInts()] BEGIN
In my spark driver I am able to get the log4j logger
spark = SparkSession\
.builder\
.appName("estimatedScalingFactors")\
.getOrCreate()
#
# https://medium.com/@lubna_22592/building-production-pyspark-jobs-5480d03fd71e
# initialize logger for yarn cluster logs
#
log4jLogger = spark.sparkContext._jvm.org.apache.log4j
logger = log4jLogger.LogManager.getLogger(__name__)
However it only outputs the message. As a hack I have been adding the function names to the msg.
I wonder if this is because of the way I make my python code available. When I submit my job using
‘$ gcloud dataproc jobs submit pyspark’
I pass my python file in a zip file
--py-files ${extraPkg}
I use level warn because the driver info logs are very verbose
###############################################################################
def rowSums( self, countsSparkDF, columnNames ):
self.logger.warn( "rowSums BEGIN" )
# https://stackoverflow.com/a/54283997/4586180
retDF = countsSparkDF.na.fill( 0 ).withColumn( "rowSum" , reduce( add, [col( x ) for x in columnNames] ) )
self.logger.warn( "rowSums retDF numRows:{} numCols:{}"\
.format( retDF.count(), len( retDF.columns ) ) )
self.logger.warn( "rowSums END\n" )
return retDF
kind regards
Andy