You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Imran Rashid <ir...@cloudera.com.INVALID> on 2018/08/27 17:29:05 UTC

no logging in pyspark code?

Another question on pyspark code -- how come there is no logging at all?
does python logging have an unreasonable overhead, or its impossible to
configure or something?

I'm really surprised nobody has ever wanted to me able to turn on some
debug or trace logging in pyspark by just configuring a logging level.

For me, I wanted this during debugging while developing -- I'd work on some
part of the code and drop in a bunch of print statements.  Then I'd rip
those out when I think I'm ready to submit a patch.  But then I realize I
forgot some case, then more debugging -- oh gotta add those print
statements in again ...

does somebody jsut need to setup the configuration properly, or is there a
bigger reason to avoid logging in python?

thanks,
Imran

Re: no logging in pyspark code?

Posted by Hyukjin Kwon <gu...@gmail.com>.
FYI, we do have a basic logging by warnings module.

2018년 8월 28일 (화) 오전 2:05, Imran Rashid <ir...@cloudera.com.invalid>님이 작성:

> ah, great, thanks!  sorry I missed that, I'll watch that jira.
>
> On Mon, Aug 27, 2018 at 12:41 PM Ilan Filonenko <if...@cornell.edu> wrote:
>
>> A JIRA has been opened up on this exact topic: SPARK-25236
>> <https://issues.apache.org/jira/browse/SPARK-25236>, a few days ago,
>> after seeing another case of print(_, file=sys.stderr) in a most recent
>> review. I agree that we should include logging for PySpark workers.
>>
>> On Mon, Aug 27, 2018 at 1:29 PM, Imran Rashid <
>> irashid@cloudera.com.invalid> wrote:
>>
>>> Another question on pyspark code -- how come there is no logging at
>>> all?  does python logging have an unreasonable overhead, or its impossible
>>> to configure or something?
>>>
>>> I'm really surprised nobody has ever wanted to me able to turn on some
>>> debug or trace logging in pyspark by just configuring a logging level.
>>>
>>> For me, I wanted this during debugging while developing -- I'd work on
>>> some part of the code and drop in a bunch of print statements.  Then I'd
>>> rip those out when I think I'm ready to submit a patch.  But then I realize
>>> I forgot some case, then more debugging -- oh gotta add those print
>>> statements in again ...
>>>
>>> does somebody jsut need to setup the configuration properly, or is there
>>> a bigger reason to avoid logging in python?
>>>
>>> thanks,
>>> Imran
>>>
>>
>>

Re: no logging in pyspark code?

Posted by Imran Rashid <ir...@cloudera.com.INVALID>.
ah, great, thanks!  sorry I missed that, I'll watch that jira.

On Mon, Aug 27, 2018 at 12:41 PM Ilan Filonenko <if...@cornell.edu> wrote:

> A JIRA has been opened up on this exact topic: SPARK-25236
> <https://issues.apache.org/jira/browse/SPARK-25236>, a few days ago,
> after seeing another case of print(_, file=sys.stderr) in a most recent
> review. I agree that we should include logging for PySpark workers.
>
> On Mon, Aug 27, 2018 at 1:29 PM, Imran Rashid <
> irashid@cloudera.com.invalid> wrote:
>
>> Another question on pyspark code -- how come there is no logging at all?
>> does python logging have an unreasonable overhead, or its impossible to
>> configure or something?
>>
>> I'm really surprised nobody has ever wanted to me able to turn on some
>> debug or trace logging in pyspark by just configuring a logging level.
>>
>> For me, I wanted this during debugging while developing -- I'd work on
>> some part of the code and drop in a bunch of print statements.  Then I'd
>> rip those out when I think I'm ready to submit a patch.  But then I realize
>> I forgot some case, then more debugging -- oh gotta add those print
>> statements in again ...
>>
>> does somebody jsut need to setup the configuration properly, or is there
>> a bigger reason to avoid logging in python?
>>
>> thanks,
>> Imran
>>
>
>

Re: no logging in pyspark code?

Posted by Ilan Filonenko <if...@cornell.edu>.
A JIRA has been opened up on this exact topic: SPARK-25236
<https://issues.apache.org/jira/browse/SPARK-25236>, a few days ago, after
seeing another case of print(_, file=sys.stderr) in a most recent review. I
agree that we should include logging for PySpark workers.

On Mon, Aug 27, 2018 at 1:29 PM, Imran Rashid <ir...@cloudera.com.invalid>
wrote:

> Another question on pyspark code -- how come there is no logging at all?
> does python logging have an unreasonable overhead, or its impossible to
> configure or something?
>
> I'm really surprised nobody has ever wanted to me able to turn on some
> debug or trace logging in pyspark by just configuring a logging level.
>
> For me, I wanted this during debugging while developing -- I'd work on
> some part of the code and drop in a bunch of print statements.  Then I'd
> rip those out when I think I'm ready to submit a patch.  But then I realize
> I forgot some case, then more debugging -- oh gotta add those print
> statements in again ...
>
> does somebody jsut need to setup the configuration properly, or is there a
> bigger reason to avoid logging in python?
>
> thanks,
> Imran
>