You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by David Mollitor <da...@gmail.com> on 2020/01/23 16:25:24 UTC

Parquet Verbose Logging

Hello Team,

I have been a consumer of Apache Parquet through Apache Hive for several
years now.  For a long time, logging in Parquet has been pretty painful.
Some of the logging was going to STDOUT and some was going to Log4J.
Overall, though the framework has been too verbose, spewing many log lines
about internal details of Parquet I don't understand.

The logging has gotten a lot better with recent releases moving solidly
into SLF4J.  That is awesome and very welcomed.  However, (opinion alert) I
think the logging is still too verbose.  I think Parquet should be a silent
partner in data processing.  If everything is going well, it should be
silent (DEBUG level logging).  If things are going wrong, it should throw
an Exception.

If an operator suspects Parquet is the issue (and that's rarely the first
thing to check), they can set the logging for all of the Loggers in the
entire Parquet package (org.apache.parquet) to DEBUG to get the required
information.  Not to mention, the less logging it does, the faster it will
be.

I've opened this discussion because I've got two PRs related to this topic
ready to go:

PARQUET-1758
PARQUET-1761

Thanks,
David

Re: Parquet Verbose Logging

Posted by Maarten Ballintijn <ma...@xs4all.nl>.
Hello,

For larger projects, the logging is sometimes broken up into namespaces
that have their own (default) logging level. That way lower level sub-systems 
are quiet in normal operation but can be “activated” when debugging issues etc.

This might make sense for Arrow as well? What does the Logger provide?

Cheers,
Maarten.



> On Jan 24, 2020, at 1:29 PM, David Mollitor <da...@gmail.com> wrote:
> 
> Hello Ryan,
> 
> I appreciate you taking the time to share your thoughts.
> 
> I'd just like to point out that there is also TRACE level logging if
> Parquet requires greater granularity.
> 
> Furthermore, I'm not suggesting that there be an unbreakable rule that all
> logging must be DEBUG, but it should be the exception, not the rule.  It is
> more likely the situation the the wrapping application would be responsible
> for logging at the INFO and WARN/ERROR level.  Something like....
> 
> try {
>   LOG.info("Using Parquet to read file {}", path);
>   avroParquetReader.read();
> } catch (Exception e) {
>  LOG.error("Failed to read Parquet file", e);
> }
> 
> This is a very normal setup and doesn't require any additional logging from
> the Parquet library itself.  Once I see an error with "Failed to re Parquet
> file", then I'm going to turn on DEBUG logging and try to reproduce the
> error.
> 
> Thanks,
> David
> 
> On Fri, Jan 24, 2020 at 12:01 PM Ryan Blue <rb...@netflix.com.invalid>
> wrote:
> 
>> I don't agree with the idea to convert all of Parquet's logs to DEBUG
>> level, but I do think that we can improve the levels of individual
>> messages.
>> 
>> If we convert all logs to debug, then turning on logs to see what Parquet
>> is doing would show everything from opening an input file to position
>> tracking in output files. That's way too much information, which is why we
>> use different log levels to begin with.
>> 
>> I think we should continue using log levels to distinguish between types of
>> information: error for errors, warn for recoverable errors that may or may
>> not indicate a problem, info for regular operations, and debug for extra
>> information if you're debugging the Parquet library. Following the common
>> convention enables people to choose what information they want instead of
>> mixing it all together.
>> 
>> If you want to only see error and warning logs from Parquet, then the right
>> way to do that is to configure your logger so that the level for
>> org.apache.parquet classes is warn. That's not to say I don't agree that we
>> can cut down on what is logged at info and clean it up; I just don't think
>> it's a good idea to abandon the idea of log levels to distinguish between
>> different information the user of a library will need.
>> 
>> On Fri, Jan 24, 2020 at 6:30 AM lukas nalezenec <lu...@apache.org> wrote:
>> 
>>> Hi,
>>> I can help too.
>>> Lukas
>>> 
>>> Dne pá 24. 1. 2020 15:29 uživatel David Mollitor <da...@gmail.com>
>>> napsal:
>>> 
>>>> Hello Team,
>>>> 
>>>> I am happy to do the work of reviewing all Parquet logging, but I need
>>> help
>>>> getting the work committed.
>>>> 
>>>> Fokko Driesprong has been a wonderfully ally in helping me get
>>> incremental
>>>> improvements into Parquet, but I wonder if there's anyone else that can
>>>> share in the load.
>>>> 
>>>> Thanks,
>>>> David
>>>> 
>>>> On Thu, Jan 23, 2020 at 11:55 AM Michael Heuer <he...@gmail.com>
>>> wrote:
>>>> 
>>>>> Hello David,
>>>>> 
>>>>> As I mentioned on PARQUET-1758, we have been frustrated by overly
>>> verbose
>>>>> logging in Parquet for a long time.  Various workarounds have been
>> more
>>>> or
>>>>> less successful, e.g.
>>>>> 
>>>>> https://github.com/bigdatagenomics/adam/issues/851 <
>>>>> https://github.com/bigdatagenomics/adam/issues/851>
>>>>> 
>>>>> I would support a move making Parquet a silent partner.  :)
>>>>> 
>>>>>   michael
>>>>> 
>>>>> 
>>>>>> On Jan 23, 2020, at 10:25 AM, David Mollitor <da...@gmail.com>
>>>> wrote:
>>>>>> 
>>>>>> Hello Team,
>>>>>> 
>>>>>> I have been a consumer of Apache Parquet through Apache Hive for
>>>> several
>>>>>> years now.  For a long time, logging in Parquet has been pretty
>>>> painful.
>>>>>> Some of the logging was going to STDOUT and some was going to
>> Log4J.
>>>>>> Overall, though the framework has been too verbose, spewing many
>> log
>>>>> lines
>>>>>> about internal details of Parquet I don't understand.
>>>>>> 
>>>>>> The logging has gotten a lot better with recent releases moving
>>> solidly
>>>>>> into SLF4J.  That is awesome and very welcomed.  However, (opinion
>>>>> alert) I
>>>>>> think the logging is still too verbose.  I think Parquet should be
>> a
>>>>> silent
>>>>>> partner in data processing.  If everything is going well, it should
>>> be
>>>>>> silent (DEBUG level logging).  If things are going wrong, it should
>>>> throw
>>>>>> an Exception.
>>>>>> 
>>>>>> If an operator suspects Parquet is the issue (and that's rarely the
>>>> first
>>>>>> thing to check), they can set the logging for all of the Loggers in
>>> the
>>>>>> entire Parquet package (org.apache.parquet) to DEBUG to get the
>>>> required
>>>>>> information.  Not to mention, the less logging it does, the faster
>> it
>>>>> will
>>>>>> be.
>>>>>> 
>>>>>> I've opened this discussion because I've got two PRs related to
>> this
>>>>> topic
>>>>>> ready to go:
>>>>>> 
>>>>>> PARQUET-1758
>>>>>> PARQUET-1761
>>>>>> 
>>>>>> Thanks,
>>>>>> David
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>> 


Re: Parquet Verbose Logging

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Let's see how it goes moving more things to debug. I don't think it should
be a policy to move everything to debug, but we can certainly put more
information at the debug level that is currently not needed at info.

On Fri, Jan 24, 2020 at 4:08 PM David Mollitor <da...@gmail.com> wrote:

> Hey Ryan,
>
> I think you understand my position correctly and articulated it well.  My
> background is from higher up the stack; a consumer of these libraries.
>
> We may need to agree to disagree on this one.  Projects these days include
> 100+ libraries and I don't want to have to set a custom log level for each
> one.  Much easier for consumer of libraries to keep everything as quiet as
> possible and then only have to worry about a custom logging level when
> something goes wrong.  Parquet in particular logs a lot of stuff at INFO
> level that is very specific to Parquet and would only be useful (if at all)
> to someone that really knows the library, not something that would be
> helpful to the higher level application developer.
>
> Thanks.
>
>
>
> On Fri, Jan 24, 2020 at 6:48 PM Ryan Blue <rb...@netflix.com> wrote:
>
>> It sounds like we see logging differently. My approach is that for any
>> library, the type of information should be categorized using the same
>> criteria into log levels. For example, if it is a normal event you might
>> want to know about, use info. It looks like your approach is that the
>> levels should be set for information from the perspective of the end
>> application: is this behavior relevant to the end user?
>>
>> The problem is that you don't always know whether something is relevant
>> to the end user because that context depends on the application. For the
>> Parquet CLI, much more Parquet information is relevant than for Presto that
>> is scanning Parquet files. That's why I think it's best to categorize the
>> log information using a standard definition, and rely on the end
>> application to configure log levels for its users expectations.
>>
>> On Fri, Jan 24, 2020 at 10:29 AM David Mollitor <da...@gmail.com>
>> wrote:
>>
>>> Hello Ryan,
>>>
>>> I appreciate you taking the time to share your thoughts.
>>>
>>> I'd just like to point out that there is also TRACE level logging if
>>> Parquet requires greater granularity.
>>>
>>> Furthermore, I'm not suggesting that there be an unbreakable rule that
>>> all logging must be DEBUG, but it should be the exception, not the rule.
>>> It is more likely the situation the the wrapping application would be
>>> responsible for logging at the INFO and WARN/ERROR level.  Something
>>> like....
>>>
>>> try {
>>>    LOG.info("Using Parquet to read file {}", path);
>>>    avroParquetReader.read();
>>> } catch (Exception e) {
>>>   LOG.error("Failed to read Parquet file", e);
>>> }
>>>
>>> This is a very normal setup and doesn't require any additional logging
>>> from the Parquet library itself.  Once I see an error with "Failed to re
>>> Parquet file", then I'm going to turn on DEBUG logging and try to reproduce
>>> the error.
>>>
>>> Thanks,
>>> David
>>>
>>> On Fri, Jan 24, 2020 at 12:01 PM Ryan Blue <rb...@netflix.com.invalid>
>>> wrote:
>>>
>>>> I don't agree with the idea to convert all of Parquet's logs to DEBUG
>>>> level, but I do think that we can improve the levels of individual
>>>> messages.
>>>>
>>>> If we convert all logs to debug, then turning on logs to see what
>>>> Parquet
>>>> is doing would show everything from opening an input file to position
>>>> tracking in output files. That's way too much information, which is why
>>>> we
>>>> use different log levels to begin with.
>>>>
>>>> I think we should continue using log levels to distinguish between
>>>> types of
>>>> information: error for errors, warn for recoverable errors that may or
>>>> may
>>>> not indicate a problem, info for regular operations, and debug for extra
>>>> information if you're debugging the Parquet library. Following the
>>>> common
>>>> convention enables people to choose what information they want instead
>>>> of
>>>> mixing it all together.
>>>>
>>>> If you want to only see error and warning logs from Parquet, then the
>>>> right
>>>> way to do that is to configure your logger so that the level for
>>>> org.apache.parquet classes is warn. That's not to say I don't agree
>>>> that we
>>>> can cut down on what is logged at info and clean it up; I just don't
>>>> think
>>>> it's a good idea to abandon the idea of log levels to distinguish
>>>> between
>>>> different information the user of a library will need.
>>>>
>>>> On Fri, Jan 24, 2020 at 6:30 AM lukas nalezenec <lu...@apache.org>
>>>> wrote:
>>>>
>>>> > Hi,
>>>> > I can help too.
>>>> > Lukas
>>>> >
>>>> > Dne pá 24. 1. 2020 15:29 uživatel David Mollitor <da...@gmail.com>
>>>> > napsal:
>>>> >
>>>> > > Hello Team,
>>>> > >
>>>> > > I am happy to do the work of reviewing all Parquet logging, but I
>>>> need
>>>> > help
>>>> > > getting the work committed.
>>>> > >
>>>> > > Fokko Driesprong has been a wonderfully ally in helping me get
>>>> > incremental
>>>> > > improvements into Parquet, but I wonder if there's anyone else that
>>>> can
>>>> > > share in the load.
>>>> > >
>>>> > > Thanks,
>>>> > > David
>>>> > >
>>>> > > On Thu, Jan 23, 2020 at 11:55 AM Michael Heuer <he...@gmail.com>
>>>> > wrote:
>>>> > >
>>>> > > > Hello David,
>>>> > > >
>>>> > > > As I mentioned on PARQUET-1758, we have been frustrated by overly
>>>> > verbose
>>>> > > > logging in Parquet for a long time.  Various workarounds have
>>>> been more
>>>> > > or
>>>> > > > less successful, e.g.
>>>> > > >
>>>> > > > https://github.com/bigdatagenomics/adam/issues/851 <
>>>> > > > https://github.com/bigdatagenomics/adam/issues/851>
>>>> > > >
>>>> > > > I would support a move making Parquet a silent partner.  :)
>>>> > > >
>>>> > > >    michael
>>>> > > >
>>>> > > >
>>>> > > > > On Jan 23, 2020, at 10:25 AM, David Mollitor <dam6923@gmail.com
>>>> >
>>>> > > wrote:
>>>> > > > >
>>>> > > > > Hello Team,
>>>> > > > >
>>>> > > > > I have been a consumer of Apache Parquet through Apache Hive for
>>>> > > several
>>>> > > > > years now.  For a long time, logging in Parquet has been pretty
>>>> > > painful.
>>>> > > > > Some of the logging was going to STDOUT and some was going to
>>>> Log4J.
>>>> > > > > Overall, though the framework has been too verbose, spewing
>>>> many log
>>>> > > > lines
>>>> > > > > about internal details of Parquet I don't understand.
>>>> > > > >
>>>> > > > > The logging has gotten a lot better with recent releases moving
>>>> > solidly
>>>> > > > > into SLF4J.  That is awesome and very welcomed.  However,
>>>> (opinion
>>>> > > > alert) I
>>>> > > > > think the logging is still too verbose.  I think Parquet should
>>>> be a
>>>> > > > silent
>>>> > > > > partner in data processing.  If everything is going well, it
>>>> should
>>>> > be
>>>> > > > > silent (DEBUG level logging).  If things are going wrong, it
>>>> should
>>>> > > throw
>>>> > > > > an Exception.
>>>> > > > >
>>>> > > > > If an operator suspects Parquet is the issue (and that's rarely
>>>> the
>>>> > > first
>>>> > > > > thing to check), they can set the logging for all of the
>>>> Loggers in
>>>> > the
>>>> > > > > entire Parquet package (org.apache.parquet) to DEBUG to get the
>>>> > > required
>>>> > > > > information.  Not to mention, the less logging it does, the
>>>> faster it
>>>> > > > will
>>>> > > > > be.
>>>> > > > >
>>>> > > > > I've opened this discussion because I've got two PRs related to
>>>> this
>>>> > > > topic
>>>> > > > > ready to go:
>>>> > > > >
>>>> > > > > PARQUET-1758
>>>> > > > > PARQUET-1761
>>>> > > > >
>>>> > > > > Thanks,
>>>> > > > > David
>>>> > > >
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>>>
>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Parquet Verbose Logging

Posted by David Mollitor <da...@gmail.com>.
Hey Ryan,

I think you understand my position correctly and articulated it well.  My
background is from higher up the stack; a consumer of these libraries.

We may need to agree to disagree on this one.  Projects these days include
100+ libraries and I don't want to have to set a custom log level for each
one.  Much easier for consumer of libraries to keep everything as quiet as
possible and then only have to worry about a custom logging level when
something goes wrong.  Parquet in particular logs a lot of stuff at INFO
level that is very specific to Parquet and would only be useful (if at all)
to someone that really knows the library, not something that would be
helpful to the higher level application developer.

Thanks.



On Fri, Jan 24, 2020 at 6:48 PM Ryan Blue <rb...@netflix.com> wrote:

> It sounds like we see logging differently. My approach is that for any
> library, the type of information should be categorized using the same
> criteria into log levels. For example, if it is a normal event you might
> want to know about, use info. It looks like your approach is that the
> levels should be set for information from the perspective of the end
> application: is this behavior relevant to the end user?
>
> The problem is that you don't always know whether something is relevant to
> the end user because that context depends on the application. For the
> Parquet CLI, much more Parquet information is relevant than for Presto that
> is scanning Parquet files. That's why I think it's best to categorize the
> log information using a standard definition, and rely on the end
> application to configure log levels for its users expectations.
>
> On Fri, Jan 24, 2020 at 10:29 AM David Mollitor <da...@gmail.com> wrote:
>
>> Hello Ryan,
>>
>> I appreciate you taking the time to share your thoughts.
>>
>> I'd just like to point out that there is also TRACE level logging if
>> Parquet requires greater granularity.
>>
>> Furthermore, I'm not suggesting that there be an unbreakable rule that
>> all logging must be DEBUG, but it should be the exception, not the rule.
>> It is more likely the situation the the wrapping application would be
>> responsible for logging at the INFO and WARN/ERROR level.  Something
>> like....
>>
>> try {
>>    LOG.info("Using Parquet to read file {}", path);
>>    avroParquetReader.read();
>> } catch (Exception e) {
>>   LOG.error("Failed to read Parquet file", e);
>> }
>>
>> This is a very normal setup and doesn't require any additional logging
>> from the Parquet library itself.  Once I see an error with "Failed to re
>> Parquet file", then I'm going to turn on DEBUG logging and try to reproduce
>> the error.
>>
>> Thanks,
>> David
>>
>> On Fri, Jan 24, 2020 at 12:01 PM Ryan Blue <rb...@netflix.com.invalid>
>> wrote:
>>
>>> I don't agree with the idea to convert all of Parquet's logs to DEBUG
>>> level, but I do think that we can improve the levels of individual
>>> messages.
>>>
>>> If we convert all logs to debug, then turning on logs to see what Parquet
>>> is doing would show everything from opening an input file to position
>>> tracking in output files. That's way too much information, which is why
>>> we
>>> use different log levels to begin with.
>>>
>>> I think we should continue using log levels to distinguish between types
>>> of
>>> information: error for errors, warn for recoverable errors that may or
>>> may
>>> not indicate a problem, info for regular operations, and debug for extra
>>> information if you're debugging the Parquet library. Following the common
>>> convention enables people to choose what information they want instead of
>>> mixing it all together.
>>>
>>> If you want to only see error and warning logs from Parquet, then the
>>> right
>>> way to do that is to configure your logger so that the level for
>>> org.apache.parquet classes is warn. That's not to say I don't agree that
>>> we
>>> can cut down on what is logged at info and clean it up; I just don't
>>> think
>>> it's a good idea to abandon the idea of log levels to distinguish between
>>> different information the user of a library will need.
>>>
>>> On Fri, Jan 24, 2020 at 6:30 AM lukas nalezenec <lu...@apache.org>
>>> wrote:
>>>
>>> > Hi,
>>> > I can help too.
>>> > Lukas
>>> >
>>> > Dne pá 24. 1. 2020 15:29 uživatel David Mollitor <da...@gmail.com>
>>> > napsal:
>>> >
>>> > > Hello Team,
>>> > >
>>> > > I am happy to do the work of reviewing all Parquet logging, but I
>>> need
>>> > help
>>> > > getting the work committed.
>>> > >
>>> > > Fokko Driesprong has been a wonderfully ally in helping me get
>>> > incremental
>>> > > improvements into Parquet, but I wonder if there's anyone else that
>>> can
>>> > > share in the load.
>>> > >
>>> > > Thanks,
>>> > > David
>>> > >
>>> > > On Thu, Jan 23, 2020 at 11:55 AM Michael Heuer <he...@gmail.com>
>>> > wrote:
>>> > >
>>> > > > Hello David,
>>> > > >
>>> > > > As I mentioned on PARQUET-1758, we have been frustrated by overly
>>> > verbose
>>> > > > logging in Parquet for a long time.  Various workarounds have been
>>> more
>>> > > or
>>> > > > less successful, e.g.
>>> > > >
>>> > > > https://github.com/bigdatagenomics/adam/issues/851 <
>>> > > > https://github.com/bigdatagenomics/adam/issues/851>
>>> > > >
>>> > > > I would support a move making Parquet a silent partner.  :)
>>> > > >
>>> > > >    michael
>>> > > >
>>> > > >
>>> > > > > On Jan 23, 2020, at 10:25 AM, David Mollitor <da...@gmail.com>
>>> > > wrote:
>>> > > > >
>>> > > > > Hello Team,
>>> > > > >
>>> > > > > I have been a consumer of Apache Parquet through Apache Hive for
>>> > > several
>>> > > > > years now.  For a long time, logging in Parquet has been pretty
>>> > > painful.
>>> > > > > Some of the logging was going to STDOUT and some was going to
>>> Log4J.
>>> > > > > Overall, though the framework has been too verbose, spewing many
>>> log
>>> > > > lines
>>> > > > > about internal details of Parquet I don't understand.
>>> > > > >
>>> > > > > The logging has gotten a lot better with recent releases moving
>>> > solidly
>>> > > > > into SLF4J.  That is awesome and very welcomed.  However,
>>> (opinion
>>> > > > alert) I
>>> > > > > think the logging is still too verbose.  I think Parquet should
>>> be a
>>> > > > silent
>>> > > > > partner in data processing.  If everything is going well, it
>>> should
>>> > be
>>> > > > > silent (DEBUG level logging).  If things are going wrong, it
>>> should
>>> > > throw
>>> > > > > an Exception.
>>> > > > >
>>> > > > > If an operator suspects Parquet is the issue (and that's rarely
>>> the
>>> > > first
>>> > > > > thing to check), they can set the logging for all of the Loggers
>>> in
>>> > the
>>> > > > > entire Parquet package (org.apache.parquet) to DEBUG to get the
>>> > > required
>>> > > > > information.  Not to mention, the less logging it does, the
>>> faster it
>>> > > > will
>>> > > > > be.
>>> > > > >
>>> > > > > I've opened this discussion because I've got two PRs related to
>>> this
>>> > > > topic
>>> > > > > ready to go:
>>> > > > >
>>> > > > > PARQUET-1758
>>> > > > > PARQUET-1761
>>> > > > >
>>> > > > > Thanks,
>>> > > > > David
>>> > > >
>>> > > >
>>> > >
>>> >
>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Parquet Verbose Logging

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
It sounds like we see logging differently. My approach is that for any
library, the type of information should be categorized using the same
criteria into log levels. For example, if it is a normal event you might
want to know about, use info. It looks like your approach is that the
levels should be set for information from the perspective of the end
application: is this behavior relevant to the end user?

The problem is that you don't always know whether something is relevant to
the end user because that context depends on the application. For the
Parquet CLI, much more Parquet information is relevant than for Presto that
is scanning Parquet files. That's why I think it's best to categorize the
log information using a standard definition, and rely on the end
application to configure log levels for its users expectations.

On Fri, Jan 24, 2020 at 10:29 AM David Mollitor <da...@gmail.com> wrote:

> Hello Ryan,
>
> I appreciate you taking the time to share your thoughts.
>
> I'd just like to point out that there is also TRACE level logging if
> Parquet requires greater granularity.
>
> Furthermore, I'm not suggesting that there be an unbreakable rule that all
> logging must be DEBUG, but it should be the exception, not the rule.  It is
> more likely the situation the the wrapping application would be responsible
> for logging at the INFO and WARN/ERROR level.  Something like....
>
> try {
>    LOG.info("Using Parquet to read file {}", path);
>    avroParquetReader.read();
> } catch (Exception e) {
>   LOG.error("Failed to read Parquet file", e);
> }
>
> This is a very normal setup and doesn't require any additional logging
> from the Parquet library itself.  Once I see an error with "Failed to re
> Parquet file", then I'm going to turn on DEBUG logging and try to reproduce
> the error.
>
> Thanks,
> David
>
> On Fri, Jan 24, 2020 at 12:01 PM Ryan Blue <rb...@netflix.com.invalid>
> wrote:
>
>> I don't agree with the idea to convert all of Parquet's logs to DEBUG
>> level, but I do think that we can improve the levels of individual
>> messages.
>>
>> If we convert all logs to debug, then turning on logs to see what Parquet
>> is doing would show everything from opening an input file to position
>> tracking in output files. That's way too much information, which is why we
>> use different log levels to begin with.
>>
>> I think we should continue using log levels to distinguish between types
>> of
>> information: error for errors, warn for recoverable errors that may or may
>> not indicate a problem, info for regular operations, and debug for extra
>> information if you're debugging the Parquet library. Following the common
>> convention enables people to choose what information they want instead of
>> mixing it all together.
>>
>> If you want to only see error and warning logs from Parquet, then the
>> right
>> way to do that is to configure your logger so that the level for
>> org.apache.parquet classes is warn. That's not to say I don't agree that
>> we
>> can cut down on what is logged at info and clean it up; I just don't think
>> it's a good idea to abandon the idea of log levels to distinguish between
>> different information the user of a library will need.
>>
>> On Fri, Jan 24, 2020 at 6:30 AM lukas nalezenec <lu...@apache.org> wrote:
>>
>> > Hi,
>> > I can help too.
>> > Lukas
>> >
>> > Dne pá 24. 1. 2020 15:29 uživatel David Mollitor <da...@gmail.com>
>> > napsal:
>> >
>> > > Hello Team,
>> > >
>> > > I am happy to do the work of reviewing all Parquet logging, but I need
>> > help
>> > > getting the work committed.
>> > >
>> > > Fokko Driesprong has been a wonderfully ally in helping me get
>> > incremental
>> > > improvements into Parquet, but I wonder if there's anyone else that
>> can
>> > > share in the load.
>> > >
>> > > Thanks,
>> > > David
>> > >
>> > > On Thu, Jan 23, 2020 at 11:55 AM Michael Heuer <he...@gmail.com>
>> > wrote:
>> > >
>> > > > Hello David,
>> > > >
>> > > > As I mentioned on PARQUET-1758, we have been frustrated by overly
>> > verbose
>> > > > logging in Parquet for a long time.  Various workarounds have been
>> more
>> > > or
>> > > > less successful, e.g.
>> > > >
>> > > > https://github.com/bigdatagenomics/adam/issues/851 <
>> > > > https://github.com/bigdatagenomics/adam/issues/851>
>> > > >
>> > > > I would support a move making Parquet a silent partner.  :)
>> > > >
>> > > >    michael
>> > > >
>> > > >
>> > > > > On Jan 23, 2020, at 10:25 AM, David Mollitor <da...@gmail.com>
>> > > wrote:
>> > > > >
>> > > > > Hello Team,
>> > > > >
>> > > > > I have been a consumer of Apache Parquet through Apache Hive for
>> > > several
>> > > > > years now.  For a long time, logging in Parquet has been pretty
>> > > painful.
>> > > > > Some of the logging was going to STDOUT and some was going to
>> Log4J.
>> > > > > Overall, though the framework has been too verbose, spewing many
>> log
>> > > > lines
>> > > > > about internal details of Parquet I don't understand.
>> > > > >
>> > > > > The logging has gotten a lot better with recent releases moving
>> > solidly
>> > > > > into SLF4J.  That is awesome and very welcomed.  However, (opinion
>> > > > alert) I
>> > > > > think the logging is still too verbose.  I think Parquet should
>> be a
>> > > > silent
>> > > > > partner in data processing.  If everything is going well, it
>> should
>> > be
>> > > > > silent (DEBUG level logging).  If things are going wrong, it
>> should
>> > > throw
>> > > > > an Exception.
>> > > > >
>> > > > > If an operator suspects Parquet is the issue (and that's rarely
>> the
>> > > first
>> > > > > thing to check), they can set the logging for all of the Loggers
>> in
>> > the
>> > > > > entire Parquet package (org.apache.parquet) to DEBUG to get the
>> > > required
>> > > > > information.  Not to mention, the less logging it does, the
>> faster it
>> > > > will
>> > > > > be.
>> > > > >
>> > > > > I've opened this discussion because I've got two PRs related to
>> this
>> > > > topic
>> > > > > ready to go:
>> > > > >
>> > > > > PARQUET-1758
>> > > > > PARQUET-1761
>> > > > >
>> > > > > Thanks,
>> > > > > David
>> > > >
>> > > >
>> > >
>> >
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Parquet Verbose Logging

Posted by David Mollitor <da...@gmail.com>.
Hello Ryan,

I appreciate you taking the time to share your thoughts.

I'd just like to point out that there is also TRACE level logging if
Parquet requires greater granularity.

Furthermore, I'm not suggesting that there be an unbreakable rule that all
logging must be DEBUG, but it should be the exception, not the rule.  It is
more likely the situation the the wrapping application would be responsible
for logging at the INFO and WARN/ERROR level.  Something like....

try {
   LOG.info("Using Parquet to read file {}", path);
   avroParquetReader.read();
} catch (Exception e) {
  LOG.error("Failed to read Parquet file", e);
}

This is a very normal setup and doesn't require any additional logging from
the Parquet library itself.  Once I see an error with "Failed to re Parquet
file", then I'm going to turn on DEBUG logging and try to reproduce the
error.

Thanks,
David

On Fri, Jan 24, 2020 at 12:01 PM Ryan Blue <rb...@netflix.com.invalid>
wrote:

> I don't agree with the idea to convert all of Parquet's logs to DEBUG
> level, but I do think that we can improve the levels of individual
> messages.
>
> If we convert all logs to debug, then turning on logs to see what Parquet
> is doing would show everything from opening an input file to position
> tracking in output files. That's way too much information, which is why we
> use different log levels to begin with.
>
> I think we should continue using log levels to distinguish between types of
> information: error for errors, warn for recoverable errors that may or may
> not indicate a problem, info for regular operations, and debug for extra
> information if you're debugging the Parquet library. Following the common
> convention enables people to choose what information they want instead of
> mixing it all together.
>
> If you want to only see error and warning logs from Parquet, then the right
> way to do that is to configure your logger so that the level for
> org.apache.parquet classes is warn. That's not to say I don't agree that we
> can cut down on what is logged at info and clean it up; I just don't think
> it's a good idea to abandon the idea of log levels to distinguish between
> different information the user of a library will need.
>
> On Fri, Jan 24, 2020 at 6:30 AM lukas nalezenec <lu...@apache.org> wrote:
>
> > Hi,
> > I can help too.
> > Lukas
> >
> > Dne pá 24. 1. 2020 15:29 uživatel David Mollitor <da...@gmail.com>
> > napsal:
> >
> > > Hello Team,
> > >
> > > I am happy to do the work of reviewing all Parquet logging, but I need
> > help
> > > getting the work committed.
> > >
> > > Fokko Driesprong has been a wonderfully ally in helping me get
> > incremental
> > > improvements into Parquet, but I wonder if there's anyone else that can
> > > share in the load.
> > >
> > > Thanks,
> > > David
> > >
> > > On Thu, Jan 23, 2020 at 11:55 AM Michael Heuer <he...@gmail.com>
> > wrote:
> > >
> > > > Hello David,
> > > >
> > > > As I mentioned on PARQUET-1758, we have been frustrated by overly
> > verbose
> > > > logging in Parquet for a long time.  Various workarounds have been
> more
> > > or
> > > > less successful, e.g.
> > > >
> > > > https://github.com/bigdatagenomics/adam/issues/851 <
> > > > https://github.com/bigdatagenomics/adam/issues/851>
> > > >
> > > > I would support a move making Parquet a silent partner.  :)
> > > >
> > > >    michael
> > > >
> > > >
> > > > > On Jan 23, 2020, at 10:25 AM, David Mollitor <da...@gmail.com>
> > > wrote:
> > > > >
> > > > > Hello Team,
> > > > >
> > > > > I have been a consumer of Apache Parquet through Apache Hive for
> > > several
> > > > > years now.  For a long time, logging in Parquet has been pretty
> > > painful.
> > > > > Some of the logging was going to STDOUT and some was going to
> Log4J.
> > > > > Overall, though the framework has been too verbose, spewing many
> log
> > > > lines
> > > > > about internal details of Parquet I don't understand.
> > > > >
> > > > > The logging has gotten a lot better with recent releases moving
> > solidly
> > > > > into SLF4J.  That is awesome and very welcomed.  However, (opinion
> > > > alert) I
> > > > > think the logging is still too verbose.  I think Parquet should be
> a
> > > > silent
> > > > > partner in data processing.  If everything is going well, it should
> > be
> > > > > silent (DEBUG level logging).  If things are going wrong, it should
> > > throw
> > > > > an Exception.
> > > > >
> > > > > If an operator suspects Parquet is the issue (and that's rarely the
> > > first
> > > > > thing to check), they can set the logging for all of the Loggers in
> > the
> > > > > entire Parquet package (org.apache.parquet) to DEBUG to get the
> > > required
> > > > > information.  Not to mention, the less logging it does, the faster
> it
> > > > will
> > > > > be.
> > > > >
> > > > > I've opened this discussion because I've got two PRs related to
> this
> > > > topic
> > > > > ready to go:
> > > > >
> > > > > PARQUET-1758
> > > > > PARQUET-1761
> > > > >
> > > > > Thanks,
> > > > > David
> > > >
> > > >
> > >
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Parquet Verbose Logging

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
I don't agree with the idea to convert all of Parquet's logs to DEBUG
level, but I do think that we can improve the levels of individual messages.

If we convert all logs to debug, then turning on logs to see what Parquet
is doing would show everything from opening an input file to position
tracking in output files. That's way too much information, which is why we
use different log levels to begin with.

I think we should continue using log levels to distinguish between types of
information: error for errors, warn for recoverable errors that may or may
not indicate a problem, info for regular operations, and debug for extra
information if you're debugging the Parquet library. Following the common
convention enables people to choose what information they want instead of
mixing it all together.

If you want to only see error and warning logs from Parquet, then the right
way to do that is to configure your logger so that the level for
org.apache.parquet classes is warn. That's not to say I don't agree that we
can cut down on what is logged at info and clean it up; I just don't think
it's a good idea to abandon the idea of log levels to distinguish between
different information the user of a library will need.

On Fri, Jan 24, 2020 at 6:30 AM lukas nalezenec <lu...@apache.org> wrote:

> Hi,
> I can help too.
> Lukas
>
> Dne pá 24. 1. 2020 15:29 uživatel David Mollitor <da...@gmail.com>
> napsal:
>
> > Hello Team,
> >
> > I am happy to do the work of reviewing all Parquet logging, but I need
> help
> > getting the work committed.
> >
> > Fokko Driesprong has been a wonderfully ally in helping me get
> incremental
> > improvements into Parquet, but I wonder if there's anyone else that can
> > share in the load.
> >
> > Thanks,
> > David
> >
> > On Thu, Jan 23, 2020 at 11:55 AM Michael Heuer <he...@gmail.com>
> wrote:
> >
> > > Hello David,
> > >
> > > As I mentioned on PARQUET-1758, we have been frustrated by overly
> verbose
> > > logging in Parquet for a long time.  Various workarounds have been more
> > or
> > > less successful, e.g.
> > >
> > > https://github.com/bigdatagenomics/adam/issues/851 <
> > > https://github.com/bigdatagenomics/adam/issues/851>
> > >
> > > I would support a move making Parquet a silent partner.  :)
> > >
> > >    michael
> > >
> > >
> > > > On Jan 23, 2020, at 10:25 AM, David Mollitor <da...@gmail.com>
> > wrote:
> > > >
> > > > Hello Team,
> > > >
> > > > I have been a consumer of Apache Parquet through Apache Hive for
> > several
> > > > years now.  For a long time, logging in Parquet has been pretty
> > painful.
> > > > Some of the logging was going to STDOUT and some was going to Log4J.
> > > > Overall, though the framework has been too verbose, spewing many log
> > > lines
> > > > about internal details of Parquet I don't understand.
> > > >
> > > > The logging has gotten a lot better with recent releases moving
> solidly
> > > > into SLF4J.  That is awesome and very welcomed.  However, (opinion
> > > alert) I
> > > > think the logging is still too verbose.  I think Parquet should be a
> > > silent
> > > > partner in data processing.  If everything is going well, it should
> be
> > > > silent (DEBUG level logging).  If things are going wrong, it should
> > throw
> > > > an Exception.
> > > >
> > > > If an operator suspects Parquet is the issue (and that's rarely the
> > first
> > > > thing to check), they can set the logging for all of the Loggers in
> the
> > > > entire Parquet package (org.apache.parquet) to DEBUG to get the
> > required
> > > > information.  Not to mention, the less logging it does, the faster it
> > > will
> > > > be.
> > > >
> > > > I've opened this discussion because I've got two PRs related to this
> > > topic
> > > > ready to go:
> > > >
> > > > PARQUET-1758
> > > > PARQUET-1761
> > > >
> > > > Thanks,
> > > > David
> > >
> > >
> >
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: Parquet Verbose Logging

Posted by lukas nalezenec <lu...@apache.org>.
Hi,
I can help too.
Lukas

Dne pá 24. 1. 2020 15:29 uživatel David Mollitor <da...@gmail.com> napsal:

> Hello Team,
>
> I am happy to do the work of reviewing all Parquet logging, but I need help
> getting the work committed.
>
> Fokko Driesprong has been a wonderfully ally in helping me get incremental
> improvements into Parquet, but I wonder if there's anyone else that can
> share in the load.
>
> Thanks,
> David
>
> On Thu, Jan 23, 2020 at 11:55 AM Michael Heuer <he...@gmail.com> wrote:
>
> > Hello David,
> >
> > As I mentioned on PARQUET-1758, we have been frustrated by overly verbose
> > logging in Parquet for a long time.  Various workarounds have been more
> or
> > less successful, e.g.
> >
> > https://github.com/bigdatagenomics/adam/issues/851 <
> > https://github.com/bigdatagenomics/adam/issues/851>
> >
> > I would support a move making Parquet a silent partner.  :)
> >
> >    michael
> >
> >
> > > On Jan 23, 2020, at 10:25 AM, David Mollitor <da...@gmail.com>
> wrote:
> > >
> > > Hello Team,
> > >
> > > I have been a consumer of Apache Parquet through Apache Hive for
> several
> > > years now.  For a long time, logging in Parquet has been pretty
> painful.
> > > Some of the logging was going to STDOUT and some was going to Log4J.
> > > Overall, though the framework has been too verbose, spewing many log
> > lines
> > > about internal details of Parquet I don't understand.
> > >
> > > The logging has gotten a lot better with recent releases moving solidly
> > > into SLF4J.  That is awesome and very welcomed.  However, (opinion
> > alert) I
> > > think the logging is still too verbose.  I think Parquet should be a
> > silent
> > > partner in data processing.  If everything is going well, it should be
> > > silent (DEBUG level logging).  If things are going wrong, it should
> throw
> > > an Exception.
> > >
> > > If an operator suspects Parquet is the issue (and that's rarely the
> first
> > > thing to check), they can set the logging for all of the Loggers in the
> > > entire Parquet package (org.apache.parquet) to DEBUG to get the
> required
> > > information.  Not to mention, the less logging it does, the faster it
> > will
> > > be.
> > >
> > > I've opened this discussion because I've got two PRs related to this
> > topic
> > > ready to go:
> > >
> > > PARQUET-1758
> > > PARQUET-1761
> > >
> > > Thanks,
> > > David
> >
> >
>

Re: Parquet Verbose Logging

Posted by David Mollitor <da...@gmail.com>.
Hello Team,

I am happy to do the work of reviewing all Parquet logging, but I need help
getting the work committed.

Fokko Driesprong has been a wonderfully ally in helping me get incremental
improvements into Parquet, but I wonder if there's anyone else that can
share in the load.

Thanks,
David

On Thu, Jan 23, 2020 at 11:55 AM Michael Heuer <he...@gmail.com> wrote:

> Hello David,
>
> As I mentioned on PARQUET-1758, we have been frustrated by overly verbose
> logging in Parquet for a long time.  Various workarounds have been more or
> less successful, e.g.
>
> https://github.com/bigdatagenomics/adam/issues/851 <
> https://github.com/bigdatagenomics/adam/issues/851>
>
> I would support a move making Parquet a silent partner.  :)
>
>    michael
>
>
> > On Jan 23, 2020, at 10:25 AM, David Mollitor <da...@gmail.com> wrote:
> >
> > Hello Team,
> >
> > I have been a consumer of Apache Parquet through Apache Hive for several
> > years now.  For a long time, logging in Parquet has been pretty painful.
> > Some of the logging was going to STDOUT and some was going to Log4J.
> > Overall, though the framework has been too verbose, spewing many log
> lines
> > about internal details of Parquet I don't understand.
> >
> > The logging has gotten a lot better with recent releases moving solidly
> > into SLF4J.  That is awesome and very welcomed.  However, (opinion
> alert) I
> > think the logging is still too verbose.  I think Parquet should be a
> silent
> > partner in data processing.  If everything is going well, it should be
> > silent (DEBUG level logging).  If things are going wrong, it should throw
> > an Exception.
> >
> > If an operator suspects Parquet is the issue (and that's rarely the first
> > thing to check), they can set the logging for all of the Loggers in the
> > entire Parquet package (org.apache.parquet) to DEBUG to get the required
> > information.  Not to mention, the less logging it does, the faster it
> will
> > be.
> >
> > I've opened this discussion because I've got two PRs related to this
> topic
> > ready to go:
> >
> > PARQUET-1758
> > PARQUET-1761
> >
> > Thanks,
> > David
>
>

Re: Parquet Verbose Logging

Posted by Michael Heuer <he...@gmail.com>.
Hello David,

As I mentioned on PARQUET-1758, we have been frustrated by overly verbose logging in Parquet for a long time.  Various workarounds have been more or less successful, e.g.

https://github.com/bigdatagenomics/adam/issues/851 <https://github.com/bigdatagenomics/adam/issues/851>

I would support a move making Parquet a silent partner.  :)

   michael


> On Jan 23, 2020, at 10:25 AM, David Mollitor <da...@gmail.com> wrote:
> 
> Hello Team,
> 
> I have been a consumer of Apache Parquet through Apache Hive for several
> years now.  For a long time, logging in Parquet has been pretty painful.
> Some of the logging was going to STDOUT and some was going to Log4J.
> Overall, though the framework has been too verbose, spewing many log lines
> about internal details of Parquet I don't understand.
> 
> The logging has gotten a lot better with recent releases moving solidly
> into SLF4J.  That is awesome and very welcomed.  However, (opinion alert) I
> think the logging is still too verbose.  I think Parquet should be a silent
> partner in data processing.  If everything is going well, it should be
> silent (DEBUG level logging).  If things are going wrong, it should throw
> an Exception.
> 
> If an operator suspects Parquet is the issue (and that's rarely the first
> thing to check), they can set the logging for all of the Loggers in the
> entire Parquet package (org.apache.parquet) to DEBUG to get the required
> information.  Not to mention, the less logging it does, the faster it will
> be.
> 
> I've opened this discussion because I've got two PRs related to this topic
> ready to go:
> 
> PARQUET-1758
> PARQUET-1761
> 
> Thanks,
> David