You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@daffodil.apache.org by Mike Beckerle <mb...@tresys.com> on 2018/07/31 13:25:44 UTC

logging statements - should be removing/scrutinizing them

As a part of code review for PRs, I think it's time we should start giving the logging calls some scrutiny.


Most of the logging calls are just leftover debug print statements disguised as logging, from back when we had no debug/trace capability.


We should be removing the logging calls that make no sense as actual logging. I haven't turned on logging for any reason for years. There are CLI tests that turn it on, and to some degree just display how not-useful it is.


While they are largely just clutter now, some of them do contain, by their format strings, useful code documentation that should be preserved as a comment.


This of course begs the question of what logging is for. I.e., when to use it. I think code reviews should suggest places to add logging, places where logging calls that are spurious should be removed/converted-to-comments.


The times not to use logging - not as a debug print statement,  not as an invariant or precondition check. Not as a replacement for tracer/debugger designed for end-users to use.  Not as part of test facilities for the code base. E.g.,  I think the TDML runner should not (and does not) use logging.


Logging calls are to assist with understanding what is going on in the software when it is deployed. When a system using daffodil is not doing what is intended/desired, people will want to see some evidence/log of what daffodil is doing so as to isolate the problem to either something in Daffodil, in their DFDL schema, or something in another part of the system.


Logging examples:

  *   In schema compiler - core optimizations that happen/don't happen are worth logging - e.g., if a schema is "not scannable" meaning it is a mix of text and binary, or mixture of different character set encodings - that's worth logging, as it's a core top level thing that if expected and not found, indicates a significant issue in the schema.
  *   Successful/failed schema compilation.
  *   Schema definition warnings.
  *   Successful save/restore of processor.
  *   Every top level parse/unparse call - logging of its success or failure and location in the data stream (for messaging API)
  *   Every top level parse/unparse call's validation errors (not each individual validation error, but all of them at the top level)
  *   Exceptional situations like uncaught throws, aborts/invariant failures. These should abort, but a real application has to recover from such failures, so they need to be logged.

There are undoubtedly many others, but the basic observation is that these are fairly coarse grained. Individual parse/unparse actions like scanning for a delimiter aren't things that we should be logging.

Thoughts?