You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Elad Yosifon (Jira)" <ji...@apache.org> on 2021/04/25 16:16:00 UTC

[jira] [Created] (PARQUET-2036) implicitly defining DEBUG mode in MessageColumnIO causes 80% performance overhead

Elad Yosifon created PARQUET-2036:
-------------------------------------

             Summary: implicitly defining DEBUG mode in MessageColumnIO causes 80% performance overhead
                 Key: PARQUET-2036
                 URL: https://issues.apache.org/jira/browse/PARQUET-2036
             Project: Parquet
          Issue Type: Bug
    Affects Versions: 1.12.0, 1.10.1, 1.10.0
            Reporter: Elad Yosifon


*parquet-column* jar leverages +slf4j and log4j as default logger+, neglecting to define a log4j configuration, defaults to *DEBUG* log level.

 
{code:java}
public class MessageColumnIO extends GroupColumnIO {
  private static final Logger LOG = LoggerFactory.getLogger(MessageColumnIO.class);

  private static final boolean DEBUG = LOG.isDebugEnabled(); // <------
}
{code}
 

this "magic behavior" defaults parquet library to be in DEBUG mode, without any notification or warnings. Unfortunately, the *RecordConsumerLoggingWrapper* implementation generates 5x performance overhead in comparison to the *MessageColumnIORecordConsumer* implementation, causing a massive hit in performance and wasteful server utilization.

 

+IMHO there are two things that could prevent such issue:+
 * printing a message to STDOUT notifying about DEBUG mode being set to active.
 * defaulting to *MessageColumnIORecordConsumer* implementation, and waiting for explicit configuration to define DEBUG mode, and use *RecordConsumerLoggingWrapper*.

 

In the past 2 years, this issue probably cost my company 50,000$ in excessive cloud costs!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)