You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Elad Yosifon (Jira)" <ji...@apache.org> on 2021/04/25 16:16:00 UTC
[jira] [Created] (PARQUET-2036) implicitly defining DEBUG mode in
MessageColumnIO causes 80% performance overhead
Elad Yosifon created PARQUET-2036:
-------------------------------------
Summary: implicitly defining DEBUG mode in MessageColumnIO causes 80% performance overhead
Key: PARQUET-2036
URL: https://issues.apache.org/jira/browse/PARQUET-2036
Project: Parquet
Issue Type: Bug
Affects Versions: 1.12.0, 1.10.1, 1.10.0
Reporter: Elad Yosifon
*parquet-column* jar leverages +slf4j and log4j as default logger+, neglecting to define a log4j configuration, defaults to *DEBUG* log level.
{code:java}
public class MessageColumnIO extends GroupColumnIO {
private static final Logger LOG = LoggerFactory.getLogger(MessageColumnIO.class);
private static final boolean DEBUG = LOG.isDebugEnabled(); // <------
}
{code}
this "magic behavior" defaults parquet library to be in DEBUG mode, without any notification or warnings. Unfortunately, the *RecordConsumerLoggingWrapper* implementation generates 5x performance overhead in comparison to the *MessageColumnIORecordConsumer* implementation, causing a massive hit in performance and wasteful server utilization.
+IMHO there are two things that could prevent such issue:+
* printing a message to STDOUT notifying about DEBUG mode being set to active.
* defaulting to *MessageColumnIORecordConsumer* implementation, and waiting for explicit configuration to define DEBUG mode, and use *RecordConsumerLoggingWrapper*.
In the past 2 years, this issue probably cost my company 50,000$ in excessive cloud costs!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)