You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Steve Scaffidi (JIRA)" <ji...@apache.org> on 2017/08/08 14:47:03 UTC
[jira] [Commented] (SPARK-4412) Parquet logger cannot be configured

    [ https://issues.apache.org/jira/browse/SPARK-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16118403#comment-16118403 ] 

Steve Scaffidi commented on SPARK-4412:
---------------------------------------

This is also an issue in the version of parquet distributed in CDH 5.x. In this case, I am using {{parquet-1.5.0-cdh5.8.4}} (sources available here: http://archive.cloudera.com/cdh5/cdh/5)

However, I've found a work-around for mapreduce jobs submitted via Hive. I'm sure this can be adapted for use with Spark as well.

* Add the following properties to your job's configuration (in my case, I added them to {{hive-site.xml}} since adding them to {{mapred-site.xml}} didn't work:
  {code}
  <property>
    <name>mapreduce.map.java.opts</name>
    <value>-Djava.util.logging.config.file=parquet-logging.properties</value>
  </property>
  <property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Djava.util.logging.config.file=parquet-logging.properties</value>
  </property>
  <property>
    <name>mapreduce.child.java.opts</name>
    <value>-Djava.util.logging.config.file=parquet-logging.properties</value>
  </property>{code}

* Create a file named {{parquet-logging.properties}} with the following contents:
  {code}
# Note: I'm certain not every line here is necessary. I just added them to cover all possible
# class/facility names.you will want to tailor this as per your needs.
.level=WARNING
java.util.logging.ConsoleHandler.level=WARNING

parquet.handlers=java.util.logging.ConsoleHandler
parquet.hadoop.handlers=java.util.logging.ConsoleHandler
org.apache.parquet.handlers=java.util.logging.ConsoleHandler
org.apache.parquet.hadoop.handlers=java.util.logging.ConsoleHandler

parquet.level=WARNING
parquet.hadoop.level=WARNING
org.apache.parquet.level=WARNING
org.apache.parquet.hadoop.level=WARNING{code}

* Add the file to the job. In Hive, this is most easily done like so:
  {code}ADD FILE /path/to/parquet-logging.properties;{code}

With this done, when you run your Hive queries, parquet should only log WARNING (and higher) level messages to the stdout container logs.

> Parquet logger cannot be configured
> -----------------------------------
>
>                 Key: SPARK-4412
>                 URL: https://issues.apache.org/jira/browse/SPARK-4412
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.2.0, 1.3.1
>            Reporter: Jim Carroll
>
> The Spark ParquetRelation.scala code makes the assumption that the parquet.Log class has already been loaded. If ParquetRelation.enableLogForwarding executes prior to the parquet.Log class being loaded then the code in enableLogForwarding has no affect.
> ParquetRelation.scala attempts to override the parquet logger but, at least currently (and if your application simply reads a parquet file before it does anything else with Parquet), the parquet.Log class hasn't been loaded yet. Therefore the code in ParquetRelation.enableLogForwarding has no affect. If you look at the code in parquet.Log there's a static initializer that needs to be called prior to enableLogForwarding or whatever enableLogForwarding does gets undone by this static initializer.
> The "fix" would be to force the static initializer to get called in parquet.Log as part of enableForwardLogging. 
> PR will be forthcomming.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org