You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2020/09/02 20:33:21 UTC
[spark] branch master updated: [SPARK-32772][SQL] Reduce log messages for spark-sql CLI

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new ad6b887  [SPARK-32772][SQL] Reduce log messages for spark-sql CLI
ad6b887 is described below

commit ad6b887541bf90cc3ea830a1a3322b71ccdd80ee
Author: Kousuke Saruta <sa...@oss.nttdata.com>
AuthorDate: Wed Sep 2 13:31:06 2020 -0700

    [SPARK-32772][SQL] Reduce log messages for spark-sql CLI
    
    ### What changes were proposed in this pull request?
    
    This PR reduces log messages for spark-sql CLI like spark-shell and pyspark CLI.
    
    ### Why are the changes needed?
    
    When we launch spark-sql CLI, too many log messages are shown and it's sometimes difficult to find the result of query.
    ```
    spark-sql> SELECT now();
    20/09/02 00:11:45 INFO CodeGenerator: Code generated in 10.121625 ms
    20/09/02 00:11:45 INFO SparkContext: Starting job: main at NativeMethodAccessorImpl.java:0
    20/09/02 00:11:45 INFO DAGScheduler: Got job 0 (main at NativeMethodAccessorImpl.java:0) with 1 output partitions
    20/09/02 00:11:45 INFO DAGScheduler: Final stage: ResultStage 0 (main at NativeMethodAccessorImpl.java:0)
    20/09/02 00:11:45 INFO DAGScheduler: Parents of final stage: List()
    20/09/02 00:11:45 INFO DAGScheduler: Missing parents: List()
    20/09/02 00:11:45 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at main at NativeMethodAccessorImpl.java:0), which has no missing parents
    20/09/02 00:11:45 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 6.3 KiB, free 366.3 MiB)
    20/09/02 00:11:45 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.2 KiB, free 366.3 MiB)
    20/09/02 00:11:45 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.204:42615 (size: 3.2 KiB, free: 366.3 MiB)
    20/09/02 00:11:45 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1348
    20/09/02 00:11:45 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at main at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0))
    20/09/02 00:11:45 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks resource profile 0
    20/09/02 00:11:45 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (192.168.1.204, executor driver, partition 0, PROCESS_LOCAL, 7561 bytes) taskResourceAssignments Map()
    20/09/02 00:11:45 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
    20/09/02 00:11:45 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1446 bytes result sent to driver
    20/09/02 00:11:45 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 238 ms on 192.168.1.204 (executor driver) (1/1)
    20/09/02 00:11:45 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
    20/09/02 00:11:45 INFO DAGScheduler: ResultStage 0 (main at NativeMethodAccessorImpl.java:0) finished in 0.343 s
    20/09/02 00:11:45 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
    20/09/02 00:11:45 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
    20/09/02 00:11:45 INFO DAGScheduler: Job 0 finished: main at NativeMethodAccessorImpl.java:0, took 0.377489 s
    2020-09-02 00:11:45.07
    Time taken: 0.704 seconds, Fetched 1 row(s)
    20/09/02 00:11:45 INFO SparkSQLCLIDriver: Time taken: 0.704 seconds, Fetched 1 row(s)
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes. Log messages are reduced for spark-sql CLI like as follows.
    ```
    20/09/02 00:34:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    20/09/02 00:34:53 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
    20/09/02 00:34:53 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
    20/09/02 00:34:55 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
    20/09/02 00:34:55 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore kou192.168.1.204
    Spark master: local[*], Application Id: local-1598974492822
    spark-sql> SELECT now();
    2020-09-02 00:35:05.258
    Time taken: 2.299 seconds, Fetched 1 row(s)
    ```
    
    ### How was this patch tested?
    
    Launched spark-sql CLI and confirmed that log messages are reduced as I paste above.
    
    Closes #29619 from sarutak/suppress-log-for-spark-sql.
    
    Authored-by: Kousuke Saruta <sa...@oss.nttdata.com>
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
 .../scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
index c7848af..6676223 100644
--- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
+++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
@@ -59,6 +59,7 @@ private[hive] object SparkSQLCLIDriver extends Logging {
   private var transport: TSocket = _
   private final val SPARK_HADOOP_PROP_PREFIX = "spark.hadoop."
 
+  initializeLogIfNecessary(true)
   installSignalHandler()
 
   /**


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org