You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@seatunnel.apache.org by GitBox <gi...@apache.org> on 2022/03/08 06:59:46 UTC

[GitHub] [incubator-seatunnel] dik111 opened a new issue #1438: [Bug] [Spark-Sink-Hive] NoSuchDatabaseException: Database 'test' not found

dik111 opened a new issue #1438:
URL: https://github.com/apache/incubator-seatunnel/issues/1438


   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues.
   
   
   ### What happened
   
   I am new to apache seatunnel, I test seatunnel mysql to hive, but is throw  an exection Exception: `org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'test' not found`
   and i add enableHiveSupport() in 
   `this.sparkSession = SparkSession.builder().config(sparkConf).getOrCreate();`
   It solves the problem.
   Is there any configuration I miss?
   
   ### SeaTunnel Version
   
   dev (2022-03-08)
   
   ### SeaTunnel Config
   
   ```conf
   env {
     # seatunnel defined streaming batch duration in seconds
     spark.streaming.batchDuration = 5
   
     spark.app.name = "seatunnel"
     spark.ui.port = 13000
   }
   
   source {
     jdbc {
       driver = "com.mysql.jdbc.Driver"
       url = "jdbc:mysql://xx:3306/test?useSSL=false"
       table = "user_info"
       result_table_name = "user_info"
       user = "xx"
       password = "xx"
   }
   }
   
   transform {
   
   }
   
   sink {
     Hive {
       source_table_name = "user_info"
       result_table_name = "test.user_info1"
       save_mode = "overwrite"
       sink_columns = "id,name"
     }
   }
   ```
   
   
   ### Running Command
   
   ```shell
   bin/start-seatunnel-spark.sh --master local[4] --deploy-mode client --config ./config/mysql-hive-example.conf
   ```
   
   
   ### Error Exception
   
   ```log
   Exception in thread "main" org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'test' not found;
           at org.apache.spark.sql.catalyst.catalog.ExternalCatalog$class.requireDbExists(ExternalCatalog.scala:42)
           at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.requireDbExists(InMemoryCatalog.scala:45)
           at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:331)
           at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.tableExists(ExternalCatalogWithListener.scala:142)
           at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:420)
           at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:414)
           at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:409)
           at org.apache.seatunnel.spark.sink.Hive.output(Hive.scala:64)
           at org.apache.seatunnel.spark.sink.Hive.output(Hive.scala:29)
           at org.apache.seatunnel.spark.batch.SparkBatchExecution.sinkProcess(SparkBatchExecution.java:90)
           at org.apache.seatunnel.spark.batch.SparkBatchExecution.start(SparkBatchExecution.java:105)
           at org.apache.seatunnel.Seatunnel.entryPoint(Seatunnel.java:107)
           at org.apache.seatunnel.Seatunnel.run(Seatunnel.java:65)
           at org.apache.seatunnel.SeatunnelSpark.main(SeatunnelSpark.java:29)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
           at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
           at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   ```
   
   
   ### Flink or Spark Version
   
   spark version  2.4.4
   
   ### Java or Scala Version
   
   java version 1.8
   mysql version 5.7
   hive version 3.0.0
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-seatunnel] yx91490 commented on issue #1438: [Bug] [Spark-Sink-Hive] NoSuchDatabaseException: Database 'test' not found

Posted by GitBox <gi...@apache.org>.

yx91490 commented on issue #1438:
URL: https://github.com/apache/incubator-seatunnel/issues/1438#issuecomment-1065797403


   > Seem like we forgot call enableHiveSupport() method when used hive connector. I will fix it.
   
   enableHiveSupport() is not necessary for all job, you can add 'spark.sql.catalogImplementation="hive"' to env, see https://interestinglab.github.io/seatunnel-docs/#/zh-cn/v1/configuration/input-plugins/Hive


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-seatunnel] BenJFan commented on issue #1438: [Bug] [Spark-Sink-Hive] NoSuchDatabaseException: Database 'test' not found

Posted by GitBox <gi...@apache.org>.

BenJFan commented on issue #1438:
URL: https://github.com/apache/incubator-seatunnel/issues/1438#issuecomment-1065010214


   Seem like we forgot call enableHiveSupport()  method when used hive connector. I will fix it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-seatunnel] BenJFan commented on issue #1438: [Bug] [Spark-Sink-Hive] NoSuchDatabaseException: Database 'test' not found

Posted by GitBox <gi...@apache.org>.

BenJFan commented on issue #1438:
URL: https://github.com/apache/incubator-seatunnel/issues/1438#issuecomment-1065798690


   > > Seem like we forgot call enableHiveSupport() method when used hive connector. I will fix it.
   > 
   > enableHiveSupport() is not necessary for all job, you can add 'spark.sql.catalogImplementation="hive"' to env, see https://interestinglab.github.io/seatunnel-docs/#/zh-cn/v1/configuration/input-plugins/Hive
   
   We can identify whether the user is using hive, then do enableHiveSupport automic. Not need add config, I think it could be better


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-seatunnel] CalvinKirs closed issue #1438: [Bug] [Spark-Sink-Hive] NoSuchDatabaseException: Database 'test' not found

Posted by GitBox <gi...@apache.org>.

CalvinKirs closed issue #1438:
URL: https://github.com/apache/incubator-seatunnel/issues/1438


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-seatunnel] tmljob commented on issue #1438: [Bug] [Spark-Sink-Hive] NoSuchDatabaseException: Database 'test' not found

Posted by GitBox <gi...@apache.org>.

tmljob commented on issue #1438:
URL: https://github.com/apache/incubator-seatunnel/issues/1438#issuecomment-1063656321


   I also encountered the same problem (based on 1.5.7), but I suspect that it is caused by the spark configuration problem, but I don't know where the configuration is, I hope to help clarify.
   The task configuration is as follows:
   ```
   spark {
     # seatunnel defined streaming batch duration in seconds
     spark.streaming.batchDuration = 5
   
     spark.app.name = "mysql_to_hive"
     spark.ui.port = 13000
     #spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation  = true
   }
   
   input {
     mysql {
       url = "jdbc:mysql://10.30.4.160:3306/cdh_cm"
       table = "metrics"
       result_table_name = "scr_metrics"
       user = "root"
       password = "root"
           }
   }
   
   filter {
   }
   
   output {
     Hive {
       source_table_name = "scr_metrics"
       result_table_name = "cdh_test.metrics"
       save_mode = "overwrite"
       sink_columns = "metric_id,optimistic_lock_version,metric_identifier,name,metric"
     }
   
   ```
   1. The same configuration is no problem when the nodes in the CDH cluster are running.
   2. On a node other than the CDH cluster, if spark is configured and hdfs-site.xml, core-site.xml, hive-site.xml, yarn-site.xml of the replication cluster are configured and the spark configuration path runs, a NoSuchDatabaseException will be reported. as follows;
   ```
   2022-03-10 10:57:39 INFO  Client:54 -
            client token: N/A
            diagnostics: User class threw exception: java.lang.Exception: org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'cdh_test' not found;
           at io.github.interestinglab.waterdrop.Waterdrop$.main(Waterdrop.scala:62)
           at io.github.interestinglab.waterdrop.Waterdrop.main(Waterdrop.scala)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:678)
   Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'cdh_test' not found;
           at org.apache.spark.sql.catalyst.catalog.ExternalCatalog$class.requireDbExists(ExternalCatalog.scala:42)
           at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.requireDbExists(InMemoryCatalog.scala:45)
           at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:331)
           at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.tableExists(ExternalCatalogWithListener.scala:142)
           at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:415)
           at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:405)
           at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:400)
           at io.github.interestinglab.waterdrop.output.batch.Hive.process(Hive.scala:81)
           at io.github.interestinglab.waterdrop.Waterdrop$.outputProcess(Waterdrop.scala:278)
           at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$batchProcessing$2.apply(Waterdrop.scala:242)
           at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$batchProcessing$2.apply(Waterdrop.scala:241)
           at scala.collection.immutable.List.foreach(List.scala:392)
           at io.github.interestinglab.waterdrop.Waterdrop$.batchProcessing(Waterdrop.scala:241)
           at io.github.interestinglab.waterdrop.Waterdrop$.io$github$interestinglab$waterdrop$Waterdrop$$entrypoint(Waterdrop.scala:144)
           at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply$mcV$sp(Waterdrop.scala:57)
           at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply(Waterdrop.scala:57)
           at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply(Waterdrop.scala:57)
           at scala.util.Try$.apply(Try.scala:192)
           at io.github.interestinglab.waterdrop.Waterdrop$.main(Waterdrop.scala:57)
           ... 6 more
   
            ApplicationMaster host: test-4-177
            ApplicationMaster RPC port: 3158
            queue: root.users.root
            start time: 1646881027981
            final status: FAILED
            tracking URL: http://test-4-178:8088/proxy/application_1645686071960_0055/
            user: root
   2022-03-10 10:57:39 ERROR Client:70 - Application diagnostics message: User class threw exception: java.lang.Exception: org.apache.spark.sql.catalyst.analysis.
   ```
   3. If the database name in the seatunnel task configuration is removed, the first run will not report an error, and a directory corresponding to the table name will be generated under the /user/hive/warehouse/ path of hdfs, but it cannot be found through show databases , and repeating the operation again will report an error, the error message is as follows:
   ```
   Exception in thread "main" java.lang.Exception: org.apache.spark.sql.AnalysisException: Can not create the managed table('`metrics`'). The associated location('/user/hive/warehouse/metrics') already exists.;
           at io.github.interestinglab.waterdrop.Waterdrop$.main(Waterdrop.scala:62)
           at io.github.interestinglab.waterdrop.Waterdrop.main(Waterdrop.scala)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
           at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
           at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: org.apache.spark.sql.AnalysisException: Can not create the managed table('`metrics`'). The associated location('/user/hive/warehouse/metrics') already exists.;
           at org.apache.spark.sql.catalyst.catalog.SessionCatalog.validateTableLocation(SessionCatalog.scala:331)
           at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:170)
           at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
           at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
           at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
           at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
           at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
           at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
           at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
           at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
           at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
           at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
           at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
           at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
           at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
           at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
           at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
           at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
           at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
           at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:465)
           at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:444)
           at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:400)
           at io.github.interestinglab.waterdrop.output.batch.Hive.process(Hive.scala:81)
           at io.github.interestinglab.waterdrop.Waterdrop$.outputProcess(Waterdrop.scala:278)
           at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$batchProcessing$2.apply(Waterdrop.scala:242)
           at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$batchProcessing$2.apply(Waterdrop.scala:241)
           at scala.collection.immutable.List.foreach(List.scala:392)
           at io.github.interestinglab.waterdrop.Waterdrop$.batchProcessing(Waterdrop.scala:241)
           at io.github.interestinglab.waterdrop.Waterdrop$.io$github$interestinglab$waterdrop$Waterdrop$$entrypoint(Waterdrop.scala:144)
           at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply$mcV$sp(Waterdrop.scala:57)
           at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply(Waterdrop.scala:57)
           at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply(Waterdrop.scala:57)
           at scala.util.Try$.apply(Try.scala:192)
           at io.github.interestinglab.waterdrop.Waterdrop$.main(Waterdrop.scala:57)
           ... 13 more
   2022-03-10 11:05:37 INFO  SparkUI:54 - Stopped Spark web UI at http://10.30.4.160:13000
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org