You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by GitBox <gi...@apache.org> on 2022/03/08 06:59:46 UTC
[GitHub] [incubator-seatunnel] dik111 opened a new issue #1438: [Bug] [Spark-Sink-Hive] NoSuchDatabaseException: Database 'test' not found
dik111 opened a new issue #1438:
URL: https://github.com/apache/incubator-seatunnel/issues/1438
### Search before asking
- [X] I had searched in the [issues](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues.
### What happened
I am new to apache seatunnel, I test seatunnel mysql to hive, but is throw an exection Exception: `org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'test' not found`
and i add enableHiveSupport() in
`this.sparkSession = SparkSession.builder().config(sparkConf).getOrCreate();`
It solves the problem.
Is there any configuration I miss?
### SeaTunnel Version
dev (2022-03-08)
### SeaTunnel Config
```conf
env {
# seatunnel defined streaming batch duration in seconds
spark.streaming.batchDuration = 5
spark.app.name = "seatunnel"
spark.ui.port = 13000
}
source {
jdbc {
driver = "com.mysql.jdbc.Driver"
url = "jdbc:mysql://xx:3306/test?useSSL=false"
table = "user_info"
result_table_name = "user_info"
user = "xx"
password = "xx"
}
}
transform {
}
sink {
Hive {
source_table_name = "user_info"
result_table_name = "test.user_info1"
save_mode = "overwrite"
sink_columns = "id,name"
}
}
```
### Running Command
```shell
bin/start-seatunnel-spark.sh --master local[4] --deploy-mode client --config ./config/mysql-hive-example.conf
```
### Error Exception
```log
Exception in thread "main" org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'test' not found;
at org.apache.spark.sql.catalyst.catalog.ExternalCatalog$class.requireDbExists(ExternalCatalog.scala:42)
at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.requireDbExists(InMemoryCatalog.scala:45)
at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:331)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.tableExists(ExternalCatalogWithListener.scala:142)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:420)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:414)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:409)
at org.apache.seatunnel.spark.sink.Hive.output(Hive.scala:64)
at org.apache.seatunnel.spark.sink.Hive.output(Hive.scala:29)
at org.apache.seatunnel.spark.batch.SparkBatchExecution.sinkProcess(SparkBatchExecution.java:90)
at org.apache.seatunnel.spark.batch.SparkBatchExecution.start(SparkBatchExecution.java:105)
at org.apache.seatunnel.Seatunnel.entryPoint(Seatunnel.java:107)
at org.apache.seatunnel.Seatunnel.run(Seatunnel.java:65)
at org.apache.seatunnel.SeatunnelSpark.main(SeatunnelSpark.java:29)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
```
### Flink or Spark Version
spark version 2.4.4
### Java or Scala Version
java version 1.8
mysql version 5.7
hive version 3.0.0
### Screenshots
_No response_
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-seatunnel] yx91490 commented on issue #1438: [Bug] [Spark-Sink-Hive] NoSuchDatabaseException: Database 'test' not found
Posted by GitBox <gi...@apache.org>.
yx91490 commented on issue #1438:
URL: https://github.com/apache/incubator-seatunnel/issues/1438#issuecomment-1065797403
> Seem like we forgot call enableHiveSupport() method when used hive connector. I will fix it.
enableHiveSupport() is not necessary for all job, you can add 'spark.sql.catalogImplementation="hive"' to env, see https://interestinglab.github.io/seatunnel-docs/#/zh-cn/v1/configuration/input-plugins/Hive
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-seatunnel] BenJFan commented on issue #1438: [Bug] [Spark-Sink-Hive] NoSuchDatabaseException: Database 'test' not found
Posted by GitBox <gi...@apache.org>.
BenJFan commented on issue #1438:
URL: https://github.com/apache/incubator-seatunnel/issues/1438#issuecomment-1065010214
Seem like we forgot call enableHiveSupport() method when used hive connector. I will fix it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-seatunnel] BenJFan commented on issue #1438: [Bug] [Spark-Sink-Hive] NoSuchDatabaseException: Database 'test' not found
Posted by GitBox <gi...@apache.org>.
BenJFan commented on issue #1438:
URL: https://github.com/apache/incubator-seatunnel/issues/1438#issuecomment-1065798690
> > Seem like we forgot call enableHiveSupport() method when used hive connector. I will fix it.
>
> enableHiveSupport() is not necessary for all job, you can add 'spark.sql.catalogImplementation="hive"' to env, see https://interestinglab.github.io/seatunnel-docs/#/zh-cn/v1/configuration/input-plugins/Hive
We can identify whether the user is using hive, then do enableHiveSupport automic. Not need add config, I think it could be better
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-seatunnel] CalvinKirs closed issue #1438: [Bug] [Spark-Sink-Hive] NoSuchDatabaseException: Database 'test' not found
Posted by GitBox <gi...@apache.org>.
CalvinKirs closed issue #1438:
URL: https://github.com/apache/incubator-seatunnel/issues/1438
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-seatunnel] tmljob commented on issue #1438: [Bug] [Spark-Sink-Hive] NoSuchDatabaseException: Database 'test' not found
Posted by GitBox <gi...@apache.org>.
tmljob commented on issue #1438:
URL: https://github.com/apache/incubator-seatunnel/issues/1438#issuecomment-1063656321
I also encountered the same problem (based on 1.5.7), but I suspect that it is caused by the spark configuration problem, but I don't know where the configuration is, I hope to help clarify.
The task configuration is as follows:
```
spark {
# seatunnel defined streaming batch duration in seconds
spark.streaming.batchDuration = 5
spark.app.name = "mysql_to_hive"
spark.ui.port = 13000
#spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation = true
}
input {
mysql {
url = "jdbc:mysql://10.30.4.160:3306/cdh_cm"
table = "metrics"
result_table_name = "scr_metrics"
user = "root"
password = "root"
}
}
filter {
}
output {
Hive {
source_table_name = "scr_metrics"
result_table_name = "cdh_test.metrics"
save_mode = "overwrite"
sink_columns = "metric_id,optimistic_lock_version,metric_identifier,name,metric"
}
```
1. The same configuration is no problem when the nodes in the CDH cluster are running.
2. On a node other than the CDH cluster, if spark is configured and hdfs-site.xml, core-site.xml, hive-site.xml, yarn-site.xml of the replication cluster are configured and the spark configuration path runs, a NoSuchDatabaseException will be reported. as follows;
```
2022-03-10 10:57:39 INFO Client:54 -
client token: N/A
diagnostics: User class threw exception: java.lang.Exception: org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'cdh_test' not found;
at io.github.interestinglab.waterdrop.Waterdrop$.main(Waterdrop.scala:62)
at io.github.interestinglab.waterdrop.Waterdrop.main(Waterdrop.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:678)
Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'cdh_test' not found;
at org.apache.spark.sql.catalyst.catalog.ExternalCatalog$class.requireDbExists(ExternalCatalog.scala:42)
at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.requireDbExists(InMemoryCatalog.scala:45)
at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:331)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.tableExists(ExternalCatalogWithListener.scala:142)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:415)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:405)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:400)
at io.github.interestinglab.waterdrop.output.batch.Hive.process(Hive.scala:81)
at io.github.interestinglab.waterdrop.Waterdrop$.outputProcess(Waterdrop.scala:278)
at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$batchProcessing$2.apply(Waterdrop.scala:242)
at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$batchProcessing$2.apply(Waterdrop.scala:241)
at scala.collection.immutable.List.foreach(List.scala:392)
at io.github.interestinglab.waterdrop.Waterdrop$.batchProcessing(Waterdrop.scala:241)
at io.github.interestinglab.waterdrop.Waterdrop$.io$github$interestinglab$waterdrop$Waterdrop$$entrypoint(Waterdrop.scala:144)
at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply$mcV$sp(Waterdrop.scala:57)
at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply(Waterdrop.scala:57)
at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply(Waterdrop.scala:57)
at scala.util.Try$.apply(Try.scala:192)
at io.github.interestinglab.waterdrop.Waterdrop$.main(Waterdrop.scala:57)
... 6 more
ApplicationMaster host: test-4-177
ApplicationMaster RPC port: 3158
queue: root.users.root
start time: 1646881027981
final status: FAILED
tracking URL: http://test-4-178:8088/proxy/application_1645686071960_0055/
user: root
2022-03-10 10:57:39 ERROR Client:70 - Application diagnostics message: User class threw exception: java.lang.Exception: org.apache.spark.sql.catalyst.analysis.
```
3. If the database name in the seatunnel task configuration is removed, the first run will not report an error, and a directory corresponding to the table name will be generated under the /user/hive/warehouse/ path of hdfs, but it cannot be found through show databases , and repeating the operation again will report an error, the error message is as follows:
```
Exception in thread "main" java.lang.Exception: org.apache.spark.sql.AnalysisException: Can not create the managed table('`metrics`'). The associated location('/user/hive/warehouse/metrics') already exists.;
at io.github.interestinglab.waterdrop.Waterdrop$.main(Waterdrop.scala:62)
at io.github.interestinglab.waterdrop.Waterdrop.main(Waterdrop.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.sql.AnalysisException: Can not create the managed table('`metrics`'). The associated location('/user/hive/warehouse/metrics') already exists.;
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.validateTableLocation(SessionCatalog.scala:331)
at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:170)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:465)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:444)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:400)
at io.github.interestinglab.waterdrop.output.batch.Hive.process(Hive.scala:81)
at io.github.interestinglab.waterdrop.Waterdrop$.outputProcess(Waterdrop.scala:278)
at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$batchProcessing$2.apply(Waterdrop.scala:242)
at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$batchProcessing$2.apply(Waterdrop.scala:241)
at scala.collection.immutable.List.foreach(List.scala:392)
at io.github.interestinglab.waterdrop.Waterdrop$.batchProcessing(Waterdrop.scala:241)
at io.github.interestinglab.waterdrop.Waterdrop$.io$github$interestinglab$waterdrop$Waterdrop$$entrypoint(Waterdrop.scala:144)
at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply$mcV$sp(Waterdrop.scala:57)
at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply(Waterdrop.scala:57)
at io.github.interestinglab.waterdrop.Waterdrop$$anonfun$1.apply(Waterdrop.scala:57)
at scala.util.Try$.apply(Try.scala:192)
at io.github.interestinglab.waterdrop.Waterdrop$.main(Waterdrop.scala:57)
... 13 more
2022-03-10 11:05:37 INFO SparkUI:54 - Stopped Spark web UI at http://10.30.4.160:13000
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org