You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by GitBox <gi...@apache.org> on 2022/10/12 09:30:14 UTC
[GitHub] [incubator-seatunnel] AlexNilone opened a new issue, #3076: [Bug] [Module Name] 不支持CDH5.13的hive jdbc方式访问吗
AlexNilone opened a new issue, #3076:
URL: https://github.com/apache/incubator-seatunnel/issues/3076
### Search before asking
- [X] I had searched in the [issues](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues.
### What happened
因为一些特殊的原因,需要以jdbc方式来访问hiveserver2服务,集群环境是CDH5.13的。因为涉及到hive版本的问题,用的Spark_home是CDH 通过parcels安装的Spark2.4.0的目录。驱动包另在plugins目录下建的子目录放的是集群的hive-jdbc包
两个问题:
1、查询不到数据
2、若以yarn client方式提交作业,报告gss票据问题(集群添加了kerberos认证)
### SeaTunnel Version
2.1.3
### SeaTunnel Config
```conf
env {
spark.sql.catalogImplementation = "hive"
spark.app.name = "SeaTunnel"
spark.executor.instances = 1
spark.executor.cores = 1
spark.num.executors=1
spark.executor.memory = "1g"
execution.parallelism = 1
spark.yarn.keytab=/hdfs.keytab
spark.yarn.principal="hdfs/server001@MYCDH"
}
source {
jdbc {
driver = org.apache.hive.jdbc.HiveDriver ,
url = "jdbc:hive2://server001:10000/;principal=hive/server001@MYCDH",
user = "hive",
password = "hive",
table = "test_seatunnel_source"
result_table_name = "test_seatunnel_source"
}
}
transform {
}
sink{
Console {}
}
```
### Running Command
```shell
start-seatunnel-spark.sh --master local --deploy-mode client \
--config /data/apache-seatunnel-incubating-2.1.3/config/hivejdbc-console.conf
start-seatunnel-spark.sh --master yarn --deploy-mode client \
--config /data/apache-seatunnel-incubating-2.1.3/config/hivejdbc-console.conf
```
### Error Exception
```log
能打印出来表头,但是没有具体的数据内容查询到。
22/10/12 17:20:33 INFO jdbc.Utils: Resolved authority: cdh129135:10000
22/10/12 17:20:33 INFO jdbc.JDBCRDD: closed connection
22/10/12 17:20:33 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1069 bytes result sent to driver
22/10/12 17:20:33 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 876 ms on localhost (executor driver) (1/1)
22/10/12 17:20:33 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
22/10/12 17:20:33 INFO scheduler.DAGScheduler: ResultStage 0 (show at Console.scala:38) finished in 1.295 s
22/10/12 17:20:33 INFO scheduler.DAGScheduler: Job 0 finished: show at Console.scala:38, took 1.344624 s
+------------------------+--------------------------+
|test_seatunnel_source.id|test_seatunnel_source.name|
+------------------------+--------------------------+
+------------------------+--------------------------+
22/10/12 17:20:33 INFO spark.SparkContext: Invoking stop() from shutdown hook
22/10/12 17:20:33 INFO server.AbstractConnector: Stopped Spark@1f52eb6f{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
22/10/12 17:20:33 INFO ui.SparkUI: Stopped Spark web UI at http://cdh129135:4040
22/10/12 17:20:33 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/10/12 17:20:33 INFO memory.MemoryStore: MemoryStore cleared
22/10/12 17:20:33 INFO storage.BlockManager: BlockManager stopped
22/10/12 17:20:33 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
22/10/12 17:20:33 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/10/12 17:20:33 INFO spark.SparkContext: Successfully stopped SparkContext
22/10/12 17:20:33 INFO util.ShutdownHookManager: Shutdown hook called
22/10/12 17:20:33 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-c931a4e1-6abc-476b-813a-718773b5e110
22/10/12 17:20:33 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-7add2b21-6a02-4943-9cf1-5349f3fc3c37
---yarn方式运行报gss报错
Caused by: org.apache.thrift.transport.TTransportException: GSS initiate failed
at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232)
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:316)
at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:204)
```
### Flink or Spark Version
Spark2.4.0( CDH5.13官方parcels安装包)
### Java or Scala Version
1.8
### Screenshots
1
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-seatunnel] CalvinKirs closed issue #3076: [Bug] [Module Name] 不支持CDH5.13的hive jdbc方式访问吗
Posted by GitBox <gi...@apache.org>.
CalvinKirs closed issue #3076: [Bug] [Module Name] 不支持CDH5.13的hive jdbc方式访问吗
URL: https://github.com/apache/incubator-seatunnel/issues/3076
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org