You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by "LYL41011 (via GitHub)" <gi...@apache.org> on 2023/04/11 10:07:14 UTC
[GitHub] [incubator-seatunnel] LYL41011 opened a new issue, #4549: [Bug] [hive sink] Bug title
LYL41011 opened a new issue, #4549:
URL: https://github.com/apache/incubator-seatunnel/issues/4549
### Search before asking
- [X] I had searched in the [issues](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues.
### What happened
I want to synchronize data from kafka to hive. When I start the task, an error occurs.
Caused by: java.lang.NullPointerException
at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.AbstractWriteStrategy.lambda$buildSchemaWithRowType$0(AbstractWriteStrategy.java:129)
at java.util.ArrayList.forEach(ArrayList.java:1259)
at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.AbstractWriteStrategy.buildSchemaWithRowType(AbstractWriteStrategy.java:127)
The table creation statement is as follows:
CREATE TABLE `ods_tmp`.`prd_kafka_rta_toutiao_gl_topic`(`reqid` string COMMENT '使用项目', `reqtime` string COMMENT '上报时间', `media` string COMMENT '媒体', `isping` string COMMENT '是否ping测流量', `flowtype` string COMMENT '流量类型', `reqparams` string COMMENT '媒体下发的参数', `imeimd5` string COMMENT '设备 imei 号的 md5加密值', `oaidmd5` string COMMENT '设备 oaid 号的 md5加密值', `idfamd5` string COMMENT '设备 idfa 号的 md5加密值', `bucket` string COMMENT '', `results` string COMMENT '各账户的参竞、识别情况以及决策上下文', `exts` string COMMENT '扩展的字段', `timeconsume` string COMMENT '', `system` string COMMENT '', `debuginfo` string COMMENT '', `ts` string COMMENT '毓数消费时间戳')
PARTITIONED BY (`pday` string, `phour` string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1' ) STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
### SeaTunnel Version
2.3.1
### SeaTunnel Config
```conf
env {
execution.parallelism = 1
job.mode = "STREAMING"
checkpoint.interval = 20000
}
source {
Kafka {
result_table_name = "kafka_toutiao_gl_topic_source"
schema = {
fields {
reqId= "string"
reqTime= "string"
media= "string"
isPing= "string"
flowType= "string"
reqParams= "string"
imeiMd5= "string"
oaidMd5= "string"
idfaMd5= "string"
bucket= "string"
results= "string"
exts= "string"
timeConsume= "string"
system= "string"
debugInfo= "string"
}
}
topic = "toutiao_gl_topic"
bootstrap.servers = "10.220.193.248:39092"
kafka.max.poll.records = 10000
kafka.auto.offset.reset = earliest
consumer.group = "yushu_dis_seatunnel"
}
}
transform {
Sql {
source_table_name = "kafka_toutiao_gl_topic_source"
result_table_name = "kafka_toutiao_gl_topic_result"
query="select IFNULL(reqId,''),IFNULL(reqTime,''),IFNULL(media,''),IFNULL(isPing,''),IFNULL(flowType,''),IFNULL(reqParams,''),IFNULL(imeiMd5,''),IFNULL(oaidMd5,''),IFNULL(idfaMd5,''),IFNULL(bucket,''),IFNULL(results,''),IFNULL(exts,''),IFNULL(timeConsume,''),IFNULL(system,''),IFNULL(debugInfo,''),current_timestamp as ts,SUBSTRING(REPLACE(reqTime,'-',''),1,8) AS pday,SUBSTRING(REPLACE(reqTime,'-',''),10,2) AS phour from kafka_toutiao_gl_topic_source"
}
}
sink {
Hive {
source_table_name = "kafka_toutiao_gl_topic_result"
table_name = "ods_tmp.prd_kafka_rta_toutiao_gl_topic"
metastore_uri = "thrift://xxx:9083"
}
}
```
### Running Command
```shell
../bin/start-seatunnel-flink-13-connector-v2.sh --config kafka2hive_toutiao_gl_topic -m yarn-cluster -ynm seatunnel
```
### Error Exception
```log
[finloan@jds01 config]$ ../bin/start-seatunnel-flink-13-connector-v2.sh --config kafka2hive_toutiao_gl_topic -m yarn-cluster -ynm seatunnel
Execute SeaTunnel Flink Job: ${FLINK_HOME}/bin/flink run -m yarn-cluster -ynm seatunnel -c org.apache.seatunnel.core.starter.flink.SeaTunnelFlink /home/q/dis/apache-seatunnel-incubating-2.3.1/starter/seatunnel-flink-13-starter.jar --config kafka2hive_toutiao_gl_topic --name SeaTunnel
Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR or HADOOP_CLASSPATH was set.
Setting HBASE_CONF_DIR=/etc/hbase/conf because no HBASE_CONF_DIR was set.
2023-04-11 17:58:43,199 INFO org.apache.hadoop.security.UserGroupInformation [] - Login successful for user dis@LYCC.COM using keytab file /home/q/dis/dis.keytab
------------------------------------------------------------
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Flink job executed failed
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:372)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222)
at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246)
at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
Caused by: org.apache.seatunnel.core.starter.exception.CommandExecuteException: Flink job executed failed
at org.apache.seatunnel.core.starter.flink.command.FlinkTaskExecuteCommand.execute(FlinkTaskExecuteCommand.java:63)
at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40)
at org.apache.seatunnel.core.starter.flink.SeaTunnelFlink.main(SeaTunnelFlink.java:34)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
... 11 more
Caused by: java.lang.NullPointerException
at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.AbstractWriteStrategy.lambda$buildSchemaWithRowType$0(AbstractWriteStrategy.java:129)
at java.util.ArrayList.forEach(ArrayList.java:1259)
at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.AbstractWriteStrategy.buildSchemaWithRowType(AbstractWriteStrategy.java:127)
at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.TextWriteStrategy.setSeaTunnelRowTypeInfo(TextWriteStrategy.java:68)
at org.apache.seatunnel.connectors.seatunnel.file.sink.BaseFileSink.setTypeInfo(BaseFileSink.java:71)
at org.apache.seatunnel.core.starter.flink.execution.SinkExecuteProcessor.execute(SinkExecuteProcessor.java:110)
at org.apache.seatunnel.core.starter.flink.execution.FlinkExecution.execute(FlinkExecution.java:109)
at org.apache.seatunnel.core.starter.flink.command.FlinkTaskExecuteCommand.execute(FlinkTaskExecuteCommand.java:61)
```
### Flink or Spark Version
Flink 1.13.6
### Java or Scala Version
1.8.0
### Screenshots
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-seatunnel] LYL41011 closed issue #4549: [Bug] [connector-hive] sink hive AbstractWriteStrategy.buildSchemaWithRowType NullPointerException
Posted by "LYL41011 (via GitHub)" <gi...@apache.org>.
LYL41011 closed issue #4549: [Bug] [connector-hive] sink hive AbstractWriteStrategy.buildSchemaWithRowType NullPointerException
URL: https://github.com/apache/incubator-seatunnel/issues/4549
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [seatunnel] SinyoWong commented on issue #4549: [Bug] [connector-hive] sink hive AbstractWriteStrategy.buildSchemaWithRowType NullPointerException
Posted by "SinyoWong (via GitHub)" <gi...@apache.org>.
SinyoWong commented on issue #4549:
URL: https://github.com/apache/seatunnel/issues/4549#issuecomment-1736614552
给我这个PR,这个问题我解决了。
Give me a PR, this problem I've solved.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [seatunnel] RU-jiayijie commented on issue #4549: [Bug] [connector-hive] sink hive AbstractWriteStrategy.buildSchemaWithRowType NullPointerException
Posted by "RU-jiayijie (via GitHub)" <gi...@apache.org>.
RU-jiayijie commented on issue #4549:
URL: https://github.com/apache/seatunnel/issues/4549#issuecomment-1588980402
hive 表字段是不区分大小写的,所以你的query=中的字段必须 as 小写hive建表时的字段名 如下:
`env {
execution.parallelism = 1
job.mode = "STREAMING"
execution.checkpoint.interval = 60000
format.fail-on-missing-field = false
}
source {
Kafka {
result_table_name = "xxxxx"
format = json
json.fail-on-missing-field = false
json.ignore-parse-errors = true
schema = {
fields {
EvnetBody = "string"
EventDay = "string"
EventTime = "string"
}
}
start_mode = "group_offsets"
topic = "xxx"
consumer.group = "xxxx"
commit_on_checkpoint = true
bootstrap.servers = "xxxxxxxxx"
}
}
transform {
sql {
source_table_name = "xxxxxxxx"
result_table_name = "xxxxxxxx"
query = "select EvnetBody as eventbody, replace(EventDay,'-','') as dt, EventTime as hour from xxxxxxx",
}
}
sink {
Hive {
source_table_name = "xxxxx"
table_name = "test.xxxxx"
metastore_uri = "xxxxxxx"
fields = ["eventbody","dt","hour"]
}
}`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] [Bug] [connector-hive] sink hive AbstractWriteStrategy.buildSchemaWithRowType NullPointerException [seatunnel]
Posted by "zengqinchris (via GitHub)" <gi...@apache.org>.
zengqinchris commented on issue #4549:
URL: https://github.com/apache/seatunnel/issues/4549#issuecomment-1855281676
大佬们,你们指定 start_mode = "group_offsets" 可以消费数据吗,我的一直不能消费数据,换成新的消费者组也是不行的,只能消费新的数据,之前的历史数据,只能消费新的数据
env {
parallelism = 1
job.mode = "STREAMING"
checkpoint.interval = 15
}
source {
Kafka {
format = json
field_delimiter = ","
json.fail-on-missing-field = false
json.ignore-parse-errors = true
format_error_handle_way = "fail"
topic = "kv_info"
consumer.group = "kv_info_client_12"
commit_on_checkpoint = true
start_mode = "group_offsets"
bootstrap.servers = "bd212:9092,bd213:9092,bd214:9092"
kafka.config = {
client.id = kv_info_client_12
max.poll.records = 50000
enable.auto.commit = false
}
}
}
transform {
}
sink {
Console {
}
}
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-seatunnel] yujh1981 commented on issue #4549: [Bug] [connector-hive] sink hive AbstractWriteStrategy.buildSchemaWithRowType NullPointerException
Posted by "yujh1981 (via GitHub)" <gi...@apache.org>.
yujh1981 commented on issue #4549:
URL: https://github.com/apache/incubator-seatunnel/issues/4549#issuecomment-1518551597
I find the same error。
How can i do to resolve it?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org