You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by "gaotong521 (via GitHub)" <gi...@apache.org> on 2024/04/12 03:36:47 UTC

[I] When the hive table storage type is orc, data sinks to the hive, and the task fails to be executed [seatunnel]

gaotong521 opened a new issue, #6694:
URL: https://github.com/apache/seatunnel/issues/6694

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues.
   
   
   ### What happened
   
   When the hive table storage type is orc, data sinks to the hive and the FieldMapper transform is configured. If certain fields in the hive table are not mapped, tasks fail to be executed
   
   ### SeaTunnel Version
   
   2.3.4
   
   ### SeaTunnel Config
   
   ```conf
   {
       "env": {
           "parallelism": 3,
           "job.mode": "BATCH",
           "checkpoint.interval": 30000,
           "job.name": "seatunnel_1712823979630"
       },
       "source": [
           {
               "plugin_name": "Jdbc",
               "result_table_name": "table_source",
               "user": "postgres",
               "password": "C3kk4v5_b4f2Jr",
               "driver": "org.postgresql.Driver",
               "url": "jdbc:postgresql://10.188.15.91:5434/gis",
               "query": "select event_id,event_type,event_radius,event_source,start_time,end_time,priority,latitude,longitude,elevation,node_ids,create_time,update_time from ghcloud.gh_traffic_event_info"
           }
       ],
       "transform": [
           {
               "plugin_name": "FieldMapper",
               "source_table_name": "table_source",
               "result_table_name": "table_source_FieldMapper",
               "field_mapper": {
                   "event_id": "event_id",
                   "event_type": "event_type",
                   "event_radius": "event_radius",
                   "event_source": "event_source",
                   "start_time": "start_time",
                   "end_time": "end_time",
                   "priority": "priority",
                   "latitude": "latitude",
                   "longitude": "longitude",
                   "elevation": "elevation",
                   "node_ids": "node_ids",
                   "create_time": "create_time",
                   "update_time": "update_time"
               }
           }
       ],
       "sink": [
           {
               "plugin_name": "Hive",
               "source_table_name": "table_source_FieldMapper",
               "table_name": "gh_cloud_data_model.dwd_pub_traffic_event",
               "metastore_uri": "thrift://cloudera-hadoop-61:9083"
           }
       ]
   }
   ```
   
   
   ### Running Command
   
   ```shell
   Executed by dolphin scheduler
   ```
   
   
   ### Error Exception
   
   ```log
   SHUTDOWN
   	2024-04-12 11:31:30,246 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - Closed SeaTunnel client......
   	2024-04-12 11:31:30,246 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - Closed metrics executor service ......
   	2024-04-12 11:31:30,246 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - 
   	
   	===============================================================================
   	
   	
   	2024-04-12 11:31:30,246 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - Fatal Error, 
   	
   	2024-04-12 11:31:30,246 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - Please submit bug report in https://github.com/apache/seatunnel/issues
   	
   	2024-04-12 11:31:30,246 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - Reason:SeaTunnel job executed failed 
   	
   	2024-04-12 11:31:30,248 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - Exception StackTrace:org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel job executed failed
   		at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:202)
   		at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40)
   		at org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34)
   	Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: java.lang.RuntimeException: java.lang.NullPointerException
   		at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:257)
   		at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:66)
   		at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:39)
   		at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:27)
   		at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.handleRecord(IntermediateBlockingQueue.java:75)
   		at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.collect(IntermediateBlockingQueue.java:50)
   		at org.apache.seatunnel.engine.server.task.flow.IntermediateQueueFlowLifeCycle.collect(IntermediateQueueFlowLifeCycle.java:51)
   		at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.collect(TransformSeaTunnelTask.java:73)
   		at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168)
   		at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.call(TransformSeaTunnelTask.java:78)
   		at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:648)
   		at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:949)
   		at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   		at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   		at java.lang.Thread.run(Thread.java:748)
   	Caused by: java.lang.NullPointerException
   		at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.OrcWriteStrategy.buildSchemaWithRowType(OrcWriteStrategy.java:196)
   		at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.OrcWriteStrategy.getOrCreateWriter(OrcWriteStrategy.java:116)
   		at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.OrcWriteStrategy.write(OrcWriteStrategy.java:75)
   		at org.apache.seatunnel.connectors.seatunnel.file.sink.BaseFileSinkWriter.write(BaseFileSinkWriter.java:134)
   		at org.apache.seatunnel.connectors.seatunnel.file.sink.BaseFileSinkWriter.write(BaseFileSinkWriter.java:46)
   		at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:247)
   		... 16 more
   	
   		at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:194)
   		... 2 more
   	 
   	2024-04-12 11:31:30,248 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - 
   	===============================================================================
   	
   	
   	
   	Exception in thread "main" org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel job executed failed
   		at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:202)
   		at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40)
   		at org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34)
   	Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: java.lang.RuntimeException: java.lang.NullPointerException
   		at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:257)
   		at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:66)
   		at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:39)
   		at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:27)
   		at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.handleRecord(IntermediateBlockingQueue.java:75)
   		at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.collect(IntermediateBlockingQueue.java:50)
   		at org.apache.seatunnel.engine.server.task.flow.IntermediateQueueFlowLifeCycle.collect(IntermediateQueueFlowLifeCycle.java:51)
   		at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.collect(TransformSeaTunnelTask.java:73)
   		at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168)
   		at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.call(TransformSeaTunnelTask.java:78)
   		at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:648)
   		at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:949)
   		at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   		at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   		at java.lang.Thread.run(Thread.java:748)
   	Caused by: java.lang.NullPointerException
   		at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.OrcWriteStrategy.buildSchemaWithRowType(OrcWriteStrategy.java:196)
   		at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.OrcWriteStrategy.getOrCreateWriter(OrcWriteStrategy.java:116)
   		at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.OrcWriteStrategy.write(OrcWriteStrategy.java:75)
   		at org.apache.seatunnel.connectors.seatunnel.file.sink.BaseFileSinkWriter.write(BaseFileSinkWriter.java:134)
   		at org.apache.seatunnel.connectors.seatunnel.file.sink.BaseFileSinkWriter.write(BaseFileSinkWriter.java:46)
   		at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:247)
   		... 16 more
   	
   		at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:194)
   		... 2 more
   	2024-04-12 11:31:30,249 INFO  [s.c.s.s.c.ClientExecuteCommand] [ForkJoinPool.commonPool-worker-2] - run shutdown hook because get close signal
   [INFO] 2024-04-12 11:31:30.453 +0800 - FINALIZE_SESSION
   ```
   
   
   ### Zeta or Flink or Spark Version
   
   _No response_
   
   ### Java or Scala Version
   
   _No response_
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] When the hive table storage type is orc, data sinks to the hive, and the task fails to be executed [seatunnel]

Posted by "LeonYoah (via GitHub)" <gi...@apache.org>.
LeonYoah commented on issue #6694:
URL: https://github.com/apache/seatunnel/issues/6694#issuecomment-2060432346

   Please paste in the ddl statement of the [gh_cloud_data_model.dwd_pub_traffic_event table]. It is suspected that the name of the mapped field is inconsistent with that of the destination table, which causes the null pointer problem


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] When the hive table storage type is orc, data sinks to the hive, and the task fails to be executed [seatunnel]

Posted by "LeonYoah (via GitHub)" <gi...@apache.org>.
LeonYoah commented on issue #6694:
URL: https://github.com/apache/seatunnel/issues/6694#issuecomment-2060771448

   You should pay attention to two things: one is that all fields in the [hive] table should have corresponding fields from upstream. If there are no extra fields upstream, you can pass the empty string, that is, [''], as an empty field, but you cannot specify [null] as an empty field, and the field mapping name should be the same as the field name in the table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org