You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by "15810785091 (via GitHub)" <gi...@apache.org> on 2024/04/23 10:38:18 UTC

[I] During Hive metastore connection, the underlying NULL data is\N error [seatunnel]

15810785091 opened a new issue, #6744:
URL: https://github.com/apache/seatunnel/issues/6744

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues.
   
   
   ### What happened
   
   The hive data file is read when connecting using hive metasotre, and an error is reported when the data exists NULL, because the underlying storage of the data file is\N
   Hivejdbc does not have this problem
   
   ### SeaTunnel Version
   
   2.3.5and2.3.4
   
   ### SeaTunnel Config
   
   ```conf
   env {
     parallelism = 1
     job.mode = "BATCH"
   }
   
   source {
     Hive {
       table_name = "th_db.test1"
       metastore_uri = "thrift://172.16.111.11:9083"
       hdfs_site_path = "/etc/hadoop/conf/hdfs-site.xml"
       result_table_name = "hive_table"
     }
   }
   sink {
           Console{}
   }
   ```
   
   
   ### Running Command
   
   ```shell
   bin/seatunnel.sh --config hive_me.conf -e local
   ```
   
   
   ### Error Exception
   
   ```log
   2024-04-23 18:35:03,593 ERROR [o.a.s.c.s.SeaTunnel           ] [main] -
   ===============================================================================
   
   
   
   Exception in thread "main" org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel job executed failed
           at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:202)
           at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40)
           at org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34)
   Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.common.exception.SeaTunnelRuntimeException: ErrorCode:[COMMON-01], ErrorDescription:[SeaTunnel read file 'hdfs://nameservice1/user/hive/warehouse/th_db.db/test1/000000_0' failed.]
           at org.apache.seatunnel.common.exception.CommonError.fileOperationFailed(CommonError.java:60)
           at org.apache.seatunnel.connectors.seatunnel.file.source.BaseFileSourceReader.pollNext(BaseFileSourceReader.java:65)
           at org.apache.seatunnel.engine.server.task.flow.SourceFlowLifeCycle.collect(SourceFlowLifeCycle.java:156)
           at org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.collect(SourceSeaTunnelTask.java:116)
           at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168)
           at org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.call(SourceSeaTunnelTask.java:121)
           at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:703)
           at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1004)
           at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
           at java.util.concurrent.FutureTask.run(FutureTask.java:266)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.NumberFormatException: For input string: "\N"
           at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
           at java.lang.Long.parseLong(Long.java:589)
           at java.lang.Long.parseLong(Long.java:631)
           at org.apache.seatunnel.format.text.TextDeserializationSchema.convert(TextDeserializationSchema.java:247)
           at org.apache.seatunnel.format.text.TextDeserializationSchema.deserialize(TextDeserializationSchema.java:152)
           at org.apache.seatunnel.format.text.TextDeserializationSchema.deserialize(TextDeserializationSchema.java:57)
           at org.apache.seatunnel.connectors.seatunnel.file.source.reader.TextReadStrategy.lambda$read$0(TextReadStrategy.java:95)
           at java.util.Iterator.forEachRemaining(Iterator.java:116)
           at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
           at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
           at org.apache.seatunnel.connectors.seatunnel.file.source.reader.TextReadStrategy.read(TextReadStrategy.java:91)
           at org.apache.seatunnel.connectors.seatunnel.file.source.BaseFileSourceReader.pollNext(BaseFileSourceReader.java:63)
           ... 11 more
   
           at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:194)
           ... 2 more
   2024-04-23 18:35:03,596 INFO  [s.c.s.s.c.ClientExecuteCommand] [ForkJoinPool.commonPool-worker-50] - run shutdown hook because get close signal
   ```
   
   
   ### Zeta or Flink or Spark Version
   
   zeta
   
   ### Java or Scala Version
   
   1.8
   
   ### Screenshots
   
   ![image](https://github.com/apache/seatunnel/assets/53160760/9ba23419-d67a-454f-98d9-1a31d123e154)
   
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] During Hive metastore connection, the underlying NULL data is\N error [seatunnel]

Posted by "iture123 (via GitHub)" <gi...@apache.org>.
iture123 commented on issue #6744:
URL: https://github.com/apache/seatunnel/issues/6744#issuecomment-2072250975

   I am willing to submit a PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] During Hive metastore connection, the underlying NULL data is\N error [seatunnel]

Posted by "Pikawang (via GitHub)" <gi...@apache.org>.
Pikawang commented on issue #6744:
URL: https://github.com/apache/seatunnel/issues/6744#issuecomment-2081787009

   use orc create table
   
   STORED AS ORC


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] During Hive metastore connection, the underlying NULL data is\N error [seatunnel]

Posted by "15810785091 (via GitHub)" <gi...@apache.org>.
15810785091 commented on issue #6744:
URL: https://github.com/apache/seatunnel/issues/6744#issuecomment-2081823297

   @Pikawang Not applicable to all scenarios,Not all data tables are in ORC format


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org