You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by "ocean-zhc (via GitHub)" <gi...@apache.org> on 2023/05/10 02:25:28 UTC

[GitHub] [incubator-seatunnel] ocean-zhc opened a new issue, #4727: [Bug] [Module Name] Kafka data cannot be written to Hive

ocean-zhc opened a new issue, #4727:
URL: https://github.com/apache/incubator-seatunnel/issues/4727

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues.
   
   
   ### What happened
   
   json Example:`{"name":"zhc","age":15}`
   
   hive DDL:
   ```
   CREATE TABLE `default.user1`(
     `name` string,
     `age` int)
   row format delimited fields terminated by '\n'
   stored as orc tblproperties ("orc.compress"="NONE");
   ```
   
   SeaTunnel conf:
   
   ```
   env {
     execution.parallelism = 1
     job.mode = "STREAMING"
     execution.planner = blink
     job.name = "kafka_console_test"
     execution.checkpoint.interval = 60000
   }
   
   source {
   
     Kafka {
       result_table_name = "kafka_table"
       schema = {
         fields {
           name = "string"
           age = "int"
         }
       }
       topic = "DZFP_DZDZ.DZDZ_FPXX_JDCFP"
       bootstrap.servers = "local:19092"
       kafka.config = {
         client.id = client_1
         max.poll.records = 500
         auto.offset.reset = "earliest"
         enable.auto.commit = "false"
       }
     }
     
   }
   
   transform {
     
   }
   
   sink {
     Hive {
       table_name = "default.user1"
       metastore_uri = "thrift://centos4:9083"
     }
   }
   
   ```
   
   ### SeaTunnel Version
   
   2.3.1
   
   ### SeaTunnel Config
   
   ```conf
   env {
     execution.parallelism = 1
     job.mode = "STREAMING"
     execution.planner = blink
     job.name = "kafka_console_test"
     execution.checkpoint.interval = 60000
   }
   
   source {
   
     Kafka {
       result_table_name = "kafka_table"
       schema = {
         fields {
           name = "string"
           age = "int"
         }
       }
       topic = "DZFP_DZDZ.DZDZ_FPXX_JDCFP"
       bootstrap.servers = "local:19092"
       kafka.config = {
         client.id = client_1
         max.poll.records = 500
         auto.offset.reset = "earliest"
         enable.auto.commit = "false"
       }
     }
     
   }
   
   transform {
     
   }
   
   sink {
     Hive {
       table_name = "default.user1"
       metastore_uri = "thrift://centos4:9083"
     }
   }
   
   ```
   ```
   
   
   ### Running Command
   
   ```shell
   ./start-seatunnel-spark-3-connector-v2.sh --master spark://centos4:7077 --deploy-mode client --config /home/app/py/tmp/jdcfp.conf
   ```
   
   
   ### Error Exception
   
   ```log
   No Error, but data cannot be written~
   ```
   
   
   ### Flink or Spark Version
   
   Spark 3.2.2
   
   ### Java or Scala Version
   
   Java:1.8.0_172
   Scala:2.12
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] devbiu commented on issue #4727: [Bug] [Module Name] Kafka data cannot be written to Hive

Posted by "devbiu (via GitHub)" <gi...@apache.org>.
devbiu commented on issue #4727:
URL: https://github.com/apache/seatunnel/issues/4727#issuecomment-1611216550

   > Try changing the hive table storage format to text or parquet, it should be no problem.
   
   I tried three types of writing from kafka to hive
   Orc text parquet still cannot be written
   The author used Spark while I used Flink 1.13 and the Java version was 11


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] liugddx commented on issue #4727: [Bug] [Module Name] Kafka data cannot be written to Hive

Posted by "liugddx (via GitHub)" <gi...@apache.org>.
liugddx commented on issue #4727:
URL: https://github.com/apache/incubator-seatunnel/issues/4727#issuecomment-1551213586

   Can you provide a running log?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] devbiu commented on issue #4727: [Bug] [Module Name] Kafka data cannot be written to Hive

Posted by "devbiu (via GitHub)" <gi...@apache.org>.
devbiu commented on issue #4727:
URL: https://github.com/apache/seatunnel/issues/4727#issuecomment-1617493965

   My problem has been solved
   
   Added execution. checkpoint. interval=60000
   
   There are no issues with writing text and parquet types to hive
   
   
   There may be errors in the ORC format
   
   
   ```java 
   switched from RUNNING to FAILED with failure cause: java.util.concurrent.CompletionException: org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: Could not forward element to next operator
   	at java.util.concurrent.CompletableFuture.reportJoin(CompletableFuture.java:375)
   	at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1934)
   	at org.apache.seatunnel.connectors.seatunnel.kafka.source.KafkaSourceReader.lambda$pollNext$4(KafkaSourceReader.java:151)
   	at java.lang.Iterable.forEach(Iterable.java:75)
   	at org.apache.seatunnel.connectors.seatunnel.kafka.source.KafkaSourceReader.pollNext(KafkaSourceReader.java:109)
   	at org.apache.seatunnel.translation.source.ParallelSource.run(ParallelSource.java:128)
   	at org.apache.seatunnel.translation.flink.source.BaseSeaTunnelSourceFunction.run(BaseSeaTunnelSourceFunction.java:83)
   	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:104)
   	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:60)
   	at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:269)
   Caused by: org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException: Could not forward element to next operator
   	at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:88)
   	at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46)
   	at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26)
   	at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50)
   	at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28)
   	at org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:317)
   	at org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:411)
   	at org.apache.seatunnel.translation.flink.source.RowCollector.collect(RowCollector.java:45)
   	at org.apache.seatunnel.translation.flink.source.RowCollector.collect(RowCollector.java:30)
   	at org.apache.seatunnel.api.serialization.DeserializationSchema.deserialize(DeserializationSchema.java:39)
   	at org.apache.seatunnel.connectors.seatunnel.kafka.source.KafkaSourceReader.lambda$null$3(KafkaSourceReader.java:126)
   	at org.apache.seatunnel.connectors.seatunnel.kafka.source.KafkaConsumerThread.run(KafkaConsumerThread.java:53)
   	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.NullPointerException
   	at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.OrcWriteStrategy.buildSchemaWithRowType(OrcWriteStrategy.java:185)
   	at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.OrcWriteStrategy.getOrCreateWriter(OrcWriteStrategy.java:111)
   	at org.apache.seatunnel.connectors.seatunnel.file.sink.writer.OrcWriteStrategy.write(OrcWriteStrategy.java:74)
   	at org.apache.seatunnel.connectors.seatunnel.file.sink.BaseFileSinkWriter.write(BaseFileSinkWriter.java:108)
   	at org.apache.seatunnel.connectors.seatunnel.file.sink.BaseFileSinkWriter.write(BaseFileSinkWriter.java:43)
   	at org.apache.seatunnel.translation.flink.sink.FlinkSinkWriter.write(FlinkSinkWriter.java:51)
   	at org.apache.flink.streaming.runtime.operators.sink.AbstractSinkWriterOperator.processElement(AbstractSinkWriterOperator.java:80)
   	at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:71)
   	... 16 more ```
   
   
   It seems that orc cannot be obtained


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] wang-zhiang commented on issue #4727: [Bug] [Module Name] Kafka data cannot be written to Hive

Posted by "wang-zhiang (via GitHub)" <gi...@apache.org>.
wang-zhiang commented on issue #4727:
URL: https://github.com/apache/incubator-seatunnel/issues/4727#issuecomment-1550656012

   我也是这个问题,但有一次写进去了后面又写不进去了,写进hive里的那一次hive中文都是问号


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] devbiu commented on issue #4727: [Bug] [Module Name] Kafka data cannot be written to Hive

Posted by "devbiu (via GitHub)" <gi...@apache.org>.
devbiu commented on issue #4727:
URL: https://github.com/apache/seatunnel/issues/4727#issuecomment-1611022286

   I also have the same problem
   My seatunnel version is 2.3.0
   The source side is Kafka, while the sink side has added Console for easy observation,
   No errors were reported after running, and the data has been output to Console, but no data has been written to the hive table


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] ocean-zhc commented on issue #4727: [Bug] [Module Name] Kafka data cannot be written to Hive

Posted by "ocean-zhc (via GitHub)" <gi...@apache.org>.
ocean-zhc commented on issue #4727:
URL: https://github.com/apache/seatunnel/issues/4727#issuecomment-1612488357

   > > Try changing the hive table storage format to text or parquet, it should be no problem.
   > 
   > I tried three types of writing from kafka to hive Orc text parquet still cannot be written The author used Spark while I used Flink 1.13 and the Java version was 11
   
   后来测试使用的是flink1.16.2,忘记标记了,SeaTunnel使用的是dev分支,也就是2.3.2-SNAPSHOT。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] SinyoWong commented on issue #4727: [Bug] [Module Name] Kafka data cannot be written to Hive

Posted by "SinyoWong (via GitHub)" <gi...@apache.org>.
SinyoWong commented on issue #4727:
URL: https://github.com/apache/seatunnel/issues/4727#issuecomment-1736613859

   给我个PR,这个问题我解决了。
   Give me a PR, this problem I've solved. @ocean-zhc 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] wang-zhiang commented on issue #4727: [Bug] [Module Name] Kafka data cannot be written to Hive

Posted by "wang-zhiang (via GitHub)" <gi...@apache.org>.
wang-zhiang commented on issue #4727:
URL: https://github.com/apache/incubator-seatunnel/issues/4727#issuecomment-1550654838

   老兄,解决了吗,我也有这个问题


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] ocean-zhc closed issue #4727: [Bug] [Module Name] Kafka data cannot be written to Hive

Posted by "ocean-zhc (via GitHub)" <gi...@apache.org>.
ocean-zhc closed issue #4727: [Bug] [Module Name] Kafka data cannot be written to Hive
URL: https://github.com/apache/seatunnel/issues/4727


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] lightzhao commented on issue #4727: [Bug] [Module Name] Kafka data cannot be written to Hive

Posted by "lightzhao (via GitHub)" <gi...@apache.org>.
lightzhao commented on issue #4727:
URL: https://github.com/apache/seatunnel/issues/4727#issuecomment-1611074240

   Try changing the hive table storage format to text or parquet, it should be no problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] devbiu commented on issue #4727: [Bug] [Module Name] Kafka data cannot be written to Hive

Posted by "devbiu (via GitHub)" <gi...@apache.org>.
devbiu commented on issue #4727:
URL: https://github.com/apache/seatunnel/issues/4727#issuecomment-1613020842

   > > > Try changing the hive table storage format to text or parquet, it should be no problem.
   > > 
   > > 
   > > I tried three types of writing from kafka to hive Orc text parquet still cannot be written The author used Spark while I used Flink 1.13 and the Java version was 11
   > 
   > 后来测试使用的是flink1.16.2,忘记标记了,SeaTunnel使用的是dev分支,也就是2.3.2-SNAPSHOT。
   
   谢谢你的回答


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] ocean-zhc commented on issue #4727: [Bug] [Module Name] Kafka data cannot be written to Hive

Posted by "ocean-zhc (via GitHub)" <gi...@apache.org>.
ocean-zhc commented on issue #4727:
URL: https://github.com/apache/seatunnel/issues/4727#issuecomment-1611100093

   > 老兄,解决了吗,我也有这个问题
   
   解决了呢。
   <img width="549" alt="image" src="https://github.com/apache/seatunnel/assets/46189785/d6c01895-b62e-4ad3-afa6-fb170ab6fb78">
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] devbiu commented on issue #4727: [Bug] [Module Name] Kafka data cannot be written to Hive

Posted by "devbiu (via GitHub)" <gi...@apache.org>.
devbiu commented on issue #4727:
URL: https://github.com/apache/seatunnel/issues/4727#issuecomment-1611216162

   > -
   I tried three types of writing from kafka to hive
   Orc text parquet still cannot be written
   The author used Spark while I used Flink 1.13 and the Java version was 11


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] ocean-zhc commented on issue #4727: [Bug] [Module Name] Kafka data cannot be written to Hive

Posted by "ocean-zhc (via GitHub)" <gi...@apache.org>.
ocean-zhc commented on issue #4727:
URL: https://github.com/apache/seatunnel/issues/4727#issuecomment-1615839569

   但是 `max.poll.records` 参数设置好像报错!
   ![image](https://github.com/apache/seatunnel/assets/46189785/1860e401-1eb2-4ce0-bf37-a324ff692535)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] wang-zhiang commented on issue #4727: [Bug] [Module Name] Kafka data cannot be written to Hive

Posted by "wang-zhiang (via GitHub)" <gi...@apache.org>.
wang-zhiang commented on issue #4727:
URL: https://github.com/apache/incubator-seatunnel/issues/4727#issuecomment-1552366329

   I am very happy to receive your reply. I will send you the run log in which the Chinese characters in the data are garbled using seatunneL2.3.1 to synchronize data to hive, and I will send you both the exported csv of the source table and the csv of the synchronized table. If you need any more information, I will actively cooperate with you and look forward to your reply
   
   
   
   
   ------------------&nbsp;原始邮件&nbsp;------------------
   发件人:                                                                                                                        "apache/incubator-seatunnel"                                                                                    ***@***.***&gt;;
   发送时间:&nbsp;2023年5月17日(星期三) 晚上7:25
   ***@***.***&gt;;
   ***@***.******@***.***&gt;;
   主题:&nbsp;Re: [apache/incubator-seatunnel] [Bug] [Module Name] Kafka data cannot be written to Hive (Issue #4727)
   
   
   
   
   
    
   Can you provide a running log?
    
   —
   Reply to this email directly, view it on GitHub, or unsubscribe.
   You are receiving this because you commented.Message ID: ***@***.***&gt;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org