You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by "wang-zhiang (via GitHub)" <gi...@apache.org> on 2023/05/18 02:33:16 UTC

[GitHub] [incubator-seatunnel] wang-zhiang opened a new issue, #4776: Synchronizing data to hive contains garbled characters

wang-zhiang opened a new issue, #4776:
URL: https://github.com/apache/incubator-seatunnel/issues/4776

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues.
   
   
   ### What happened
   
   When seatunneL2.3.1 uses mongo to synchronize data to hive, no error is reported, but the synchronized data is displayed as'? ', so I also tested hive and found garbled characters in hive. The seatunnel2.1.2 version used before did not have this problem. If this is not a bug, I hope you can get help
   
   ### SeaTunnel Version
   
   2.3.1
   
   ### SeaTunnel Config
   
   ```conf
   env
     {
       job.mode = "BATCH"
       execution.parallelism = 10
   }
   
   source {
                     Hive {
        table_name = "ods.mj_temp"
        metastore_uri = "thrift://hadoop104:9083"
   
   }
   }
   
   transform {
   
   }
   
   sink {
                     Hive {
        table_name = "tmp.mj_temp"
        metastore_uri = "thrift://hadoop104:9083"
    }
   
   }
   ~
   ```
   
   
   ### Running Command
   
   ```shell
   /opt/module/seatunnel-2.3.1/bin/start-seatunnel-spark-2-connector-v2.sh         --master yarn         --deploy-mode client         --config /opt/module/seatunnel-2.1.2/script_spark/test/seatunnel-2.3.1/hive-to.conf
   ```
   
   
   ### Error Exception
   
   ```log
   No error was reported during operation.Only the Chinese data after synchronization is garbled. I have posted some samples of the data obtained after synchronization:
   569160699454,????|??POSTURE PT????????????,?? ??
   568992142211,????|MINE MIRS???????????????LED???,?? ??
   568438068924,????|?????????????????????2????,?? ??
   566478155925,????|????????????????????U????,?? ??
   564262765693,????|????????????????? ????????,?? ??
   562042332078,????|???????? ?????? ?????????,?? ??
   559423175663,????|911?????????? ?????????????,?? ??
   557509244769,????|??????????RUAN???????????,?? ??
   557310183444,????|????????6???????????????,?? ??
   556109409857,????|?????????????????????????,?? ??
   549019815209,????|Oracleen?????????????????????,?? ??
   582054201255,?????????????????????????????,?? ??
   581048063649,????????????????????????????5?,?? ??
   ```
   
   
   ### Flink or Spark Version
   
   spark-2.4.8
   
   ### Java or Scala Version
   
   1.8
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] wang-zhiang commented on issue #4776: Synchronizing data to hive contains garbled characters

Posted by "wang-zhiang (via GitHub)" <gi...@apache.org>.
wang-zhiang commented on issue #4776:
URL: https://github.com/apache/incubator-seatunnel/issues/4776#issuecomment-1552858540

   Thank you for your reply. I tried again according to your method, but the result was still garbled. Is this related to the coding of my hive setting or is it a problem in my cluster.Do you have any ideas? Thank you again
   
   
   hive version:3.1.2
   hadoop:3.1.3
   spark:2.4.8
   
   
   
   
   
   
   CREATE TABLE `characters_source`(
   `id` string,
   `name` string)
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   STORED AS INPUTFORMAT
   'org.apache.hadoop.mapred.TextInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
   TBLPROPERTIES (
   'bucketing_version'='2',
   'last_modified_by'='smartpath',
   'last_modified_time'='1663656042',
   'transient_lastDdlTime'='1663656042');
   
   
   INSERT INTO TABLE characters_source
   VALUES (1,'詹姆斯'),(2,'乔丹');
   
   
   drop table characters_sink;
   
   CREATE TABLE `characters_sink`(
   `id` string,
   `name` string)
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   STORED AS INPUTFORMAT
   'org.apache.hadoop.mapred.TextInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
   TBLPROPERTIES (
   'bucketing_version'='2',
   'last_modified_by'='smartpath',
   'last_modified_time'='1663656042',
   'transient_lastDdlTime'='1663656042');
   
   
   
   Seatunnetl Hive conf:
   
   env {
     # You can set spark configuration here
     execution.parallelism = 1
   }
   
   
   source {
     Hive {
       table_name = "default.characters_source"
       metastore_uri = "thrift://hadoop104:9083"
     }
   }
   
   
   sink {
   Hive {
        table_name = "default.characters_sink"
       metastore_uri = "thrift://hadoop104:9083"
    }
   }
   
   
   select * from characters_sink
   
   
   
   
   ------------------ 原始邮件 ------------------
   发件人:                                                                                                                        "apache/incubator-seatunnel"                                                                                    ***@***.***&gt;;
   发送时间:&nbsp;2023年5月18日(星期四) 下午5:50
   ***@***.***&gt;;
   ***@***.******@***.***&gt;;
   主题:&nbsp;Re: [apache/incubator-seatunnel] Synchronizing data to hive contains garbled characters (Issue #4776)
   
   
   
   
   
     drop table characters_source; CREATE TABLE `characters_source`( `id` string, `name` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION  'hdfs://localhost:9000/user/hive/warehouse/characters_source' TBLPROPERTIES ( 'bucketing_version'='2', 'last_modified_by'='smartpath', 'last_modified_time'='1663656042', 'transient_lastDdlTime'='1663656042'); INSERT INTO TABLE characters_source  VALUES (1,'詹姆斯'),(2,'乔丹'); drop table characters_sink; CREATE TABLE `characters_sink`( `id` string, `name` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION  'hdfs://localhost:9000/user/hive/warehouse/characters_sink' TBLPROPERTIES ( 'bucketing_versi
 on'='2', 'last_modified_by'='smartpath', 'last_modified_time'='1663656042', 'transient_lastDdlTime'='1663656042');  
   Seatunnetl Hive conf:
    env {   # You can set spark configuration here   execution.parallelism = 1 } source {   Hive {     table_name = "default.characters_source"     metastore_uri = "thrift://localhost:9083"   } } sink { Hive {      table_name = "default.characters_sink"     metastore_uri = "thrift://localhost:9083"  } }  2023-05-18 17:47:15,088 INFO  [de8e2b80-70b4-448a-9733-c49260f041d0 main] exec.ListSinkOperator (Operator.java:logStats(1028)) - RECORDS_OUT_INTERMEDIATE:0, RECORDS_OUT_OPERATOR_LIST_SINK_3:2,  1       詹姆斯 2       乔丹 Time taken: 0.331 seconds, Fetched: 2 row(s)  
   Seatunnel version: Current dev version
    Hive:3.0.0
    Hadoop:3.0.0
    I have no garbled codes for the time being. Could you please recheck your configuration
    
   —
   Reply to this email directly, view it on GitHub, or unsubscribe.
   You are receiving this because you authored the thread.Message ID: ***@***.***&gt;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] wang-zhiang commented on issue #4776: Synchronizing data to hive contains garbled characters

Posted by "wang-zhiang (via GitHub)" <gi...@apache.org>.
wang-zhiang commented on issue #4776:
URL: https://github.com/apache/incubator-seatunnel/issues/4776#issuecomment-1552683631

   There's something at the bottom
   
   
   1.This is the ddl of the source table at synchronization:
   
   
   CREATE TABLE `ods.mj_temp`(
   &nbsp; `itemid` string,&nbsp;
   &nbsp; `itemname` string,&nbsp;
   &nbsp; `city` string,&nbsp;
   &nbsp; `shopcity` string,&nbsp;
   &nbsp; `leaf_classid` string,&nbsp;
   &nbsp; `leaf_class` string,&nbsp;
   &nbsp; `bid` string,&nbsp;
   &nbsp; `brandname` string,&nbsp;
   &nbsp; `shopid` string,&nbsp;
   &nbsp; `shopname` string,&nbsp;
   &nbsp; `catid1` string,&nbsp;
   &nbsp; `catid2` string,&nbsp;
   &nbsp; `catid3` string,&nbsp;
   &nbsp; `catid4` string,&nbsp;
   &nbsp; `class_1_category` string,&nbsp;
   &nbsp; `class_2_category` string,&nbsp;
   &nbsp; `class_3_category` string,&nbsp;
   &nbsp; `class_4_category` string,&nbsp;
   &nbsp; `shoptype_1` string,&nbsp;
   &nbsp; `background_level_1_category` string,&nbsp;
   &nbsp; `background_level_2_category` string,&nbsp;
   &nbsp; `background_level_3_category` string,&nbsp;
   &nbsp; `background_level_4_category` string,&nbsp;
   &nbsp; `background_leaf_category` string,&nbsp;
   &nbsp; `brand` string,&nbsp;
   &nbsp; `model` string,&nbsp;
   &nbsp; `modelid` string,&nbsp;
   &nbsp; `subtitle` string,&nbsp;
   &nbsp; `stock` string,&nbsp;
   &nbsp; `marked_price` string,&nbsp;
   &nbsp; `sales_volume` string,&nbsp;
   &nbsp; `sales_volume2` string,&nbsp;
   &nbsp; `qty` double,&nbsp;
   &nbsp; `sales__price` string,&nbsp;
   &nbsp; `price` double,&nbsp;
   &nbsp; `comments` string,&nbsp;
   &nbsp; `isjd_logistics` string,&nbsp;
   &nbsp; `isoverseas_purchase` string,&nbsp;
   &nbsp; `product_properties` string,&nbsp;
   &nbsp; `skuname` string,&nbsp;
   &nbsp; `time_` string,&nbsp;
   &nbsp; `priceall` string,&nbsp;
   &nbsp; `prices` string,&nbsp;
   &nbsp; `sales_data_crawl_date` string,&nbsp;
   &nbsp; `attribute` string,&nbsp;
   &nbsp; `buyyernum` string,&nbsp;
   &nbsp; `column_23` string,&nbsp;
   &nbsp; `salesdatasource` string,&nbsp;
   &nbsp; `comcount` string,&nbsp;
   &nbsp; `ttype` string,&nbsp;
   &nbsp; `categoryid` string,&nbsp;
   &nbsp; `brandid` string,&nbsp;
   &nbsp; `sellerid` string,&nbsp;
   &nbsp; `shop` string,&nbsp;
   &nbsp; `id` string,&nbsp;
   &nbsp; `shopuserid` string,&nbsp;
   &nbsp; `isglobal_purchase` string,&nbsp;
   &nbsp; `adddate` string,&nbsp;
   &nbsp; `grand` string,&nbsp;
   &nbsp; `ttypeold` string,&nbsp;
   &nbsp; `categoryidold` string,&nbsp;
   &nbsp; `brandidold` string,&nbsp;
   &nbsp; `gtype` string,&nbsp;
   &nbsp; `from_table_name` string,&nbsp;
   &nbsp; `pt_channel` string,&nbsp;
   &nbsp; `pt_ym` string)
   ROW FORMAT SERDE&nbsp;
   &nbsp; 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'&nbsp;
   STORED AS INPUTFORMAT&nbsp;
   &nbsp; 'org.apache.hadoop.mapred.TextInputFormat'&nbsp;
   OUTPUTFORMAT&nbsp;
   &nbsp; 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
   LOCATION
   &nbsp; 'hdfs://mycluster/user/hive/warehouse/ods.db/mj_temp'
   TBLPROPERTIES (
   &nbsp; 'bucketing_version'='2',&nbsp;
   &nbsp; 'last_modified_by'='smartpath',&nbsp;
   &nbsp; 'last_modified_time'='1663656042',&nbsp;
   &nbsp; 'transient_lastDdlTime'='1663656042')
   
   
   
   2. The following is the ddl of the synchronized result table :(Note: The ddl of the result table is derived from the source table create table as)
   
   
   CREATE TABLE `tmp.mj_temp`(
   &nbsp; `itemid` string,&nbsp;
   &nbsp; `itemname` string,&nbsp;
   &nbsp; `city` string,&nbsp;
   &nbsp; `shopcity` string,&nbsp;
   &nbsp; `leaf_classid` string,&nbsp;
   &nbsp; `leaf_class` string,&nbsp;
   &nbsp; `bid` string,&nbsp;
   &nbsp; `brandname` string,&nbsp;
   &nbsp; `shopid` string,&nbsp;
   &nbsp; `shopname` string,&nbsp;
   &nbsp; `catid1` string,&nbsp;
   &nbsp; `catid2` string,&nbsp;
   &nbsp; `catid3` string,&nbsp;
   &nbsp; `catid4` string,&nbsp;
   &nbsp; `class_1_category` string,&nbsp;
   &nbsp; `class_2_category` string,&nbsp;
   &nbsp; `class_3_category` string,&nbsp;
   &nbsp; `class_4_category` string,&nbsp;
   &nbsp; `shoptype_1` string,&nbsp;
   &nbsp; `background_level_1_category` string,&nbsp;
   &nbsp; `background_level_2_category` string,&nbsp;
   &nbsp; `background_level_3_category` string,&nbsp;
   &nbsp; `background_level_4_category` string,&nbsp;
   &nbsp; `background_leaf_category` string,&nbsp;
   &nbsp; `brand` string,&nbsp;
   &nbsp; `model` string,&nbsp;
   &nbsp; `modelid` string,&nbsp;
   &nbsp; `subtitle` string,&nbsp;
   &nbsp; `stock` string,&nbsp;
   &nbsp; `marked_price` string,&nbsp;
   &nbsp; `sales_volume` string,&nbsp;
   &nbsp; `sales_volume2` string,&nbsp;
   &nbsp; `qty` double,&nbsp;
   &nbsp; `sales__price` string,&nbsp;
   &nbsp; `price` double,&nbsp;
   &nbsp; `comments` string,&nbsp;
   &nbsp; `isjd_logistics` string,&nbsp;
   &nbsp; `isoverseas_purchase` string,&nbsp;
   &nbsp; `product_properties` string,&nbsp;
   &nbsp; `skuname` string,&nbsp;
   &nbsp; `time_` string,&nbsp;
   &nbsp; `priceall` string,&nbsp;
   &nbsp; `prices` string,&nbsp;
   &nbsp; `sales_data_crawl_date` string,&nbsp;
   &nbsp; `attribute` string,&nbsp;
   &nbsp; `buyyernum` string,&nbsp;
   &nbsp; `column_23` string,&nbsp;
   &nbsp; `salesdatasource` string,&nbsp;
   &nbsp; `comcount` string,&nbsp;
   &nbsp; `ttype` string,&nbsp;
   &nbsp; `categoryid` string,&nbsp;
   &nbsp; `brandid` string,&nbsp;
   &nbsp; `sellerid` string,&nbsp;
   &nbsp; `shop` string,&nbsp;
   &nbsp; `id` string,&nbsp;
   &nbsp; `shopuserid` string,&nbsp;
   &nbsp; `isglobal_purchase` string,&nbsp;
   &nbsp; `adddate` string,&nbsp;
   &nbsp; `grand` string,&nbsp;
   &nbsp; `ttypeold` string,&nbsp;
   &nbsp; `categoryidold` string,&nbsp;
   &nbsp; `brandidold` string,&nbsp;
   &nbsp; `gtype` string,&nbsp;
   &nbsp; `from_table_name` string,&nbsp;
   &nbsp; `pt_channel` string,&nbsp;
   &nbsp; `pt_ym` string)
   ROW FORMAT SERDE&nbsp;
   &nbsp; 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'&nbsp;
   STORED AS INPUTFORMAT&nbsp;
   &nbsp; 'org.apache.hadoop.mapred.TextInputFormat'&nbsp;
   OUTPUTFORMAT&nbsp;
   &nbsp; 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
   LOCATION
   &nbsp; 'hdfs://mycluster/user/hive/warehouse/tmp.db/mj_temp'
   TBLPROPERTIES (
   &nbsp; 'transient_lastDdlTime'='1684315313')
   
   
   
   
   
   Hello, thank you for your reply, do I need to provide it? Besides DDL, you said there are data, I sent you the data in the form of csv in my last email, is there anything wrong with it? Is there anything else I need to provide?
   
   
   
   
   
   
   
   
   ------------------&nbsp;原始邮件&nbsp;------------------
   发件人:                                                                                                                        "apache/incubator-seatunnel"                                                                                    ***@***.***&gt;;
   发送时间:&nbsp;2023年5月18日(星期四) 下午3:40
   ***@***.***&gt;;
   ***@***.******@***.***&gt;;
   主题:&nbsp;Re: [apache/incubator-seatunnel] Synchronizing data to hive contains garbled characters (Issue #4776)
   
   
   
   
   
    
   you Can put hive sql ddl and data into this problem
    
   —
   Reply to this email directly, view it on GitHub, or unsubscribe.
   You are receiving this because you authored the thread.Message ID: ***@***.***&gt;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] wang-zhiang commented on issue #4776: Synchronizing data to hive contains garbled characters

Posted by "wang-zhiang (via GitHub)" <gi...@apache.org>.
wang-zhiang commented on issue #4776:
URL: https://github.com/apache/incubator-seatunnel/issues/4776#issuecomment-1552324304

   Please be generous with your advice


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] wang-zhiang commented on issue #4776: Synchronizing data to hive contains garbled characters

Posted by "wang-zhiang (via GitHub)" <gi...@apache.org>.
wang-zhiang commented on issue #4776:
URL: https://github.com/apache/incubator-seatunnel/issues/4776#issuecomment-1552357974

   Hello, I have translated the source table data into csv and sent it to you
   Hello, thank you for your reply. This is the exported csv file of the data I synchronized to hive. As you can see, all the Chinese characters are changed into '? ', there is no such situation on seatunnel2.1.2. If you need me to provide some more materials, I will actively cooperate and look forward to your reply
   ------------------ 原始邮件 ------------------
   发件人:                                                                                                                        "apache/incubator-seatunnel"                                                                                    ***@***.***&gt;;
   发送时间:&nbsp;2023年5月18日(星期四) 中午11:26
   ***@***.***&gt;;
   ***@***.******@***.***&gt;;
   主题:&nbsp;Re: [apache/incubator-seatunnel] Synchronizing data to hive contains garbled characters (Issue #4776)
   
   
   
   
   
    
   Could you please send out your data and statements to provide more detailed data
    
   —
   Reply to this email directly, view it on GitHub, or unsubscribe.
   You are receiving this because you authored the thread.Message ID: ***@***.***&gt;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] wang-zhiang commented on issue #4776: Synchronizing data to hive contains garbled characters

Posted by "wang-zhiang (via GitHub)" <gi...@apache.org>.
wang-zhiang commented on issue #4776:
URL: https://github.com/apache/incubator-seatunnel/issues/4776#issuecomment-1552355054

   > 
   
   Hello, I replied to you in the email


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] zhilinli123 commented on issue #4776: Synchronizing data to hive contains garbled characters

Posted by "zhilinli123 (via GitHub)" <gi...@apache.org>.
zhilinli123 commented on issue #4776:
URL: https://github.com/apache/incubator-seatunnel/issues/4776#issuecomment-1552758407

   Do hive query garbled characters or export garbled characters from csv manually?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] github-actions[bot] commented on issue #4776: Synchronizing data to hive contains garbled characters

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #4776:
URL: https://github.com/apache/seatunnel/issues/4776#issuecomment-1595905551

   This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] wang-zhiang commented on issue #4776: Synchronizing data to hive contains garbled characters

Posted by "wang-zhiang (via GitHub)" <gi...@apache.org>.
wang-zhiang commented on issue #4776:
URL: https://github.com/apache/incubator-seatunnel/issues/4776#issuecomment-1552840602

   Thank you for your reply. I tried again according to your method, but the result was still garbled. Is this related to the coding of my hive setting or is it a problem in my cluster.Do you have any ideas? Thank you again
   
   
   
   
   
   
   CREATE TABLE `characters_source`(
   `id` string,
   `name` string)
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   STORED AS INPUTFORMAT
   'org.apache.hadoop.mapred.TextInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
   TBLPROPERTIES (
   'bucketing_version'='2',
   'last_modified_by'='smartpath',
   'last_modified_time'='1663656042',
   'transient_lastDdlTime'='1663656042');
   
   
   INSERT INTO TABLE characters_source
   VALUES (1,'詹姆斯'),(2,'乔丹');
   
   
   drop table characters_sink;
   
   CREATE TABLE `characters_sink`(
   `id` string,
   `name` string)
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   STORED AS INPUTFORMAT
   'org.apache.hadoop.mapred.TextInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
   TBLPROPERTIES (
   'bucketing_version'='2',
   'last_modified_by'='smartpath',
   'last_modified_time'='1663656042',
   'transient_lastDdlTime'='1663656042');
   
   
   
   Seatunnetl Hive conf:
   
   env {
     # You can set spark configuration here
     execution.parallelism = 1
   }
   
   
   source {
     Hive {
       table_name = "default.characters_source"
       metastore_uri = "thrift://hadoop104:9083"
     }
   }
   
   
   sink {
   Hive {
        table_name = "default.characters_sink"
       metastore_uri = "thrift://hadoop104:9083"
    }
   }
   
   
   select * from characters_sink
   
   
   
   
   ------------------&nbsp;原始邮件&nbsp;------------------
   发件人:                                                                                                                        "apache/incubator-seatunnel"                                                                                    ***@***.***&gt;;
   发送时间:&nbsp;2023年5月18日(星期四) 下午5:50
   ***@***.***&gt;;
   ***@***.******@***.***&gt;;
   主题:&nbsp;Re: [apache/incubator-seatunnel] Synchronizing data to hive contains garbled characters (Issue #4776)
   
   
   
   
   
     drop table characters_source; CREATE TABLE `characters_source`( `id` string, `name` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION  'hdfs://localhost:9000/user/hive/warehouse/characters_source' TBLPROPERTIES ( 'bucketing_version'='2', 'last_modified_by'='smartpath', 'last_modified_time'='1663656042', 'transient_lastDdlTime'='1663656042'); INSERT INTO TABLE characters_source  VALUES (1,'詹姆斯'),(2,'乔丹'); drop table characters_sink; CREATE TABLE `characters_sink`( `id` string, `name` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION  'hdfs://localhost:9000/user/hive/warehouse/characters_sink' TBLPROPERTIES ( 'bucketing_versi
 on'='2', 'last_modified_by'='smartpath', 'last_modified_time'='1663656042', 'transient_lastDdlTime'='1663656042');  
   Seatunnetl Hive conf:
    env {   # You can set spark configuration here   execution.parallelism = 1 } source {   Hive {     table_name = "default.characters_source"     metastore_uri = "thrift://localhost:9083"   } } sink { Hive {      table_name = "default.characters_sink"     metastore_uri = "thrift://localhost:9083"  } }  2023-05-18 17:47:15,088 INFO  [de8e2b80-70b4-448a-9733-c49260f041d0 main] exec.ListSinkOperator (Operator.java:logStats(1028)) - RECORDS_OUT_INTERMEDIATE:0, RECORDS_OUT_OPERATOR_LIST_SINK_3:2,  1       詹姆斯 2       乔丹 Time taken: 0.331 seconds, Fetched: 2 row(s)  
   Seatunnel version: Current dev version
    Hive:3.0.0
    Hadoop:3.0.0
    I have no garbled codes for the time being. Could you please recheck your configuration
    
   —
   Reply to this email directly, view it on GitHub, or unsubscribe.
   You are receiving this because you authored the thread.Message ID: ***@***.***&gt;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] zhilinli123 commented on issue #4776: Synchronizing data to hive contains garbled characters

Posted by "zhilinli123 (via GitHub)" <gi...@apache.org>.
zhilinli123 commented on issue #4776:
URL: https://github.com/apache/incubator-seatunnel/issues/4776#issuecomment-1552811416

   ```
   
   drop table characters_source;
   CREATE TABLE `characters_source`(
   `id` string,
   `name` string)
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   STORED AS INPUTFORMAT
   'org.apache.hadoop.mapred.TextInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
   LOCATION
    'hdfs://localhost:9000/user/hive/warehouse/characters_source'
   TBLPROPERTIES (
   'bucketing_version'='2',
   'last_modified_by'='smartpath',
   'last_modified_time'='1663656042',
   'transient_lastDdlTime'='1663656042');
   
   
   INSERT INTO TABLE characters_source 
   VALUES (1,'詹姆斯'),(2,'乔丹');
   
   
   drop table characters_sink;
   
   CREATE TABLE `characters_sink`(
   `id` string,
   `name` string)
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
   STORED AS INPUTFORMAT
   'org.apache.hadoop.mapred.TextInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
   LOCATION
    'hdfs://localhost:9000/user/hive/warehouse/characters_sink'
   TBLPROPERTIES (
   'bucketing_version'='2',
   'last_modified_by'='smartpath',
   'last_modified_time'='1663656042',
   'transient_lastDdlTime'='1663656042');
   
   
   ```
   
   **Seatunnetl Hive conf:**
   ```
   env {
     # You can set spark configuration here
     execution.parallelism = 1
   }
   
   
   source {
     Hive {
       table_name = "default.characters_source"
       metastore_uri = "thrift://localhost:9083"
     }
   }
   
   
   sink {
   Hive {
        table_name = "default.characters_sink"
       metastore_uri = "thrift://localhost:9083"
    }
   }
   
   
   ```
   
   ```
   2023-05-18 17:47:15,088 INFO  [de8e2b80-70b4-448a-9733-c49260f041d0 main] exec.ListSinkOperator (Operator.java:logStats(1028)) - RECORDS_OUT_INTERMEDIATE:0, RECORDS_OUT_OPERATOR_LIST_SINK_3:2, 
   1       詹姆斯
   2       乔丹
   Time taken: 0.331 seconds, Fetched: 2 row(s)
   ```
   
   Seatunnel version: Current dev version
   Hive:3.0.0
   Hadoop:3.0.0
   I have no garbled codes for the time being. Could you please recheck your configuration
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] zhilinli123 commented on issue #4776: Synchronizing data to hive contains garbled characters

Posted by "zhilinli123 (via GitHub)" <gi...@apache.org>.
zhilinli123 commented on issue #4776:
URL: https://github.com/apache/incubator-seatunnel/issues/4776#issuecomment-1552659574

   you Can put hive sql ddl and data into this problem
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] github-actions[bot] commented on issue #4776: Synchronizing data to hive contains garbled characters

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #4776:
URL: https://github.com/apache/seatunnel/issues/4776#issuecomment-1608526285

   This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] zhilinli123 commented on issue #4776: Synchronizing data to hive contains garbled characters

Posted by "zhilinli123 (via GitHub)" <gi...@apache.org>.
zhilinli123 commented on issue #4776:
URL: https://github.com/apache/incubator-seatunnel/issues/4776#issuecomment-1552351522

   Could you please send out your data and statements to provide more detailed data
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] wang-zhiang commented on issue #4776: Synchronizing data to hive contains garbled characters

Posted by "wang-zhiang (via GitHub)" <gi...@apache.org>.
wang-zhiang commented on issue #4776:
URL: https://github.com/apache/incubator-seatunnel/issues/4776#issuecomment-1552354279

   Hello, thank you for your reply. This is the exported csv file of the data I synchronized to hive. As you can see, all the Chinese characters are changed into '? ', there is no such situation on seatunnel2.1.2. If you need me to provide some more materials, I will actively cooperate and look forward to your reply
   ------------------&nbsp;原始邮件&nbsp;------------------
   发件人:                                                                                                                        "apache/incubator-seatunnel"                                                                                    ***@***.***&gt;;
   发送时间:&nbsp;2023年5月18日(星期四) 中午11:26
   ***@***.***&gt;;
   ***@***.******@***.***&gt;;
   主题:&nbsp;Re: [apache/incubator-seatunnel] Synchronizing data to hive contains garbled characters (Issue #4776)
   
   
   
   
   
    
   Could you please send out your data and statements to provide more detailed data
    
   —
   Reply to this email directly, view it on GitHub, or unsubscribe.
   You are receiving this because you authored the thread.Message ID: ***@***.***&gt;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] wang-zhiang commented on issue #4776: Synchronizing data to hive contains garbled characters

Posted by "wang-zhiang (via GitHub)" <gi...@apache.org>.
wang-zhiang commented on issue #4776:
URL: https://github.com/apache/incubator-seatunnel/issues/4776#issuecomment-1552783266

   感谢您的回复,并不是导出到csv才出现乱码的,在hive中汉字就是问号形式的乱码,导出csv是为了方便您看
   
   
   
   
   
   
   
   
   
   ------------------&nbsp;原始邮件&nbsp;------------------
   发件人:                                                                                                                        "apache/incubator-seatunnel"                                                                                    ***@***.***&gt;;
   发送时间:&nbsp;2023年5月18日(星期四) 下午5:04
   ***@***.***&gt;;
   ***@***.******@***.***&gt;;
   主题:&nbsp;Re: [apache/incubator-seatunnel] Synchronizing data to hive contains garbled characters (Issue #4776)
   
   
   
   
   
    
   Do hive query garbled characters or export garbled characters from csv manually?
    
   —
   Reply to this email directly, view it on GitHub, or unsubscribe.
   You are receiving this because you authored the thread.Message ID: ***@***.***&gt;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] github-actions[bot] closed issue #4776: Synchronizing data to hive contains garbled characters

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #4776: Synchronizing data to hive contains garbled characters
URL: https://github.com/apache/seatunnel/issues/4776


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org