You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by "seeker-jie (via GitHub)" <gi...@apache.org> on 2023/05/10 09:22:50 UTC

[GitHub] [incubator-seatunnel] seeker-jie opened a new issue, #4731: [Bug] [Hive Source Connector] During Hive source processing, data from multiple columns (Textfile type) is combined into one column.

seeker-jie opened a new issue, #4731:
URL: https://github.com/apache/incubator-seatunnel/issues/4731

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues.
   
   
   ### What happened
   
   When processing the Hive source, multiple columns of data (with data type as Textfile) will be read into a single column.
   create table sql:
   
   ```sql
   create table test
   (
       name    string,
       age     int,
       detail  string,
       address string
   )
       row format serde 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
       stored as
           inputformat 'org.apache.hadoop.mapred.TextInputFormat'
           outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
       location 'hdfs://***/test'
       tblproperties ('spark.sql.create.version' = '2.4.8', 'spark.sql.sources.schema.numParts' = '1',
           'spark.sql.sources.schema.part.0' =
                   '{\"type\":\"struct\",\"fields\":[{\"name\":\"name\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"age\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}},{\"name\":\"detail\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"address\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}}]}');
   ```
   
   test data:
   
   | name   | age | detail | address        |
   |--------|-----|--------|----------------|
   | test_2 | 2   | sdasd  | asds           |
   | test_3 | 3   | sdfsd  | sad            |
   | test_4 | 4   | asqwer | gsdfg          |
   | test_5 | 5   | tet    | sadf           |
   | test_6 | 6   | asdfas | asdfasdf       |
   | test_7 | 7   | sdfasd | fasdsd         |
   | test_8 | 8   | ghrty  | uiyuy          |
   | test_9 | 9   | rwerwe | uiyui          |
   | test_10| 10  | rtet   | rtytyuty       |
   | test   | 1   | shaskaslajs | jsdflsjfljdfld |
   
   
   
   
   
   ### SeaTunnel Version
   
   2.3.0
   
   ### SeaTunnel Config
   
   ```conf
   env {
     spark.streaming.batchDuration = 5
     spark.app.name = "seatunnel"
     spark.ui.port = 13000
     spark.sql.catalogImplementation = "hive"
   }
   
   source {
     Hive {
       table_name = "default.test_st"
       metastore_uri = "thrift://myhost:9083"
     }
   }
   transform {
   }
   
   sink{
     console{
     }
   }
   ```
   
   
   ### Running Command
   
   ```shell
   ./bin/start-seatunnel-spark-connector-v2.sh --master local --deploy-mode client --config ./config/hive-source.conf
   ```
   
   
   ### Error Exception
   
   ```log
   No Exception.
   ```
   
   
   ### Flink or Spark Version
   
   Spark version: 2.4.8
   
   ### Java or Scala Version
   
   Openjdk version: 1.8.0_322
   
   ### Screenshots
   
   
   ![image](https://github.com/apache/incubator-seatunnel/assets/46879926/057c9572-86b9-454b-96fb-2ef5c81ab9ea)
   
   <img width="419" alt="image" src="https://github.com/apache/incubator-seatunnel/assets/46879926/0c09b4e4-5a0a-49f3-a0b3-1ca97d11dc90">
   
   
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] github-actions[bot] commented on issue #4731: [Bug] [Hive Source Connector] During Hive source processing, data from multiple columns (Textfile type) is combined into one column.

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #4731:
URL: https://github.com/apache/seatunnel/issues/4731#issuecomment-1585872699

   This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] shysnow commented on issue #4731: [Bug] [Hive Source Connector] During Hive source processing, data from multiple columns (Textfile type) is combined into one column.

Posted by "shysnow (via GitHub)" <gi...@apache.org>.
shysnow commented on issue #4731:
URL: https://github.com/apache/seatunnel/issues/4731#issuecomment-1642993077

   I also encountered the same problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] github-actions[bot] commented on issue #4731: [Bug] [Hive Source Connector] During Hive source processing, data from multiple columns (Textfile type) is combined into one column.

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #4731:
URL: https://github.com/apache/seatunnel/issues/4731#issuecomment-1616218613

   This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] shysnow commented on issue #4731: [Bug] [Hive Source Connector] During Hive source processing, data from multiple columns (Textfile type) is combined into one column.

Posted by "shysnow (via GitHub)" <gi...@apache.org>.
shysnow commented on issue #4731:
URL: https://github.com/apache/seatunnel/issues/4731#issuecomment-1644872156

   我看源码发现字段分隔符是通过delimiter设置的,只要设置这个delimiter=","就可以了,我的配置文件内容如下
   
   env {
     spark.sql.catalogImplementation = "hive"
     spark.app.name = "SeaTunnel-spark3"
     spark.executor.instances = 4
     spark.executor.cores = 2
     spark.executor.memory = "5g"
     spark.yarn.queue = "aiops"
     spark.ui.enabled = true
   }
   
   source {
   Hive {
     table_name = "default.hive_id_int"
     metastore_uri = "thrift://cdh129130:9083"
     kerberos_principal = "hive/cdh129144@MYCDH"
     kerberos_keytab_path = "/home/aiops/keytab/hive.keytab"
     hdfs_site_path = "/etc/hadoop/conf/hdfs-site.xml"
     parallelism = 1
     read_columns = ["nid","date_id","mm","latn_id","email","ctime","address"],
     fetch_size = 10000
     delimiter=","
     
   }
   }
   
   transform {
   }
   
   sink {
   Console{
   
   }
   }
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [seatunnel] github-actions[bot] closed issue #4731: [Bug] [Hive Source Connector] During Hive source processing, data from multiple columns (Textfile type) is combined into one column.

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #4731: [Bug] [Hive Source Connector] During Hive source processing, data from multiple columns (Textfile type) is combined into one column.
URL: https://github.com/apache/seatunnel/issues/4731


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] EricJoy2048 commented on issue #4731: [Bug] [Hive Source Connector] During Hive source processing, data from multiple columns (Textfile type) is combined into one column.

Posted by "EricJoy2048 (via GitHub)" <gi...@apache.org>.
EricJoy2048 commented on issue #4731:
URL: https://github.com/apache/incubator-seatunnel/issues/4731#issuecomment-1542726678

   @TyrantLucifer  PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] TyrantLucifer commented on issue #4731: [Bug] [Hive Source Connector] During Hive source processing, data from multiple columns (Textfile type) is combined into one column.

Posted by "TyrantLucifer (via GitHub)" <gi...@apache.org>.
TyrantLucifer commented on issue #4731:
URL: https://github.com/apache/incubator-seatunnel/issues/4731#issuecomment-1543171125

   > @TyrantLucifer PTAL
   
   Use 2.3.1 and re test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org