You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/21 09:30:16 UTC

[GitHub] [hudi] DeyinZhong opened a new pull request #1855: [HUDI-871] Add support for Tencent Cloud Object Storage(COS)

DeyinZhong opened a new pull request #1855:
URL: https://github.com/apache/hudi/pull/1855


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contributing.html before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   - Add hudi support Tencent Cloud Object Storage(COS)
   
   ## Brief change log
   
   - add cosn schema in StorageSchemes.java
   
   - compile hudi after modified codes
   ```
   mvn clean package -DskipTests -DskipITs -Dhadoop.version=2.8.5 -Dhive.version=2.3.5 -Dspark.version=2.4.3
   ```
   ![image](https://user-images.githubusercontent.com/44561252/88037478-8a4bed80-cb77-11ea-8bac-e2c09528ec1c.png)
   
   
   
   ## Verify this pull request
   
   This change added tests and can be verified as follows:
   You can refer to the documents: 
   http://hudi.apache.org/docs/docker_demo.html
   
   Also, We have implemented this feature on Tencent cloud EMR product, please read the link: https://cloud.tencent.com/document/product/589/42955
   
   environments:
   
   - hadoop: 2.8.5
   
   - hive: 2.3.5
   
   - spark:  2.4.3
   
   - hudi: release-0.5.1-incubating
   
   The general steps for hudi in tencent object storage(cos) as follows:
   
   - step1: Upload config to cos
   
   ```
   hdfs dfs -mkdir -p cosn://[bucket]/hudi/config
   hdfs dfs -copyFromLocal demo/config/*  cosn://[bucket]/hudi/config/
   ```
   
   - Step 2: Incrementally ingest data from Kafka, and write to cos
   
   ```
   spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --master yarn ./hudi-utilities-bundle_2.11-0.5.1-incubating.jar   --table-type COPY_ON_WRITE --source-class org.apache.hudi.utilities.sources.JsonKafkaSource --source-ordering-field ts  --target-base-path cosn://[bucket]/usr/hive/warehouse/stock_ticks_cow --target-table stock_ticks_cow --props cosn://[bucket]/hudi/config/kafka-source.properties --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider
   
   
   spark-submit --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer  --master yarn ./hudi-utilities-bundle_2.11-0.5.1-incubating.jar  --table-type MERGE_ON_READ --source-class org.apache.hudi.utilities.sources.JsonKafkaSource --source-ordering-field ts  --target-base-path cosn://[bucket]/usr/hive/warehouse/stock_ticks_mor --target-table stock_ticks_mor --props cosn://[bucket]/hudi/config/kafka-source.properties --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider --disable-compaction
   ```
   
   
   - Step3: Sync with Hive when data on cos
   ```
   bin/run_sync_tool.sh  --jdbc-url jdbc:hive2://[hiveserver2_ip:hiveserver2_port] --user hadoop --pass isd@cloud --partitioned-by dt --base-path cosn://[bucket]/usr/hive/warehouse/stock_ticks_cow --database default --table stock_ticks_cow
   
   bin/run_sync_tool.sh  --jdbc-url jdbc:hive2://[hiveserver2_ip:hiveserver2_port] --user hadoop --pass hive --partitioned-by dt --base-path cosn://[bucket]/usr/hive/warehouse/stock_ticks_mor --database default --table stock_ticks_mor --skip-ro-suffix
   
   ```
   
   - Step4: Query hudi table by hive or spark sql engine
   
   ```
   
   beeline -u jdbc:hive2://[hiveserver2_ip:hiveserver2_port] -n hadoop --hiveconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat --hiveconf hive.stats.autogather=false
   
   spark-sql --master yarn --conf spark.sql.hive.convertMetastoreParquet=false
   
   hivesqls:
   select symbol, max(ts) from stock_ticks_cow group by symbol HAVING symbol = 'GOOG';
   select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_cow where  symbol = 'GOOG';
   select symbol, max(ts) from stock_ticks_mor group by symbol HAVING symbol = 'GOOG';
   select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor where  symbol = 'GOOG';
   select symbol, max(ts) from stock_ticks_mor_rt group by symbol HAVING symbol = 'GOOG';
   select `_hoodie_commit_time`, symbol, ts, volume, open, close  from stock_ticks_mor_rt where  symbol = 'GOOG';
   ```
   
   - Step5:  Run Compaction when data in cos
   
   ```
   cli/bin/hudi-cli.sh
   connect --path cosn://[bucket]/usr/hive/warehouse/stock_ticks_mor
   compactions show all
   compaction schedule
   compaction run --compactionInstant [requestid]  --parallelism 2 --sparkMemory 1G  --schemaFilePath cosn://[bucket]/hudi/config/schema.avsc --retry 1
   ```
   
   
   
   
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on pull request #1855: [HUDI-871] Add support for Tencent Cloud Object Storage(COS)

Posted by GitBox <gi...@apache.org>.
leesf commented on pull request #1855:
URL: https://github.com/apache/hudi/pull/1855#issuecomment-661790185


   close to retrigger


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf closed pull request #1855: [HUDI-871] Add support for Tencent Cloud Object Storage(COS)

Posted by GitBox <gi...@apache.org>.
leesf closed pull request #1855:
URL: https://github.com/apache/hudi/pull/1855


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] DeyinZhong commented on pull request #1855: [HUDI-871] Add support for Tencent Cloud Object Storage(COS)

Posted by GitBox <gi...@apache.org>.
DeyinZhong commented on pull request #1855:
URL: https://github.com/apache/hudi/pull/1855#issuecomment-667076680


   > > > @DeyinZhong Thanks for your contributing, LGTM, would you please also update the docs(http://hudi.apache.org/docs/cloud.html), the docs branch is asf-site. Please ping me if you have any problem.
   > > 
   > > 
   > > Thanks,@leesf, I try to update the docs follow the 'https://github.com/apache/hudi/tree/asf-site', add file hudi/content/docs/cos_hoodie.html and append cos link in cloud.html, rebuild by command 'bundle exec jekyll serve ' in hudi/docs, but there is no change in _site/cloud.html. please help me to do this work, thank you very much.
   > 
   > Hi @DeyinZhong you should modify `cloud.md` file rather than `cloud.html` file.
   
   document for hudi on cos: https://github.com/apache/hudi/pull/1891


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf edited a comment on pull request #1855: [HUDI-871] Add support for Tencent Cloud Object Storage(COS)

Posted by GitBox <gi...@apache.org>.
leesf edited a comment on pull request #1855:
URL: https://github.com/apache/hudi/pull/1855#issuecomment-662352999


   > > @DeyinZhong Thanks for your contributing, LGTM, would you please also update the docs(http://hudi.apache.org/docs/cloud.html), the docs branch is asf-site. Please ping me if you have any problem.
   > 
   > Thanks,@leesf, I try to update the docs follow the 'https://github.com/apache/hudi/tree/asf-site', add file hudi/content/docs/cos_hoodie.html and append cos link in cloud.html, rebuild by command 'bundle exec jekyll serve ' in hudi/docs, but there is no change in _site/cloud.html. please help me to do this work, thank you very much.
   
   Hi @DeyinZhong you should modify `cloud.md` file rather than `cloud.html` file.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf commented on pull request #1855: [HUDI-871] Add support for Tencent Cloud Object Storage(COS)

Posted by GitBox <gi...@apache.org>.
leesf commented on pull request #1855:
URL: https://github.com/apache/hudi/pull/1855#issuecomment-662352999


   > > @DeyinZhong Thanks for your contributing, LGTM, would you please also update the docs(http://hudi.apache.org/docs/cloud.html), the docs branch is asf-site. Please ping me if you have any problem.
   > 
   > Thanks,@leesf, I try to update the docs follow the 'https://github.com/apache/hudi/tree/asf-site', add file hudi/content/docs/cos_hoodie.html and append cos link in cloud.html, rebuild by command 'bundle exec jekyll serve ' in hudi/docs, but there is no change in _site/cloud.html. please help me to do this work, thank you very much.
   
   Hi @DeyinZhong you should modify `cloud.md` file rather than `cloud.hmtl` file.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] leesf merged pull request #1855: [HUDI-871] Add support for Tencent Cloud Object Storage(COS)

Posted by GitBox <gi...@apache.org>.
leesf merged pull request #1855:
URL: https://github.com/apache/hudi/pull/1855


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] DeyinZhong commented on pull request #1855: [HUDI-871] Add support for Tencent Cloud Object Storage(COS)

Posted by GitBox <gi...@apache.org>.
DeyinZhong commented on pull request #1855:
URL: https://github.com/apache/hudi/pull/1855#issuecomment-662309823


   > @DeyinZhong Thanks for your contributing, LGTM, would you please also update the docs(http://hudi.apache.org/docs/cloud.html), the docs branch is asf-site. Please ping me if you have any problem.
   
   Thanks,@leesf, I try to update the docs follow the 'https://github.com/apache/hudi/tree/asf-site', add file  hudi/content/docs/cos_hoodie.html and append cos link in cloud.html, rebuild by command 'bundle exec jekyll serve ' in hudi/docs, but there is no change in  _site/cloud.html. please help me to do this work, thank you very much.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org