You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/10/25 09:57:28 UTC

[GitHub] [hudi] laurieliyang opened a new pull request #3859: Fix the "Edit this page" config and add 6 cn docs.

laurieliyang opened a new pull request #3859:
URL: https://github.com/apache/hudi/pull/3859


   
   ## Brief change log
   
   1. Fix the "Edit this page" config in `docusaurus.config.js`. New config supports versioned docs and locale .
   2. Add 6 new cn docs.
   
   ## Verify this pull request
   
   ### About url fixing
   
   It has passed  the following test:
   
   1. with default locale ( which should be 'en' as it in the config file )
       - [x] docs of current version
       - [x] docs of 0.9.0 version
       - [x] docs of 0.8.0 version
   2. with cn
       - [x] docs of current version
       - [x] docs of 0.9.0 version
       - [x] docs of 0.8.0 version
   
   ### About new cn docs
   
   1. **s3_hoodie.md**
   ![s3_hoodie](https://user-images.githubusercontent.com/11391675/138674743-3acf8c9e-c2cc-41c6-838c-de55c8bb2d50.png)
   
   2. **ibm_cos_hoodie.md**
   ![ibm_cos_hoodie](https://user-images.githubusercontent.com/11391675/138674746-2b38b9f6-f078-474c-8d25-132236c0a01b.png)
   
   3. **gcs_hoodie.md**
   ![gcs_hoodie](https://user-images.githubusercontent.com/11391675/138674750-894cd2a5-19b4-4d4d-aa09-c5570a5d25ce.png)
   
   4. **privacy.md**
   ![privacy](https://user-images.githubusercontent.com/11391675/138674752-607cdc9b-b4fd-472f-b5a6-540810c74cff.png)
   
   5. **migration_guide.md**
   ![migration_guide](https://user-images.githubusercontent.com/11391675/138674755-fa4523c6-8651-4240-8482-4e63c109520e.png)
   
   6. **overview.md**
   ![overview](https://user-images.githubusercontent.com/11391675/138674735-c458b72e-bdf2-41df-97d5-81ec10f3e98b.png)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yihua commented on a change in pull request #3859: [DOCS] Fix the "Edit this page" config and add 6 cn docs.

Posted by GitBox <gi...@apache.org>.
yihua commented on a change in pull request #3859:
URL: https://github.com/apache/hudi/pull/3859#discussion_r767224687



##########
File path: website/i18n/cn/docusaurus-plugin-content-docs/current/ibm_cos_hoodie.md
##########
@@ -1,26 +1,26 @@
 ---
-title: IBM Cloud Object Storage Filesystem
+title: IBM Cloud Object Storage 文件系统
 keywords: [ hudi, hive, ibm, cos, spark, presto]
-summary: In this page, we go over how to configure Hudi with IBM Cloud Object Storage filesystem.
+summary: 在本页中,我们讨论在 IBM Cloud Object Storage 文件系统中配置 Hudi 。
 last_modified_at: 2020-10-01T11:38:24-10:00
 language: cn
 ---
-In this page, we explain how to get your Hudi spark job to store into IBM Cloud Object Storage.
+在本页中,我们解释如何如何将你的 Hudi Spark 作业存储到 IBM Cloud Object Storage 当中。

Review comment:
       `我们解释如何如何...` -> `我们解释如何...`

##########
File path: website/docusaurus.config.js
##########
@@ -383,8 +383,20 @@ module.exports = {
         docs: {
           sidebarPath: require.resolve('./sidebars.js'),
           // Please change this to your repo.
-          editUrl:
-            'https://github.com/apache/hudi/edit/asf-site/website/docs/',
+          editUrl: ({ version, versionDocsDirPath, docPath, locale }) => {
+            if (locale != this.defaultLocale) {
+              return `https://github.com/apache/hudi/tree/asf-site/website/${versionDocsDirPath}/${docPath}`
+            } else {
+              return `https://github.com/apache/hudi/tree/asf-site/website/i18n/${locale}/docusaurus-plugin-content-${versionDocsDirPath}/${version}/${docPath}`
+            }
+          },
+          // type EditUrlFunction = (params: {
+          //   version: string;
+          //   versionDocsDirPath: string;
+          //   docPath: string;
+          //   permalink: string;
+          //   locale: string;
+          // }) => string | undefined;

Review comment:
       Could you remove these if not used?

##########
File path: website/i18n/cn/docusaurus-plugin-content-docs/current/migration_guide.md
##########
@@ -1,58 +1,46 @@
 ---
-title: Migration Guide
-keywords: [ hudi, migration, use case]
-summary: In this page, we will discuss some available tools for migrating your existing dataset into a Hudi dataset
+title: 迁移指南
+keywords: [ hudi, migration, use case, 迁移, 用例]
+summary: 在本页中,我们将讨论有效的工具,他们能将你的现有数据集迁移到 Hudi 数据集。
 last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
-Hudi maintains metadata such as commit timeline and indexes to manage a dataset. The commit timelines helps to understand the actions happening on a dataset as well as the current state of a dataset. Indexes are used by Hudi to maintain a record key to file id mapping to efficiently locate a record. At the moment, Hudi supports writing only parquet columnar formats.
-To be able to start using Hudi for your existing dataset, you will need to migrate your existing dataset into a Hudi managed dataset. There are a couple of ways to achieve this.
+Hudi 维护了元数据,包括提交的时间线和索引,来管理一个数据集。提交的时间线帮助理解一个数据集上发生的操作,以及数据集的当前状态。索引则被 Hudi 用来维护一个映射到文件 ID 的记录键,它能高效地定位一条记录。目前, Hudi 仅支持写 Parquet 列式格式 。
 
+为了在你的现有数据集上开始使用 Hudi ,你需要将你的现有数据集迁移到 Hudi 管理的数据集中。以下有多种方法实现这个目的。
 
-## Approaches
 
+## 方法
 
-### Use Hudi for new partitions alone
 
-Hudi can be used to manage an existing dataset without affecting/altering the historical data already present in the
-dataset. Hudi has been implemented to be compatible with such a mixed dataset with a caveat that either the complete
-Hive partition is Hudi managed or not. Thus the lowest granularity at which Hudi manages a dataset is a Hive
-partition. Start using the datasource API or the WriteClient to write to the dataset and make sure you start writing
-to a new partition or convert your last N partitions into Hudi instead of the entire table. Note, since the historical
- partitions are not managed by HUDI, none of the primitives provided by HUDI work on the data in those partitions. More concretely, one cannot perform upserts or incremental pull on such older partitions not managed by the HUDI dataset.
-Take this approach if your dataset is an append only type of dataset and you do not expect to perform any updates to existing (or non Hudi managed) partitions.
+### 将 Hudi 仅用于新分区
 
+Hudi 可以被用来在不影响/改变数据集历史数据的情况下管理一个现有的数据集。 Hudi 已经实现为能够兼容这样的数据集,不论整个 Hive 分区是否由 Hudi 管理。因此, Hudi 管理一个数据集的最低粒度是一个 Hive 分区。使用数据源 API 或 WriteClient 来写入数据集,并确保你开始写入的是一个新分区,或者将过去的 N 个分区而非整张表转换为 Hudi 。需要注意的是,由于历史分区不是由 Hudi 管理的, Hudi 提供的任何操作在那些分区上都不生效。更具体地说,无法在这些非 Hudi 管理的旧分区上进行插入更新或增量拉取。

Review comment:
       `Hudi 已经实现为能够兼容这样的数据集,不论整个 Hive 分区是否由 Hudi 管理。`
   -> `Hudi 已经实现兼容这样的数据集,需要注意的是,单个 Hive 分区要么完全由 Hudi 管理,要么不由 Hudi 管理。`

##########
File path: website/i18n/cn/docusaurus-plugin-content-docs/current/migration_guide.md
##########
@@ -1,58 +1,46 @@
 ---
-title: Migration Guide
-keywords: [ hudi, migration, use case]
-summary: In this page, we will discuss some available tools for migrating your existing dataset into a Hudi dataset
+title: 迁移指南
+keywords: [ hudi, migration, use case, 迁移, 用例]
+summary: 在本页中,我们将讨论有效的工具,他们能将你的现有数据集迁移到 Hudi 数据集。
 last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
-Hudi maintains metadata such as commit timeline and indexes to manage a dataset. The commit timelines helps to understand the actions happening on a dataset as well as the current state of a dataset. Indexes are used by Hudi to maintain a record key to file id mapping to efficiently locate a record. At the moment, Hudi supports writing only parquet columnar formats.
-To be able to start using Hudi for your existing dataset, you will need to migrate your existing dataset into a Hudi managed dataset. There are a couple of ways to achieve this.
+Hudi 维护了元数据,包括提交的时间线和索引,来管理一个数据集。提交的时间线帮助理解一个数据集上发生的操作,以及数据集的当前状态。索引则被 Hudi 用来维护一个映射到文件 ID 的记录键,它能高效地定位一条记录。目前, Hudi 仅支持写 Parquet 列式格式 。

Review comment:
       `...用来维护一个映射到文件 ID 的记录键...`
   -> `...用来维护记录键到文件 ID的映射...`

##########
File path: website/i18n/cn/docusaurus-plugin-content-docs/current/migration_guide.md
##########
@@ -1,58 +1,46 @@
 ---
-title: Migration Guide
-keywords: [ hudi, migration, use case]
-summary: In this page, we will discuss some available tools for migrating your existing dataset into a Hudi dataset
+title: 迁移指南
+keywords: [ hudi, migration, use case, 迁移, 用例]
+summary: 在本页中,我们将讨论有效的工具,他们能将你的现有数据集迁移到 Hudi 数据集。
 last_modified_at: 2019-12-30T15:59:57-04:00
 language: cn
 ---
 
-Hudi maintains metadata such as commit timeline and indexes to manage a dataset. The commit timelines helps to understand the actions happening on a dataset as well as the current state of a dataset. Indexes are used by Hudi to maintain a record key to file id mapping to efficiently locate a record. At the moment, Hudi supports writing only parquet columnar formats.
-To be able to start using Hudi for your existing dataset, you will need to migrate your existing dataset into a Hudi managed dataset. There are a couple of ways to achieve this.
+Hudi 维护了元数据,包括提交的时间线和索引,来管理一个数据集。提交的时间线帮助理解一个数据集上发生的操作,以及数据集的当前状态。索引则被 Hudi 用来维护一个映射到文件 ID 的记录键,它能高效地定位一条记录。目前, Hudi 仅支持写 Parquet 列式格式 。
 
+为了在你的现有数据集上开始使用 Hudi ,你需要将你的现有数据集迁移到 Hudi 管理的数据集中。以下有多种方法实现这个目的。
 
-## Approaches
 
+## 方法
 
-### Use Hudi for new partitions alone
 
-Hudi can be used to manage an existing dataset without affecting/altering the historical data already present in the
-dataset. Hudi has been implemented to be compatible with such a mixed dataset with a caveat that either the complete
-Hive partition is Hudi managed or not. Thus the lowest granularity at which Hudi manages a dataset is a Hive
-partition. Start using the datasource API or the WriteClient to write to the dataset and make sure you start writing
-to a new partition or convert your last N partitions into Hudi instead of the entire table. Note, since the historical
- partitions are not managed by HUDI, none of the primitives provided by HUDI work on the data in those partitions. More concretely, one cannot perform upserts or incremental pull on such older partitions not managed by the HUDI dataset.
-Take this approach if your dataset is an append only type of dataset and you do not expect to perform any updates to existing (or non Hudi managed) partitions.
+### 将 Hudi 仅用于新分区
 
+Hudi 可以被用来在不影响/改变数据集历史数据的情况下管理一个现有的数据集。 Hudi 已经实现为能够兼容这样的数据集,不论整个 Hive 分区是否由 Hudi 管理。因此, Hudi 管理一个数据集的最低粒度是一个 Hive 分区。使用数据源 API 或 WriteClient 来写入数据集,并确保你开始写入的是一个新分区,或者将过去的 N 个分区而非整张表转换为 Hudi 。需要注意的是,由于历史分区不是由 Hudi 管理的, Hudi 提供的任何操作在那些分区上都不生效。更具体地说,无法在这些非 Hudi 管理的旧分区上进行插入更新或增量拉取。
 
-### Convert existing dataset to Hudi
+如果你的数据集是追加型的数据集,并且你不指望在已经存在的(或者非 Hudi 管理的)分区上进行更新操作,就使用这个方法。
 
-Import your existing dataset into a Hudi managed dataset. Since all the data is Hudi managed, none of the limitations
- of Approach 1 apply here. Updates spanning any partitions can be applied to this dataset and Hudi will efficiently
- make the update available to queries. Note that not only do you get to use all Hudi primitives on this dataset,
- there are other additional advantages of doing this. Hudi automatically manages file sizes of a Hudi managed dataset
- . You can define the desired file size when converting this dataset and Hudi will ensure it writes out files
- adhering to the config. It will also ensure that smaller files later get corrected by routing some new inserts into
- small files rather than writing new small ones thus maintaining the health of your cluster.
+### 将现有的数据集转换为 Hudi
 
-There are a few options when choosing this approach.
+将你的现有数据集导入到一个 Hudi 管理的数据集。由于全部数据都是 Hudi 管理的,方法 1 的任何限制在这里都无效了。跨分区的更新可以被应用到这个数据集,而 Hudi 会高效地让这些更新对查询可用。值得注意的是,你不仅可以在这个数据集上使用所有 Hudi 提供的操作,这样做还有额外的好处。 Hudi 会自动管理受管数据集的文件大小。你可以在转换数据集的时候设置期望的文件大小, Hudi 将确保它写出的文件符合这个配置。Hudi 还会确保小文件在后续被修正,这个过程是通过将新的插入引导到这些小文件而不是写入新的小文件来实现的,这样能维持你的集群的健康度。

Review comment:
       `方法 1 的任何限制在这里都无效了`
   ->  `方法 1 的任何限制在这里都不适用`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] laurieliyang commented on pull request #3859: [DOCS] Fix the "Edit this page" config and add 6 cn docs.

Posted by GitBox <gi...@apache.org>.
laurieliyang commented on pull request #3859:
URL: https://github.com/apache/hudi/pull/3859#issuecomment-992145998


   > @laurieliyang Thanks for fixing the Chinese docs. Could you fix the conflicts with the latest asf-site?
   
   I have fixed the conflicts in `overview.md`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yihua commented on pull request #3859: [DOCS] Fix the "Edit this page" config and add 6 cn docs.

Posted by GitBox <gi...@apache.org>.
yihua commented on pull request #3859:
URL: https://github.com/apache/hudi/pull/3859#issuecomment-992121886


   @leesf is there any plan to update the CN docs for 0.9.0, 0.10.0 releases, and the current version?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yihua merged pull request #3859: [DOCS] Fix the "Edit this page" config and add 6 cn docs.

Posted by GitBox <gi...@apache.org>.
yihua merged pull request #3859:
URL: https://github.com/apache/hudi/pull/3859


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] laurieliyang commented on pull request #3859: [DOCS] Fix the "Edit this page" config and add 6 cn docs.

Posted by GitBox <gi...@apache.org>.
laurieliyang commented on pull request #3859:
URL: https://github.com/apache/hudi/pull/3859#issuecomment-993127736


   > @laurieliyang Could you check the failed build? Also, it looks like the commits you pushed don't include the changes resolving my comments.
   
   Hello, I have fixed the check error, and also commited the changes in the comments. I forgot to add the local changes ...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yihua commented on pull request #3859: [DOCS] Fix the "Edit this page" config and add 6 cn docs.

Posted by GitBox <gi...@apache.org>.
yihua commented on pull request #3859:
URL: https://github.com/apache/hudi/pull/3859#issuecomment-995344882


   > > @laurieliyang Could you check the failed build? Also, it looks like the commits you pushed don't include the changes resolving my comments.
   > 
   > Hello, I have fixed the check error, and also commited the changes in the comments. I forgot to add the local changes ...
   
   No worries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org