You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2019/11/11 18:55:30 UTC

[GitHub] [incubator-hudi] yihua commented on a change in pull request #1006: [HUDI-276] Translate the Configurations page into Chinese

yihua commented on a change in pull request #1006: [HUDI-276] Translate the Configurations page into Chinese
URL: https://github.com/apache/incubator-hudi/pull/1006#discussion_r344858236
 
 

 ##########
 File path: docs/configurations.cn.md
 ##########
 @@ -51,385 +49,419 @@ inputDF.write()
 .save(basePath);
 ```
 
-Options useful for writing datasets via `write.format.option(...)`
+用于通过`write.format.option(...)`写入数据集的选项
 
 ##### TABLE_NAME_OPT_KEY {#TABLE_NAME_OPT_KEY}
-  Property: `hoodie.datasource.write.table.name` [Required]<br/>
-  <span style="color:grey">Hive table name, to register the dataset into.</span>
+  属性:`hoodie.datasource.write.table.name` [必须]<br/>
+  <span style="color:grey">Hive表名,用于将数据集注册到其中。</span>
   
 ##### OPERATION_OPT_KEY {#OPERATION_OPT_KEY}
-  Property: `hoodie.datasource.write.operation`, Default: `upsert`<br/>
-  <span style="color:grey">whether to do upsert, insert or bulkinsert for the write operation. Use `bulkinsert` to load new data into a table, and there on use `upsert`/`insert`. 
-  bulk insert uses a disk based write path to scale to load large inputs without need to cache it.</span>
+  属性:`hoodie.datasource.write.operation`, 默认值:`upsert`<br/>
+  <span style="color:grey">是否为写操作进行插入更新、插入或批量插入。使用`bulkinsert`将新数据加载到表中,之后使用`upsert`或`insert`。
+  批量插入使用基于磁盘的写入路径来扩展以加载大量输入,而无需对其进行缓存。</span>
   
 ##### STORAGE_TYPE_OPT_KEY {#STORAGE_TYPE_OPT_KEY}
-  Property: `hoodie.datasource.write.storage.type`, Default: `COPY_ON_WRITE` <br/>
-  <span style="color:grey">The storage type for the underlying data, for this write. This can't change between writes.</span>
+  属性:`hoodie.datasource.write.storage.type`, 默认值:`COPY_ON_WRITE` <br/>
+  <span style="color:grey">此写入的基础数据的存储类型。两次写入之间不能改变。</span>
   
 ##### PRECOMBINE_FIELD_OPT_KEY {#PRECOMBINE_FIELD_OPT_KEY}
-  Property: `hoodie.datasource.write.precombine.field`, Default: `ts` <br/>
-  <span style="color:grey">Field used in preCombining before actual write. When two records have the same key value,
-we will pick the one with the largest value for the precombine field, determined by Object.compareTo(..)</span>
+  属性:`hoodie.datasource.write.precombine.field`, 默认值:`ts` <br/>
+  <span style="color:grey">实际写入之前在preCombining中使用的字段。
+  当两个记录具有相同的键值时,我们将使用Object.compareTo(..)从precombine字段中选择一个值最大的记录。</span>
 
 ##### PAYLOAD_CLASS_OPT_KEY {#PAYLOAD_CLASS_OPT_KEY}
-  Property: `hoodie.datasource.write.payload.class`, Default: `org.apache.hudi.OverwriteWithLatestAvroPayload` <br/>
-  <span style="color:grey">Payload class used. Override this, if you like to roll your own merge logic, when upserting/inserting. 
-  This will render any value set for `PRECOMBINE_FIELD_OPT_VAL` in-effective</span>
+  属性:`hoodie.datasource.write.payload.class`, 默认值:`org.apache.hudi.OverwriteWithLatestAvroPayload` <br/>
+  <span style="color:grey">使用的有效载荷类。如果您想在插入更新或插入时使用自己的合并逻辑,请重写此方法。
+  这将使为`PRECOMBINE_FIELD_OPT_VAL`设置的任何值无效</span>
   
 ##### RECORDKEY_FIELD_OPT_KEY {#RECORDKEY_FIELD_OPT_KEY}
-  Property: `hoodie.datasource.write.recordkey.field`, Default: `uuid` <br/>
-  <span style="color:grey">Record key field. Value to be used as the `recordKey` component of `HoodieKey`. Actual value
-will be obtained by invoking .toString() on the field value. Nested fields can be specified using
-the dot notation eg: `a.b.c`</span>
+  属性:`hoodie.datasource.write.recordkey.field`, 默认值:`uuid` <br/>
+  <span style="color:grey">记录键字段。用作`HoodieKey`中`recordKey`部分的值。
+  实际值将通过在字段值上调用.toString()来获得。可以使用点符号指定嵌套字段,例如:`a.b.c`</span>
 
 ##### PARTITIONPATH_FIELD_OPT_KEY {#PARTITIONPATH_FIELD_OPT_KEY}
-  Property: `hoodie.datasource.write.partitionpath.field`, Default: `partitionpath` <br/>
-  <span style="color:grey">Partition path field. Value to be used at the `partitionPath` component of `HoodieKey`.
-Actual value ontained by invoking .toString()</span>
+  属性:`hoodie.datasource.write.partitionpath.field`, 默认值:`partitionpath` <br/>
+  <span style="color:grey">分区路径字段。用作`HoodieKey`中`partitionPath`部分的值。
+  通过调用.toString()获得实际的值</span>
 
 ##### KEYGENERATOR_CLASS_OPT_KEY {#KEYGENERATOR_CLASS_OPT_KEY}
-  Property: `hoodie.datasource.write.keygenerator.class`, Default: `org.apache.hudi.SimpleKeyGenerator` <br/>
-  <span style="color:grey">Key generator class, that implements will extract the key out of incoming `Row` object</span>
+  属性:`hoodie.datasource.write.keygenerator.class`, 默认值:`org.apache.hudi.SimpleKeyGenerator` <br/>
+  <span style="color:grey">键生成器类,实现从输入的`Row`对象中提取键</span>
   
 ##### COMMIT_METADATA_KEYPREFIX_OPT_KEY {#COMMIT_METADATA_KEYPREFIX_OPT_KEY}
-  Property: `hoodie.datasource.write.commitmeta.key.prefix`, Default: `_` <br/>
-  <span style="color:grey">Option keys beginning with this prefix, are automatically added to the commit/deltacommit metadata.
-This is useful to store checkpointing information, in a consistent way with the hudi timeline</span>
+  属性:`hoodie.datasource.write.commitmeta.key.prefix`, 默认值:`_` <br/>
+  <span style="color:grey">以该前缀开头的选项键会自动添加到提交/增量提交的元数据中。
+  这对于以与hudi时间轴一致的方式存储检查点信息很有用</span>
 
 ##### INSERT_DROP_DUPS_OPT_KEY {#INSERT_DROP_DUPS_OPT_KEY}
-  Property: `hoodie.datasource.write.insert.drop.duplicates`, Default: `false` <br/>
-  <span style="color:grey">If set to true, filters out all duplicate records from incoming dataframe, during insert operations. </span>
+  属性:`hoodie.datasource.write.insert.drop.duplicates`, 默认值:`false` <br/>
+  <span style="color:grey">如果设置为true,则在插入操作期间从传入数据帧中过滤掉所有重复记录。</span>
   
 ##### HIVE_SYNC_ENABLED_OPT_KEY {#HIVE_SYNC_ENABLED_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.enable`, Default: `false` <br/>
-  <span style="color:grey">When set to true, register/sync the dataset to Apache Hive metastore</span>
+  属性:`hoodie.datasource.hive_sync.enable`, 默认值:`false` <br/>
+  <span style="color:grey">设置为true时,将数据集注册并同步到Apache Hive Metastore</span>
   
 ##### HIVE_DATABASE_OPT_KEY {#HIVE_DATABASE_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.database`, Default: `default` <br/>
-  <span style="color:grey">database to sync to</span>
+  属性:`hoodie.datasource.hive_sync.database`, 默认值:`default` <br/>
+  <span style="color:grey">要同步到的数据库</span>
   
 ##### HIVE_TABLE_OPT_KEY {#HIVE_TABLE_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.table`, [Required] <br/>
-  <span style="color:grey">table to sync to</span>
+  属性:`hoodie.datasource.hive_sync.table`, [Required] <br/>
+  <span style="color:grey">要同步到的表</span>
   
 ##### HIVE_USER_OPT_KEY {#HIVE_USER_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.username`, Default: `hive` <br/>
-  <span style="color:grey">hive user name to use</span>
+  属性:`hoodie.datasource.hive_sync.username`, 默认值:`hive` <br/>
+  <span style="color:grey">要使用的Hive用户名</span>
   
 ##### HIVE_PASS_OPT_KEY {#HIVE_PASS_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.password`, Default: `hive` <br/>
-  <span style="color:grey">hive password to use</span>
+  属性:`hoodie.datasource.hive_sync.password`, 默认值:`hive` <br/>
+  <span style="color:grey">要使用的Hive密码</span>
   
 ##### HIVE_URL_OPT_KEY {#HIVE_URL_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.jdbcurl`, Default: `jdbc:hive2://localhost:10000` <br/>
+  属性:`hoodie.datasource.hive_sync.jdbcurl`, 默认值:`jdbc:hive2://localhost:10000` <br/>
   <span style="color:grey">Hive metastore url</span>
   
 ##### HIVE_PARTITION_FIELDS_OPT_KEY {#HIVE_PARTITION_FIELDS_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.partition_fields`, Default: ` ` <br/>
-  <span style="color:grey">field in the dataset to use for determining hive partition columns.</span>
+  属性:`hoodie.datasource.hive_sync.partition_fields`, 默认值:` ` <br/>
+  <span style="color:grey">数据集中用于确定Hive分区的字段。</span>
   
 ##### HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY {#HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.partition_extractor_class`, Default: `org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor` <br/>
-  <span style="color:grey">Class used to extract partition field values into hive partition columns.</span>
+  属性:`hoodie.datasource.hive_sync.partition_extractor_class`, 默认值:`org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor` <br/>
+  <span style="color:grey">用于将分区字段值提取到配置单元分区列中的类。</span>
   
 ##### HIVE_ASSUME_DATE_PARTITION_OPT_KEY {#HIVE_ASSUME_DATE_PARTITION_OPT_KEY}
-  Property: `hoodie.datasource.hive_sync.assume_date_partitioning`, Default: `false` <br/>
-  <span style="color:grey">Assume partitioning is yyyy/mm/dd</span>
+  属性:`hoodie.datasource.hive_sync.assume_date_partitioning`, 默认值:`false` <br/>
+  <span style="color:grey">假设分区格式是yyyy/mm/dd</span>
 
-#### Read Options
+#### 读选项
 
-Options useful for reading datasets via `read.format.option(...)`
+用于通过`read.format.option(...)`读取数据集的选项
 
 ##### VIEW_TYPE_OPT_KEY {#VIEW_TYPE_OPT_KEY}
-Property: `hoodie.datasource.view.type`, Default: `read_optimized` <br/>
-<span style="color:grey">Whether data needs to be read, in incremental mode (new data since an instantTime)
-(or) Read Optimized mode (obtain latest view, based on columnar data)
-(or) Real time mode (obtain latest view, based on row & columnar data)</span>
+属性:`hoodie.datasource.view.type`, 默认值:`read_optimized` <br/>
+<span style="color:grey">是否需要以某种模式读取数据,增量模式(自InstantTime以来的新数据)
+(或)读优化模式(基于列数据获取最新视图)
+(或)实时模式(基于行和列数据获取最新视图)</span>
 
 ##### BEGIN_INSTANTTIME_OPT_KEY {#BEGIN_INSTANTTIME_OPT_KEY} 
-Property: `hoodie.datasource.read.begin.instanttime`, [Required in incremental mode] <br/>
-<span style="color:grey">Instant time to start incrementally pulling data from. The instanttime here need not
-necessarily correspond to an instant on the timeline. New data written with an
- `instant_time > BEGIN_INSTANTTIME` are fetched out. For e.g: '20170901080000' will get
- all new data written after Sep 1, 2017 08:00AM.</span>
+属性:`hoodie.datasource.read.begin.instanttime`, [在增量模式下必须] <br/>
+<span style="color:grey">开始增量提取数据的即时时间。这里的instanttime不必一定与时间轴上的即时相对应。
+取出以`instant_time > BEGIN_INSTANTTIME`写入的新数据。
+例如:'20170901080000'将获取2017年9月1日08:00 AM之后写入的所有新数据。</span>
  
 ##### END_INSTANTTIME_OPT_KEY {#END_INSTANTTIME_OPT_KEY}
-Property: `hoodie.datasource.read.end.instanttime`, Default: latest instant (i.e fetches all new data since begin instant time) <br/>
-<span style="color:grey"> Instant time to limit incrementally fetched data to. New data written with an
-`instant_time <= END_INSTANTTIME` are fetched out.</span>
+属性:`hoodie.datasource.read.end.instanttime`, 默认值:latest instant (i.e fetches all new data since begin instant time) <br/>
 
 Review comment:
   Good catch... fixed in the latest commit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services