You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/08/02 02:40:00 UTC

[jira] [Commented] (HUDI-2177) Virtual keys support for Compaction

    [ https://issues.apache.org/jira/browse/HUDI-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391265#comment-17391265 ] 

ASF GitHub Bot commented on HUDI-2177:
--------------------------------------

vinothchandar commented on a change in pull request #3315:
URL: https://github.com/apache/hudi/pull/3315#discussion_r680614863



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
##########
@@ -157,6 +157,11 @@
       .withDocumentation("When enabled, populates all meta fields. When disabled, no meta fields are populated "
           + "and incremental queries will not be functional. This is only meant to be used for append only/immutable data for batch processing");
 
+  public static final ConfigProperty<String> HOODIE_TABLE_KEY_GENERATOR_CLASS = ConfigProperty
+      .key("hoodie.datasource.write.keygenerator.class")

Review comment:
       do we really write this as `hoodie.datasource` prefix?  This is a table level property, not just for the datasource write. lets fix this?

##########
File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/compact/CompactionTestBase.java
##########
@@ -198,7 +198,7 @@ protected void executeCompaction(String compactionInstantTime, SparkRDDWriteClie
     assertEquals(latestCompactionCommitTime, compactionInstantTime,
         "Expect compaction instant time to be the latest commit time");
     assertEquals(expectedNumRecs,
-        HoodieClientTestUtils.countRecordsSince(jsc, basePath, sqlContext, timeline, "000"),
+        HoodieClientTestUtils.countRecordsWithOptionalSince(jsc, basePath, sqlContext, timeline, Option.of("000")),

Review comment:
       rename `countRecordOptionallySince` 

##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimeFileSplit.java
##########
@@ -36,16 +38,20 @@
 
   private String basePath;
 
+  private Option<HoodieVirtualKeyInfo> hoodieVirtualKeyInfoOpt = Option.empty();
+
   public HoodieRealtimeFileSplit() {
     super();
   }
 
-  public HoodieRealtimeFileSplit(FileSplit baseSplit, String basePath, List<String> deltaLogPaths, String maxCommitTime)
+  public HoodieRealtimeFileSplit(FileSplit baseSplit, String basePath, List<String> deltaLogPaths, String maxCommitTime,
+                                 Option<HoodieVirtualKeyInfo> hoodieVirtualKeyInfoOpt)

Review comment:
       I think its okay to drop the `Opt` everywhere.  `virtualKeyInfo` is concise and already conveys the meaning. `virtualKeyInfo.isPresent()` will again convey what `Opt` would have conveyed. 
   
   Also lets drop `hoodie` prefix everywhere in the variables as well. 

##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeSplit.java
##########
@@ -52,8 +53,15 @@
    */
   String getBasePath();
 
+  /**
+   * Returns Virtual key info if meta fields are disabled.
+   * @return
+   */
+  Option<HoodieVirtualKeyInfo> getHoodieVirtualKeyInfoOpt();

Review comment:
       fix name.

##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##########
@@ -204,23 +226,34 @@ private static Configuration addProjectionField(Configuration conf, String field
     return conf;
   }
 
-  public static void addRequiredProjectionFields(Configuration configuration) {
+  public static void addRequiredProjectionFields(Configuration configuration, Option<HoodieVirtualKeyInfo> hoodieVirtualKeyInfoOpt) {

Review comment:
       namign

##########
File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/cluster/SparkExecuteClusteringCommitActionExecutor.java
##########
@@ -211,8 +213,11 @@ protected String getCommitActionType() {
               .withBitCaskDiskMapCompressionEnabled(config.getCommonConfig().isBitCaskDiskMapCompressionEnabled())
               .build();
 
+          HoodieTableConfig tableConfig = table.getMetaClient().getTableConfig();
           recordIterators.add(HoodieFileSliceReader.getFileSliceReader(baseFileReader, scanner, readerSchema,
-              table.getMetaClient().getTableConfig().getPayloadClass()));
+              tableConfig.getPayloadClass(),
+              tableConfig.populateMetaFields() ? Option.empty() : Option.of(tableConfig.getRecordKeyFieldProp()),

Review comment:
       we can use `Option<Pair<String, String>>` anywhere both a record key and partition path field needs to be passed. 

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordScanner.java
##########
@@ -80,6 +82,10 @@
   private final HoodieTableMetaClient hoodieTableMetaClient;
   // Merge strategy to use when combining records from log
   private final String payloadClassFQN;
+  // simple recordKey field
+  private Option<String> simpleRecordKeyField = Option.empty();

Review comment:
       lets use a Pair

##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimeFileSplit.java
##########
@@ -60,6 +66,16 @@ public String getBasePath() {
     return basePath;
   }
 
+  @Override
+  public void setHoodieVirtualKeyInfoOpt(Option<HoodieVirtualKeyInfo> hoodieVirtualKeyInfoOpt) {

Review comment:
       fix the getter and setter names

##########
File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java
##########
@@ -126,8 +131,10 @@ private void initIfNeeded() {
         HoodieTimer readTimer = new HoodieTimer().startTimer();
         Option<GenericRecord> baseRecord = baseFileReader.getRecordByKey(key);
         if (baseRecord.isPresent()) {
-          hoodieRecord = SpillableMapUtils.convertToHoodieRecordPayload(baseRecord.get(),
-              metaClient.getTableConfig().getPayloadClass());
+          hoodieRecord = tableConfig.populateMetaFields() ? SpillableMapUtils.convertToHoodieRecordPayload(baseRecord.get(),
+              tableConfig.getPayloadClass()) : SpillableMapUtils.convertToHoodieRecordPayload(baseRecord.get(),
+              tableConfig.getPayloadClass(), tableConfig.getRecordKeyFieldProp(),

Review comment:
       lets use a Pair

##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeSplit.java
##########
@@ -70,13 +78,24 @@
    */
   void setBasePath(String basePath);
 
+  void setHoodieVirtualKeyInfoOpt(Option<HoodieVirtualKeyInfo> hoodieVirtualKeyInfoOpt);
+
   default void writeToOutput(DataOutput out) throws IOException {
     InputSplitUtils.writeString(getBasePath(), out);
     InputSplitUtils.writeString(getMaxCommitTime(), out);
     out.writeInt(getDeltaLogPaths().size());
     for (String logFilePath : getDeltaLogPaths()) {
       InputSplitUtils.writeString(logFilePath, out);
     }
+    if (!getHoodieVirtualKeyInfoOpt().isPresent()) {

Review comment:
       Option.ifPresent.orElse() ? lets make use of the Option APIs! 

##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieRealtimeFileSplit.java
##########
@@ -36,16 +38,20 @@
 
   private String basePath;
 
+  private Option<HoodieVirtualKeyInfo> hoodieVirtualKeyInfoOpt = Option.empty();
+
   public HoodieRealtimeFileSplit() {
     super();
   }
 
-  public HoodieRealtimeFileSplit(FileSplit baseSplit, String basePath, List<String> deltaLogPaths, String maxCommitTime)
+  public HoodieRealtimeFileSplit(FileSplit baseSplit, String basePath, List<String> deltaLogPaths, String maxCommitTime,
+                                 Option<HoodieVirtualKeyInfo> hoodieVirtualKeyInfoOpt)

Review comment:
       encoding data types in the name is no longer needed in the age of modern IDEs. says. - Robert C Martin in Clean Code :)

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
##########
@@ -324,6 +324,14 @@ public void validateTableProperties(Properties properties, WriteOperationType op
         && Boolean.parseBoolean((String) properties.getOrDefault(HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.key(), HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.defaultValue()))) {
       throw new HoodieException(HoodieTableConfig.HOODIE_POPULATE_META_FIELDS.key() + " already disabled for the table. Can't be re-enabled back");
     }
+
+    // meta fields can be disabled only with SimpleKeyGenerator
+    if (!getTableConfig().populateMetaFields()
+        && !properties.getProperty(HoodieTableConfig.HOODIE_TABLE_KEY_GENERATOR_CLASS.key(), "org.apache.hudi.keygen.SimpleKeyGenerator")

Review comment:
       use the class name and do a `.class.getName()`. I think I had a similar comment before.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Virtual keys support for Compaction
> -----------------------------------
>
>                 Key: HUDI-2177
>                 URL: https://issues.apache.org/jira/browse/HUDI-2177
>             Project: Apache Hudi
>          Issue Type: Sub-task
>          Components: Writer Core
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>
> Virtual keys support for Compaction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)