You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/23 23:04:40 UTC

[GitHub] [hudi] xushiyan commented on a change in pull request #4649: [HUDI-2941] Show _hoodie_operation in spark sql results

xushiyan commented on a change in pull request #4649:
URL: https://github.com/apache/hudi/pull/4649#discussion_r790342949



##########
File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala
##########
@@ -64,8 +64,9 @@ class MergeOnReadSnapshotRelation(val sqlContext: SQLContext,
 
   private val conf = sqlContext.sparkContext.hadoopConfiguration
   private val jobConf = new JobConf(conf)
+  private val withOperationField: Boolean = new TableSchemaResolver(metaClient).hasOperationField
   // use schema from latest metadata, if not present, read schema from the data file
-  private val schemaUtil = new TableSchemaResolver(metaClient)
+  private val schemaUtil = new TableSchemaResolver(metaClient, withOperationField)

Review comment:
       ditto

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##########
@@ -477,4 +478,13 @@ public static MessageType readSchemaFromLogFile(FileSystem fs, Path path) throws
     }
     return null;
   }
+
+  public boolean hasOperationField() {

Review comment:
       this class has an instance variable `withOperationField` too. can we consolidate this method and that variable?

##########
File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala
##########
@@ -74,7 +74,8 @@ class MergeOnReadIncrementalRelation(val sqlContext: SQLContext,
     optParams.getOrElse(DataSourceReadOptions.END_INSTANTTIME.key, lastInstant.getTimestamp))
   log.debug(s"${commitsTimelineToReturn.getInstants.iterator().toList.map(f => f.toString).mkString(",")}")
   private val commitsToReturn = commitsTimelineToReturn.getInstants.iterator().toList
-  private val schemaUtil = new TableSchemaResolver(metaClient)
+  private val withOperationField: Boolean = new TableSchemaResolver(metaClient).hasOperationField
+  private val schemaUtil = new TableSchemaResolver(metaClient, withOperationField)

Review comment:
       +1 to consolidate these. `TableSchemaResolver` should be the source of truth for `OperationField`, instead of being given the info from outside. The constructor with operation field seems to be an anti-pattern.

##########
File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala
##########
@@ -74,7 +74,8 @@ class MergeOnReadIncrementalRelation(val sqlContext: SQLContext,
     optParams.getOrElse(DataSourceReadOptions.END_INSTANTTIME.key, lastInstant.getTimestamp))
   log.debug(s"${commitsTimelineToReturn.getInstants.iterator().toList.map(f => f.toString).mkString(",")}")
   private val commitsToReturn = commitsTimelineToReturn.getInstants.iterator().toList
-  private val schemaUtil = new TableSchemaResolver(metaClient)
+  private val withOperationField: Boolean = new TableSchemaResolver(metaClient).hasOperationField
+  private val schemaUtil = new TableSchemaResolver(metaClient, withOperationField)

Review comment:
       Also use a name closer to the class name makes things clearer
   
   ```suggestion
     private val schemaResolver = new TableSchemaResolver(metaClient, withOperationField)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org