You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/01/11 13:58:39 UTC

[GitHub] [hudi] garyli1019 commented on a change in pull request #2378: [HUDI-1491] Support partition pruning for MOR snapshot query

garyli1019 commented on a change in pull request #2378:
URL: https://github.com/apache/hudi/pull/2378#discussion_r555064333



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala
##########
@@ -108,7 +111,7 @@ class MergeOnReadSnapshotRelation(val sqlContext: SQLContext,
       dataSchema = tableStructSchema,
       partitionSchema = StructType(Nil),

Review comment:
       hi @yui2010 , how's your dataset looks like? Does it has a `dt` column in the dataset? The partitioning I am referring to is that when you `spark.read.format('hudi').load(basePath)` and your dataset folder structure looks like `basePath/dt=20201010`, then Spark is able to append a `dt` column to your dataset. When you do sth like `df.filter(dt=20201010)`, spark will go to this partition and read the file. How's your workflow to load your data and pass the partition information to Spark?
   In order to get more information about this implementation, would you write a test to demo the partition pruning? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org