You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/13 00:35:04 UTC

[GitHub] [hudi] tarunguptanit commented on issue #6174: Hudi Read Performance : Partition pruning not happening when reading Hudi table

tarunguptanit commented on issue #6174:
URL: https://github.com/apache/hudi/issues/6174#issuecomment-1244745834

   Yes, I was able to fix this issue by upgrading to 0.10. 
   Seems like this way of reading the specific partition by providing the actual path is not supported with the newer versions of Hudi  : 
   
   ```scala> val hudiDirectory = "s3a://podsofaupgradetesting-v2/HZ_CUST_ACCOUNTS/2022/05/04/"```
   
   I had to url encode the partition path for my table by using the parameter ```hoodie.datasource.write.partitionpath.urlencode``` and then use Spark filter function to do partition pruning. 
   
   Something like this : 
   
   ```
   val hoodieIncrementalView = spark.read.format("hudi").load(hudiDirectory).filter(col("partition_date") === "2022/05")
   ```
   
   This fixed my issue. I revisited the documentation but didn't see this change in behaviour noted. Not sure if I missed something, but it would be good to call this out.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org