You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/10/25 03:24:44 UTC

[GitHub] [hudi] boneanxs opened a new issue #3856: [SUPPORT] Maybe should cache baseDir in nonHoodiePathCache in HoodieROTablePathFilter?

boneanxs opened a new issue #3856:
URL: https://github.com/apache/hudi/issues/3856


   For a non hoodie table, with table path: `hdfs://test/warehouse/db/table`, 3 partition columns(p1, p2, p3), for a specific partition, like(p1=A, p2=B, p3=C), the path should be `hdfs://test/warehouse/db/table/p1=A/p2=B/p3=C`, HoodieROTablePathFilter will check baseDir(hdfs://test/warehouse/db/table) is a valid HoodieTable path or not, othervise, cache `hdfs://test/warehouse/db/table/p1=A/p2=B/p3=C` in nonHoodiePathCache.
   
   I'm wondering why don't we cache baseDir in nonHoodiePathCache, if we cache baseDir, for other partitions(like p1=A1, p2=B1, p3=C1), we only check if baseDir in nonHoodiePathCache or not.
   
   Pls correct me if I'm wrong.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan closed issue #3856: [SUPPORT] Maybe should cache baseDir in nonHoodiePathCache in HoodieROTablePathFilter?

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #3856:
URL: https://github.com/apache/hudi/issues/3856


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] boneanxs commented on issue #3856: [SUPPORT] Maybe should cache baseDir in nonHoodiePathCache in HoodieROTablePathFilter?

Posted by GitBox <gi...@apache.org>.
boneanxs commented on issue #3856:
URL: https://github.com/apache/hudi/issues/3856#issuecomment-953547374


   Thanks for noticing it.
   Sorry, I don't get your thought, we can check the baseDir is in nonHoodiePathCache or not here: [HoodieROTablePathFilter.java#L174](https://github.com/apache/hudi/blob/e5b6b8602c242c89cdb45440df8d2996a6c301f1/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieROTablePathFilter.java#L174). 
   
   I don't think it will give the wrong results if the path is a hudi path, if it is a hudi path, then baseDir must have the metadata, then it can't be in nonHoodieCache. If the partition level is changed in the same query, the baseDir will also be changed by method `HoodieHiveUtils.getNthParent`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #3856: [SUPPORT] Maybe should cache baseDir in nonHoodiePathCache in HoodieROTablePathFilter?

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #3856:
URL: https://github.com/apache/hudi/issues/3856#issuecomment-968191556


   yeah. I also could not think of a reason. @bvaradar @n3nash : any thoughts here. If not, we will go ahead and work on a fix. 
   @boneanxs : Do you think you can put up a fix on this end. We can probably get it into upcoming release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] boneanxs commented on issue #3856: [SUPPORT] Maybe should cache baseDir in nonHoodiePathCache in HoodieROTablePathFilter?

Posted by GitBox <gi...@apache.org>.
boneanxs commented on issue #3856:
URL: https://github.com/apache/hudi/issues/3856#issuecomment-969870494


   Yes, I'm willing to put up a fix for it, will do it these days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] codope commented on issue #3856: [SUPPORT] Maybe should cache baseDir in nonHoodiePathCache in HoodieROTablePathFilter?

Posted by GitBox <gi...@apache.org>.
codope commented on issue #3856:
URL: https://github.com/apache/hudi/issues/3856#issuecomment-953028582


   @boneanxs That's a good point and I had the same question too. My understanding is we cannot make assumptions about partition path for non-hoodie table (such as we cannot be sure of partition depth, or whether it's hive style partitioning or not) so we cache the immediate parent folder. 
   
   cc @nsivabalan @vinothchandar 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org