You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/11/13 12:15:28 UTC

[GitHub] [hudi] BalaMahesh commented on issue #2251: [SUPPORT] select queries failing with InvalidInputException: Input path does not exist even though file is present in directory

BalaMahesh commented on issue #2251:
URL: https://github.com/apache/hudi/issues/2251#issuecomment-726732563


   Update : 1 . After adding the additional log statement in HoodieParquetInputFormat and InputHandler classes, I have found this : 
   
   1) [InputInitializer {Map 1} #0] |hadoop.InputPathHandler|: Got the input paths : [s3a://xxx/test/hudi/data/xxx/xxx/dt=2020-11-13/.hoodie_partition_metadata, s3a://xxx/test/hudi/data/xxx/xxx/dt=2020-11-13/4e5582b0-ceb4-4d7c-ab98-bb9dfb0962e6-0_0-17038-5024094_20201113170011.parquet]conf : Configuration: incrementalTables : []
   
   Query Job has got the input paths as the files inside partition directory instead of partition directory itself , now Hudi mr bundle is trying to append metadata filename to these base files and failing to find the metadata file path . 
   
   In the same hive session , query on the different hudi table has the below logs : 
   
   hadoop.InputPathHandler|: Got the input paths : [s3a://xxxx/test/hudi/data/xxx/xxx/dt=2020-11-13]conf : Configuration: incrementalTables : []  which is upto partition directory unlike above base file path, in this case ,partition metadata file is accessible and query is finishing . 
   
   I would need help to figuring out from where job is getting the base files are inputPath instead of directory, i did describe formatted table partition(val) on the tables and they both have same directory structure. 
   
   
   
    
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org