You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/04/21 00:21:36 UTC

[GitHub] [incubator-hudi] vinothchandar commented on issue #1528: [SUPPORT] Issue while writing to HDFS via hudi. Only `/.hoodie` folder is written.

vinothchandar commented on issue #1528:
URL: https://github.com/apache/incubator-hudi/issues/1528#issuecomment-616877781


   @jenu9417 Thanks for taking the time to report this. 
   
   a) is weird.. The logs do indicate that tasks got scheduled atleast.. but I think the job died before getting to write any data.. Do you have access to Spark UI? to see how the jobs are doing..
   
   b) So `.parquet()` does not use hudi at all (I suspect).. It uses the Spark parquet datasource and you can look at official spark docs to understand how you can partition that write (I think `.partitionBy("batch")`). `.save()` will invoke the save method of the datasource you configured using `format(...)`.. Spark docs will do a better job of explaining this than me :) 
   
   >The query was throwing error that there are no such field called _hoodie_commit_time
   
   parquet and hudi are different things.. Only hudi datasets have this field 
   
   c) `.hoodie` will contain all the metadata
   
   d) You can find more on compaction here https://cwiki.apache.org/confluence/display/HUDI/Design+And+Architecture#DesignAndArchitecture-Compaction 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org