You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/11/12 14:08:44 UTC

[GitHub] [hudi] JoshuaZhuCN commented on issue #3981: [SUPPORT] If the HUDI table(MOR) contains only log files, the Spark Datasource cannot obtain data in snapshot mode

JoshuaZhuCN commented on issue #3981:
URL: https://github.com/apache/hudi/issues/3981#issuecomment-967144402


   > > after initializing the data with insert or upsert, only log files will be generated in the directory, and there is no parquet file.
   > 
   > This sounds very strange. When you insert to a new empty table, it's not meant to create log files, rather only write parquet. Only when subsequent update will result in log files. You mentioned HBase index. With your code and data, can you configure it to run data generation with SIMPLE index just to see any difference? wanted to rule out if HBase index is the problem here.
   
   @xushiyan I have tried using the (GLOBAL)SIMPLE index, (GLOBAL)BLOOM index for insert or upsert, which generates log files and does not generate parquet files, but using the HBASE index for insert or upsert, which only generates log files, The parquet file is generated only with bulk_insert.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org