You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/25 08:29:00 UTC

[GitHub] [hudi] Zouxxyy opened a new issue, #6787: [SUPPORT] questions about DEFAULT_METADATA_ENABLE_FOR_READERS

Zouxxyy opened a new issue, #6787:
URL: https://github.com/apache/hudi/issues/6787

   **Describe the problem you faced**
   
   Why is DEFAULT_METADATA_ENABLE_FOR_READERS not set as a configuration parameter
   
   How is it related to `hoodie.metadata.enable`?
   
   **Environment Description**
   
   * Hudi version : 0.12.0
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] idrismike commented on issue #6787: [SUPPORT] hoodie.metadata.enable and DEFAULT_METADATA_ENABLE_FOR_READERS conflict

Posted by GitBox <gi...@apache.org>.
idrismike commented on issue #6787:
URL: https://github.com/apache/hudi/issues/6787#issuecomment-1273452370

   In this relevant discussion, I have encountered yet another problem.
   
   First of all, We have data already written using hudi `0.9.0`. And we are using spark `hive metastore` to read the data. **Note** The metadata is disabled (by default as in documentation) and hence no `table/.hoodie/metadata` folder exists. Reading using metastore works fine.
   
   Now, We upgraded to hudi `0.12.0`, and tried to read the data using `metastore` - and get a continuous warning (see screenshot) where it tries to refresh the table after few seconds, and throws warnings that it cannot find the metadata folder. 
   ![image](https://user-images.githubusercontent.com/12491651/194896794-905df8c3-92f1-4a19-8053-3ea558e585dd.png)
   These warnings are shown continuously and no advancements made.
   
   As i see from the comments of @yihua , the reader by default does/should not use metadata - then why do we get these unlimited continuous refreshes and warnings, and it never advances.
   
   Is this confiugraiton `DEFAULT_METADATA_ENABLE_FOR_READERS` only used for spark DataSourceOptions? How can metadata be disabled for spark's `hive metastore`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] parisni commented on issue #6787: [SUPPORT] hoodie.metadata.enable and DEFAULT_METADATA_ENABLE_FOR_READERS conflict

Posted by GitBox <gi...@apache.org>.
parisni commented on issue #6787:
URL: https://github.com/apache/hudi/issues/6787#issuecomment-1302715131

   > So currently the metadata is only written, but never used? Unless manually modify DEFAULT_METADATA_ENABLE_FOR_READERS=true, then compile hudi ?
   @Zouxxyy 
   you can turn the option at read time. `.option("hoodie.metadata.enable","true")` to leverage the MDT at read time.
   
   @idrismike 
   > Now, We upgraded to hudi 0.12.0, and tried to read the data using metastore -
   hudi 0.11 and above won't use the hive metastore to get the partition. Even when you think you read from metastore, it fall back reading hudi by path. Then if you don't have a metadata table enabled, it get's the partition from scanning the whole table on filesystem. which is rather slow and costly. You should consider activating the MDT (at read / write) if you want to speed up things


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] parisni commented on issue #6787: [SUPPORT] hoodie.metadata.enable and DEFAULT_METADATA_ENABLE_FOR_READERS conflict

Posted by GitBox <gi...@apache.org>.
parisni commented on issue #6787:
URL: https://github.com/apache/hudi/issues/6787#issuecomment-1315579031

   true, hive metastore is not mandatory with hudi. You can use hudi without, but you won't get easy identifier such database.table but work with s3/hdfs path only


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on issue #6787: [SUPPORT] hoodie.metadata.enable and DEFAULT_METADATA_ENABLE_FOR_READERS conflict

Posted by GitBox <gi...@apache.org>.
yihua commented on issue #6787:
URL: https://github.com/apache/hudi/issues/6787#issuecomment-1258149079

   @Zouxxyy `DEFAULT_METADATA_ENABLE_FOR_READERS` is used for the reader side, deciding whether a metadata table should be used for reading a Hudi table by default.  The write side does not leverage `DEFAULT_METADATA_ENABLE_FOR_READERS`.  We plan to remove this reader-side default.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] idrismike commented on issue #6787: [SUPPORT] hoodie.metadata.enable and DEFAULT_METADATA_ENABLE_FOR_READERS conflict

Posted by GitBox <gi...@apache.org>.
idrismike commented on issue #6787:
URL: https://github.com/apache/hudi/issues/6787#issuecomment-1315108738

   Interesting but at the same time curious to know that from `hudi 0.11` hive metastore is not used anymore for getting partition. If I understand correctly, since metastore does not do any partition pruning etc., then we don't necessarily need `hive metastore`? is that correct?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua closed issue #6787: [SUPPORT] hoodie.metadata.enable and DEFAULT_METADATA_ENABLE_FOR_READERS conflict

Posted by GitBox <gi...@apache.org>.
yihua closed issue #6787: [SUPPORT]  hoodie.metadata.enable and DEFAULT_METADATA_ENABLE_FOR_READERS conflict
URL: https://github.com/apache/hudi/issues/6787


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Zouxxyy commented on issue #6787: [SUPPORT] hoodie.metadata.enable and DEFAULT_METADATA_ENABLE_FOR_READERS conflict

Posted by GitBox <gi...@apache.org>.
Zouxxyy commented on issue #6787:
URL: https://github.com/apache/hudi/issues/6787#issuecomment-1258185449

   > @Zouxxyy `DEFAULT_METADATA_ENABLE_FOR_READERS` is used for the reader side, deciding whether a metadata table should be used for reading a Hudi table by default. The write side does not leverage `DEFAULT_METADATA_ENABLE_FOR_READERS`. We plan to remove this reader-side default.
   
   So currently the metadata is only written, but never used? Unless manually modify `DEFAULT_METADATA_ENABLE_FOR_READERS=true`, then compile hudi ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org