You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/08/06 20:08:56 UTC

[GitHub] [hudi] nsivabalan edited a comment on issue #3324: [SUPPORT]Slow Performance With Spark Structured Streaming

nsivabalan edited a comment on issue #3324:
URL: https://github.com/apache/hudi/issues/3324#issuecomment-894492641


   with MOR, there are 3 types of queries that could be of benefit to you. 
   Config : https://hudi.apache.org/docs/configurations#query_type_opt_key
   [Snapshot/Realtime read](https://hudi.apache.org/docs/quick-start-guide#query-data) : reads entire data for latest snapshot. 
   
   ReadOptimized query: "read_optimized"
   As I was telling you earlier, for a given data file, depending on your compaction schedule, there could be some delta log files. For snapshot reads, these will be merged with base data files and then served. Where as for ReadOptimized query, only the base data files will be read. 
   If you can give up on freshness, your queries will be much faster since there is not real time merge involved. 
   
   And then you have [incremental read](https://hudi.apache.org/docs/quick-start-guide#incremental-query) which will give you delta records between commits.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org