You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/09/27 12:22:33 UTC

[GitHub] [hudi] xiarixiaoyao edited a comment on pull request #3203: [HUDI-2086] Redo the logical of mor_incremental_view for hive

xiarixiaoyao edited a comment on pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#issuecomment-927820027


   @danny0405  i revert this pr: [HUDI-1969] Support reading logs for MOR Hive rt table (#3033), since this pr introduce a critical bug which may cause the query result repeat。 
   give a example:  now we have a base file(file1) which contains two record:  key1, p1
                                                                                                                  key2, p2
   then we update key2,  the log file(logfile1) produced and contains record: key2, p3
   when hive/presto query the table:   file1 has been split into two file parts, one is file1_part1, which contains all the record; the other is file2_part2 wich contains no record.       notice those two file parts  will be bound to logfile1 and produce two realtimesplit。  then the log file will be read twice， repeat result produced。
   The above is a very simple example， in fact query engine split one parquet file into small parts is very common。
   
   This pr  also covers the [HUDI-1969] 。
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org