You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/06/27 01:56:01 UTC

[GitHub] [hudi] vinothchandar commented on issue #1766: [SUPPORT] Hudi COW - Bulk Insert followed by Upsert via Spark streaming job

vinothchandar commented on issue #1766:
URL: https://github.com/apache/hudi/issues/1766#issuecomment-650472516


   @RajasekarSribalan  as you may have guessed the issue seems like the right input format not getting invoked. Hudi input formats filler for the latest parquet files after each commit. So when this is not happening query ends up reading all the files resulting in duplicates. 
   
   What’s the version of hive you are trying to use? First error with combine input format seems like a jar mismatch issue 
   
   The second exception does seem to be coming from actual reading? Ie it called hoodieParquetInputFormat.getSplits properly and errors out trying to read parquet 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org