You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/10/11 00:08:47 UTC

[GitHub] [hudi] somebol opened a new issue #2166: [SUPPORT] Hive Query Latest Records

somebol opened a new issue #2166:
URL: https://github.com/apache/hudi/issues/2166


   ** The Issue **
   Is there a way we can query to get the latest record across commits?
   
   e.g.
   commit-1
   Record-1, Value A
   Record-2, Value A
   
   commit-2
   Record-1, Value B
   Record-3, Value B
   
   desired output
   Record-1, Value B
   Record-2, Value A
   Record-3, Value B
   
   ** Issue Details **
   @bvaradar - the details you wanted.
   
   * Query in Hive / Hue *
   ![image](https://user-images.githubusercontent.com/29965228/95667282-ad52d000-0baf-11eb-83e2-08e0ff4c01d4.png)
   ![image](https://user-images.githubusercontent.com/29965228/95667314-fdca2d80-0baf-11eb-8bd8-010f5e3e0ff4.png)
   
   The result has shows all commits for the record, not the latest as expected.
   
   * Query in spark shell *
   ![image](https://user-images.githubusercontent.com/29965228/95667362-b2fce580-0bb0-11eb-9efb-7842b548ecf2.png)
   ![image](https://user-images.githubusercontent.com/29965228/95667373-c7d97900-0bb0-11eb-9284-dc2c3db19161.png)
   
   This is the correct expected output.
   
   * .Hoodie contents *
   ![image](https://user-images.githubusercontent.com/29965228/95667445-ed1ab700-0bb1-11eb-931b-a4cb16ee0dbe.png)
   
   ** using hudie verison 0.53 **
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #2166: [SUPPORT] Hive Query Latest Records

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #2166:
URL: https://github.com/apache/hudi/issues/2166#issuecomment-744882397


   @somebol : Please reopen if you are still seeing issues. 
   
   Balaji.V


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar closed issue #2166: [SUPPORT] Hive Query Latest Records

Posted by GitBox <gi...@apache.org>.
bvaradar closed issue #2166:
URL: https://github.com/apache/hudi/issues/2166


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #2166: [SUPPORT] Hive Query Latest Records

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #2166:
URL: https://github.com/apache/hudi/issues/2166#issuecomment-706965836


   @somebol : Its hard to figure out if all 4 rows you are seeing in "Query in hue/hive" have the same record key due to masking. But assuming that is the case, you should not be seeing duplicate record keys ? 
   
   Are you writing using "upsert" operation and deduping the incoming batch using hoodie.combine.before.upsert=true (which is the default) ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org