You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/04/24 04:48:07 UTC

[GitHub] [incubator-hudi] PhatakN1 edited a comment on issue #1549: Potential issue when using Deltastreamer with DMS

PhatakN1 edited a comment on issue #1549:
URL: https://github.com/apache/incubator-hudi/issues/1549#issuecomment-618292312


   If MOR inserts go to a parquet file but updates to go a log file, then a query on the _ro table will show the inserts since the last compaction but not the updates. Isnt that like providing an inconsistent state of data? So, I still see all inserts since the last compaction but none of  the updates?
   
   These are the contents of the log file using show logfile records in hudi-cli
   ```
   {"_hoodie_commit_time": "20200422083923", "_hoodie_commit_seqno": "20200422083923_1_2", "_hoodie_record_key": "11", "_hoodie_partition_path": "2019-03-14", "_hoodie_file_name": "c9df1d00-5dda-4bf7-8f27-1d4534bbbe4c-0", "dms_received_ts": "2020-04-22T08:38:36.873970Z", "tran_id": 11, "tran_date": "2019-03-14", "store_id": 5, "store_city": "CHICAGO", "store_state": "IL", "item_code": "XXXXXX", "quantity": 15, "total": 106.25, "Op": "D"}
   ```
   
   This is the log file metadata
   ```
   ║ 20200422083923 │ 1           │ AVRO_DATA_BLOCK │ {"SCHEMA":"{\"type\":\"record\",\"name\":\"retail_transactions\",\"fields\":[{\"name\":\"_hoodie_commit_time\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_commit_seqno\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_record_key\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_partition_path\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_file_name\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"dms_received_ts\",\"type\":\"string\"},{\"name\":\"tran_id\",\"type\":\"int\"},{\"name\":\"tran_date\",\"type\":\"string\"},{\"name\":\"store_id\",\"type\":\"int\"},{\"name\":\"store_city\",\"type\":\"string\"},{\"name\":\"store_state\",\"type\":\"string\"},{\"name\":\"item_code\",\"type\":\"string\"},{\"name\":\"quantity\",\"type\":\"int\"},{\"name\":\"total\",\"type\":\"float\"},{\"name\":\"Op\",\"type\":\"string\"}]}","INSTANT_TIME":"20200422083923"} │ {}             ║
   ```
   
   The name of the parquet file in the partition is c9df1d00-5dda-4bf7-8f27-1d4534bbbe4c-0_3-23-40_20200422072539.parquet and the log file name is `c9df1d00-5dda-4bf7-8f27-1d4534bbbe4c-0_20200422072539.log.1_1-24-33`
   
   The partiton metadata contents are 
   ```
   commitTime=20200422072539
   partitionDepth=1
   ```
   Not sure why a query on the _rt table does not reflect the delete. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org