You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "duc-dn (via GitHub)" <gi...@apache.org> on 2023/02/02 10:38:19 UTC

[GitHub] [hudi] duc-dn commented on issue #7683: [SUPPORT] Querying data using Trino only returns records of the latest commit, not all records.

duc-dn commented on issue #7683:
URL: https://github.com/apache/hudi/issues/7683#issuecomment-1413524767

   Hi @codope!!
   - I tried to build the bundle from the latest master branch and used the hive connector but still facing the same problem
   ![image](https://user-images.githubusercontent.com/64005590/216295597-f74646ac-e680-4a06-b52c-0bf66105576f.png)
   ![image](https://user-images.githubusercontent.com/64005590/216295752-66d7bfde-ce88-4b56-9095-1a63bcb14330.png)
   - After I tried to ingest data with spark. I created the spark dataframes and saved it (COPY_ON_WRITE table) into MinIO, also I synced it with hive metastore. Afterward, I performed 3 commits.
   ![image](https://user-images.githubusercontent.com/64005590/216298950-81da00ba-2d24-4ae1-a6fb-f3aa899f1dbd.png)
   ![image](https://user-images.githubusercontent.com/64005590/216299029-a1c358a2-5329-4d35-a0ef-758de6c0bc5f.png)
   ![image](https://user-images.githubusercontent.com/64005590/216299069-025c90fa-e266-461a-8e0f-e902634b40cc.png)
   ![image](https://user-images.githubusercontent.com/64005590/216299196-71d54898-f056-42ee-ac98-f18d7c6381e4.png)
   - Then trying queried with trino (I tried using hudi-connector and hive connector)
   => I got all records when ingested by spark
   ![image](https://user-images.githubusercontent.com/64005590/216300287-6f2ea436-cfb8-4089-a350-650fc6b57abe.png)
   - Besides when I read the latest parquet file when ingesting data by spark, I found that the file latest parquet copied records of the old data file version and merge them with the latest records but not with hudi-kafka-connector
   => What problem with hudi-kafka-connector??
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org