You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/08/26 09:12:14 UTC

[GitHub] [hudi] liqiquan opened a new issue, #6511: [SUPPORT]

liqiquan opened a new issue, #6511:
URL: https://github.com/apache/hudi/issues/6511

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   Using insert_overwrite_table mode, presto reads and returns data from all versions of parquet files
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.Use the insert_overwrite_table mode to write the hudi table, at least twice
   2. Presto reads the table in step 1. If the catalog is hudi, reading the hudi table is normal
   3.Presto reads the table in step 1. If the catalog is hive, the version cannot be distinguished when reading the hudi table, and the data of all versions of parquet files will be read.
   
   For example, I write twice, each time I write 100 pieces of data. When using presto to read, it should read 100 pieces of data of the latest version, but actually all 200 pieces of data will be read.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.11.1
   
   * Spark version : 3.2.2
   
   * Hive version : 2.7.3
   
   * Hadoop version :3.3.2
   
   * Presto version:0.275
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan closed issue #6511: [SUPPORT] Using the insert_overwrite_table mode, the data of all versions of parquet files is returned when presto queries

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #6511: [SUPPORT] Using the insert_overwrite_table mode, the data of all versions of parquet files is returned when presto queries
URL: https://github.com/apache/hudi/issues/6511


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6511: [SUPPORT] Using the insert_overwrite_table mode, the data of all versions of parquet files is returned when presto queries

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6511:
URL: https://github.com/apache/hudi/issues/6511#issuecomment-1229251285

   Presto had some fix around intercepting replace commits recently. not sure if this is related to that. @codope : do you have any idea here. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] liqiquan commented on issue #6511: [SUPPORT] Using the insert_overwrite_table mode, the data of all versions of parquet files is returned when presto queries

Posted by GitBox <gi...@apache.org>.
liqiquan commented on issue #6511:
URL: https://github.com/apache/hudi/issues/6511#issuecomment-1229066706

   > @liqiquan Did you use `INSERT OVERWRITE TABLE` in Spark SQL to write the Hudi table? How did you create the table? Is the Hudi table synced to Hive for Presto to query?
   
   yes,i use `INSERT OVERWRITE TABLE` in Spark SQL to write the Hudi table  and presto reads the hive table synchronized by hudi


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6511: [SUPPORT] Using the insert_overwrite_table mode, the data of all versions of parquet files is returned when presto queries

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6511:
URL: https://github.com/apache/hudi/issues/6511#issuecomment-1301606531

   feel free to raise a new issue if you are looking for more assistance. thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on issue #6511: [SUPPORT] Using the insert_overwrite_table mode, the data of all versions of parquet files is returned when presto queries

Posted by GitBox <gi...@apache.org>.
yihua commented on issue #6511:
URL: https://github.com/apache/hudi/issues/6511#issuecomment-1229030208

   @liqiquan Did you use `INSERT OVERWRITE TABLE` in Spark SQL to write the Hudi table?  How did you create the table?  Is the Hudi table synced to Hive for Presto to query?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6511: [SUPPORT] Using the insert_overwrite_table mode, the data of all versions of parquet files is returned when presto queries

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6511:
URL: https://github.com/apache/hudi/issues/6511#issuecomment-1287999199

   https://github.com/prestodb/presto/pull/18209
   the fix is available from hudi 0.12.0. 
   
   Let us know if you can try out 0.12.0. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org