You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/08/20 12:55:28 UTC

[GitHub] [hudi] rubenssoto commented on issue #1981: [SUPPORT] Huge performance Difference Between Hudi and Regular Parquet in Athena

rubenssoto commented on issue #1981:
URL: https://github.com/apache/hudi/issues/1981#issuecomment-677647272


   Yeah, I could try.
   
   I made some tests, the smaller table was partitioned by day, so now I partitioned by year-month, so now I have greater files...my simple count improve a lot before was taking 1 minute and 30 seconds, now 17 seconds, but count on bigger table takes only 7 seconds.
   
   I could try on EMR but I catch this error
   
   Query 20200820_125020_00004_h9eb5 failed: Not valid Parquet file: s3://datalake/raw/courier_api/demand_coverage/created_year_month_brt=2020-06-01/b89ad14e-8cf2-446b-934a-b27107e88e20-0_26-8-4880_20200819200116.parquet expected magic number: [80, 65, 82, 49] got: [51, -66, -112, 88] 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org