You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "guanlisheng (via GitHub)" <gi...@apache.org> on 2023/02/16 02:57:36 UTC

[GitHub] [hudi] guanlisheng opened a new issue, #7976: [SUPPORT] parquet-dereference-pushdown not working on hudi 0.10.1 and presto 0.275

guanlisheng opened a new issue, #7976:
URL: https://github.com/apache/hudi/issues/7976

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   
   with `hive.enable-parquet-dereference-pushdown` property and session `parquet_batch_reader_verification_enabled`, the query on Hudi table's sub-field is not working and always returns errors. 
   
   
   A clear and concise description of the problem.
   
   in the CLI console it says
   ```
   Query 20230216_023851_00265_zje6y failed: Error opening Hive split s3a://xxx/yyy/2023/02/10/20/3ba89a4b-d5fa-4fca-b5b2-59b4924c34b0-0_1-18551-4732440_20230210200557831.parquet (offset=0, length=202461131): null
   ```
   
   more details from presto web stack track
   ```com.facebook.presto.spi.PrestoException: Error opening Hive split s3a://xxx/yyy/onlinelog/2023/02/10/20/3ba89a4b-d5fa-4fca-b5b2-59b4924c34b0-0_1-18551-4732440_20230210200557831.parquet (offset=0, length=202461131): null
   	at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.createParquetPageSource(ParquetPageSourceFactory.java:391)
   	at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:196)
   	at com.facebook.presto.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:452)
   	at com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:187)
   	at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:63)
   	at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:80)
   	at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:248)
   	at com.facebook.presto.operator.Driver.processInternal(Driver.java:426)
   	at com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:309)
   	at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:730)
   	at com.facebook.presto.operator.Driver.processFor(Driver.java:302)
   	at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1079)
   	at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:166)
   	at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:599)
   	at com.facebook.presto.$gen.Presto_0_275_f3f1035____20230215_041712_1.run(Unknown Source)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.ArrayIndexOutOfBoundsException: undefined```
   
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.10.1
   
   * Spark version : 2.4.8
   
   * Hive version : 2.3.9
   
   * Hadoop version :2.10.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   * Presto version : 0.275
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on issue #7976: [SUPPORT] parquet-dereference-pushdown not working on hudi 0.10.1 and presto 0.275

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on issue #7976:
URL: https://github.com/apache/hudi/issues/7976#issuecomment-1433417238

   Hey @guanlisheng thanks for adding the details.  We need to repro this.  @todd5167 @codope do you have any idea why this can fail?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org