You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Vinoth Chandar (Jira)" <ji...@apache.org> on 2021/09/12 20:32:00 UTC

[jira] [Assigned] (HUDI-1912) Presto defaults to GenericHiveRecordCursor for all Hudi tables

     [ https://issues.apache.org/jira/browse/HUDI-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinoth Chandar reassigned HUDI-1912:
------------------------------------

    Assignee: Sagar Sumit

> Presto defaults to GenericHiveRecordCursor for all Hudi tables
> --------------------------------------------------------------
>
>                 Key: HUDI-1912
>                 URL: https://issues.apache.org/jira/browse/HUDI-1912
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Presto Integration
>    Affects Versions: 0.7.0
>            Reporter: satish
>            Assignee: Sagar Sumit
>            Priority: Blocker
>             Fix For: 0.7.0
>
>
> See code here https://github.com/prestodb/presto/blob/2ad67dcf000be86ebc5ff7732bbb9994c8e324a8/presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetPageSourceFactory.java#L168
> Starting Hudi 0.7, HoodieInputFormat comes with UseRecordReaderFromInputFormat annotation. As a result, we are skipping all optimizations in parquet PageSource and using basic GenericHiveRecordCursor which has several limitations:
> 1) No support for timestamp
> 2) No support for synthesized columns
> 3) No support for vectorized reading?
> Example errors we saw:
> Error#1
> {code}
> java.lang.IllegalStateException: column type must be regular
> 	at com.google.common.base.Preconditions.checkState(Preconditions.java:507)
> 	at com.facebook.presto.hive.GenericHiveRecordCursor.<init>(GenericHiveRecordCursor.java:167)
> 	at com.facebook.presto.hive.GenericHiveRecordCursorProvider.createRecordCursor(GenericHiveRecordCursorProvider.java:79)
> 	at com.facebook.presto.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:449)
> 	at com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:177)
> 	at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:63)
> 	at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:80)
> 	at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:231)
> 	at com.facebook.presto.operator.Driver.processInternal(Driver.java:418)
> 	at com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:301)
> 	at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:722)
> 	at com.facebook.presto.operator.Driver.processFor(Driver.java:294)
> 	at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
> 	at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
> 	at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:545)
> 	at com.facebook.presto.$gen.Presto_0_247_17f857e____20210506_210241_1.run(Unknown Source)
> 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> 	at java.base/java.lang.Thread.run(Thread.java:834) 
> {code}
> Error#2
> {code}
> java.lang.ClassCastException: class org.apache.hadoop.io.LongWritable cannot be cast to class org.apache.hadoop.hive.serde2.io.TimestampWritable (org.apache.hadoop.io.LongWritable and org.apache.hadoop.hive.serde2.io.TimestampWritable are in unnamed module of loader com.facebook.presto.server.PluginClassLoader @5c4e86e7)
> 	at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector.getPrimitiveJavaObject(WritableTimestampObjectInspector.java:39)
> 	at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector.getPrimitiveJavaObject(WritableTimestampObjectInspector.java:25)
> 	at com.facebook.presto.hive.GenericHiveRecordCursor.parseLongColumn(GenericHiveRecordCursor.java:286)
> 	at com.facebook.presto.hive.GenericHiveRecordCursor.parseColumn(GenericHiveRecordCursor.java:550)
> 	at com.facebook.presto.hive.GenericHiveRecordCursor.isNull(GenericHiveRecordCursor.java:508)
> 	at com.facebook.presto.hive.HiveRecordCursor.isNull(HiveRecordCursor.java:233)
> 	at com.facebook.presto.spi.RecordPageSource.getNextPage(RecordPageSource.java:112)
> 	at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:251)
> 	at com.facebook.presto.operator.Driver.processInternal(Driver.java:418)
> 	at com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:301)
> 	at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:722)
> 	at com.facebook.presto.operator.Driver.processFor(Driver.java:294)
> 	at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
> 	at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
> 	at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:545)
> 	at com.facebook.presto.$gen.Presto_0_247_17f857e____20210506_210241_1.run(Unknown Source)
> 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> 	at java.base/java.lang.Thread.run(Thread.java:834)
> {code}
> In addition to errors above, performance also seems to have slowed down substantially.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)