You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "satish (Jira)" <ji...@apache.org> on 2021/05/18 04:26:00 UTC

[jira] [Commented] (HUDI-1912) Presto defaults to GenericHiveRecordCursor for all Hudi tables

    [ https://issues.apache.org/jira/browse/HUDI-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346580#comment-17346580 ] 

satish commented on HUDI-1912:
------------------------------

[~bhasudha] [~vinoth]  [~shivnarayan] FYI, Appreciate if we can figure out right way to fix this. Right now, this path is based on static annotation 'UseRecordReaderFromInputFormat'. But we want regular COW tables to use parquet page source to leverage built-in optimizations.

> Presto defaults to GenericHiveRecordCursor for all Hudi tables
> --------------------------------------------------------------
>
>                 Key: HUDI-1912
>                 URL: https://issues.apache.org/jira/browse/HUDI-1912
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Presto Integration
>            Reporter: satish
>            Priority: Blocker
>
> See code here https://github.com/prestodb/presto/blob/2ad67dcf000be86ebc5ff7732bbb9994c8e324a8/presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetPageSourceFactory.java#L168
> Starting Hudi 0.7, HoodieInputFormat comes with UseRecordReaderFromInputFormat annotation. As a result, we are skipping all optimizations in parquet PageSource and using basic GenericHiveRecordCursor which has several limitations:
> 1) No support for timestamp
> 2) No support for synthesized columns
> 3) No support for vectorized reading?
> Example errors we saw:
> Error#1
> {code}
> java.lang.IllegalStateException: column type must be regular
> 	at com.google.common.base.Preconditions.checkState(Preconditions.java:507)
> 	at com.facebook.presto.hive.GenericHiveRecordCursor.<init>(GenericHiveRecordCursor.java:167)
> 	at com.facebook.presto.hive.GenericHiveRecordCursorProvider.createRecordCursor(GenericHiveRecordCursorProvider.java:79)
> 	at com.facebook.presto.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:449)
> 	at com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:177)
> 	at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:63)
> 	at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:80)
> 	at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:231)
> 	at com.facebook.presto.operator.Driver.processInternal(Driver.java:418)
> 	at com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:301)
> 	at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:722)
> 	at com.facebook.presto.operator.Driver.processFor(Driver.java:294)
> 	at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
> 	at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
> 	at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:545)
> 	at com.facebook.presto.$gen.Presto_0_247_17f857e____20210506_210241_1.run(Unknown Source)
> 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> 	at java.base/java.lang.Thread.run(Thread.java:834) 
> {code}
> Error#2
> {code}
> java.lang.ClassCastException: class org.apache.hadoop.io.LongWritable cannot be cast to class org.apache.hadoop.hive.serde2.io.TimestampWritable (org.apache.hadoop.io.LongWritable and org.apache.hadoop.hive.serde2.io.TimestampWritable are in unnamed module of loader com.facebook.presto.server.PluginClassLoader @5c4e86e7)
> 	at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector.getPrimitiveJavaObject(WritableTimestampObjectInspector.java:39)
> 	at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector.getPrimitiveJavaObject(WritableTimestampObjectInspector.java:25)
> 	at com.facebook.presto.hive.GenericHiveRecordCursor.parseLongColumn(GenericHiveRecordCursor.java:286)
> 	at com.facebook.presto.hive.GenericHiveRecordCursor.parseColumn(GenericHiveRecordCursor.java:550)
> 	at com.facebook.presto.hive.GenericHiveRecordCursor.isNull(GenericHiveRecordCursor.java:508)
> 	at com.facebook.presto.hive.HiveRecordCursor.isNull(HiveRecordCursor.java:233)
> 	at com.facebook.presto.spi.RecordPageSource.getNextPage(RecordPageSource.java:112)
> 	at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:251)
> 	at com.facebook.presto.operator.Driver.processInternal(Driver.java:418)
> 	at com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:301)
> 	at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:722)
> 	at com.facebook.presto.operator.Driver.processFor(Driver.java:294)
> 	at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
> 	at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
> 	at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:545)
> 	at com.facebook.presto.$gen.Presto_0_247_17f857e____20210506_210241_1.run(Unknown Source)
> 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> 	at java.base/java.lang.Thread.run(Thread.java:834)
> {code}
> In addition to errors above, performance also seems to have slowed down substantially.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)