You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/11/16 21:47:00 UTC

[jira] [Updated] (HUDI-151) Fix Realtime queries on Hive on Spark engine

     [ https://issues.apache.org/jira/browse/HUDI-151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated HUDI-151:
--------------------------------
    Labels: pull-request-available  (was: )

> Fix Realtime queries on Hive on Spark engine
> --------------------------------------------
>
>                 Key: HUDI-151
>                 URL: https://issues.apache.org/jira/browse/HUDI-151
>             Project: Apache Hudi (incubating)
>          Issue Type: Task
>          Components: Realtime View
>            Reporter: Nishith Agarwal
>            Assignee: Nishith Agarwal
>            Priority: Minor
>              Labels: pull-request-available
>
> ColumnId projections work differently across HoodieInputFormat and HoodieRealtimeInputFormat
> We track the read column ids and names to be used throughout the execution and lifetime of a mapper task needed for Hive on Spark. Our theory is that due to \{@link org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher} not handling empty list correctly, the ParquetRecordReaderWrapper ends up adding the same column ids multiple times which ultimately breaks the query. We need to find why RO view works fine but RT doesn't.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)