You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2021/01/29 00:49:00 UTC

[jira] [Resolved] (HUDI-151) Fix Realtime queries on Hive on Spark engine

     [ https://issues.apache.org/jira/browse/HUDI-151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sivabalan narayanan resolved HUDI-151.
--------------------------------------
    Fix Version/s: 0.5.2
       Resolution: Fixed

[~nishith29]: please reopen if the issue still persists

> Fix Realtime queries on Hive on Spark engine
> --------------------------------------------
>
>                 Key: HUDI-151
>                 URL: https://issues.apache.org/jira/browse/HUDI-151
>             Project: Apache Hudi
>          Issue Type: Task
>          Components: Hive Integration
>            Reporter: Nishith Agarwal
>            Assignee: Nishith Agarwal
>            Priority: Minor
>              Labels: pull-request-available, user-support-issues
>             Fix For: 0.5.2
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> ColumnId projections work differently across HoodieInputFormat and HoodieRealtimeInputFormat
> We track the read column ids and names to be used throughout the execution and lifetime of a mapper task needed for Hive on Spark. Our theory is that due to \{@link org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher} not handling empty list correctly, the ParquetRecordReaderWrapper ends up adding the same column ids multiple times which ultimately breaks the query. We need to find why RO view works fine but RT doesn't.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)