You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2021/01/26 20:02:00 UTC

[jira] [Updated] (HUDI-829) Efficiently reading hudi tables through spark-shell

     [ https://issues.apache.org/jira/browse/HUDI-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sivabalan narayanan updated HUDI-829:
-------------------------------------
    Labels: user-support-issues  (was: )

> Efficiently reading hudi tables through spark-shell
> ---------------------------------------------------
>
>                 Key: HUDI-829
>                 URL: https://issues.apache.org/jira/browse/HUDI-829
>             Project: Apache Hudi
>          Issue Type: Task
>          Components: Spark Integration
>            Reporter: Nishith Agarwal
>            Assignee: Nishith Agarwal
>            Priority: Major
>              Labels: user-support-issues
>
> [~uditme] Created this ticket to track some discussion on read/query path of spark with Hudi tables. 
> My understanding is that when you read Hudi tables through spark-shell, some of your queries are slower due to some sequential activity performed by spark when interacting with Hudi tables (even with spark.sql.hive.convertMetastoreParquet which can give you the same data reading speed and all the vectorization benefits). Is this slowness observed during spark query planning ? Can you please elaborate on this ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)