You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Simon Zhou (Jira)" <ji...@apache.org> on 2021/05/05 02:59:00 UTC

[jira] [Commented] (HUDI-431) Design and develop parquet logging in Log file

    [ https://issues.apache.org/jira/browse/HUDI-431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339382#comment-17339382 ] 

Simon Zhou commented on HUDI-431:
---------------------------------

[~vinoth] what is the intention of having inline files? In which cases we'd use them? I don't find relevant info from HUDI-430 and its linked PR.

For a given inline file, it cannot be a mix of eg, both parquet and hfile, correct?

If we want to add support for inline parquet, we should also have a new file format defined in HoodieFileFormat, something like .inlineParquet. Is my understanding correct?

Regarding the code structure, are you saying that we want to expose ParquetWriter/ParquetReader from HoodieLogFile? It's not common that we return reader/writer from file class. Instead, reader/writer should take file object as parameter when reading/writing. I'm thinking of some classes in JDK. 

> Design and develop parquet logging in Log file
> ----------------------------------------------
>
>                 Key: HUDI-431
>                 URL: https://issues.apache.org/jira/browse/HUDI-431
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: Storage Management
>            Reporter: sivabalan narayanan
>            Assignee: Vinoth Chandar
>            Priority: Major
>              Labels: help-requested
>
> We have a basic implementation of inline filesystem, to read a file format like Parquet, embedded "inline" into another file.  
> [https://github.com/apache/hudi/blob/master/hudi-common/src/test/java/org/apache/hudi/common/fs/inline/TestInLineFileSystem.java] for sample usage.
>  This idea here is to see if we can embed parquet/hfile formats into the Hudi log files, to get columnar reads on the delta log files as well. This helps us speed up query performance, given the log is row based today. Once Inline FS is available, enable parquet logging support with HoodieLogFile. LogFile can expose a writer (essentially ParquetWriter) and users can write records as though writing to parquet files. Similarly on the read path, a reader (parquetReader) will be exposed which the user can use to read data out of it. 
> This Jira tracks work to implement such parquet inlining into the log format and have the writer and reader use it. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)