You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "manasa (Jira)" <ji...@apache.org> on 2021/05/31 17:56:00 UTC
[jira] [Commented] (HUDI-1824) Spark Integration with ORC
[ https://issues.apache.org/jira/browse/HUDI-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354608#comment-17354608 ]
manasa commented on HUDI-1824:
------------------------------
[~Teresa] Looking at the ORC api , we don't seem to have corresponding api's like ParquetWriteSupport and orc write method which accepts InternalRow... But instead an abstraction of VectorizedRowBatch...
So i presume we would have to explicitly convert from InternalRow -> VectorizedRowBatch
Also are there any alternate class like ParquetWriteSupport for ORC for implementing bloom filter functionality
> Spark Integration with ORC
> --------------------------
>
> Key: HUDI-1824
> URL: https://issues.apache.org/jira/browse/HUDI-1824
> Project: Apache Hudi
> Issue Type: Sub-task
> Components: Storage Management
> Reporter: Teresa Kang
> Assignee: manasa
> Priority: Major
>
> Implement HoodieInternalRowOrcWriter for spark datasource integration with ORC.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)