You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Jiadai Xia (Jira)" <ji...@apache.org> on 2020/08/10 18:02:00 UTC

[jira] [Comment Edited] (BEAM-4379) Make ParquetIO Read splittable

    [ https://issues.apache.org/jira/browse/BEAM-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174967#comment-17174967 ] 

Jiadai Xia edited comment on BEAM-4379 at 8/10/20, 6:01 PM:
------------------------------------------------------------

Created PR [BEAM-4379|http://github.com/apache/beam/pull/12223] for this issue. Expose some of the internal Hadoop implementation and use the Splittable Dofn to achieve the rowgroup-level splittable reading. [~ŁukaszG] [~aromanenko]


was (Author: danielxjd):
Created PR [BEAM-4379 for this|http://github.com/apache/beam/pull/12223] issue. 

> Make ParquetIO Read splittable
> ------------------------------
>
>                 Key: BEAM-4379
>                 URL: https://issues.apache.org/jira/browse/BEAM-4379
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-ideas, io-java-parquet
>            Reporter: Lukasz Gajowy
>            Priority: P2
>
> As the title stands - currently it is not splittable which is not optimal for runners that support splitting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)