You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Angelo Kastroulis (Jira)" <ji...@apache.org> on 2022/10/15 13:17:00 UTC

[jira] [Commented] (FLINK-10989) OrcRowInputFormat uses two different file systems

    [ https://issues.apache.org/jira/browse/FLINK-10989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618088#comment-17618088 ] 

Angelo Kastroulis commented on FLINK-10989:
-------------------------------------------

At the very minimum, this should be documented in one of the numerous places in the docs that tell you how to implement S3 in Flink. As it sits, those docs are inaccurate because they say nothing about the incompatibility between the filesystem and the formats and make it sound like all you have to do is put the filesystem plugin in the right place and you're set. You're not set at all if you want to use a sane option like parquet or orc. This it not a minor issue (unless folks just don't use S3 with formats with Flink).

> OrcRowInputFormat uses two different file systems
> -------------------------------------------------
>
>                 Key: FLINK-10989
>                 URL: https://issues.apache.org/jira/browse/FLINK-10989
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / ORC
>    Affects Versions: 1.7.0
>            Reporter: Till Rohrmann
>            Priority: Not a Priority
>              Labels: auto-deprioritized-major, auto-deprioritized-minor
>
> The {{OrcRowInputFormat}} seems to use two different {{FileSystem}}. The Flink {{FileSystem}} for listing the files and generating the {{InputSplits}} and then Hadoop's {{FileSystem}} to actually read the input splits. This can be problematic if one only configures Flink's S3 {{FileSystem}} but does not provide a S3 implementation for Hadoop's {{FileSystem}}.
> I think this is not an intuitive behaviour and can lead to hard to debug problems for a user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)