You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by yew1eb <gi...@git.apache.org> on 2018/06/05 07:04:15 UTC

[GitHub] flink pull request #6118: [FLINK-9525][filesystem] Add missing `META-INF/ser...

GitHub user yew1eb opened a pull request:

    https://github.com/apache/flink/pull/6118

    [FLINK-9525][filesystem] Add missing `META-INF/services/*FileSystemFactory` file for flink-hadoop-fs

    ## What is the purpose of the change
    
    more details, see JIRA:<https://issues.apache.org/jira/browse/FLINK-9525>
    
    ## Brief change log
    
    - *Add missing `META-INF/services/org.apache.flink.core.fs.FileSystemFactory` file for flink-hadoop-fs module*
    
    
    ## Verifying this change
    
    add this missing file, and rebuild flink from source, then test my test-flink-job (includes `hadoop-common` and `hadoop-dfs` dependencies) pass.
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (yes / **no**)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
      - The serializers: (yes / **no** / don't know)
      - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
      - The S3 file system connector: (yes / **no** / don't know)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (yes / **no**)
      - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yew1eb/flink FLINK-9525

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/6118.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6118
    
----
commit 73006996a2d06649cac72a53559c12689fab82b8
Author: yew1eb <ye...@...>
Date:   2018-06-05T06:50:41Z

    [FLINK-9525][filesystem] Add this missing META-INF/services/*FileSystemFactory file in flink-hadoop-fs module

----


---

[GitHub] flink issue #6118: [FLINK-9525][filesystem] Add missing `META-INF/services/*...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/6118
  
    Where in the loading of the factories do you see the error?
    My suspicion is still an issue with inverted class loading.
    
    To confirm, can we check the following?
      - Are you running this on Flink 1.4.0 or 1.4.1?
      - Do you have `hadoop-common` in the job's jar, or in the `flink/lib` folder?
      - Does the error go away if you set "classloader.resolve-order: parent-first" in the config?


---

[GitHub] flink issue #6118: [FLINK-9525][filesystem] Add missing `META-INF/services/*...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/6118
  
    I think this is a misdiagnosis, this should not be merged.
    
    Flink does not need a file system factory for Hadoop, it uses Hadoop's FS as the general fallback for all schemes that it does not have a factory for.
    
    The exception in the linked JIRA comes from Hadoop's own File System discovery. There is probably some casting error or so (may be due to inverted classloading).


---

[GitHub] flink issue #6118: [FLINK-9525][filesystem] Add missing `META-INF/services/*...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:

    https://github.com/apache/flink/pull/6118
  
    This looks correct to me, @StephanEwen are we missing anything?


---

[GitHub] flink issue #6118: [FLINK-9525][filesystem] Add missing `META-INF/services/*...

Posted by yew1eb <gi...@git.apache.org>.
Github user yew1eb commented on the issue:

    https://github.com/apache/flink/pull/6118
  
    yes, this is a hadoop-file-system discovery issue (similar case: https://stackoverflow.com/questions/17265002/hadoop-no-filesystem-for-scheme-file), 
    but if  flink-job dependency this `hadoop-common`, and flink-cluster uses hdfs to store checkpoint,   job will throw error when init filesystem for checkpoint.
    ![image](https://user-images.githubusercontent.com/4133864/40985981-c101bb3a-6917-11e8-82a4-5c62e2fd7ec0.png)
    
    i think we should improve `load file system factories` part.  
    see `org.apache.flink.core.fs.FileSystem` code snippets:
    ```
    	/** All available file system factories. */
    	private static final List<FileSystemFactory> RAW_FACTORIES = loadFileSystems();
    
    	/** Mapping of file system schemes to the corresponding factories,
    	 * populated in {@link FileSystem#initialize(Configuration)}. */
    	private static final HashMap<String, FileSystemFactory> FS_FACTORIES = new HashMap<>();
    
    	/** The default factory that is used when no scheme matches. */
    	private static final FileSystemFactory FALLBACK_FACTORY = loadHadoopFsFactory();
    ```
    
    @StephanEwen , what do you think about this?
    



---