You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by yew1eb <gi...@git.apache.org> on 2018/06/05 07:04:15 UTC
[GitHub] flink pull request #6118: [FLINK-9525][filesystem] Add missing `META-INF/ser...
GitHub user yew1eb opened a pull request:
https://github.com/apache/flink/pull/6118
[FLINK-9525][filesystem] Add missing `META-INF/services/*FileSystemFactory` file for flink-hadoop-fs
## What is the purpose of the change
more details, see JIRA:<https://issues.apache.org/jira/browse/FLINK-9525>
## Brief change log
- *Add missing `META-INF/services/org.apache.flink.core.fs.FileSystemFactory` file for flink-hadoop-fs module*
## Verifying this change
add this missing file, and rebuild flink from source, then test my test-flink-job (includes `hadoop-common` and `hadoop-dfs` dependencies) pass.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (yes / **no**)
- The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
- The serializers: (yes / **no** / don't know)
- The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
- The S3 file system connector: (yes / **no** / don't know)
## Documentation
- Does this pull request introduce a new feature? (yes / **no**)
- If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/yew1eb/flink FLINK-9525
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/6118.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #6118
----
commit 73006996a2d06649cac72a53559c12689fab82b8
Author: yew1eb <ye...@...>
Date: 2018-06-05T06:50:41Z
[FLINK-9525][filesystem] Add this missing META-INF/services/*FileSystemFactory file in flink-hadoop-fs module
----
---
[GitHub] flink issue #6118: [FLINK-9525][filesystem] Add missing `META-INF/services/*...
Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/6118
Where in the loading of the factories do you see the error?
My suspicion is still an issue with inverted class loading.
To confirm, can we check the following?
- Are you running this on Flink 1.4.0 or 1.4.1?
- Do you have `hadoop-common` in the job's jar, or in the `flink/lib` folder?
- Does the error go away if you set "classloader.resolve-order: parent-first" in the config?
---
[GitHub] flink issue #6118: [FLINK-9525][filesystem] Add missing `META-INF/services/*...
Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/6118
I think this is a misdiagnosis, this should not be merged.
Flink does not need a file system factory for Hadoop, it uses Hadoop's FS as the general fallback for all schemes that it does not have a factory for.
The exception in the linked JIRA comes from Hadoop's own File System discovery. There is probably some casting error or so (may be due to inverted classloading).
---
[GitHub] flink issue #6118: [FLINK-9525][filesystem] Add missing `META-INF/services/*...
Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:
https://github.com/apache/flink/pull/6118
This looks correct to me, @StephanEwen are we missing anything?
---
[GitHub] flink issue #6118: [FLINK-9525][filesystem] Add missing `META-INF/services/*...
Posted by yew1eb <gi...@git.apache.org>.
Github user yew1eb commented on the issue:
https://github.com/apache/flink/pull/6118
yes, this is a hadoop-file-system discovery issue (similar case: https://stackoverflow.com/questions/17265002/hadoop-no-filesystem-for-scheme-file),
but if flink-job dependency this `hadoop-common`, and flink-cluster uses hdfs to store checkpoint, job will throw error when init filesystem for checkpoint.
![image](https://user-images.githubusercontent.com/4133864/40985981-c101bb3a-6917-11e8-82a4-5c62e2fd7ec0.png)
i think we should improve `load file system factories` part.
see `org.apache.flink.core.fs.FileSystem` code snippets:
```
/** All available file system factories. */
private static final List<FileSystemFactory> RAW_FACTORIES = loadFileSystems();
/** Mapping of file system schemes to the corresponding factories,
* populated in {@link FileSystem#initialize(Configuration)}. */
private static final HashMap<String, FileSystemFactory> FS_FACTORIES = new HashMap<>();
/** The default factory that is used when no scheme matches. */
private static final FileSystemFactory FALLBACK_FACTORY = loadHadoopFsFactory();
```
@StephanEwen , what do you think about this?
---