You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by al...@apache.org on 2017/11/13 15:41:09 UTC

[09/12] flink git commit: [FLINK-7973] Fix shading and relocating Hadoop for the S3 filesystems

[FLINK-7973] Fix shading and relocating Hadoop for the S3 filesystems

- do not shade everything, especially not JDK classes!
-> instead define include patterns explicitly
- do not shade core Flink classes (only those imported from flink-hadoop-fs)
- hack around Hadoop loading (unshaded/non-relocated) classes based on names in
  the core-default.xml by overwriting the Configuration class (we may need to
  extend this for the mapred-default.xml and hdfs-defaults.xml):
-> provide a core-default-shaded.xml file with shaded class names and copy and
  adapt the Configuration class of the respective Hadoop version to load this
  file instead of core-default.xml.

Add checkstyle suppression pattern for the Hadoop Configuration classes

Also fix the (integration) tests not working because they tried to load the
relocated classes which are apparently not available there

Remove minimizeJar from shading of flink-s3-fs-presto because this was
causing "java.lang.ClassNotFoundException:
org.apache.flink.fs.s3presto.shaded.org.apache.commons.logging.impl.LogFactoryImpl"
since these classes are not statically imported and thus removed when
minimizing.

Fix s3-fs-presto not shading org.HdrHistogram

Fix log4j being relocated in the S3 fs implementations

Add shading checks to travis


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/0e5fb0b7
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/0e5fb0b7
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/0e5fb0b7

Branch: refs/heads/master
Commit: 0e5fb0b78cd0a3ccb144071a47579eb6c3d0570a
Parents: 32e5194
Author: Nico Kruber <ni...@data-artisans.com>
Authored: Mon Nov 6 19:53:37 2017 +0100
Committer: Aljoscha Krettek <al...@gmail.com>
Committed: Mon Nov 13 16:37:51 2017 +0100

----------------------------------------------------------------------
 flink-filesystems/flink-s3-fs-hadoop/README.md  |   27 +
 flink-filesystems/flink-s3-fs-hadoop/pom.xml    |   84 +-
 .../org/apache/hadoop/conf/Configuration.java   | 3002 ++++++++++++++++++
 .../src/main/resources/core-default-shaded.xml  | 2312 ++++++++++++++
 .../src/test/resources/core-site.xml            | 2312 ++++++++++++++
 flink-filesystems/flink-s3-fs-presto/README.md  |   28 +
 flink-filesystems/flink-s3-fs-presto/pom.xml    |   73 +-
 .../org/apache/hadoop/conf/Configuration.java   | 2951 +++++++++++++++++
 .../src/main/resources/core-default-shaded.xml  | 1978 ++++++++++++
 .../src/test/resources/core-site.xml            | 1978 ++++++++++++
 tools/maven/suppressions.xml                    |    4 +
 tools/travis_mvn_watchdog.sh                    |   53 +-
 12 files changed, 14778 insertions(+), 24 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/0e5fb0b7/flink-filesystems/flink-s3-fs-hadoop/README.md
----------------------------------------------------------------------
diff --git a/flink-filesystems/flink-s3-fs-hadoop/README.md b/flink-filesystems/flink-s3-fs-hadoop/README.md
new file mode 100644
index 0000000..3ad90e3
--- /dev/null
+++ b/flink-filesystems/flink-s3-fs-hadoop/README.md
@@ -0,0 +1,27 @@
+This project is a wrapper around Hadoop's s3a file system. By pulling a smaller dependency tree and
+shading all dependencies away, this keeps the appearance of Flink being Hadoop-free,
+from a dependency perspective.
+
+We also relocate the shaded Hadoop version to allow running in a different
+setup. For this to work, however, we needed to adapt Hadoop's `Configuration`
+class to load a (shaded) `core-default-shaded.xml` configuration with the
+relocated class names of classes loaded via reflection
+(in the future, we may need to extend this to `mapred-default.xml` and `hdfs-defaults.xml` and their respective configuration classes).
+
+# Changing the Hadoop Version
+
+If you want to change the Hadoop version this project depends on, the following
+steps are required to keep the shading correct:
+
+1. copy `org/apache/hadoop/conf/Configuration.java` from the respective Hadoop jar file to this project
+  - adapt the `Configuration` class by replacing `core-default.xml` with `core-default-shaded.xml`.
+2. copy `core-default.xml` from the respective Hadoop jar file to this project as
+  - `src/main/resources/core-default-shaded.xml` (replacing every occurence of `org.apache.hadoop` with `org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop`)
+  - `src/test/resources/core-site.xml` (as is)
+3. verify the shaded jar:
+  - does not contain any unshaded classes except for `org.apache.flink.fs.s3hadoop.S3FileSystemFactory`
+  - all other classes should be under `org.apache.flink.fs.s3hadoop.shaded`
+  - there should be a `META-INF/services/org.apache.flink.fs.s3hadoop.S3FileSystemFactory` file pointing to the `org.apache.flink.fs.s3hadoop.S3FileSystemFactory` class
+  - other service files under `META-INF/services` should have their names and contents in the relocated `org.apache.flink.fs.s3hadoop.shaded` package
+  - contains a `core-default-shaded.xml` file
+  - does not contain a `core-default.xml` or `core-site.xml` file

http://git-wip-us.apache.org/repos/asf/flink/blob/0e5fb0b7/flink-filesystems/flink-s3-fs-hadoop/pom.xml
----------------------------------------------------------------------
diff --git a/flink-filesystems/flink-s3-fs-hadoop/pom.xml b/flink-filesystems/flink-s3-fs-hadoop/pom.xml
index 6d6db4c..26d1df2 100644
--- a/flink-filesystems/flink-s3-fs-hadoop/pom.xml
+++ b/flink-filesystems/flink-s3-fs-hadoop/pom.xml
@@ -33,6 +33,7 @@ under the License.
 	<packaging>jar</packaging>
 
 	<properties>
+		<!-- Do not change this without updating the copied Configuration class! -->
 		<s3hadoop.hadoop.version>2.8.1</s3hadoop.hadoop.version>
 		<s3hadoop.aws.version>1.11.95</s3hadoop.aws.version>
 	</properties>
@@ -234,28 +235,87 @@ under the License.
 							</artifactSet>
 							<relocations>
 								<relocation>
-									<pattern>org</pattern>
-									<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org</shadedPattern>
+									<pattern>com.amazonaws</pattern>
+									<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.com.amazonaws</shadedPattern>
+								</relocation>
+								<relocation>
+									<pattern>com.fasterxml</pattern>
+									<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.com.fasterxml</shadedPattern>
+								</relocation>
+								<relocation>
+									<pattern>com.google</pattern>
+									<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.com.google</shadedPattern>
+									<excludes>
+										<!-- provided -->
+										<exclude>com.google.code.findbugs.**</exclude>
+									</excludes>
+								</relocation>
+								<relocation>
+									<pattern>com.nimbusds</pattern>
+									<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.com.nimbusds</shadedPattern>
+								</relocation>
+								<relocation>
+									<pattern>com.squareup</pattern>
+									<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.com.squareup</shadedPattern>
+								</relocation>
+								<relocation>
+									<pattern>net.jcip</pattern>
+									<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.net.jcip</shadedPattern>
+								</relocation>
+								<relocation>
+									<pattern>net.minidev</pattern>
+									<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.net.minidev</shadedPattern>
+								</relocation>
+
+								<!-- relocate everything from the flink-hadoop-fs project -->
+								<relocation>
+									<pattern>org.apache.flink.runtime.fs.hdfs</pattern>
+									<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.fs.hdfs</shadedPattern>
+								</relocation>
+								<relocation>
+									<pattern>org.apache.flink.runtime.util</pattern>
+									<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.apache.flink.runtime.util</shadedPattern>
+									<includes>
+										<include>org.apache.flink.runtime.util.**Hadoop*</include>
+									</includes>
+								</relocation>
+
+								<relocation>
+									<pattern>org.apache</pattern>
+									<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.apache</shadedPattern>
 									<excludes>
-										<exclude>org.apache.flink.core.fs.FileSystemFactory</exclude>
-										<exclude>org.apache.flink.fs.s3hadoop.**</exclude>
+										<!-- keep all other classes of flink as they are (exceptions above) -->
+										<exclude>org.apache.flink.**</exclude>
+										<exclude>org.apache.log4j.**</exclude> <!-- provided -->
 									</excludes>
 								</relocation>
 								<relocation>
-									<pattern>com</pattern>
-									<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.com</shadedPattern>
+									<pattern>org.codehaus</pattern>
+									<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.codehaus</shadedPattern>
+								</relocation>
+								<relocation>
+									<pattern>org.joda</pattern>
+									<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.joda</shadedPattern>
+								</relocation>
+								<relocation>
+									<pattern>org.mortbay</pattern>
+									<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.mortbay</shadedPattern>
+								</relocation>
+								<relocation>
+									<pattern>org.tukaani</pattern>
+									<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.tukaani</shadedPattern>
 								</relocation>
 								<relocation>
-									<pattern>net</pattern>
-									<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.net</shadedPattern>
+									<pattern>org.znerd</pattern>
+									<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.org.znerd</shadedPattern>
 								</relocation>
 								<relocation>
 									<pattern>okio</pattern>
 									<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.okio</shadedPattern>
 								</relocation>
 								<relocation>
-									<pattern>software</pattern>
-									<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.software</shadedPattern>
+									<pattern>software.amazon</pattern>
+									<shadedPattern>org.apache.flink.fs.s3hadoop.shaded.software.amazon</shadedPattern>
 								</relocation>
 							</relocations>
 							<filters>
@@ -277,6 +337,10 @@ under the License.
 										<exclude>META-INF/maven/org.apache.commons/**</exclude>
 										<exclude>META-INF/maven/org.apache.flink/flink-hadoop-fs/**</exclude>
 										<exclude>META-INF/maven/org.apache.flink/force-shading/**</exclude>
+										<!-- we use our own "shaded" core-default.xml: core-default-shaded.xml -->
+										<exclude>core-default.xml</exclude>
+										<!-- we only add a core-site.xml with unshaded classnames for the unit tests -->
+										<exclude>core-site.xml</exclude>
 									</excludes>
 								</filter>
 							</filters>