You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by StephanEwen <gi...@git.apache.org> on 2018/01/18 17:41:58 UTC

[GitHub] flink pull request #5313: [FLINK-8455] [core] Make 'org.apache.hadoop.' a 'p...

GitHub user StephanEwen opened a pull request:

    https://github.com/apache/flink/pull/5313

    [FLINK-8455] [core] Make 'org.apache.hadoop.' a 'parent-first' classloading pattern

    ## What is the purpose of the change
    
    This change avoids duplication of Hadoop classes between the Flink runtime and the user code.
    Hadoop (and transitively its dependencies) should be part of the application class loader.
    The user code classloader is allowed to duplicate transitive dependencies, but not Hadoop's
    classes directly.
    
    This change addresses an issue that various users have reported (mainly using the BucketingSink) where they get ClassCastExceptions related to Hadoop classes.
    
    In all cases, users had Hadoop dependencies bundled into their application jar files. To make the experience better, I suggest to let Hadoop always load its classes parent-first.
    
    ## Brief change log
    
      - Add `org.apache.hadoop.` to the parent-first patterns.
      - Add some tests for the parent-first patterns.
    
    ## Verifying this change
    
    This change added self-contained tests.
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (yes / **no**)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
      - The serializers: (yes / **no** / don't know)
      - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
      - The S3 file system connector: (yes / **no** / don't know)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (yes / **no**)
      - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/StephanEwen/incubator-flink hadoop_parent_first

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5313.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5313
    
----
commit b6bf9c9e32b0aef079662ff7040969614afa5bdc
Author: Stephan Ewen <se...@...>
Date:   2018-01-18T16:57:10Z

    [FLINK-8455] [core] Make 'org.apache.hadoop.' a 'parent-first' classloading pattern.
    
    This change avoid duplication of Hadoop classes between the Flink runtime and the user code.
    Hadoop (and transitively its dependencies) should be part of the application class loader.
    The user code classloader is allowed to duplicate transitive dependencies, but not Hadoop's
    classes directly.
    
    This also adds tests to validate parent-first classloading patterns.

----


---

[GitHub] flink pull request #5313: [FLINK-8455] [core] Make 'org.apache.hadoop.' a 'p...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen closed the pull request at:

    https://github.com/apache/flink/pull/5313


---

[GitHub] flink pull request #5313: [FLINK-8455] [core] Make 'org.apache.hadoop.' a 'p...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5313#discussion_r162560352
  
    --- Diff: flink-core/src/test/java/org/apache/flink/configuration/ParentFirstPatternsTest.java ---
    @@ -0,0 +1,76 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.configuration;
    +
    +import org.junit.Test;
    +
    +import java.util.Arrays;
    +import java.util.HashSet;
    +
    +import static org.junit.Assert.assertTrue;
    +
    +/**
    + * Test that checks that all packages that need to be loaded 'parent-first' are also
    + * in the parent-first patterns.
    + */
    +public class ParentFirstPatternsTest {
    --- End diff --
    
    will do


---

[GitHub] flink pull request #5313: [FLINK-8455] [core] Make 'org.apache.hadoop.' a 'p...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5313#discussion_r162559654
  
    --- Diff: flink-core/src/test/java/org/apache/flink/configuration/ParentFirstPatternsTest.java ---
    @@ -0,0 +1,76 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.configuration;
    +
    +import org.junit.Test;
    +
    +import java.util.Arrays;
    +import java.util.HashSet;
    +
    +import static org.junit.Assert.assertTrue;
    +
    +/**
    + * Test that checks that all packages that need to be loaded 'parent-first' are also
    + * in the parent-first patterns.
    + */
    +public class ParentFirstPatternsTest {
    --- End diff --
    
    `extends TestLogger` would be a nice addition.


---

[GitHub] flink issue #5313: [FLINK-8455] [core] Make 'org.apache.hadoop.' a 'parent-f...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/5313
  
    Thanks, adding the `TestLogger` and merging this...


---