You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by NicoK <gi...@git.apache.org> on 2018/01/05 09:32:54 UTC

[GitHub] flink pull request #5243: [FLINK-8362][elasticsearch] shade all dependencies

GitHub user NicoK opened a pull request:

    https://github.com/apache/flink/pull/5243

    [FLINK-8362][elasticsearch] shade all dependencies

    ## What is the purpose of the change
    
    The Elasticsearch connectors have some dependencies that need to be available and should not conflict with other user or system code. Similarly to the cassandra connector and the S3 file systems, it should thus shade all its dependencies and become self-contained.
    
    ## Brief change log
    
    - shade dependencies of `flink-connector-elasticsearch`, `flink-connector-elasticsearch2`, and `flink-connector-elasticsearch5` and relocate to our namespace
    - add a test into `travis_mvn_watchdog.sh` to verify classes are shaded (one gap though: since `flink-connector-elasticsearch` and `flink-connector-elasticsearch-base` use the same package, we cannot really test that the latter is shaded correctly there)
    - update documentation
    
    ## Verifying this change
    
    This can be verified as follows:
    
    In the shaded jar files,
    - check for non-shaded `service` files (also their contents)
    - check for non-shaded `.class` files (this test is automated in `travis_mvn_watchdog.sh`, but `flink-connector-elasticsearch` should be verified manually as well - see the note above)
    - check for unnecessary files in `META-INF/maven` (only `org.apache.flink/flink-connector-elasticsearch<variant>_<scala_version>/` should remain)
    - manually test the connectors and verify they are working without adding other dependencies to a user jar (TODO! @tzulitai since you have been working with the connector in the past, can you chime in here?)
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): **no**
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **no**
      - The serializers: **no**
      - The runtime per-record code paths (performance sensitive): **no**
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: **no**
      - The S3 file system connector: **no**
    
    ## Documentation
    
      - Does this pull request introduce a new feature? **no**
      - If yes, how is the feature documented? **docs**

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/NicoK/flink flink-8362

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5243.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5243
    
----
commit e0c219dadcda5a7771b20c9873006ef8ebc699b0
Author: Nico Kruber <ni...@...>
Date:   2018-01-04T19:45:04Z

    [build][hotfix] fix s3-fs shading checks not exiting for a non-exiting file
    
    + improve the output a bit

commit b1dcb6d06fed7096c0e2e8a88882037559b29923
Author: Nico Kruber <ni...@...>
Date:   2018-01-04T20:09:09Z

    [FLINK-8362][elasticsearch] shade all dependencies

----


---

[GitHub] flink pull request #5243: [FLINK-8362][elasticsearch] shade all dependencies

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5243#discussion_r160617501
  
    --- Diff: docs/dev/connectors/elasticsearch.md ---
    @@ -440,36 +440,7 @@ For the execution of your Flink program, it is recommended to build a
     so-called uber-jar (executable jar) containing all your dependencies
     (see [here]({{site.baseurl}}/dev/linking.html) for further information).
     
    -However, when an uber-jar containing an Elasticsearch sink is executed,
    -an `IllegalArgumentException` may occur, which is caused by conflicting
    -files of Elasticsearch and it's dependencies in `META-INF/services`:
    -
    -```
    -IllegalArgumentException[An SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'Lucene50' does not exist.  You need to add the corresponding JAR file supporting this SPI to your classpath.  The current classpath supports the following names: [es090, completion090, XBloomFilter]]
    -```
    -
    -If the uber-jar is built using Maven, this issue can be avoided by
    -adding the following to the Maven POM file in the plugins section:
    -
    -~~~xml
    -<plugin>
    -    <groupId>org.apache.maven.plugins</groupId>
    -    <artifactId>maven-shade-plugin</artifactId>
    -    <version>2.4.3</version>
    -    <executions>
    -        <execution>
    -            <phase>package</phase>
    -            <goals>
    -                <goal>shade</goal>
    -            </goals>
    -            <configuration>
    -                <transformers>
    -                    <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
    -                </transformers>
    -            </configuration>
    -        </execution>
    -    </executions>
    -</plugin>
    -~~~
    +Alternatively, you can put the connector's jar file into Flink's `lib/` folder to make it available
    +system-wide, i.e. for all job being run.
    --- End diff --
    
    Not sure whether we should really recommend / mention this, as creating an uber jar is the usual recommended approach.


---

[GitHub] flink issue #5243: [FLINK-8362][elasticsearch] shade all dependencies

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on the issue:

    https://github.com/apache/flink/pull/5243
  
    I can't recall any license issues with shading the ES connectors ... will keep an extra eye and double check on that before merging this.
    
    @NicoK thanks a lot for your work on this. I'll first proceed to manually test this, and after that get back to the licensing issues that Stephan mentioned.


---

[GitHub] flink pull request #5243: [FLINK-8362][elasticsearch] shade all dependencies

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5243#discussion_r160618041
  
    --- Diff: flink-connectors/flink-connector-elasticsearch/pom.xml ---
    @@ -93,6 +93,106 @@ under the License.
     					<rerunFailingTestsCount>3</rerunFailingTestsCount>
     				</configuration>
     			</plugin>
    +			<plugin>
    +				<groupId>org.apache.maven.plugins</groupId>
    +				<artifactId>maven-shade-plugin</artifactId>
    +				<executions>
    +					<execution>
    +						<id>shade-flink</id>
    +						<phase>package</phase>
    +						<goals>
    +							<goal>shade</goal>
    +						</goals>
    +						<configuration>
    +							<shadeTestJar>false</shadeTestJar>
    +							<artifactSet>
    +								<includes>
    +									<include>*:*</include>
    +								</includes>
    +							</artifactSet>
    +							<relocations>
    +								<!-- elasticsearch 1.x already shades this but forgets the service file -->
    +								<relocation>
    +									<pattern>com.fasterxml.jackson</pattern>
    +									<shadedPattern>org.apache.flink.streaming.connectors.elasticsearch.shaded.org.elasticsearch.common.jackson</shadedPattern>
    +								</relocation>
    +								<relocation>
    +									<pattern>com.spatial4j</pattern>
    +									<shadedPattern>org.apache.flink.streaming.connectors.elasticsearch.shaded.com.spatial4j</shadedPattern>
    +								</relocation>
    +
    +								<!-- relocate everything from the flink-connector-elasticsearch base project -->
    --- End diff --
    
    Not sure if this is necessary. @NicoK can you comment on this?


---

[GitHub] flink pull request #5243: [FLINK-8362][elasticsearch] shade all dependencies

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/5243


---

[GitHub] flink issue #5243: [FLINK-8362][elasticsearch] shade all dependencies

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/5243
  
    I vaguely recall that there was a reason why we did not shade the Elasticsearch connectors initially. Maybe a license issue. @zentol and @tzulitai can you chime in here?
    
    If we want to go ahead with this, few comments:
      - Not sure we should relocate the Flink elasticsearch code. Why is that needed?
      - Would be good to add for each dependency that is shaded should have a comment about the license. Some licenses are not compatible with shading.
      - Some licenses may need an additional license file or a mention in an extra bundled NOTICE file, see the `flink-s3-fs-presto`
      - In general, we cannot suppress all forwarded files from `META-INF` for dependencies, especiallylicense related files.


---