You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by tweise <gi...@git.apache.org> on 2018/04/03 05:20:14 UTC

[GitHub] flink pull request #5803: [FLINK-9124] [kinesis] Allow customization of Kine...

GitHub user tweise opened a pull request:

    https://github.com/apache/flink/pull/5803

    [FLINK-9124] [kinesis] Allow customization of KinesisProxy.getRecords read timeout and retry.

    ## What is the purpose of the change
    
    This pull request enables overrides for the AWS ClientConfiguration and getRecords retry in KinesisProxy.
    
    ## Brief change log
    - option to override retry for any SdkClientException
    - option to customize the ClientConfiguration used to construct the Kinesis client
    
    ## Verifying this change
    
    This change added tests and can be verified as follows:
    
    Added test to verify the configuration override.
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (yes / **no**)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**)
      - The serializers: (yes / **no** / don't know)
      - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
      - The S3 file system connector: (yes / **no** / don't know)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (yes / **no**)
      - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tweise/flink FLINK-9124.Kinesis.getRecords

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5803.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5803
    
----
commit 563763fa6eb6eede1654a6b157a1b006b360731d
Author: Thomas Weise <th...@...>
Date:   2018-04-03T03:49:50Z

    [FLINK-9124] Allow customization of KinesisProxy.getRecords read timeout and retry.

----


---

[GitHub] flink pull request #5803: [FLINK-9124] [kinesis] Allow customization of Kine...

Posted by tweise <gi...@git.apache.org>.
Github user tweise commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5803#discussion_r180967646
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java ---
    @@ -176,6 +179,16 @@ protected KinesisProxy(Properties configProps) {
     
     	}
     
    +	/**
    +	 * Create the Kinesis client, using the provided configuration properties and default {@link ClientConfiguration}.
    +	 * Derived classes can override this method to customize the client configuration.
    +	 * @param configProps
    +	 * @return
    +	 */
    +	protected AmazonKinesis createKinesisClient(Properties configProps) {
    --- End diff --
    
    Although it is theoretically possible to override the method and not look at `configProps`, it is  rather unlikely that this would be unintended. The user that ends up working at this level will probably be in need to control how the client config is initialized and the client
    is constructed, to make the connector work. My vote is strongly in favor of not locking down things unless they are extremely well understood and there is a specific reason.
    
    The connectors in general are fluent by nature and warrant a more flexible approach that 
    empowers users to customize what they need without wholesale forking. By now we have run into several cases where behavior of the Kinesis connector had to be amended but private constructors or methods got into the way. Who would not prefer to spend time improving the connector functionality vs. opening JIRAs and PRs for access modification changes?
    
    In our internal custom code we currently have an override that can generically set any simple property on the client config from the config properties. The approach comes with its own pros and cons and I think it should be discussed separately. If there is interest in having it in the Flink codebase as default behavior, I'm happy to take it up as a separate PR. I would still want to have the ability to override it though.


---

[GitHub] flink issue #5803: [FLINK-9124] [kinesis] Allow customization of KinesisProx...

Posted by tweise <gi...@git.apache.org>.
Github user tweise commented on the issue:

    https://github.com/apache/flink/pull/5803
  
    @aljoscha @tzulitai can you take a look?


---

[GitHub] flink pull request #5803: [FLINK-9124] [kinesis] Allow customization of Kine...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5803#discussion_r180909552
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java ---
    @@ -176,6 +179,16 @@ protected KinesisProxy(Properties configProps) {
     
     	}
     
    +	/**
    +	 * Create the Kinesis client, using the provided configuration properties and default {@link ClientConfiguration}.
    +	 * Derived classes can override this method to customize the client configuration.
    +	 * @param configProps
    +	 * @return
    +	 */
    +	protected AmazonKinesis createKinesisClient(Properties configProps) {
    --- End diff --
    
    My main concern with allowing overrides of this method, is that override implementations can potentially completely ignore the `configProps` settings and create a Kinesis client entirely irrelevant from the original configuration. IMO, this is not nice design-wise.
    
    As a different approach, would it be possible to traverse keys in the `configProps` and set the `ClientConfiguration` appropriately, such that we won't need to be aware of all updated / new keys in the AWS Kinesis SDK? Ideally, Flink should not need to maintain its own set of config keys and just rely on AWS's keys (for example, Flink actually should not need to define its own config keys for AWS credentials).


---

[GitHub] flink issue #5803: [FLINK-9124] [kinesis] Allow customization of KinesisProx...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on the issue:

    https://github.com/apache/flink/pull/5803
  
    As discussed on the mailing list (http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Kinesis-getRecords-read-timeout-and-retry-td21844.html), lets merge the PR as it is for now.
    
    Some final remarks:
    
    - I see that we already have a JIRA to improve generic configuration support: https://issues.apache.org/jira/browse/FLINK-9188.
    
    - For the fullJitterBackoff and retry handling, I wonder if a user-provided retry handler would be more appropriate (as a future improvement). For example, our Elasticsearch connector has something similar.
    
    Merging this for 1.6.0 and 1.5.0 ..


---

[GitHub] flink pull request #5803: [FLINK-9124] [kinesis] Allow customization of Kine...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/5803


---