You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by bowenli86 <gi...@git.apache.org> on 2017/08/04 04:06:09 UTC

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

GitHub user bowenli86 opened a pull request:

    https://github.com/apache/flink/pull/4473

    [FLINK-7367][kinesis connector] Parameterize more configs for FlinkKinesisProducer (RecordMaxBufferedTime, MaxConnections, RequestTimeout, etc)

    
    ## What is the purpose of the change
    
    Right now, FlinkKinesisProducer only expose two configs for the underlying KinesisProducer:
    
    - AGGREGATION_MAX_COUNT
    - COLLECTION_MAX_COUNT
    
    Well, according to AWS doc and their sample on github, developers can set more to make the max use of KinesisProducer, and make it fault-tolerant (e.g. by increasing timeout). I select a few more configs that we need when using Flink with Kinesis:
    
    - MAX_CONNECTIONS
    - RATE_LIMIT
    - RECORD_MAX_BUFFERED_TIME
    - RECORD_TIME_TO_LIVE
    - REQUEST_TIMEOUT
    
    We need to parameterize FlinkKinesisProducer to pass in the above params, in order to cater to our need
    
    ## Brief change log
    
      - *Added more config values into `ProducerConfigConstants`*
      - *Made FlinkKinesisProducer pick up more configs*
      - *Added an example in doc*
    
    
    ## Verifying this change
    
    This change is a trivial rework / code cleanup without any test coverage.
    
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (no)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
      - The serializers: (no)
      - The runtime per-record code paths (performance sensitive): (no)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (no)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/bowenli86/flink FLINK-7363

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4473.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4473
    
----
commit ac0b4d22fde763dde62b26ed9a022d537bb29e58
Author: Bowen Li <bo...@gmail.com>
Date:   2017-08-04T03:59:02Z

    FLINK-7367 Parameterize FlinkKinesisProducer on RecordMaxBufferedTime, MaxConnections, RequestTimeout, and more

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    One other thing:
    Also need to add validation for these new configurations.
    That should be placed in `KinesisConfigUtil.validateProducerConfigs`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by bowenli86 <gi...@git.apache.org>.
Github user bowenli86 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132879596
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java ---
    @@ -57,6 +55,9 @@
     	/** Properties to parametrize settings such as AWS service region, access key etc. */
     	private final Properties configProps;
     
    +	/** Configuration for KinesisProducer. */
    +	private final KinesisProducerConfiguration producerConfig;
    --- End diff --
    
    agree. I'm moving it to `open()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    @bowenli86 I made some comments on the serializability of the `KinesisProducerConfiguration`. That isn't serializable. Perhaps that is what is causing the failure for you. I'm not really sure why it isn't failing for 1.3.0, though ..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    @bowenli86 no, they're provided via a `Properties` when creating a `FlinkKinesisProducer`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132879159
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java ---
    @@ -57,6 +55,9 @@
     	/** Properties to parametrize settings such as AWS service region, access key etc. */
     	private final Properties configProps;
     
    +	/** Configuration for KinesisProducer. */
    +	private final KinesisProducerConfiguration producerConfig;
    --- End diff --
    
    `KinesisProducerConfiguration` isn't serializable. We need to make it `transient`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132636379
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java ---
    @@ -22,12 +22,14 @@
     /**
      * Optional producer specific configuration keys for {@link FlinkKinesisProducer}.
      */
    +@Deprecated
     public class ProducerConfigConstants extends AWSConfigConstants {
     
    -	/** Maximum number of items to pack into an PutRecords request. **/
    -	public static final String COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    -
    -	/** Maximum number of items to pack into an aggregated record. **/
    -	public static final String AGGREGATION_MAX_COUNT = "aws.producer.aggregationMaxCount";
    +	/** Deprecated key. **/
    +	@Deprecated
    +	public static final String DEPRECATED_COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
     
    + 	/** Deprecated key. **/
    +	@Deprecated
    +	public static final String DEPRECATED_AGGREGATION_MAX_COUNT = "aws.producer.aggregationMaxCount";
    --- End diff --
    
    This cannot be renamed, as it completely breaks user code that uses `ProducerConfigConstants. AGGREGATION_MAX_COUNT`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by bowenli86 <gi...@git.apache.org>.
Github user bowenli86 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132373651
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java ---
    @@ -24,10 +24,28 @@
      */
     public class ProducerConfigConstants extends AWSConfigConstants {
     
    +	/** Deprecated key. **/
    +	public static final String DEPRECATED_COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    +
    + 	/** Deprecated key. **/
    +	public static final String DEPRECATED_AGGREGATION_MAX_COUNT = "aws.producer.aggregationMaxCount";
    +
     	/** Maximum number of items to pack into an PutRecords request. **/
    -	public static final String COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    +	public static final String COLLECTION_MAX_COUNT = "CollectionMaxCount";
    --- End diff --
    
    Are you ok with using plain string "CollectionMaxCount" and "AllocationMaxCount" in code directly?
    
    I actually use those strings directly in code at the beginning. Then change to these static final definition because I worried it might not conform to checkstyle as "magic strings", similar to "magic number". 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132372254
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java ---
    @@ -24,10 +24,28 @@
      */
     public class ProducerConfigConstants extends AWSConfigConstants {
     
    +	/** Deprecated key. **/
    +	public static final String DEPRECATED_COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    --- End diff --
    
    Add `@deprecated` annotation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132879203
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java ---
    @@ -115,13 +116,13 @@ public String getTargetStream(OUT element) {
     	 * This is a constructor supporting {@see KinesisSerializationSchema}.
     	 *
     	 * @param schema Kinesis serialization schema for the data type
    -	 * @param configProps The properties used to configure AWS credentials and AWS region
    +	 * @param configProps The properties used to configure KinesisProducer, including AWS credentials and AWS region
     	 */
     	public FlinkKinesisProducer(KinesisSerializationSchema<OUT> schema, Properties configProps) {
    -		this.configProps = checkNotNull(configProps, "configProps can not be null");
    -
    -		// check the configuration properties for any conflicting settings
    -		KinesisConfigUtil.validateProducerConfiguration(this.configProps);
    +		checkNotNull(configProps, "configProps can not be null");
    +		this.configProps = KinesisConfigUtil.replaceDeprecatedProducerKeys(configProps);
    +		// check the configuration properties for any invalid settings
    +		this.producerConfig = KinesisConfigUtil.validateProducerConfiguration(configProps);
    --- End diff --
    
    For non-serializable fields that needs to be `transient`, we should only initialize them in `open`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    @bowenli86 on the other hand, in the constructors of `FlinkKinesisProducer`, we probably should add eager safe checks on the serializability of the provided `KinesisSerializationSchema` and have good error messages in case it isn't serializable. I'll open a separate JIRA / PR for that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    LGTM! I'll rebase this on #4537, and will merge as soon as Travis gives green.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r131578056
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java ---
    @@ -24,10 +24,30 @@
      */
     public class ProducerConfigConstants extends AWSConfigConstants {
     
    -	/** Maximum number of items to pack into an PutRecords request. **/
    +	/** Maximum number of KPL user records to store in a single Kinesis Streams record (an aggregated record). */
    +	public static final String AGGREGATION_MAX_COUNT = "aws.producer.aggregationMaxCount";
    +
    +	/** Maximum number of Kinesis Streams records to pack into an PutRecords request. */
     	public static final String COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
     
    -	/** Maximum number of items to pack into an aggregated record. **/
    -	public static final String AGGREGATION_MAX_COUNT = "aws.producer.aggregationMaxCount";
    +	/** Maximum number of connections to open to the backend. HTTP requests are
    +	 * sent in parallel over multiple connections */
    --- End diff --
    
    Style consistency: missing period at the end.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    Thanks! Overall I think the changes are good.
    I've left some minor cosmetic-related comments, and some more involved comments on how we perform the deprecation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132372291
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java ---
    @@ -24,10 +24,28 @@
      */
     public class ProducerConfigConstants extends AWSConfigConstants {
     
    +	/** Deprecated key. **/
    +	public static final String DEPRECATED_COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    +
    + 	/** Deprecated key. **/
    +	public static final String DEPRECATED_AGGREGATION_MAX_COUNT = "aws.producer.aggregationMaxCount";
    --- End diff --
    
    Add `@deprecated` annotation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by bowenli86 <gi...@git.apache.org>.
Github user bowenli86 commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    @tzulitai it's failing on 
    
    ```
    The implementation of the RichSinkFunction is not serializable. The object probably contains or references non serializable fields.
    	org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:100)
    	org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.clean(StreamExecutionEnvironment.java:1548)
    	org.apache.flink.streaming.api.datastream.DataStream.clean(DataStream.java:183)
    	org.apache.flink.streaming.api.datastream.DataStream.addSink(DataStream.java:1131)
    ```
    
    FYI, I merged this PR into my 1.3.2 source code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by bowenli86 <gi...@git.apache.org>.
Github user bowenli86 commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    @tzulitai Thank you, Gordon!


---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132375356
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java ---
    @@ -24,10 +24,28 @@
      */
     public class ProducerConfigConstants extends AWSConfigConstants {
     
    +	/** Deprecated key. **/
    +	public static final String DEPRECATED_COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    +
    + 	/** Deprecated key. **/
    +	public static final String DEPRECATED_AGGREGATION_MAX_COUNT = "aws.producer.aggregationMaxCount";
    +
     	/** Maximum number of items to pack into an PutRecords request. **/
    -	public static final String COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    +	public static final String COLLECTION_MAX_COUNT = "CollectionMaxCount";
    --- End diff --
    
    Sounds like a good idea.
    It seems like that `RATE_LIMIT`, `CollectionMaxCount`, `AllocationMaxCount` can be moved to `KinesisConfigUtils` as private static final Strings, though, as that should be the only place where they are used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by bowenli86 <gi...@git.apache.org>.
Github user bowenli86 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132376112
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java ---
    @@ -165,17 +167,10 @@ public void setCustomPartitioner(KinesisPartitioner<OUT> partitioner) {
     	public void open(Configuration parameters) throws Exception {
     		super.open(parameters);
     
    -		KinesisProducerConfiguration producerConfig = new KinesisProducerConfiguration();
    -
    -		producerConfig.setRegion(configProps.getProperty(ProducerConfigConstants.AWS_REGION));
     		producerConfig.setCredentialsProvider(AWSUtil.getCredentialsProvider(configProps));
    -		if (configProps.containsKey(ProducerConfigConstants.COLLECTION_MAX_COUNT)) {
    -			producerConfig.setCollectionMaxCount(PropertiesUtil.getLong(configProps,
    -					ProducerConfigConstants.COLLECTION_MAX_COUNT, producerConfig.getCollectionMaxCount(), LOG));
    -		}
    -		if (configProps.containsKey(ProducerConfigConstants.AGGREGATION_MAX_COUNT)) {
    -			producerConfig.setAggregationMaxCount(PropertiesUtil.getLong(configProps,
    -					ProducerConfigConstants.AGGREGATION_MAX_COUNT, producerConfig.getAggregationMaxCount(), LOG));
    +		// Override KPL default value if it's not specified by user
    +		if (!configProps.containsKey(ProducerConfigConstants.RATE_LIMIT)) {
    +			producerConfig.setRateLimit(ProducerConfigConstants.DEFAULT_RATE_LIMIT);
    --- End diff --
    
    Yeah, either way. This is not really a validation, but a replacement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by bowenli86 <gi...@git.apache.org>.
Github user bowenli86 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132376988
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java ---
    @@ -24,10 +24,28 @@
      */
     public class ProducerConfigConstants extends AWSConfigConstants {
     
    +	/** Deprecated key. **/
    +	public static final String DEPRECATED_COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    +
    + 	/** Deprecated key. **/
    +	public static final String DEPRECATED_AGGREGATION_MAX_COUNT = "aws.producer.aggregationMaxCount";
    +
     	/** Maximum number of items to pack into an PutRecords request. **/
    -	public static final String COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    +	public static final String COLLECTION_MAX_COUNT = "CollectionMaxCount";
    --- End diff --
    
    `CollectionMaxCount` and `AllocationMaxCount` will be `protected` since there's a unit test using them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    Also cc'ing @aljoscha to this PR, since we also had a bit of offline discussion on this configuration gap issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by bowenli86 <gi...@git.apache.org>.
Github user bowenli86 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132375079
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java ---
    @@ -24,10 +24,28 @@
      */
     public class ProducerConfigConstants extends AWSConfigConstants {
     
    +	/** Deprecated key. **/
    +	public static final String DEPRECATED_COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    +
    + 	/** Deprecated key. **/
    +	public static final String DEPRECATED_AGGREGATION_MAX_COUNT = "aws.producer.aggregationMaxCount";
    +
     	/** Maximum number of items to pack into an PutRecords request. **/
    -	public static final String COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    +	public static final String COLLECTION_MAX_COUNT = "CollectionMaxCount";
    --- End diff --
    
    `ProducerConfigConstants` and `FlinkKinesisProducer` don't live in the same package. Let me use plain strings and see how it looks like.
    
    Besides, why don't we push a step forward now? How about I move RATE_LIMIT to `FlinkKinesisProducer`, and deprecate `ProducerConfigConstants`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by bowenli86 <gi...@git.apache.org>.
Github user bowenli86 commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    I'm in favor of the 3rd option. In that way, we don't need to 1) duplicate all KPL keys and documentation to flink-kinesis-connector and 2) always keep an eye on KPL configs when upgrading its version.
    
    Since you point out that migrate existing keys is not hard and harmful, I'm leaning even more towards it :) Do we all agree on the approach? @tzulitai @aljoscha 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by bowenli86 <gi...@git.apache.org>.
Github user bowenli86 commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    @tzulitai any more feedbacks? We have a ticket on my company for this task, and I'd like to mark it as finished if possible :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    @bowenli86 what is it failing on? AFAIK, the only serialization changes in 1.3.2 compared to 1.3.0 are serialization of some specific `TypeSerializer`s.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by bowenli86 <gi...@git.apache.org>.
Github user bowenli86 commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    @tzulitai done!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132879261
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java ---
    @@ -57,6 +55,9 @@
     	/** Properties to parametrize settings such as AWS service region, access key etc. */
     	private final Properties configProps;
     
    +	/** Configuration for KinesisProducer. */
    +	private final KinesisProducerConfiguration producerConfig;
    --- End diff --
    
    On second thought: do we actually really need to have a field for this?
    Or can we just instantiate in in the method (it doesn't seem to be used across different methods)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r131577895
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java ---
    @@ -169,14 +169,27 @@ public void open(Configuration parameters) throws Exception {
     
     		producerConfig.setRegion(configProps.getProperty(ProducerConfigConstants.AWS_REGION));
     		producerConfig.setCredentialsProvider(AWSUtil.getCredentialsProvider(configProps));
    -		if (configProps.containsKey(ProducerConfigConstants.COLLECTION_MAX_COUNT)) {
    -			producerConfig.setCollectionMaxCount(PropertiesUtil.getLong(configProps,
    -					ProducerConfigConstants.COLLECTION_MAX_COUNT, producerConfig.getCollectionMaxCount(), LOG));
    -		}
    -		if (configProps.containsKey(ProducerConfigConstants.AGGREGATION_MAX_COUNT)) {
    -			producerConfig.setAggregationMaxCount(PropertiesUtil.getLong(configProps,
    -					ProducerConfigConstants.AGGREGATION_MAX_COUNT, producerConfig.getAggregationMaxCount(), LOG));
    -		}
    +
    +		producerConfig.setAggregationMaxCount(PropertiesUtil.getLong(configProps,
    +				ProducerConfigConstants.AGGREGATION_MAX_COUNT, producerConfig.getAggregationMaxCount(), LOG));
    +
    +		producerConfig.setCollectionMaxCount(PropertiesUtil.getLong(configProps,
    +				ProducerConfigConstants.COLLECTION_MAX_COUNT, producerConfig.getCollectionMaxCount(), LOG));
    +
    +		producerConfig.setMaxConnections(PropertiesUtil.getLong(configProps,
    +				ProducerConfigConstants.MAX_CONNECTIONS, producerConfig.getMaxConnections(), LOG));
    +
    +		producerConfig.setRateLimit(PropertiesUtil.getLong(configProps,
    +			ProducerConfigConstants.RATE_LIMIT, producerConfig.getRateLimit(), LOG));
    --- End diff --
    
    Starting from this line, the indentation is not consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by bowenli86 <gi...@git.apache.org>.
Github user bowenli86 commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    hmmm... I was deploying our Flink job with this change. The Flink job failed to start, and log reports:
    
    ```
    The implementation of the RichSinkFunction is not serializable. The object probably contains or references non serializable fields.
    	org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:100)
    	org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.clean(StreamExecutionEnvironment.java:1548)
    	org.apache.flink.streaming.api.datastream.DataStream.clean(DataStream.java:183)
    	org.apache.flink.streaming.api.datastream.DataStream.addSink(DataStream.java:1131)
    ```
    
    of which the implementation of the RichSinkFunction is FlinkKinesisProducer. The only field I add to FlinkKinesisProducer is KinesisProducerConfiguration. So I made KinesisProducerConfiguration `transient`, and ran the Flink job, the job still fails. 
    
    Thus I doubted if FlinkKinesisProducer is already not serializable currently. To verify that, I created a test for the current FlinkKinesisProducer in master which doesn't have my PR change. Unit test is:
    
    ```
    @Test
    	public void testProducerSerializable() {
    		Properties testConfig = new Properties();
    		testConfig.setProperty(AWSConfigConstants.AWS_ACCESS_KEY_ID, "accessKey");
    		testConfig.setProperty(AWSConfigConstants.AWS_SECRET_ACCESS_KEY, "secretKey");
    		testConfig.setProperty(AWSConfigConstants.AWS_REGION, "us-east-1");
    		testConfig.setProperty(AWSConfigConstants.AWS_CREDENTIALS_PROVIDER, "BASIC");
    		KinesisConfigUtil.validateAwsConfiguration(testConfig);
    
    		FlinkKinesisProducer producer = new FlinkKinesisProducer(new KinesisSerializationSchema() {
    			@Override
    			public ByteBuffer serialize(Object element) {
    				return null;
    			}
    
    			@Override
    			public String getTargetStream(Object element) {
    				return null;
    			}
    		}, testConfig);
    
    		ClosureCleaner.ensureSerializable(producer);
    	}
    ```
    
    And it fails. Thus, I think something in FlinkKinesisProducer might not serializable, and already breaks FlinkKinesisProducer. 
    
    @tzulitai @aljoscha  Any insights on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r133119309
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java ---
    @@ -20,14 +20,24 @@
     import org.apache.flink.streaming.connectors.kinesis.FlinkKinesisProducer;
     
     /**
    + * @deprecated
    --- End diff --
    
    @bowenli86 one thing to mention:
    We should always try to have good Javadoc for why something is deprecated. Sorry but I overlooked this on my reviewing. I'll address this myself this time while merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by bowenli86 <gi...@git.apache.org>.
Github user bowenli86 commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    @tzulitai or other Flink committers, can you please merge this so I can submit more PRs depend on this? Thanks!


---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by bowenli86 <gi...@git.apache.org>.
Github user bowenli86 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r133121728
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java ---
    @@ -20,14 +20,24 @@
     import org.apache.flink.streaming.connectors.kinesis.FlinkKinesisProducer;
     
     /**
    + * @deprecated
    --- End diff --
    
    @tzulitai  Thanks for the reminder!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/4473


---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132373930
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java ---
    @@ -24,10 +24,28 @@
      */
     public class ProducerConfigConstants extends AWSConfigConstants {
     
    +	/** Deprecated key. **/
    +	public static final String DEPRECATED_COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    +
    + 	/** Deprecated key. **/
    +	public static final String DEPRECATED_AGGREGATION_MAX_COUNT = "aws.producer.aggregationMaxCount";
    +
     	/** Maximum number of items to pack into an PutRecords request. **/
    -	public static final String COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    +	public static final String COLLECTION_MAX_COUNT = "CollectionMaxCount";
    --- End diff --
    
    Hmmm, maybe have package private static Strings for "CollectionMaxCount" and "AllocationMaxCount", with comments on what they are used for?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    @bowenli86 the third option actually strikes me as an overall good solution, I didn't know that was possible.
    
    I think it would make sense to deprecate the previous configuration keys in favor of the `fromProperties` solution. Ideally, it would be better if we can simply let the user freely configure the internal Kinesis client depending on whatever keys are available from AWS.
    
    It also shouldn't be too much of a hassle to adapt the deprecated keys such that if the user still used them, we simply replace them with the new renamed keys, and in open() always use `KinesisProducerConfiguration#fromProperties(Properties props)`.
    
    What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by bowenli86 <gi...@git.apache.org>.
Github user bowenli86 commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    can anyone from data artisan take a look at this PR please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by bowenli86 <gi...@git.apache.org>.
Github user bowenli86 commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    @tzulitai Thanks Gordan for the quick response! You are right, I set the schema in unit test to be static and it passes.
    
    For my Flink job. When I build it against Flink 1.3.0, it can run well on EMR. When I build it against Flink 1.3.2, it fails to run. Has anything related to serialization changed since Flink 1.3.0?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by aljoscha <gi...@git.apache.org>.
Github user aljoscha commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    I think option 3) is the way to go. For backwards compatibility we have to manually convert those keys that we already support (via `ProducerConfigConstants`) but luckily they're not that many.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132372556
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java ---
    @@ -24,10 +24,28 @@
      */
     public class ProducerConfigConstants extends AWSConfigConstants {
     
    +	/** Deprecated key. **/
    +	public static final String DEPRECATED_COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    +
    + 	/** Deprecated key. **/
    +	public static final String DEPRECATED_AGGREGATION_MAX_COUNT = "aws.producer.aggregationMaxCount";
    +
     	/** Maximum number of items to pack into an PutRecords request. **/
    -	public static final String COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    +	public static final String COLLECTION_MAX_COUNT = "CollectionMaxCount";
     
     	/** Maximum number of items to pack into an aggregated record. **/
    -	public static final String AGGREGATION_MAX_COUNT = "aws.producer.aggregationMaxCount";
    +	public static final String AGGREGATION_MAX_COUNT = "AggregationMaxCount";
    +
    +	/** Limits the maximum allowed put rate for a shard, as a percentage of the backend limits.
    +	 * The default value is set as 100% in Flink. KPL's default value is 150% but it throws RateLimitExceededException
    +	 * too frequently and breaks Flink sink.
    +	 **/
    +	public static final String RATE_LIMIT = "RateLimit";
    +
    +
    --- End diff --
    
    nit: unneccesary 2 empty lines


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    Hi @bowenli86, sorry about this. I had the commit ready to merge but was waiting for another test PR to be merged first. Merging this now!


---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by bowenli86 <gi...@git.apache.org>.
Github user bowenli86 commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    @tzulitai Let me know what you think about this PR now :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132374210
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java ---
    @@ -24,10 +24,28 @@
      */
     public class ProducerConfigConstants extends AWSConfigConstants {
     
    +	/** Deprecated key. **/
    +	public static final String DEPRECATED_COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    +
    + 	/** Deprecated key. **/
    +	public static final String DEPRECATED_AGGREGATION_MAX_COUNT = "aws.producer.aggregationMaxCount";
    +
     	/** Maximum number of items to pack into an PutRecords request. **/
    -	public static final String COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    +	public static final String COLLECTION_MAX_COUNT = "CollectionMaxCount";
    --- End diff --
    
    Ideally, we should deprecate `ProducerConfigConstants` and remove the class in the long run, since we rely solely on `Properties` parsing now and have no "Flink-specific" key.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132636582
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/KinesisConfigUtil.java ---
    @@ -38,6 +39,23 @@
      * Utilities for Flink Kinesis connector configuration.
      */
     public class KinesisConfigUtil {
    +	/** Maximum number of items to pack into an PutRecords request. **/
    +	@Deprecated
    +	protected static final String COLLECTION_MAX_COUNT = "CollectionMaxCount";
    --- End diff --
    
    This doesn't require deprecation, as it isn't exposed to the user.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132372454
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java ---
    @@ -24,10 +24,28 @@
      */
     public class ProducerConfigConstants extends AWSConfigConstants {
     
    +	/** Deprecated key. **/
    +	public static final String DEPRECATED_COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    +
    + 	/** Deprecated key. **/
    +	public static final String DEPRECATED_AGGREGATION_MAX_COUNT = "aws.producer.aggregationMaxCount";
    +
     	/** Maximum number of items to pack into an PutRecords request. **/
    -	public static final String COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    +	public static final String COLLECTION_MAX_COUNT = "CollectionMaxCount";
     
     	/** Maximum number of items to pack into an aggregated record. **/
    -	public static final String AGGREGATION_MAX_COUNT = "aws.producer.aggregationMaxCount";
    +	public static final String AGGREGATION_MAX_COUNT = "AggregationMaxCount";
    +
    +	/** Limits the maximum allowed put rate for a shard, as a percentage of the backend limits.
    +	 * The default value is set as 100% in Flink. KPL's default value is 150% but it throws RateLimitExceededException
    +	 * too frequently and breaks Flink sink.
    --- End diff --
    
    We might need to document this, if we're not following the KPL defaults.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by bowenli86 <gi...@git.apache.org>.
Github user bowenli86 commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    Trying to understand the keys. Are keys like "aws.producer.collectionMaxCount" specified in flink-conf.yaml?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132373167
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java ---
    @@ -24,10 +24,28 @@
      */
     public class ProducerConfigConstants extends AWSConfigConstants {
     
    +	/** Deprecated key. **/
    +	public static final String DEPRECATED_COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    +
    + 	/** Deprecated key. **/
    +	public static final String DEPRECATED_AGGREGATION_MAX_COUNT = "aws.producer.aggregationMaxCount";
    +
     	/** Maximum number of items to pack into an PutRecords request. **/
    -	public static final String COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    +	public static final String COLLECTION_MAX_COUNT = "CollectionMaxCount";
    --- End diff --
    
    I think what we should do is just deprecate `COLLECTION_MAX_COUNT` and `AGGREGATION_MAX_COUNT`, and in the Javadoc direct the user to directly refer to the KPL config keys.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by bowenli86 <gi...@git.apache.org>.
Github user bowenli86 commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    @tzulitai Hi Gordon, shall we add more explanation of these keys in Flink docs, or do you think the java doc is sufficient?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by bowenli86 <gi...@git.apache.org>.
Github user bowenli86 commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    @tzulitai @aljoscha  Let me know if this PR looks good. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by bowenli86 <gi...@git.apache.org>.
Github user bowenli86 commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    Hi @tzulitai ,
        Let's reach a consensus before I do any more works.
    
         1) I didn't add all KPL's configs in this PR. I only added some configs that I think might be useful to Flink users. The full list of KPL configs is [here](https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer-sample/default_config.properties). I don't think exposing all KPL configs is a good idea. What do you think?
          2) I absolutely agree that there'd better be a consistent way to keep configs up-tp-date. But there probably isn't as long as Flink wraps around KPL, because right now we have to always set KPL configs in `FlinkKinesisProducer#open()` with `setXxxx()`
          3) If Flink uses [`KinesisProducerConfiguration#fromProperties(Properties props)`](https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer/src/main/java/com/amazonaws/services/kinesis/producer/KinesisProducerConfiguration.java) to init `KinesisProducerConfiguration`, users can always pass in the configs as k-v string pairs. But, it requires changing all config values in `ProducerConfigConstants`, e.g. from `aws.producer.aggregationMaxCount` to `AggregationMaxCount`, because KPL library itself doesn't provide a util class with key names as string. How much would the impact be to rename those config keys? 
    
    Bowen


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    @bowenli86 I think the test you posted above is failing because the provided `KinesisSerializationSchema` is not serializable. It is an anonymous class, which would contain a reference to the enclosing (which I guess is the test class, hence not serializable.)
    
    To make it work you either need to make the test class serializable, or define a static `KinesisSerializationSchema` subclass instead of using an anonymous class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    +1 to option 3. Looks like we all agree on that :) @bowenli86 please feel free to continue with that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132636498
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/KinesisConfigUtil.java ---
    @@ -38,6 +39,23 @@
      * Utilities for Flink Kinesis connector configuration.
      */
     public class KinesisConfigUtil {
    +	/** Maximum number of items to pack into an PutRecords request. **/
    --- End diff --
    
    nit: Add empty line before comment block.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132373023
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java ---
    @@ -24,10 +24,28 @@
      */
     public class ProducerConfigConstants extends AWSConfigConstants {
     
    +	/** Deprecated key. **/
    +	public static final String DEPRECATED_COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    +
    + 	/** Deprecated key. **/
    +	public static final String DEPRECATED_AGGREGATION_MAX_COUNT = "aws.producer.aggregationMaxCount";
    +
     	/** Maximum number of items to pack into an PutRecords request. **/
    -	public static final String COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    +	public static final String COLLECTION_MAX_COUNT = "CollectionMaxCount";
    --- End diff --
    
    Do we really want to expose these config keys now?
    AFAIK, if the user wants to tweak these, with this change they should just simply refer to the KPL docs to see what keys are available. Explicitly exposing partially some keys in Flink but not others is a bit weird, IMO.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132372133
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java ---
    @@ -165,17 +167,10 @@ public void setCustomPartitioner(KinesisPartitioner<OUT> partitioner) {
     	public void open(Configuration parameters) throws Exception {
     		super.open(parameters);
     
    -		KinesisProducerConfiguration producerConfig = new KinesisProducerConfiguration();
    -
    -		producerConfig.setRegion(configProps.getProperty(ProducerConfigConstants.AWS_REGION));
     		producerConfig.setCredentialsProvider(AWSUtil.getCredentialsProvider(configProps));
    -		if (configProps.containsKey(ProducerConfigConstants.COLLECTION_MAX_COUNT)) {
    -			producerConfig.setCollectionMaxCount(PropertiesUtil.getLong(configProps,
    -					ProducerConfigConstants.COLLECTION_MAX_COUNT, producerConfig.getCollectionMaxCount(), LOG));
    -		}
    -		if (configProps.containsKey(ProducerConfigConstants.AGGREGATION_MAX_COUNT)) {
    -			producerConfig.setAggregationMaxCount(PropertiesUtil.getLong(configProps,
    -					ProducerConfigConstants.AGGREGATION_MAX_COUNT, producerConfig.getAggregationMaxCount(), LOG));
    +		// Override KPL default value if it's not specified by user
    +		if (!configProps.containsKey(ProducerConfigConstants.RATE_LIMIT)) {
    +			producerConfig.setRateLimit(ProducerConfigConstants.DEFAULT_RATE_LIMIT);
    --- End diff --
    
    Why do we need to explicitly override the default `RATE_LIMIT`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132636335
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/config/ProducerConfigConstants.java ---
    @@ -22,12 +22,14 @@
     /**
      * Optional producer specific configuration keys for {@link FlinkKinesisProducer}.
      */
    +@Deprecated
     public class ProducerConfigConstants extends AWSConfigConstants {
     
    -	/** Maximum number of items to pack into an PutRecords request. **/
    -	public static final String COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    -
    -	/** Maximum number of items to pack into an aggregated record. **/
    -	public static final String AGGREGATION_MAX_COUNT = "aws.producer.aggregationMaxCount";
    +	/** Deprecated key. **/
    +	@Deprecated
    +	public static final String DEPRECATED_COLLECTION_MAX_COUNT = "aws.producer.collectionMaxCount";
    --- End diff --
    
    This cannot be renamed, as it completely breaks user code that uses `ProducerConfigConstants. COLLECTION_MAX_COUNT`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132636613
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/KinesisConfigUtil.java ---
    @@ -38,6 +39,23 @@
      * Utilities for Flink Kinesis connector configuration.
      */
     public class KinesisConfigUtil {
    +	/** Maximum number of items to pack into an PutRecords request. **/
    +	@Deprecated
    +	protected static final String COLLECTION_MAX_COUNT = "CollectionMaxCount";
    +
    +	/** Maximum number of items to pack into an aggregated record. **/
    +	@Deprecated
    +	protected static final String AGGREGATION_MAX_COUNT = "AggregationMaxCount";
    --- End diff --
    
    Same here; deprecation not required.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #4473: [FLINK-7367][kinesis connector] Parameterize more configs...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on the issue:

    https://github.com/apache/flink/pull/4473
  
    No problem! And really sorry for the wait.
    I'm waiting for a Travis run before merging: https://travis-ci.org/tzulitai/flink/builds/272758087?utm_source=github_status&utm_medium=notification


---

[GitHub] flink pull request #4473: [FLINK-7367][kinesis connector] Parameterize more ...

Posted by tzulitai <gi...@git.apache.org>.
Github user tzulitai commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4473#discussion_r132373446
  
    --- Diff: flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java ---
    @@ -165,17 +167,10 @@ public void setCustomPartitioner(KinesisPartitioner<OUT> partitioner) {
     	public void open(Configuration parameters) throws Exception {
     		super.open(parameters);
     
    -		KinesisProducerConfiguration producerConfig = new KinesisProducerConfiguration();
    -
    -		producerConfig.setRegion(configProps.getProperty(ProducerConfigConstants.AWS_REGION));
     		producerConfig.setCredentialsProvider(AWSUtil.getCredentialsProvider(configProps));
    -		if (configProps.containsKey(ProducerConfigConstants.COLLECTION_MAX_COUNT)) {
    -			producerConfig.setCollectionMaxCount(PropertiesUtil.getLong(configProps,
    -					ProducerConfigConstants.COLLECTION_MAX_COUNT, producerConfig.getCollectionMaxCount(), LOG));
    -		}
    -		if (configProps.containsKey(ProducerConfigConstants.AGGREGATION_MAX_COUNT)) {
    -			producerConfig.setAggregationMaxCount(PropertiesUtil.getLong(configProps,
    -					ProducerConfigConstants.AGGREGATION_MAX_COUNT, producerConfig.getAggregationMaxCount(), LOG));
    +		// Override KPL default value if it's not specified by user
    +		if (!configProps.containsKey(ProducerConfigConstants.RATE_LIMIT)) {
    +			producerConfig.setRateLimit(ProducerConfigConstants.DEFAULT_RATE_LIMIT);
    --- End diff --
    
    Ah I see why now, just read your Javadocs :)
    Should we move this override to `KinesisConfigUtil.validateProducerConfiguration`, where the producer config is built?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---