You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/09/13 12:37:20 UTC

[GitHub] [beam] oakad opened a new issue, #23206: [Bug]: KinesisIO incorrectly selects Aws Credentials provider and crashes with NullPointerException on resource not found errors

oakad opened a new issue, #23206:
URL: https://github.com/apache/beam/issues/23206

   ### What happened?
   
   Beam v2.41.0 under Flink 1.13.
   Possibly related to #22707.
   
   `org.apache.beam.sdk.io.aws2.kinesis.KinesisIO` consumer has several issues with its current implementation.
   
   1. First, it's not handling the pipeline level `AwsOptions` correctly. Let's say a pipeline is configured to use a profile to authenticate:
   ```
   options.setAwsCredentialsProvider(
   	ProfileCredentialsProvider.create("some_profile")
   );
   Pipeline.create(options).apply(
   	"KinesisRead",
   	KinesisIO.read().withStreamName("stream_name")
   );
   ```
   When in "running" stage, the created pipeline step will use the "default' credentials from the local credentials file or whatever other means to authenticate, default AWS client factory can come with. It will not use `some_profile` set in the `options`, even though debug printouts can be used to confirm that `ProfileCredentialsProvider` in the `options` bean is correctly initialized.
   
   To make the situation worse, the `options` will be correctly used during initialization, so the issue will be postponed until after the job start and made even more cryptic.
   
   2. The not unexpected outcome from the above issue is `ResourceNotFoundException` from the various AWS SDK calls (if we are hitting the wrong account, the expected stream will not be there). However, current implementation of Kinesis consumer swallows those exceptions and proceeds as if no error had happened, eventually failing with cryptic "NullPointerException" errors (because consumer iterators can not be read/updated properly), like shown below:
   
   >> Caused by: java.lang.NullPointerException
           at org.apache.beam.sdk.io.aws2.kinesis.ShardReadersPool.getMinTimestamp(ShardReadersPool.java:257)
           at org.apache.beam.sdk.io.aws2.kinesis.ShardReadersPool.getWatermark(ShardReadersPool.java:249)
           at org.apache.beam.sdk.io.aws2.kinesis.KinesisReader.getWatermark(KinesisReader.java:133)
           at org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.onProcessingTime(UnboundedSourceWrapper.java:460)
           at org.apache.flink.streaming.runtime.tasks.StreamTask.invokeProcessingTimeCallback(StreamTask.java:1425)
   
   ### Issue Priority
   
   Priority: 2
   
   ### Issue Component
   
   Component: io-java-kinesis


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche closed issue #23206: [New feature]: Support custom profile name in AWS ProfileCredentialsProvider

Posted by GitBox <gi...@apache.org>.
mosche closed issue #23206: [New feature]: Support custom profile name in AWS ProfileCredentialsProvider
URL: https://github.com/apache/beam/issues/23206


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] oakad commented on issue #23206: [New feature]: Support custom profile name in AWS ProfileCredentialsProvider

Posted by GitBox <gi...@apache.org>.
oakad commented on issue #23206:
URL: https://github.com/apache/beam/issues/23206#issuecomment-1250477626

   AWS_PROFILE env var does not work - the serializer breaks it just the same. :-)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] oakad commented on issue #23206: [New feature]: Support custom profile name in AWS ProfileCredentialsProvider

Posted by GitBox <gi...@apache.org>.
oakad commented on issue #23206:
URL: https://github.com/apache/beam/issues/23206#issuecomment-1271161870

   It's not easy to check the configuration on the worker and rather counterintuitive too. Requires custom pipeline steps and what not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on issue #23206: [New feature]: Support custom profile name in AWS ProfileCredentialsProvider

Posted by GitBox <gi...@apache.org>.
mosche commented on issue #23206:
URL: https://github.com/apache/beam/issues/23206#issuecomment-1263716233

   Thinking this through, I fear this feature cannot be supported. The resulting behavior would be very inconsistent and even more confusing.
   
   Let's imagine there's optional support for a `profileName` field when serializing `ProfileCredentialsProvider`. It cannot be required, that would be backwards incompatible.
   
   If we provide the options as CLI args it works as expected for both alternatives:
   - Use the given profile on the worker
      ```
      --awsCredentialsProvider={"@type": "ProfileCredentialsProvider, "profileName":"myprofile"}
     ```
   - Use the default profile as configured on the respective worker where we load credentials
     ```
      --awsCredentialsProvider={"@type": "ProfileCredentialsProvider"}
      ```   
   Unfortunately this would NOT be consistent with a programmatic usage :/
   -  Use the given profile on the worker
      ```
      ProfileCredentialsProvider.create("myprofile")
      ```  
   - Use the default profile as configured on the machine where the DAG is generated and tell the worker to use that profile as well :/ This is due to the fact how the provider is initialized and when serializing the provider we don't know if it was explicitly provided or not.
      ```
      ProfileCredentialsProvider.create()
      ```  
   
   Any objections closing the ticket?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] aromanenko-dev commented on issue #23206: [Bug]: KinesisIO incorrectly selects Aws Credentials provider and crashes with NullPointerException on resource not found errors

Posted by GitBox <gi...@apache.org>.
aromanenko-dev commented on issue #23206:
URL: https://github.com/apache/beam/issues/23206#issuecomment-1248258933

   CC: @mosche 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on issue #23206: [Bug]: KinesisIO incorrectly selects Aws Credentials provider and crashes with NullPointerException on resource not found errors

Posted by GitBox <gi...@apache.org>.
mosche commented on issue #23206:
URL: https://github.com/apache/beam/issues/23206#issuecomment-1248296235

   @oakad I acknowledge this is confusing and by far not optimal... `ProfileCredentialsProvider` allows for additional configuration options. However, these are currently not supported as documented in the Javadocs of `AwsOptions`.
   
   The issue is that credential providers have to be serialized, and there's no decent generic way to do that :/
   Unfortunately same applies to verifying the configuration of credentials providers if provided programmatically.
   But the latter could certainly be improved a lot and catch more such issues upfront!
   
   Would you be willing to submit a PR that adds support for a custom profile name?
   
   From the [Javadocs]:(https://beam.apache.org/releases/javadoc/2.41.0/org/apache/beam/sdk/io/aws2/options/AwsOptions.html#setAwsCredentialsProvider-software.amazon.awssdk.auth.credentials.AwsCredentialsProvider-):
   
   The class name of the provider must be set in the @type field. 
   
   Note: Not all available providers are supported and some configuration options might be ignored.
   
   Most providers rely on the system's environment to follow AWS conventions, there's no further configuration supported:
   
   - [DefaultCredentialsProvider](https://static.javadoc.io/software.amazon.awssdk/auth/2.17.127/software/amazon/awssdk/auth/credentials/DefaultCredentialsProvider.html?is-external=true)
   - [EnvironmentVariableCredentialsProvider](https://static.javadoc.io/software.amazon.awssdk/auth/2.17.127/software/amazon/awssdk/auth/credentials/EnvironmentVariableCredentialsProvider.html?is-external=true)
   - [SystemPropertyCredentialsProvider](https://static.javadoc.io/software.amazon.awssdk/auth/2.17.127/software/amazon/awssdk/auth/credentials/SystemPropertyCredentialsProvider.html?is-external=true)
   - [ProfileCredentialsProvider](https://static.javadoc.io/software.amazon.awssdk/auth/2.17.127/software/amazon/awssdk/auth/credentials/ProfileCredentialsProvider.html?is-external=true)
   - [ContainerCredentialsProvider](https://static.javadoc.io/software.amazon.awssdk/auth/2.17.127/software/amazon/awssdk/auth/credentials/ContainerCredentialsProvider.html?is-external=true)
     Example:
      ```
       --awsCredentialsProvider={"@type": "ProfileCredentialsProvider"}
      ```
   
   Some other providers require additional configuration:
   
   - [StaticCredentialsProvider](https://static.javadoc.io/software.amazon.awssdk/auth/2.17.127/software/amazon/awssdk/auth/credentials/StaticCredentialsProvider.html?is-external=true)
   - [StsAssumeRoleCredentialsProvider](https://static.javadoc.io/software.amazon.awssdk/sts/2.17.127/software/amazon/awssdk/services/sts/auth/StsAssumeRoleCredentialsProvider.html?is-external=true)
      Examples:
      ```
       --awsCredentialsProvider={
         "@type": "StaticCredentialsProvider",
         "awsAccessKeyId": "key_id_value",
         "awsSecretKey": "secret_value"
       }
      ```
      ```
       --awsCredentialsProvider={
         "@type": "StaticCredentialsProvider",
         "awsAccessKeyId": "key_id_value",
         "awsSecretKey": "secret_value",
         "sessionToken": "token_value"
       }
      ```
      ```
       --awsCredentialsProvider={
         "@type": "StsAssumeRoleCredentialsProvider",
         "roleArn": "role_arn_Value",
         "roleSessionName": "session_name_value",
         "policy": "policy_value",
         "durationSeconds": 3600
       }
      ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] aromanenko-dev commented on issue #23206: [New feature]: Support custom profile name in AWS ProfileCredentialsProvider

Posted by GitBox <gi...@apache.org>.
aromanenko-dev commented on issue #23206:
URL: https://github.com/apache/beam/issues/23206#issuecomment-1249344805

   Agree with @mosche comments and we definitively should avoid NPE situation in any case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on issue #23206: [New feature]: Support custom profile name in AWS ProfileCredentialsProvider

Posted by GitBox <gi...@apache.org>.
mosche commented on issue #23206:
URL: https://github.com/apache/beam/issues/23206#issuecomment-1250620785

   It's not meant to be serialized... AWS_PROFILE works, but it has to be set on every node where credentials are used 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on issue #23206: [New feature]: Support custom profile name in AWS ProfileCredentialsProvider

Posted by GitBox <gi...@apache.org>.
mosche commented on issue #23206:
URL: https://github.com/apache/beam/issues/23206#issuecomment-1248374566

   @oakad BTW, until this is addressed you can configure the credentials provider through the respective environment variable `AWS_PROFILE`. Though, obviously, this has to be set for all worker nodes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] aromanenko-dev commented on issue #23206: [New feature]: Support custom profile name in AWS ProfileCredentialsProvider

Posted by GitBox <gi...@apache.org>.
aromanenko-dev commented on issue #23206:
URL: https://github.com/apache/beam/issues/23206#issuecomment-1270378675

   Isn't it a responsibility of user to check their configuration on workers and not Beam/KinesisIO? I mean why KinesisIO (or any other AWS IO connector) has to check if it's matching or not?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on issue #23206: [New feature]: Support custom profile name in AWS ProfileCredentialsProvider

Posted by GitBox <gi...@apache.org>.
mosche commented on issue #23206:
URL: https://github.com/apache/beam/issues/23206#issuecomment-1271636243

   > Isn't it a responsibility of users to check their configuration on workers and not Beam/KinesisIO? I mean why KinesisIO (or any other AWS IO connector) has to check if it's matching or not?
   
   Beam's responsibility is backwards compatibility in this case, @aromanenko-dev.
   If we add support for custom profile names for `ProfileCredentialsProvider` (will say we extract/serialize the profile name in the respective pipeline option), it gets a bit tricky as I tried to explain above. When serializing a provider that was set programmatically, we cannot distinguish if the user provided the profile name or if it was loaded from the system environment (e.g. `AWS_PROFILE=master`). If the latter is the case and the system environment on a worker node, where the credentials are eventually used, looks differently (e.g. `AWS_PROFILE=worker`) things can easily break for pipelines that used to run  successfully before: we would force the `master` profile on a worker even though it should use the `worker` profile there 💥 
   
   On the other hand, in case we should decide to not support this, it's still important to check the system environment during serialization of pipeline options. It's the only way to detect if the user set an unsupported ProfileCredentialsProvider programmatically with a custom profile name (`ProfileCredentialsProvider.create("myprofile")`). If we don't check it and  silently ignore the configuration (as done now), things break fairly randomly on workers as experienced here in this ticket.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] oakad commented on issue #23206: [Bug]: KinesisIO incorrectly selects Aws Credentials provider and crashes with NullPointerException on resource not found errors

Posted by GitBox <gi...@apache.org>.
oakad commented on issue #23206:
URL: https://github.com/apache/beam/issues/23206#issuecomment-1245525890

   The problem in my case appears to originate from here:
   
   https://github.com/apache/beam/blob/ebacef90771d3fbc83f2c17e517e3f2195f746d0/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/options/AwsModule.java#L162-L164
   
   Clearly, it's a very bad design to "support" serialization of objects having non-trivial internal state while actually breaking the said state on serialization. It would be much nicer to plug the yet unsupported classes with appropriate exceptions instead of breaking apps the way it is happening right now.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org