You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Jack McCluskey <jr...@google.com> on 2022/05/12 15:56:23 UTC

S3ClientBuilder Logging

Hey everyone,

I've noticed that the logging in the AWS2 DefaultS3ClientBuilderFactory
<https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/DefaultS3ClientBuilderFactory.java#L53>
is
extremely verbose when running Beam-on-Flink. For reference, the logged
statement "INFO: The AWS S3 Beam extension was included in this build, but
the awsRegion flag was not specified. If you don't plan to use S3, then
ignore this message." appears in a full run of the Beam Go Flink
integration tests 79,970 times. Is this necessary? This makes parsing
output in the event of an error much more difficult and time consuming. I
don't have the full context around this class and its usage, but it seems
excessive at face value.

Thanks,

Jack McCluskey

-- 


Jack McCluskey
SWE - DataPLS PLAT/ Beam Go
RDU
jrmccluskey@gmail.com

SerializablePipelineOptions / FileSystems.setDefaultPipelineOptions

Posted by Moritz Mack <mm...@talend.com>.
Does anybody here have some insights on this? Really wondering about the numbers, initializing all filesystems ~80k times for a pipeline run doesn’t seem right.

On 13.05.22, 09:10, "Moritz Mack" <mm...@talend.com> wrote:

Hi Jack, Silencing info logs for that class during IT tests would be a quick fix, but also removing logging there entirely shouldn’t hurt. If the S3 filesystem is used it’ll fail on first usage and the issue should be fairly obvious… ‍ ‍ ‍ ‍ ‍ ‍
ZjQcmQRYFpfptBannerStart
ZjQcmQRYFpfptBannerEnd
Hi Jack,

Silencing info logs for that class during IT tests would be a quick fix, but also removing logging there entirely shouldn’t hurt.
If the S3 filesystem is used it’ll fail on first usage and the issue should be fairly obvious…

Though wondering, this is logged once when file systems are initialized… seeing this ~80k times basically means all file systems including the S3 one get initialized so many times.
That doesn’t seem right! But I know very little about the portable runners and lifecycle of components / JVMs :/
I suspect this behavior is related to calling FileSystems.setDefaultPipelineOptions (triggering initialization of all Filesystems) on deserialization of SerializablePipelineOptions, see https://issues.apache.org/jira/browse/BEAM-2712<https://urldefense.com/v3/__https:/issues.apache.org/jira/browse/BEAM-2712__;!!CiXD_PY!XO6byjU2CnoTtuizUz9DwZPSX0UNaBxeOH-t2e5-DHrmk1WYv23rholzHylHZj1O1BZ3-LaH7g$>.

Best,
Moritz (mosche)


From: Pablo Estrada <pa...@google.com>
Date: Thursday, 12. May 2022 at 20:50
To: dev <de...@beam.apache.org>, Alexey Romanenko <ar...@gmail.com>
Subject: Re: S3ClientBuilder Logging
Hi Jack, Frankly, I think you should feel free to make a change to reduce the logging levels for some of the logs, and get a review from @Alexey Romanenko @mosche(github), or myself. This is valuable feedback, so it would be great if we can
ZjQcmQRYFpfptBannerStart
ZjQcmQRYFpfptBannerEnd
Hi Jack,
Frankly, I think you should feel free to make a change to reduce the logging levels for some of the logs, and get a review from @Alexey Romanenko<ma...@gmail.com> @mosche(github), or myself. This is valuable feedback, so it would be great if we can get the logging to a good spot : )
Best
-P.

On Thu, May 12, 2022 at 8:56 AM Jack McCluskey <jr...@google.com>> wrote:
Hey everyone,

I've noticed that the logging in the AWS2 DefaultS3ClientBuilderFactory<https://urldefense.com/v3/__https:/github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/DefaultS3ClientBuilderFactory.java*L53__;Iw!!CiXD_PY!QJBsxY-tSzc7pEwbn1eIP-esfhfLmkayRxFqy_RvfEmqpO0D76Da8Dm2ScTF1_TvLSsh6DM58Qrg$> is extremely verbose when running Beam-on-Flink. For reference, the logged statement "INFO: The AWS S3 Beam extension was included in this build, but the awsRegion flag was not specified. If you don't plan to use S3, then ignore this message." appears in a full run of the Beam Go Flink integration tests 79,970 times. Is this necessary? This makes parsing output in the event of an error much more difficult and time consuming. I don't have the full context around this class and its usage, but it seems excessive at face value.

Thanks,

Jack McCluskey

--
[https://lh4.googleusercontent.com/OT9S5kjymtmtwskqZyTzlenaa8pi4IqW358dV1KN_HEP2T0KIx5VKMlWhzP9fM27_juihIYJ6-9aMRlD3uvT-ilMsbWCuPCDkk9bg60EH5Q4GiypzX00lpGthpiZTetEEGb0NBm7PDUT]

Jack McCluskey
SWE - DataPLS PLAT/ Beam Go
RDU
jrmccluskey@gmail.com<ma...@gmail.com>


As a recipient of an email from Talend, your contact personal data will be on our systems. Please see our privacy notice. <https://www.talend.com/privacy/>


As a recipient of an email from Talend, your contact personal data will be on our systems. Please see our privacy notice. <https://www.talend.com/privacy/>



Re: S3ClientBuilder Logging

Posted by Moritz Mack <mm...@talend.com>.
Hi Jack,

Silencing info logs for that class during IT tests would be a quick fix, but also removing logging there entirely shouldn’t hurt.
If the S3 filesystem is used it’ll fail on first usage and the issue should be fairly obvious…

Though wondering, this is logged once when file systems are initialized… seeing this ~80k times basically means all file systems including the S3 one get initialized so many times.
That doesn’t seem right! But I know very little about the portable runners and lifecycle of components / JVMs :/
I suspect this behavior is related to calling FileSystems.setDefaultPipelineOptions (triggering initialization of all Filesystems) on deserialization of SerializablePipelineOptions, see https://issues.apache.org/jira/browse/BEAM-2712.

Best,
Moritz (mosche)


From: Pablo Estrada <pa...@google.com>
Date: Thursday, 12. May 2022 at 20:50
To: dev <de...@beam.apache.org>, Alexey Romanenko <ar...@gmail.com>
Subject: Re: S3ClientBuilder Logging
Hi Jack, Frankly, I think you should feel free to make a change to reduce the logging levels for some of the logs, and get a review from @Alexey Romanenko @mosche(github), or myself. This is valuable feedback, so it would be great if we can
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
Exercise caution when opening attachments or clicking any links.
ZjQcmQRYFpfptBannerEnd
Hi Jack,
Frankly, I think you should feel free to make a change to reduce the logging levels for some of the logs, and get a review from @Alexey Romanenko<ma...@gmail.com> @mosche(github), or myself. This is valuable feedback, so it would be great if we can get the logging to a good spot : )
Best
-P.

On Thu, May 12, 2022 at 8:56 AM Jack McCluskey <jr...@google.com>> wrote:
Hey everyone,

I've noticed that the logging in the AWS2 DefaultS3ClientBuilderFactory<https://urldefense.com/v3/__https:/github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/DefaultS3ClientBuilderFactory.java*L53__;Iw!!CiXD_PY!QJBsxY-tSzc7pEwbn1eIP-esfhfLmkayRxFqy_RvfEmqpO0D76Da8Dm2ScTF1_TvLSsh6DM58Qrg$> is extremely verbose when running Beam-on-Flink. For reference, the logged statement "INFO: The AWS S3 Beam extension was included in this build, but the awsRegion flag was not specified. If you don't plan to use S3, then ignore this message." appears in a full run of the Beam Go Flink integration tests 79,970 times. Is this necessary? This makes parsing output in the event of an error much more difficult and time consuming. I don't have the full context around this class and its usage, but it seems excessive at face value.

Thanks,

Jack McCluskey

--
[https://lh4.googleusercontent.com/OT9S5kjymtmtwskqZyTzlenaa8pi4IqW358dV1KN_HEP2T0KIx5VKMlWhzP9fM27_juihIYJ6-9aMRlD3uvT-ilMsbWCuPCDkk9bg60EH5Q4GiypzX00lpGthpiZTetEEGb0NBm7PDUT]

Jack McCluskey
SWE - DataPLS PLAT/ Beam Go
RDU
jrmccluskey@gmail.com<ma...@gmail.com>


As a recipient of an email from Talend, your contact personal data will be on our systems. Please see our privacy notice. <https://www.talend.com/privacy/>



Re: S3ClientBuilder Logging

Posted by Pablo Estrada <pa...@google.com>.
Hi Jack,
Frankly, I think you should feel free to make a change to reduce the
logging levels for some of the logs, and get a review from @Alexey Romanenko
<ar...@gmail.com> @mosche(github), or myself. This is valuable
feedback, so it would be great if we can get the logging to a good spot : )
Best
-P.

On Thu, May 12, 2022 at 8:56 AM Jack McCluskey <jr...@google.com>
wrote:

> Hey everyone,
>
> I've noticed that the logging in the AWS2 DefaultS3ClientBuilderFactory
> <https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/DefaultS3ClientBuilderFactory.java#L53> is
> extremely verbose when running Beam-on-Flink. For reference, the logged
> statement "INFO: The AWS S3 Beam extension was included in this build, but
> the awsRegion flag was not specified. If you don't plan to use S3, then
> ignore this message." appears in a full run of the Beam Go Flink
> integration tests 79,970 times. Is this necessary? This makes parsing
> output in the event of an error much more difficult and time consuming. I
> don't have the full context around this class and its usage, but it seems
> excessive at face value.
>
> Thanks,
>
> Jack McCluskey
>
> --
>
>
> Jack McCluskey
> SWE - DataPLS PLAT/ Beam Go
> RDU
> jrmccluskey@gmail.com
>
>
>