You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Moritz Mack <mm...@talend.com> on 2022/05/17 08:31:25 UTC

SerializablePipelineOptions / FileSystems.setDefaultPipelineOptions

Does anybody here have some insights on this? Really wondering about the numbers, initializing all filesystems ~80k times for a pipeline run doesn’t seem right.

On 13.05.22, 09:10, "Moritz Mack" <mm...@talend.com> wrote:

Hi Jack, Silencing info logs for that class during IT tests would be a quick fix, but also removing logging there entirely shouldn’t hurt. If the S3 filesystem is used it’ll fail on first usage and the issue should be fairly obvious… ‍ ‍ ‍ ‍ ‍ ‍
ZjQcmQRYFpfptBannerStart
ZjQcmQRYFpfptBannerEnd
Hi Jack,

Silencing info logs for that class during IT tests would be a quick fix, but also removing logging there entirely shouldn’t hurt.
If the S3 filesystem is used it’ll fail on first usage and the issue should be fairly obvious…

Though wondering, this is logged once when file systems are initialized… seeing this ~80k times basically means all file systems including the S3 one get initialized so many times.
That doesn’t seem right! But I know very little about the portable runners and lifecycle of components / JVMs :/
I suspect this behavior is related to calling FileSystems.setDefaultPipelineOptions (triggering initialization of all Filesystems) on deserialization of SerializablePipelineOptions, see https://issues.apache.org/jira/browse/BEAM-2712<https://urldefense.com/v3/__https:/issues.apache.org/jira/browse/BEAM-2712__;!!CiXD_PY!XO6byjU2CnoTtuizUz9DwZPSX0UNaBxeOH-t2e5-DHrmk1WYv23rholzHylHZj1O1BZ3-LaH7g$>.

Best,
Moritz (mosche)


From: Pablo Estrada <pa...@google.com>
Date: Thursday, 12. May 2022 at 20:50
To: dev <de...@beam.apache.org>, Alexey Romanenko <ar...@gmail.com>
Subject: Re: S3ClientBuilder Logging
Hi Jack, Frankly, I think you should feel free to make a change to reduce the logging levels for some of the logs, and get a review from @Alexey Romanenko @mosche(github), or myself. This is valuable feedback, so it would be great if we can
ZjQcmQRYFpfptBannerStart
ZjQcmQRYFpfptBannerEnd
Hi Jack,
Frankly, I think you should feel free to make a change to reduce the logging levels for some of the logs, and get a review from @Alexey Romanenko<ma...@gmail.com> @mosche(github), or myself. This is valuable feedback, so it would be great if we can get the logging to a good spot : )
Best
-P.

On Thu, May 12, 2022 at 8:56 AM Jack McCluskey <jr...@google.com>> wrote:
Hey everyone,

I've noticed that the logging in the AWS2 DefaultS3ClientBuilderFactory<https://urldefense.com/v3/__https:/github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/io/amazon-web-services/src/main/java/org/apache/beam/sdk/io/aws/s3/DefaultS3ClientBuilderFactory.java*L53__;Iw!!CiXD_PY!QJBsxY-tSzc7pEwbn1eIP-esfhfLmkayRxFqy_RvfEmqpO0D76Da8Dm2ScTF1_TvLSsh6DM58Qrg$> is extremely verbose when running Beam-on-Flink. For reference, the logged statement "INFO: The AWS S3 Beam extension was included in this build, but the awsRegion flag was not specified. If you don't plan to use S3, then ignore this message." appears in a full run of the Beam Go Flink integration tests 79,970 times. Is this necessary? This makes parsing output in the event of an error much more difficult and time consuming. I don't have the full context around this class and its usage, but it seems excessive at face value.

Thanks,

Jack McCluskey

--
[https://lh4.googleusercontent.com/OT9S5kjymtmtwskqZyTzlenaa8pi4IqW358dV1KN_HEP2T0KIx5VKMlWhzP9fM27_juihIYJ6-9aMRlD3uvT-ilMsbWCuPCDkk9bg60EH5Q4GiypzX00lpGthpiZTetEEGb0NBm7PDUT]

Jack McCluskey
SWE - DataPLS PLAT/ Beam Go
RDU
jrmccluskey@gmail.com<ma...@gmail.com>


As a recipient of an email from Talend, your contact personal data will be on our systems. Please see our privacy notice. <https://www.talend.com/privacy/>


As a recipient of an email from Talend, your contact personal data will be on our systems. Please see our privacy notice. <https://www.talend.com/privacy/>