You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Udi Meiri (Jira)" <ji...@apache.org> on 2020/11/02 20:38:00 UTC

[jira] [Commented] (BEAM-7763) Python DirectRunner _PubSubReadEvaluator creates new client per bundle

    [ https://issues.apache.org/jira/browse/BEAM-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17224940#comment-17224940 ] 

Udi Meiri commented on BEAM-7763:
---------------------------------

It seems that credentials are fetched using this function: https://google-auth.readthedocs.io/en/latest/_modules/google/auth/_default.html#default
so it would be useful to cache the client since the function doesn't seem to cache.

If you could run a load test with and without client caching, that would help in determining the performance difference. It is not required though.

> Python DirectRunner _PubSubReadEvaluator creates new client per bundle
> ----------------------------------------------------------------------
>
>                 Key: BEAM-7763
>                 URL: https://issues.apache.org/jira/browse/BEAM-7763
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Udi Meiri
>            Priority: P3
>              Labels: easy
>
> Lots of credential fetches.
> Similar to https://issues.apache.org/jira/browse/BEAM-2264
> but in this case the DirectRunner implementation seems to be creating a new client for each bundle:
> https://github.com/apache/beam/blob/d5d7a7b7d0408d8435031e7bfce1abe2227115f5/sdks/python/apache_beam/runners/direct/transform_evaluator.py#L474
> From: https://stackoverflow.com/questions/57010426/dataflow-access-to-pubsub-access-tokens



--
This message was sent by Atlassian Jira
(v8.3.4#803005)