You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Xinbin Huang (Jira)" <ji...@apache.org> on 2020/10/29 15:51:00 UTC

[jira] [Comment Edited] (BEAM-7763) Python DirectRunner _PubSubReadEvaluator creates new client per bundle

    [ https://issues.apache.org/jira/browse/BEAM-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17222976#comment-17222976 ] 

Xinbin Huang edited comment on BEAM-7763 at 10/29/20, 3:50 PM:
---------------------------------------------------------------

[~udim] I tried to reproduce this, but it doesn't seem to be reproducible on my end. Do you think this may already be solved in other parts of the codebase?

Test script that I used:
{code:bash}
#! /usr/bin/sh
set -a
LOCAL_TEXT_FILE=README.md

INPUT_TOPIC=<input-topic>
OUTPUT_TOPIC=<output-topic>
set +a


function create_topics {
    gcloud pubsub topics create $INPUT_TOPIC
    gcloud pubsub topics create $OUTPUT_TOPIC
}


function publish_words {
    echo "Publishing words to topic ${INPUT_TOPIC}"
    cat $LOCAL_TEXT_FILE | while read line; do gcloud pubsub topics publish $INPUT_TOPIC --message "$line"; done
}

function run_wc_beam_streaming {
    python -m apache_beam.examples.streaming_wordcount \
    --input_topic $INPUT_TOPIC \
    --output_topic $OUTPUT_TOPIC \
    --streaming
}


############################


create_topics
run_wc_beam_streaming &
publish_words
{code}



was (Author: xbhuang):
[~udim] I tried to reproduce this, but it doesn't seem to be reproducible on my end. Do you think this may already be solved in other parts of the codebase?


{code:bash}
#! /usr/bin/sh
set -a
LOCAL_TEXT_FILE=README.md

INPUT_TOPIC=<input-topic>
OUTPUT_TOPIC=<output-topic>
set +a


function create_topics {
    gcloud pubsub topics create $INPUT_TOPIC
    gcloud pubsub topics create $OUTPUT_TOPIC
}


function publish_words {
    echo "Publishing words to topic ${INPUT_TOPIC}"
    cat $LOCAL_TEXT_FILE | while read line; do gcloud pubsub topics publish $INPUT_TOPIC --message "$line"; done
}

function run_wc_beam_streaming {
    python -m apache_beam.examples.streaming_wordcount \
    --input_topic $INPUT_TOPIC \
    --output_topic $OUTPUT_TOPIC \
    --streaming
}


############################


create_topics
run_wc_beam_streaming &
publish_words
{code}


> Python DirectRunner _PubSubReadEvaluator creates new client per bundle
> ----------------------------------------------------------------------
>
>                 Key: BEAM-7763
>                 URL: https://issues.apache.org/jira/browse/BEAM-7763
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>            Reporter: Udi Meiri
>            Priority: P3
>              Labels: easy
>
> Lots of credential fetches.
> Similar to https://issues.apache.org/jira/browse/BEAM-2264
> but in this case the DirectRunner implementation seems to be creating a new client for each bundle:
> https://github.com/apache/beam/blob/d5d7a7b7d0408d8435031e7bfce1abe2227115f5/sdks/python/apache_beam/runners/direct/transform_evaluator.py#L474
> From: https://stackoverflow.com/questions/57010426/dataflow-access-to-pubsub-access-tokens



--
This message was sent by Atlassian Jira
(v8.3.4#803005)