You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Sumit Desai via user <us...@beam.apache.org> on 2023/12/20 09:34:30 UTC

Environmental variables not accessible in Dataflow pipeline

Hi all,

I have a Python application which is using Apache beam and Dataflow as
runner. The application uses a non-public Python package
'uplight-telemetry' which is configured using 'extra_packages' while
creating pipeline_options object. This package expects an environmental
variable named 'OTEL_SERVICE_NAME' and since this variable is not present
in the Dataflow worker, it is resulting in an error during application
startup.

I am passing this variable using custom pipeline options. Code to create
pipeline options is as follows-

pipeline_options = ProcessBillRequests.CustomOptions(
    project=gcp_project_id,
    region="us-east1",
    job_name=job_name,
    temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
    staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
    runner='DataflowRunner',
    save_main_session=True,
    service_account_email= service_account,
    subnetwork=os.environ.get(SUBNETWORK_URL),
    extra_packages=[uplight_telemetry_tar_file_path],
    setup_file=setup_file_path,
    OTEL_SERVICE_NAME=otel_service_name,
    OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
    # Set values for additional custom variables as needed
)


And the code that executes the pipeline is as follows-


result = (
        pipeline
        | "ReadPendingRecordsFromDB" >> read_from_db
        | "Parse input PCollection" >>
beam.Map(ProcessBillRequests.parse_bill_data_requests)
        | "Fetch bills " >>
beam.ParDo(ProcessBillRequests.FetchBillInformation())
)

pipeline.run().wait_until_finish()

Is there a way I can set the environmental variables in custom options
available in the worker?

Thanks & Regards,
Sumit Desai

Re: Environmental variables not accessible in Dataflow pipeline

Posted by Sumit Desai via user <us...@beam.apache.org>.
Thanks XQ and Evan. I am going to try it out. Thanks for your suggestions.

Regards,
Sumit Desai

On Sat, Dec 23, 2023 at 12:16 AM Evan Galpin <eg...@apache.org> wrote:

> I assume from the previous messages that GCP Dataflow is being used as the
> pipeline runner.  Even without Flex Templates, the v2 runner can use docker
> containers to install all dependencies from various sources[1].  I have
> used docker containers to solve the same problem you mention: installing a
> python dependency from a private package repository.  The process is
> roughly:
>
>
>    1. Build a docker container from the apache beam base images,
>    customizing as you need[2]
>    2. Tag and push that image to Google Container Registry
>    3. When you deploy your Dataflow job, include the options
>    "--experiment=use_runner_v2 --worker_harness_container_image=
>    gcr.io/my-project/my-image-name:my-image-tag" (there may be other
>    ways, but this is what I have seen working first-hand)
>
> Your docker file can be as simple as:
>
> # Python:major:minor-slim must match apache/beam_python[major:minor]_sdk
> FROM python:3.10-slim
>
> # authenticate with private python package repo, install all various
> # dependencies, set env vars, COPY your pipeline code to the container, etc
> #
> #  ...
> #
> #
>
> # Copy files from official SDK image, including script/dependencies.
> # Apache SDK version must match python image major:minor version
> # Based on
> https://cloud.google.com/dataflow/docs/guides/using-custom-containers#python_1
> COPY --from=apache/beam_python3.10_sdk:2.52.0  /opt/apache/beam
> /opt/apache/beam
>
> # Set the entrypoint to Apache Beam SDK launcher.
> ENTRYPOINT ["/opt/apache/beam/boot"]
>
> [1]
> https://cloud.google.com/dataflow/docs/guides/using-custom-containers#python_1
> [2]
> https://cloud.google.com/dataflow/docs/guides/build-container-image#python
>
>
> On Fri, Dec 22, 2023 at 6:32 AM XQ Hu via user <us...@beam.apache.org>
> wrote:
>
>> You can use the same docker image for both template launcher and Dataflow
>> job. Here is one example:
>> https://github.com/google/dataflow-ml-starter/blob/main/tensorflow_gpu.flex.Dockerfile#L60
>>
>> On Fri, Dec 22, 2023 at 8:04 AM Sumit Desai <su...@uplight.com>
>> wrote:
>>
>>> Yes, I will have to try it out.
>>>
>>> Regards
>>> Sumit Desai
>>>
>>> On Fri, Dec 22, 2023 at 3:53 PM Sofia’s World <mm...@gmail.com>
>>> wrote:
>>>
>>>> I guess so, i am not an expert on using env variables in dataflow
>>>> pipelines as any config dependencies i  need, i pass them as job input
>>>> params
>>>>
>>>> But perhaps you can configure variables in your docker file (i am not
>>>> an expert in this either),  as  flex templates use Docker?
>>>>
>>>>
>>>> https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates
>>>>
>>>> hth
>>>>   Marco
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Dec 22, 2023 at 10:17 AM Sumit Desai <su...@uplight.com>
>>>> wrote:
>>>>
>>>>> We are using an external non-public package which expects
>>>>> environmental variables only. If environmental variables are not found, it
>>>>> will throw an error. We can't change source of this package.
>>>>>
>>>>> Does this mean we will face same problem with flex templates also?
>>>>>
>>>>> On Fri, 22 Dec 2023, 3:39 pm Sofia’s World, <mm...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> The flex template will allow you to pass input params with dynamic
>>>>>> values to your data flow job so you could replace the env variable with
>>>>>> that input? That is, unless you have to have env bars..but from your
>>>>>> snippets it appears you are just using them to configure one of your
>>>>>> components?
>>>>>> Hth
>>>>>>
>>>>>> On Fri, 22 Dec 2023, 10:01 Sumit Desai, <su...@uplight.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Sofia and XQ,
>>>>>>>
>>>>>>> The application is failing because I have loggers defined in every
>>>>>>> file and the method to create a logger tries to create an object of
>>>>>>> UplightTelemetry. If I use flex templated, will the environmental variables
>>>>>>> I supply be loaded before the application gets loaded? If not, it would not
>>>>>>> serve my purpose.
>>>>>>>
>>>>>>> Thanks & Regards,
>>>>>>> Sumit Desai
>>>>>>>
>>>>>>> On Thu, Dec 21, 2023 at 10:02 AM Sumit Desai <
>>>>>>> sumit.desai@uplight.com> wrote:
>>>>>>>
>>>>>>>> Thank you HQ. Will take a look at this.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Sumit Desai
>>>>>>>>
>>>>>>>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <xq...@google.com> wrote:
>>>>>>>>
>>>>>>>>> Dataflow VMs cannot know your local env variable. I think you
>>>>>>>>> should use custom container:
>>>>>>>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>>>>>>>>> Here is a sample project:
>>>>>>>>> https://github.com/google/dataflow-ml-starter
>>>>>>>>>
>>>>>>>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mm...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hello Sumit
>>>>>>>>>>  Thanks. Sorry...I guess if the value of the env variable is
>>>>>>>>>> always the same u can pass it as job params?..though it doesn't sound like
>>>>>>>>>> a viable option...
>>>>>>>>>> Hth
>>>>>>>>>>
>>>>>>>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <su...@uplight.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Sofia,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the response. For now, we have decided not to use
>>>>>>>>>>> flex template. Is there a way to pass environmental variables without using
>>>>>>>>>>> any template?
>>>>>>>>>>>
>>>>>>>>>>> Thanks & Regards,
>>>>>>>>>>> Sumit Desai
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <
>>>>>>>>>>> mmistroni@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi
>>>>>>>>>>>>  My 2 cents. .have u ever considered using flex templates to
>>>>>>>>>>>> run your pipeline? Then you can pass all your parameters at runtime..
>>>>>>>>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <
>>>>>>>>>>>> user@beam.apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have a Python application which is using Apache beam and
>>>>>>>>>>>>> Dataflow as runner. The application uses a non-public Python package
>>>>>>>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>>>>>>>>> creating pipeline_options object. This package expects an environmental
>>>>>>>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not present
>>>>>>>>>>>>> in the Dataflow worker, it is resulting in an error during application
>>>>>>>>>>>>> startup.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am passing this variable using custom pipeline options. Code
>>>>>>>>>>>>> to create pipeline options is as follows-
>>>>>>>>>>>>>
>>>>>>>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>>>>>>>>     project=gcp_project_id,
>>>>>>>>>>>>>     region="us-east1",
>>>>>>>>>>>>>     job_name=job_name,
>>>>>>>>>>>>>     temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>>>>>>>>     staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>>>>>>>>     runner='DataflowRunner',
>>>>>>>>>>>>>     save_main_session=True,
>>>>>>>>>>>>>     service_account_email= service_account,
>>>>>>>>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>>>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>>>>>>>>     setup_file=setup_file_path,
>>>>>>>>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>>>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>>>>>>>>     # Set values for additional custom variables as needed
>>>>>>>>>>>>> )
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> And the code that executes the pipeline is as follows-
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> result = (
>>>>>>>>>>>>>         pipeline
>>>>>>>>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>>>>>>>>         | "Parse input PCollection" >> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>>>>>>>>         | "Fetch bills " >> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>>>>>>>>> )
>>>>>>>>>>>>>
>>>>>>>>>>>>> pipeline.run().wait_until_finish()
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is there a way I can set the environmental variables in custom
>>>>>>>>>>>>> options available in the worker?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks & Regards,
>>>>>>>>>>>>> Sumit Desai
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <xq...@google.com> wrote:
>>>>>>>
>>>>>>>> Dataflow VMs cannot know your local env variable. I think you
>>>>>>>> should use custom container:
>>>>>>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>>>>>>>> Here is a sample project:
>>>>>>>> https://github.com/google/dataflow-ml-starter
>>>>>>>>
>>>>>>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mm...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hello Sumit
>>>>>>>>>  Thanks. Sorry...I guess if the value of the env variable is
>>>>>>>>> always the same u can pass it as job params?..though it doesn't sound like
>>>>>>>>> a viable option...
>>>>>>>>> Hth
>>>>>>>>>
>>>>>>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <su...@uplight.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Sofia,
>>>>>>>>>>
>>>>>>>>>> Thanks for the response. For now, we have decided not to use flex
>>>>>>>>>> template. Is there a way to pass environmental variables without using any
>>>>>>>>>> template?
>>>>>>>>>>
>>>>>>>>>> Thanks & Regards,
>>>>>>>>>> Sumit Desai
>>>>>>>>>>
>>>>>>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <
>>>>>>>>>> mmistroni@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi
>>>>>>>>>>>  My 2 cents. .have u ever considered using flex templates to run
>>>>>>>>>>> your pipeline? Then you can pass all your parameters at runtime..
>>>>>>>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>>>>>>>
>>>>>>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <
>>>>>>>>>>> user@beam.apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I have a Python application which is using Apache beam and
>>>>>>>>>>>> Dataflow as runner. The application uses a non-public Python package
>>>>>>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>>>>>>>> creating pipeline_options object. This package expects an environmental
>>>>>>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not present
>>>>>>>>>>>> in the Dataflow worker, it is resulting in an error during application
>>>>>>>>>>>> startup.
>>>>>>>>>>>>
>>>>>>>>>>>> I am passing this variable using custom pipeline options. Code
>>>>>>>>>>>> to create pipeline options is as follows-
>>>>>>>>>>>>
>>>>>>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>>>>>>>     project=gcp_project_id,
>>>>>>>>>>>>     region="us-east1",
>>>>>>>>>>>>     job_name=job_name,
>>>>>>>>>>>>     temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>>>>>>>     staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>>>>>>>     runner='DataflowRunner',
>>>>>>>>>>>>     save_main_session=True,
>>>>>>>>>>>>     service_account_email= service_account,
>>>>>>>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>>>>>>>     setup_file=setup_file_path,
>>>>>>>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>>>>>>>     # Set values for additional custom variables as needed
>>>>>>>>>>>> )
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> And the code that executes the pipeline is as follows-
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> result = (
>>>>>>>>>>>>         pipeline
>>>>>>>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>>>>>>>         | "Parse input PCollection" >> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>>>>>>>         | "Fetch bills " >> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>>>>>>>> )
>>>>>>>>>>>>
>>>>>>>>>>>> pipeline.run().wait_until_finish()
>>>>>>>>>>>>
>>>>>>>>>>>> Is there a way I can set the environmental variables in custom
>>>>>>>>>>>> options available in the worker?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks & Regards,
>>>>>>>>>>>> Sumit Desai
>>>>>>>>>>>>
>>>>>>>>>>>

Re: Environmental variables not accessible in Dataflow pipeline

Posted by Evan Galpin <eg...@apache.org>.
I assume from the previous messages that GCP Dataflow is being used as the
pipeline runner.  Even without Flex Templates, the v2 runner can use docker
containers to install all dependencies from various sources[1].  I have
used docker containers to solve the same problem you mention: installing a
python dependency from a private package repository.  The process is
roughly:


   1. Build a docker container from the apache beam base images,
   customizing as you need[2]
   2. Tag and push that image to Google Container Registry
   3. When you deploy your Dataflow job, include the options
   "--experiment=use_runner_v2 --worker_harness_container_image=
   gcr.io/my-project/my-image-name:my-image-tag" (there may be other ways,
   but this is what I have seen working first-hand)

Your docker file can be as simple as:

# Python:major:minor-slim must match apache/beam_python[major:minor]_sdk
FROM python:3.10-slim

# authenticate with private python package repo, install all various
# dependencies, set env vars, COPY your pipeline code to the container, etc
#
#  ...
#
#

# Copy files from official SDK image, including script/dependencies.
# Apache SDK version must match python image major:minor version
# Based on
https://cloud.google.com/dataflow/docs/guides/using-custom-containers#python_1
COPY --from=apache/beam_python3.10_sdk:2.52.0  /opt/apache/beam
/opt/apache/beam

# Set the entrypoint to Apache Beam SDK launcher.
ENTRYPOINT ["/opt/apache/beam/boot"]

[1]
https://cloud.google.com/dataflow/docs/guides/using-custom-containers#python_1
[2]
https://cloud.google.com/dataflow/docs/guides/build-container-image#python


On Fri, Dec 22, 2023 at 6:32 AM XQ Hu via user <us...@beam.apache.org> wrote:

> You can use the same docker image for both template launcher and Dataflow
> job. Here is one example:
> https://github.com/google/dataflow-ml-starter/blob/main/tensorflow_gpu.flex.Dockerfile#L60
>
> On Fri, Dec 22, 2023 at 8:04 AM Sumit Desai <su...@uplight.com>
> wrote:
>
>> Yes, I will have to try it out.
>>
>> Regards
>> Sumit Desai
>>
>> On Fri, Dec 22, 2023 at 3:53 PM Sofia’s World <mm...@gmail.com>
>> wrote:
>>
>>> I guess so, i am not an expert on using env variables in dataflow
>>> pipelines as any config dependencies i  need, i pass them as job input
>>> params
>>>
>>> But perhaps you can configure variables in your docker file (i am not an
>>> expert in this either),  as  flex templates use Docker?
>>>
>>>
>>> https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates
>>>
>>> hth
>>>   Marco
>>>
>>>
>>>
>>>
>>> On Fri, Dec 22, 2023 at 10:17 AM Sumit Desai <su...@uplight.com>
>>> wrote:
>>>
>>>> We are using an external non-public package which expects environmental
>>>> variables only. If environmental variables are not found, it will throw an
>>>> error. We can't change source of this package.
>>>>
>>>> Does this mean we will face same problem with flex templates also?
>>>>
>>>> On Fri, 22 Dec 2023, 3:39 pm Sofia’s World, <mm...@gmail.com>
>>>> wrote:
>>>>
>>>>> The flex template will allow you to pass input params with dynamic
>>>>> values to your data flow job so you could replace the env variable with
>>>>> that input? That is, unless you have to have env bars..but from your
>>>>> snippets it appears you are just using them to configure one of your
>>>>> components?
>>>>> Hth
>>>>>
>>>>> On Fri, 22 Dec 2023, 10:01 Sumit Desai, <su...@uplight.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Sofia and XQ,
>>>>>>
>>>>>> The application is failing because I have loggers defined in every
>>>>>> file and the method to create a logger tries to create an object of
>>>>>> UplightTelemetry. If I use flex templated, will the environmental variables
>>>>>> I supply be loaded before the application gets loaded? If not, it would not
>>>>>> serve my purpose.
>>>>>>
>>>>>> Thanks & Regards,
>>>>>> Sumit Desai
>>>>>>
>>>>>> On Thu, Dec 21, 2023 at 10:02 AM Sumit Desai <su...@uplight.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thank you HQ. Will take a look at this.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Sumit Desai
>>>>>>>
>>>>>>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <xq...@google.com> wrote:
>>>>>>>
>>>>>>>> Dataflow VMs cannot know your local env variable. I think you
>>>>>>>> should use custom container:
>>>>>>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>>>>>>>> Here is a sample project:
>>>>>>>> https://github.com/google/dataflow-ml-starter
>>>>>>>>
>>>>>>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mm...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hello Sumit
>>>>>>>>>  Thanks. Sorry...I guess if the value of the env variable is
>>>>>>>>> always the same u can pass it as job params?..though it doesn't sound like
>>>>>>>>> a viable option...
>>>>>>>>> Hth
>>>>>>>>>
>>>>>>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <su...@uplight.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Sofia,
>>>>>>>>>>
>>>>>>>>>> Thanks for the response. For now, we have decided not to use flex
>>>>>>>>>> template. Is there a way to pass environmental variables without using any
>>>>>>>>>> template?
>>>>>>>>>>
>>>>>>>>>> Thanks & Regards,
>>>>>>>>>> Sumit Desai
>>>>>>>>>>
>>>>>>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <
>>>>>>>>>> mmistroni@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi
>>>>>>>>>>>  My 2 cents. .have u ever considered using flex templates to run
>>>>>>>>>>> your pipeline? Then you can pass all your parameters at runtime..
>>>>>>>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>>>>>>>
>>>>>>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <
>>>>>>>>>>> user@beam.apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I have a Python application which is using Apache beam and
>>>>>>>>>>>> Dataflow as runner. The application uses a non-public Python package
>>>>>>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>>>>>>>> creating pipeline_options object. This package expects an environmental
>>>>>>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not present
>>>>>>>>>>>> in the Dataflow worker, it is resulting in an error during application
>>>>>>>>>>>> startup.
>>>>>>>>>>>>
>>>>>>>>>>>> I am passing this variable using custom pipeline options. Code
>>>>>>>>>>>> to create pipeline options is as follows-
>>>>>>>>>>>>
>>>>>>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>>>>>>>     project=gcp_project_id,
>>>>>>>>>>>>     region="us-east1",
>>>>>>>>>>>>     job_name=job_name,
>>>>>>>>>>>>     temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>>>>>>>     staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>>>>>>>     runner='DataflowRunner',
>>>>>>>>>>>>     save_main_session=True,
>>>>>>>>>>>>     service_account_email= service_account,
>>>>>>>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>>>>>>>     setup_file=setup_file_path,
>>>>>>>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>>>>>>>     # Set values for additional custom variables as needed
>>>>>>>>>>>> )
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> And the code that executes the pipeline is as follows-
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> result = (
>>>>>>>>>>>>         pipeline
>>>>>>>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>>>>>>>         | "Parse input PCollection" >> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>>>>>>>         | "Fetch bills " >> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>>>>>>>> )
>>>>>>>>>>>>
>>>>>>>>>>>> pipeline.run().wait_until_finish()
>>>>>>>>>>>>
>>>>>>>>>>>> Is there a way I can set the environmental variables in custom
>>>>>>>>>>>> options available in the worker?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks & Regards,
>>>>>>>>>>>> Sumit Desai
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <xq...@google.com> wrote:
>>>>>>
>>>>>>> Dataflow VMs cannot know your local env variable. I think you should
>>>>>>> use custom container:
>>>>>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>>>>>>> Here is a sample project:
>>>>>>> https://github.com/google/dataflow-ml-starter
>>>>>>>
>>>>>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mm...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello Sumit
>>>>>>>>  Thanks. Sorry...I guess if the value of the env variable is always
>>>>>>>> the same u can pass it as job params?..though it doesn't sound like a
>>>>>>>> viable option...
>>>>>>>> Hth
>>>>>>>>
>>>>>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <su...@uplight.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Sofia,
>>>>>>>>>
>>>>>>>>> Thanks for the response. For now, we have decided not to use flex
>>>>>>>>> template. Is there a way to pass environmental variables without using any
>>>>>>>>> template?
>>>>>>>>>
>>>>>>>>> Thanks & Regards,
>>>>>>>>> Sumit Desai
>>>>>>>>>
>>>>>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mm...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi
>>>>>>>>>>  My 2 cents. .have u ever considered using flex templates to run
>>>>>>>>>> your pipeline? Then you can pass all your parameters at runtime..
>>>>>>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>>>>>>
>>>>>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <
>>>>>>>>>> user@beam.apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I have a Python application which is using Apache beam and
>>>>>>>>>>> Dataflow as runner. The application uses a non-public Python package
>>>>>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>>>>>>> creating pipeline_options object. This package expects an environmental
>>>>>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not present
>>>>>>>>>>> in the Dataflow worker, it is resulting in an error during application
>>>>>>>>>>> startup.
>>>>>>>>>>>
>>>>>>>>>>> I am passing this variable using custom pipeline options. Code
>>>>>>>>>>> to create pipeline options is as follows-
>>>>>>>>>>>
>>>>>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>>>>>>     project=gcp_project_id,
>>>>>>>>>>>     region="us-east1",
>>>>>>>>>>>     job_name=job_name,
>>>>>>>>>>>     temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>>>>>>     staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>>>>>>     runner='DataflowRunner',
>>>>>>>>>>>     save_main_session=True,
>>>>>>>>>>>     service_account_email= service_account,
>>>>>>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>>>>>>     setup_file=setup_file_path,
>>>>>>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>>>>>>     # Set values for additional custom variables as needed
>>>>>>>>>>> )
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> And the code that executes the pipeline is as follows-
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> result = (
>>>>>>>>>>>         pipeline
>>>>>>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>>>>>>         | "Parse input PCollection" >> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>>>>>>         | "Fetch bills " >> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>>>>>>> )
>>>>>>>>>>>
>>>>>>>>>>> pipeline.run().wait_until_finish()
>>>>>>>>>>>
>>>>>>>>>>> Is there a way I can set the environmental variables in custom
>>>>>>>>>>> options available in the worker?
>>>>>>>>>>>
>>>>>>>>>>> Thanks & Regards,
>>>>>>>>>>> Sumit Desai
>>>>>>>>>>>
>>>>>>>>>>

Re: Environmental variables not accessible in Dataflow pipeline

Posted by XQ Hu via user <us...@beam.apache.org>.
You can use the same docker image for both template launcher and Dataflow
job. Here is one example:
https://github.com/google/dataflow-ml-starter/blob/main/tensorflow_gpu.flex.Dockerfile#L60

On Fri, Dec 22, 2023 at 8:04 AM Sumit Desai <su...@uplight.com> wrote:

> Yes, I will have to try it out.
>
> Regards
> Sumit Desai
>
> On Fri, Dec 22, 2023 at 3:53 PM Sofia’s World <mm...@gmail.com> wrote:
>
>> I guess so, i am not an expert on using env variables in dataflow
>> pipelines as any config dependencies i  need, i pass them as job input
>> params
>>
>> But perhaps you can configure variables in your docker file (i am not an
>> expert in this either),  as  flex templates use Docker?
>>
>>
>> https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates
>>
>> hth
>>   Marco
>>
>>
>>
>>
>> On Fri, Dec 22, 2023 at 10:17 AM Sumit Desai <su...@uplight.com>
>> wrote:
>>
>>> We are using an external non-public package which expects environmental
>>> variables only. If environmental variables are not found, it will throw an
>>> error. We can't change source of this package.
>>>
>>> Does this mean we will face same problem with flex templates also?
>>>
>>> On Fri, 22 Dec 2023, 3:39 pm Sofia’s World, <mm...@gmail.com> wrote:
>>>
>>>> The flex template will allow you to pass input params with dynamic
>>>> values to your data flow job so you could replace the env variable with
>>>> that input? That is, unless you have to have env bars..but from your
>>>> snippets it appears you are just using them to configure one of your
>>>> components?
>>>> Hth
>>>>
>>>> On Fri, 22 Dec 2023, 10:01 Sumit Desai, <su...@uplight.com>
>>>> wrote:
>>>>
>>>>> Hi Sofia and XQ,
>>>>>
>>>>> The application is failing because I have loggers defined in every
>>>>> file and the method to create a logger tries to create an object of
>>>>> UplightTelemetry. If I use flex templated, will the environmental variables
>>>>> I supply be loaded before the application gets loaded? If not, it would not
>>>>> serve my purpose.
>>>>>
>>>>> Thanks & Regards,
>>>>> Sumit Desai
>>>>>
>>>>> On Thu, Dec 21, 2023 at 10:02 AM Sumit Desai <su...@uplight.com>
>>>>> wrote:
>>>>>
>>>>>> Thank you HQ. Will take a look at this.
>>>>>>
>>>>>> Regards,
>>>>>> Sumit Desai
>>>>>>
>>>>>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <xq...@google.com> wrote:
>>>>>>
>>>>>>> Dataflow VMs cannot know your local env variable. I think you should
>>>>>>> use custom container:
>>>>>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>>>>>>> Here is a sample project:
>>>>>>> https://github.com/google/dataflow-ml-starter
>>>>>>>
>>>>>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mm...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello Sumit
>>>>>>>>  Thanks. Sorry...I guess if the value of the env variable is always
>>>>>>>> the same u can pass it as job params?..though it doesn't sound like a
>>>>>>>> viable option...
>>>>>>>> Hth
>>>>>>>>
>>>>>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <su...@uplight.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Sofia,
>>>>>>>>>
>>>>>>>>> Thanks for the response. For now, we have decided not to use flex
>>>>>>>>> template. Is there a way to pass environmental variables without using any
>>>>>>>>> template?
>>>>>>>>>
>>>>>>>>> Thanks & Regards,
>>>>>>>>> Sumit Desai
>>>>>>>>>
>>>>>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mm...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi
>>>>>>>>>>  My 2 cents. .have u ever considered using flex templates to run
>>>>>>>>>> your pipeline? Then you can pass all your parameters at runtime..
>>>>>>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>>>>>>
>>>>>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <
>>>>>>>>>> user@beam.apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I have a Python application which is using Apache beam and
>>>>>>>>>>> Dataflow as runner. The application uses a non-public Python package
>>>>>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>>>>>>> creating pipeline_options object. This package expects an environmental
>>>>>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not present
>>>>>>>>>>> in the Dataflow worker, it is resulting in an error during application
>>>>>>>>>>> startup.
>>>>>>>>>>>
>>>>>>>>>>> I am passing this variable using custom pipeline options. Code
>>>>>>>>>>> to create pipeline options is as follows-
>>>>>>>>>>>
>>>>>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>>>>>>     project=gcp_project_id,
>>>>>>>>>>>     region="us-east1",
>>>>>>>>>>>     job_name=job_name,
>>>>>>>>>>>     temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>>>>>>     staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>>>>>>     runner='DataflowRunner',
>>>>>>>>>>>     save_main_session=True,
>>>>>>>>>>>     service_account_email= service_account,
>>>>>>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>>>>>>     setup_file=setup_file_path,
>>>>>>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>>>>>>     # Set values for additional custom variables as needed
>>>>>>>>>>> )
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> And the code that executes the pipeline is as follows-
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> result = (
>>>>>>>>>>>         pipeline
>>>>>>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>>>>>>         | "Parse input PCollection" >> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>>>>>>         | "Fetch bills " >> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>>>>>>> )
>>>>>>>>>>>
>>>>>>>>>>> pipeline.run().wait_until_finish()
>>>>>>>>>>>
>>>>>>>>>>> Is there a way I can set the environmental variables in custom
>>>>>>>>>>> options available in the worker?
>>>>>>>>>>>
>>>>>>>>>>> Thanks & Regards,
>>>>>>>>>>> Sumit Desai
>>>>>>>>>>>
>>>>>>>>>>
>>>>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <xq...@google.com> wrote:
>>>>>
>>>>>> Dataflow VMs cannot know your local env variable. I think you should
>>>>>> use custom container:
>>>>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>>>>>> Here is a sample project:
>>>>>> https://github.com/google/dataflow-ml-starter
>>>>>>
>>>>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mm...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello Sumit
>>>>>>>  Thanks. Sorry...I guess if the value of the env variable is always
>>>>>>> the same u can pass it as job params?..though it doesn't sound like a
>>>>>>> viable option...
>>>>>>> Hth
>>>>>>>
>>>>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <su...@uplight.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Sofia,
>>>>>>>>
>>>>>>>> Thanks for the response. For now, we have decided not to use flex
>>>>>>>> template. Is there a way to pass environmental variables without using any
>>>>>>>> template?
>>>>>>>>
>>>>>>>> Thanks & Regards,
>>>>>>>> Sumit Desai
>>>>>>>>
>>>>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mm...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi
>>>>>>>>>  My 2 cents. .have u ever considered using flex templates to run
>>>>>>>>> your pipeline? Then you can pass all your parameters at runtime..
>>>>>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>>>>>
>>>>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <
>>>>>>>>> user@beam.apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I have a Python application which is using Apache beam and
>>>>>>>>>> Dataflow as runner. The application uses a non-public Python package
>>>>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>>>>>> creating pipeline_options object. This package expects an environmental
>>>>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not present
>>>>>>>>>> in the Dataflow worker, it is resulting in an error during application
>>>>>>>>>> startup.
>>>>>>>>>>
>>>>>>>>>> I am passing this variable using custom pipeline options. Code to
>>>>>>>>>> create pipeline options is as follows-
>>>>>>>>>>
>>>>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>>>>>     project=gcp_project_id,
>>>>>>>>>>     region="us-east1",
>>>>>>>>>>     job_name=job_name,
>>>>>>>>>>     temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>>>>>     staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>>>>>     runner='DataflowRunner',
>>>>>>>>>>     save_main_session=True,
>>>>>>>>>>     service_account_email= service_account,
>>>>>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>>>>>     setup_file=setup_file_path,
>>>>>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>>>>>     # Set values for additional custom variables as needed
>>>>>>>>>> )
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> And the code that executes the pipeline is as follows-
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> result = (
>>>>>>>>>>         pipeline
>>>>>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>>>>>         | "Parse input PCollection" >> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>>>>>         | "Fetch bills " >> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>>>>>> )
>>>>>>>>>>
>>>>>>>>>> pipeline.run().wait_until_finish()
>>>>>>>>>>
>>>>>>>>>> Is there a way I can set the environmental variables in custom
>>>>>>>>>> options available in the worker?
>>>>>>>>>>
>>>>>>>>>> Thanks & Regards,
>>>>>>>>>> Sumit Desai
>>>>>>>>>>
>>>>>>>>>

Re: Environmental variables not accessible in Dataflow pipeline

Posted by Sumit Desai via user <us...@beam.apache.org>.
Yes, I will have to try it out.

Regards
Sumit Desai

On Fri, Dec 22, 2023 at 3:53 PM Sofia’s World <mm...@gmail.com> wrote:

> I guess so, i am not an expert on using env variables in dataflow
> pipelines as any config dependencies i  need, i pass them as job input
> params
>
> But perhaps you can configure variables in your docker file (i am not an
> expert in this either),  as  flex templates use Docker?
>
>
> https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates
>
> hth
>   Marco
>
>
>
>
> On Fri, Dec 22, 2023 at 10:17 AM Sumit Desai <su...@uplight.com>
> wrote:
>
>> We are using an external non-public package which expects environmental
>> variables only. If environmental variables are not found, it will throw an
>> error. We can't change source of this package.
>>
>> Does this mean we will face same problem with flex templates also?
>>
>> On Fri, 22 Dec 2023, 3:39 pm Sofia’s World, <mm...@gmail.com> wrote:
>>
>>> The flex template will allow you to pass input params with dynamic
>>> values to your data flow job so you could replace the env variable with
>>> that input? That is, unless you have to have env bars..but from your
>>> snippets it appears you are just using them to configure one of your
>>> components?
>>> Hth
>>>
>>> On Fri, 22 Dec 2023, 10:01 Sumit Desai, <su...@uplight.com> wrote:
>>>
>>>> Hi Sofia and XQ,
>>>>
>>>> The application is failing because I have loggers defined in every file
>>>> and the method to create a logger tries to create an object of
>>>> UplightTelemetry. If I use flex templated, will the environmental variables
>>>> I supply be loaded before the application gets loaded? If not, it would not
>>>> serve my purpose.
>>>>
>>>> Thanks & Regards,
>>>> Sumit Desai
>>>>
>>>> On Thu, Dec 21, 2023 at 10:02 AM Sumit Desai <su...@uplight.com>
>>>> wrote:
>>>>
>>>>> Thank you HQ. Will take a look at this.
>>>>>
>>>>> Regards,
>>>>> Sumit Desai
>>>>>
>>>>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <xq...@google.com> wrote:
>>>>>
>>>>>> Dataflow VMs cannot know your local env variable. I think you should
>>>>>> use custom container:
>>>>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>>>>>> Here is a sample project:
>>>>>> https://github.com/google/dataflow-ml-starter
>>>>>>
>>>>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mm...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello Sumit
>>>>>>>  Thanks. Sorry...I guess if the value of the env variable is always
>>>>>>> the same u can pass it as job params?..though it doesn't sound like a
>>>>>>> viable option...
>>>>>>> Hth
>>>>>>>
>>>>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <su...@uplight.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Sofia,
>>>>>>>>
>>>>>>>> Thanks for the response. For now, we have decided not to use flex
>>>>>>>> template. Is there a way to pass environmental variables without using any
>>>>>>>> template?
>>>>>>>>
>>>>>>>> Thanks & Regards,
>>>>>>>> Sumit Desai
>>>>>>>>
>>>>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mm...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi
>>>>>>>>>  My 2 cents. .have u ever considered using flex templates to run
>>>>>>>>> your pipeline? Then you can pass all your parameters at runtime..
>>>>>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>>>>>
>>>>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <
>>>>>>>>> user@beam.apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I have a Python application which is using Apache beam and
>>>>>>>>>> Dataflow as runner. The application uses a non-public Python package
>>>>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>>>>>> creating pipeline_options object. This package expects an environmental
>>>>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not present
>>>>>>>>>> in the Dataflow worker, it is resulting in an error during application
>>>>>>>>>> startup.
>>>>>>>>>>
>>>>>>>>>> I am passing this variable using custom pipeline options. Code to
>>>>>>>>>> create pipeline options is as follows-
>>>>>>>>>>
>>>>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>>>>>     project=gcp_project_id,
>>>>>>>>>>     region="us-east1",
>>>>>>>>>>     job_name=job_name,
>>>>>>>>>>     temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>>>>>     staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>>>>>     runner='DataflowRunner',
>>>>>>>>>>     save_main_session=True,
>>>>>>>>>>     service_account_email= service_account,
>>>>>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>>>>>     setup_file=setup_file_path,
>>>>>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>>>>>     # Set values for additional custom variables as needed
>>>>>>>>>> )
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> And the code that executes the pipeline is as follows-
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> result = (
>>>>>>>>>>         pipeline
>>>>>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>>>>>         | "Parse input PCollection" >> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>>>>>         | "Fetch bills " >> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>>>>>> )
>>>>>>>>>>
>>>>>>>>>> pipeline.run().wait_until_finish()
>>>>>>>>>>
>>>>>>>>>> Is there a way I can set the environmental variables in custom
>>>>>>>>>> options available in the worker?
>>>>>>>>>>
>>>>>>>>>> Thanks & Regards,
>>>>>>>>>> Sumit Desai
>>>>>>>>>>
>>>>>>>>>
>>>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <xq...@google.com> wrote:
>>>>
>>>>> Dataflow VMs cannot know your local env variable. I think you should
>>>>> use custom container:
>>>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>>>>> Here is a sample project:
>>>>> https://github.com/google/dataflow-ml-starter
>>>>>
>>>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mm...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello Sumit
>>>>>>  Thanks. Sorry...I guess if the value of the env variable is always
>>>>>> the same u can pass it as job params?..though it doesn't sound like a
>>>>>> viable option...
>>>>>> Hth
>>>>>>
>>>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <su...@uplight.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Sofia,
>>>>>>>
>>>>>>> Thanks for the response. For now, we have decided not to use flex
>>>>>>> template. Is there a way to pass environmental variables without using any
>>>>>>> template?
>>>>>>>
>>>>>>> Thanks & Regards,
>>>>>>> Sumit Desai
>>>>>>>
>>>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mm...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi
>>>>>>>>  My 2 cents. .have u ever considered using flex templates to run
>>>>>>>> your pipeline? Then you can pass all your parameters at runtime..
>>>>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>>>>
>>>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <
>>>>>>>> user@beam.apache.org> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I have a Python application which is using Apache beam and
>>>>>>>>> Dataflow as runner. The application uses a non-public Python package
>>>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>>>>> creating pipeline_options object. This package expects an environmental
>>>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not present
>>>>>>>>> in the Dataflow worker, it is resulting in an error during application
>>>>>>>>> startup.
>>>>>>>>>
>>>>>>>>> I am passing this variable using custom pipeline options. Code to
>>>>>>>>> create pipeline options is as follows-
>>>>>>>>>
>>>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>>>>     project=gcp_project_id,
>>>>>>>>>     region="us-east1",
>>>>>>>>>     job_name=job_name,
>>>>>>>>>     temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>>>>     staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>>>>     runner='DataflowRunner',
>>>>>>>>>     save_main_session=True,
>>>>>>>>>     service_account_email= service_account,
>>>>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>>>>     setup_file=setup_file_path,
>>>>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>>>>     # Set values for additional custom variables as needed
>>>>>>>>> )
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> And the code that executes the pipeline is as follows-
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> result = (
>>>>>>>>>         pipeline
>>>>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>>>>         | "Parse input PCollection" >> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>>>>         | "Fetch bills " >> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>>>>> )
>>>>>>>>>
>>>>>>>>> pipeline.run().wait_until_finish()
>>>>>>>>>
>>>>>>>>> Is there a way I can set the environmental variables in custom
>>>>>>>>> options available in the worker?
>>>>>>>>>
>>>>>>>>> Thanks & Regards,
>>>>>>>>> Sumit Desai
>>>>>>>>>
>>>>>>>>

Re: Environmental variables not accessible in Dataflow pipeline

Posted by Sofia’s World <mm...@gmail.com>.
I guess so, i am not an expert on using env variables in dataflow pipelines
as any config dependencies i  need, i pass them as job input params

But perhaps you can configure variables in your docker file (i am not an
expert in this either),  as  flex templates use Docker?

https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates

hth
  Marco




On Fri, Dec 22, 2023 at 10:17 AM Sumit Desai <su...@uplight.com>
wrote:

> We are using an external non-public package which expects environmental
> variables only. If environmental variables are not found, it will throw an
> error. We can't change source of this package.
>
> Does this mean we will face same problem with flex templates also?
>
> On Fri, 22 Dec 2023, 3:39 pm Sofia’s World, <mm...@gmail.com> wrote:
>
>> The flex template will allow you to pass input params with dynamic values
>> to your data flow job so you could replace the env variable with that
>> input? That is, unless you have to have env bars..but from your snippets it
>> appears you are just using them to configure one of your components?
>> Hth
>>
>> On Fri, 22 Dec 2023, 10:01 Sumit Desai, <su...@uplight.com> wrote:
>>
>>> Hi Sofia and XQ,
>>>
>>> The application is failing because I have loggers defined in every file
>>> and the method to create a logger tries to create an object of
>>> UplightTelemetry. If I use flex templated, will the environmental variables
>>> I supply be loaded before the application gets loaded? If not, it would not
>>> serve my purpose.
>>>
>>> Thanks & Regards,
>>> Sumit Desai
>>>
>>> On Thu, Dec 21, 2023 at 10:02 AM Sumit Desai <su...@uplight.com>
>>> wrote:
>>>
>>>> Thank you HQ. Will take a look at this.
>>>>
>>>> Regards,
>>>> Sumit Desai
>>>>
>>>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <xq...@google.com> wrote:
>>>>
>>>>> Dataflow VMs cannot know your local env variable. I think you should
>>>>> use custom container:
>>>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>>>>> Here is a sample project:
>>>>> https://github.com/google/dataflow-ml-starter
>>>>>
>>>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mm...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello Sumit
>>>>>>  Thanks. Sorry...I guess if the value of the env variable is always
>>>>>> the same u can pass it as job params?..though it doesn't sound like a
>>>>>> viable option...
>>>>>> Hth
>>>>>>
>>>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <su...@uplight.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Sofia,
>>>>>>>
>>>>>>> Thanks for the response. For now, we have decided not to use flex
>>>>>>> template. Is there a way to pass environmental variables without using any
>>>>>>> template?
>>>>>>>
>>>>>>> Thanks & Regards,
>>>>>>> Sumit Desai
>>>>>>>
>>>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mm...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi
>>>>>>>>  My 2 cents. .have u ever considered using flex templates to run
>>>>>>>> your pipeline? Then you can pass all your parameters at runtime..
>>>>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>>>>
>>>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <
>>>>>>>> user@beam.apache.org> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I have a Python application which is using Apache beam and
>>>>>>>>> Dataflow as runner. The application uses a non-public Python package
>>>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>>>>> creating pipeline_options object. This package expects an environmental
>>>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not present
>>>>>>>>> in the Dataflow worker, it is resulting in an error during application
>>>>>>>>> startup.
>>>>>>>>>
>>>>>>>>> I am passing this variable using custom pipeline options. Code to
>>>>>>>>> create pipeline options is as follows-
>>>>>>>>>
>>>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>>>>     project=gcp_project_id,
>>>>>>>>>     region="us-east1",
>>>>>>>>>     job_name=job_name,
>>>>>>>>>     temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>>>>     staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>>>>     runner='DataflowRunner',
>>>>>>>>>     save_main_session=True,
>>>>>>>>>     service_account_email= service_account,
>>>>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>>>>     setup_file=setup_file_path,
>>>>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>>>>     # Set values for additional custom variables as needed
>>>>>>>>> )
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> And the code that executes the pipeline is as follows-
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> result = (
>>>>>>>>>         pipeline
>>>>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>>>>         | "Parse input PCollection" >> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>>>>         | "Fetch bills " >> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>>>>> )
>>>>>>>>>
>>>>>>>>> pipeline.run().wait_until_finish()
>>>>>>>>>
>>>>>>>>> Is there a way I can set the environmental variables in custom
>>>>>>>>> options available in the worker?
>>>>>>>>>
>>>>>>>>> Thanks & Regards,
>>>>>>>>> Sumit Desai
>>>>>>>>>
>>>>>>>>
>>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <xq...@google.com> wrote:
>>>
>>>> Dataflow VMs cannot know your local env variable. I think you should
>>>> use custom container:
>>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>>>> Here is a sample project: https://github.com/google/dataflow-ml-starter
>>>>
>>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mm...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello Sumit
>>>>>  Thanks. Sorry...I guess if the value of the env variable is always
>>>>> the same u can pass it as job params?..though it doesn't sound like a
>>>>> viable option...
>>>>> Hth
>>>>>
>>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <su...@uplight.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Sofia,
>>>>>>
>>>>>> Thanks for the response. For now, we have decided not to use flex
>>>>>> template. Is there a way to pass environmental variables without using any
>>>>>> template?
>>>>>>
>>>>>> Thanks & Regards,
>>>>>> Sumit Desai
>>>>>>
>>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mm...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi
>>>>>>>  My 2 cents. .have u ever considered using flex templates to run
>>>>>>> your pipeline? Then you can pass all your parameters at runtime..
>>>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>>>
>>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <
>>>>>>> user@beam.apache.org> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I have a Python application which is using Apache beam and Dataflow
>>>>>>>> as runner. The application uses a non-public Python package
>>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>>>> creating pipeline_options object. This package expects an environmental
>>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not present
>>>>>>>> in the Dataflow worker, it is resulting in an error during application
>>>>>>>> startup.
>>>>>>>>
>>>>>>>> I am passing this variable using custom pipeline options. Code to
>>>>>>>> create pipeline options is as follows-
>>>>>>>>
>>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>>>     project=gcp_project_id,
>>>>>>>>     region="us-east1",
>>>>>>>>     job_name=job_name,
>>>>>>>>     temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>>>     staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>>>     runner='DataflowRunner',
>>>>>>>>     save_main_session=True,
>>>>>>>>     service_account_email= service_account,
>>>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>>>     setup_file=setup_file_path,
>>>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>>>     # Set values for additional custom variables as needed
>>>>>>>> )
>>>>>>>>
>>>>>>>>
>>>>>>>> And the code that executes the pipeline is as follows-
>>>>>>>>
>>>>>>>>
>>>>>>>> result = (
>>>>>>>>         pipeline
>>>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>>>         | "Parse input PCollection" >> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>>>         | "Fetch bills " >> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>>>> )
>>>>>>>>
>>>>>>>> pipeline.run().wait_until_finish()
>>>>>>>>
>>>>>>>> Is there a way I can set the environmental variables in custom
>>>>>>>> options available in the worker?
>>>>>>>>
>>>>>>>> Thanks & Regards,
>>>>>>>> Sumit Desai
>>>>>>>>
>>>>>>>

Re: Environmental variables not accessible in Dataflow pipeline

Posted by Sumit Desai via user <us...@beam.apache.org>.
We are using an external non-public package which expects environmental
variables only. If environmental variables are not found, it will throw an
error. We can't change source of this package.

Does this mean we will face same problem with flex templates also?

On Fri, 22 Dec 2023, 3:39 pm Sofia’s World, <mm...@gmail.com> wrote:

> The flex template will allow you to pass input params with dynamic values
> to your data flow job so you could replace the env variable with that
> input? That is, unless you have to have env bars..but from your snippets it
> appears you are just using them to configure one of your components?
> Hth
>
> On Fri, 22 Dec 2023, 10:01 Sumit Desai, <su...@uplight.com> wrote:
>
>> Hi Sofia and XQ,
>>
>> The application is failing because I have loggers defined in every file
>> and the method to create a logger tries to create an object of
>> UplightTelemetry. If I use flex templated, will the environmental variables
>> I supply be loaded before the application gets loaded? If not, it would not
>> serve my purpose.
>>
>> Thanks & Regards,
>> Sumit Desai
>>
>> On Thu, Dec 21, 2023 at 10:02 AM Sumit Desai <su...@uplight.com>
>> wrote:
>>
>>> Thank you HQ. Will take a look at this.
>>>
>>> Regards,
>>> Sumit Desai
>>>
>>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <xq...@google.com> wrote:
>>>
>>>> Dataflow VMs cannot know your local env variable. I think you should
>>>> use custom container:
>>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>>>> Here is a sample project: https://github.com/google/dataflow-ml-starter
>>>>
>>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mm...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello Sumit
>>>>>  Thanks. Sorry...I guess if the value of the env variable is always
>>>>> the same u can pass it as job params?..though it doesn't sound like a
>>>>> viable option...
>>>>> Hth
>>>>>
>>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <su...@uplight.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Sofia,
>>>>>>
>>>>>> Thanks for the response. For now, we have decided not to use flex
>>>>>> template. Is there a way to pass environmental variables without using any
>>>>>> template?
>>>>>>
>>>>>> Thanks & Regards,
>>>>>> Sumit Desai
>>>>>>
>>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mm...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi
>>>>>>>  My 2 cents. .have u ever considered using flex templates to run
>>>>>>> your pipeline? Then you can pass all your parameters at runtime..
>>>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>>>
>>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <
>>>>>>> user@beam.apache.org> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I have a Python application which is using Apache beam and Dataflow
>>>>>>>> as runner. The application uses a non-public Python package
>>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>>>> creating pipeline_options object. This package expects an environmental
>>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not present
>>>>>>>> in the Dataflow worker, it is resulting in an error during application
>>>>>>>> startup.
>>>>>>>>
>>>>>>>> I am passing this variable using custom pipeline options. Code to
>>>>>>>> create pipeline options is as follows-
>>>>>>>>
>>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>>>     project=gcp_project_id,
>>>>>>>>     region="us-east1",
>>>>>>>>     job_name=job_name,
>>>>>>>>     temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>>>     staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>>>     runner='DataflowRunner',
>>>>>>>>     save_main_session=True,
>>>>>>>>     service_account_email= service_account,
>>>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>>>     setup_file=setup_file_path,
>>>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>>>     # Set values for additional custom variables as needed
>>>>>>>> )
>>>>>>>>
>>>>>>>>
>>>>>>>> And the code that executes the pipeline is as follows-
>>>>>>>>
>>>>>>>>
>>>>>>>> result = (
>>>>>>>>         pipeline
>>>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>>>         | "Parse input PCollection" >> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>>>         | "Fetch bills " >> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>>>> )
>>>>>>>>
>>>>>>>> pipeline.run().wait_until_finish()
>>>>>>>>
>>>>>>>> Is there a way I can set the environmental variables in custom
>>>>>>>> options available in the worker?
>>>>>>>>
>>>>>>>> Thanks & Regards,
>>>>>>>> Sumit Desai
>>>>>>>>
>>>>>>>
>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <xq...@google.com> wrote:
>>
>>> Dataflow VMs cannot know your local env variable. I think you should use
>>> custom container:
>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>>> Here is a sample project: https://github.com/google/dataflow-ml-starter
>>>
>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mm...@gmail.com>
>>> wrote:
>>>
>>>> Hello Sumit
>>>>  Thanks. Sorry...I guess if the value of the env variable is always the
>>>> same u can pass it as job params?..though it doesn't sound like a
>>>> viable option...
>>>> Hth
>>>>
>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <su...@uplight.com>
>>>> wrote:
>>>>
>>>>> Hi Sofia,
>>>>>
>>>>> Thanks for the response. For now, we have decided not to use flex
>>>>> template. Is there a way to pass environmental variables without using any
>>>>> template?
>>>>>
>>>>> Thanks & Regards,
>>>>> Sumit Desai
>>>>>
>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mm...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi
>>>>>>  My 2 cents. .have u ever considered using flex templates to run your
>>>>>> pipeline? Then you can pass all your parameters at runtime..
>>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>>
>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <
>>>>>> user@beam.apache.org> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I have a Python application which is using Apache beam and Dataflow
>>>>>>> as runner. The application uses a non-public Python package
>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>>> creating pipeline_options object. This package expects an environmental
>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not present
>>>>>>> in the Dataflow worker, it is resulting in an error during application
>>>>>>> startup.
>>>>>>>
>>>>>>> I am passing this variable using custom pipeline options. Code to
>>>>>>> create pipeline options is as follows-
>>>>>>>
>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>>     project=gcp_project_id,
>>>>>>>     region="us-east1",
>>>>>>>     job_name=job_name,
>>>>>>>     temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>>     staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>>     runner='DataflowRunner',
>>>>>>>     save_main_session=True,
>>>>>>>     service_account_email= service_account,
>>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>>     setup_file=setup_file_path,
>>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>>     # Set values for additional custom variables as needed
>>>>>>> )
>>>>>>>
>>>>>>>
>>>>>>> And the code that executes the pipeline is as follows-
>>>>>>>
>>>>>>>
>>>>>>> result = (
>>>>>>>         pipeline
>>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>>         | "Parse input PCollection" >> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>>         | "Fetch bills " >> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>>> )
>>>>>>>
>>>>>>> pipeline.run().wait_until_finish()
>>>>>>>
>>>>>>> Is there a way I can set the environmental variables in custom
>>>>>>> options available in the worker?
>>>>>>>
>>>>>>> Thanks & Regards,
>>>>>>> Sumit Desai
>>>>>>>
>>>>>>

Re: Environmental variables not accessible in Dataflow pipeline

Posted by Sofia’s World <mm...@gmail.com>.
The flex template will allow you to pass input params with dynamic values
to your data flow job so you could replace the env variable with that
input? That is, unless you have to have env bars..but from your snippets it
appears you are just using them to configure one of your components?
Hth

On Fri, 22 Dec 2023, 10:01 Sumit Desai, <su...@uplight.com> wrote:

> Hi Sofia and XQ,
>
> The application is failing because I have loggers defined in every file
> and the method to create a logger tries to create an object of
> UplightTelemetry. If I use flex templated, will the environmental variables
> I supply be loaded before the application gets loaded? If not, it would not
> serve my purpose.
>
> Thanks & Regards,
> Sumit Desai
>
> On Thu, Dec 21, 2023 at 10:02 AM Sumit Desai <su...@uplight.com>
> wrote:
>
>> Thank you HQ. Will take a look at this.
>>
>> Regards,
>> Sumit Desai
>>
>> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <xq...@google.com> wrote:
>>
>>> Dataflow VMs cannot know your local env variable. I think you should use
>>> custom container:
>>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>>> Here is a sample project: https://github.com/google/dataflow-ml-starter
>>>
>>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mm...@gmail.com>
>>> wrote:
>>>
>>>> Hello Sumit
>>>>  Thanks. Sorry...I guess if the value of the env variable is always the
>>>> same u can pass it as job params?..though it doesn't sound like a
>>>> viable option...
>>>> Hth
>>>>
>>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <su...@uplight.com>
>>>> wrote:
>>>>
>>>>> Hi Sofia,
>>>>>
>>>>> Thanks for the response. For now, we have decided not to use flex
>>>>> template. Is there a way to pass environmental variables without using any
>>>>> template?
>>>>>
>>>>> Thanks & Regards,
>>>>> Sumit Desai
>>>>>
>>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mm...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi
>>>>>>  My 2 cents. .have u ever considered using flex templates to run your
>>>>>> pipeline? Then you can pass all your parameters at runtime..
>>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>>
>>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <
>>>>>> user@beam.apache.org> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I have a Python application which is using Apache beam and Dataflow
>>>>>>> as runner. The application uses a non-public Python package
>>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>>> creating pipeline_options object. This package expects an environmental
>>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not present
>>>>>>> in the Dataflow worker, it is resulting in an error during application
>>>>>>> startup.
>>>>>>>
>>>>>>> I am passing this variable using custom pipeline options. Code to
>>>>>>> create pipeline options is as follows-
>>>>>>>
>>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>>     project=gcp_project_id,
>>>>>>>     region="us-east1",
>>>>>>>     job_name=job_name,
>>>>>>>     temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>>     staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>>     runner='DataflowRunner',
>>>>>>>     save_main_session=True,
>>>>>>>     service_account_email= service_account,
>>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>>     setup_file=setup_file_path,
>>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>>     # Set values for additional custom variables as needed
>>>>>>> )
>>>>>>>
>>>>>>>
>>>>>>> And the code that executes the pipeline is as follows-
>>>>>>>
>>>>>>>
>>>>>>> result = (
>>>>>>>         pipeline
>>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>>         | "Parse input PCollection" >> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>>         | "Fetch bills " >> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>>> )
>>>>>>>
>>>>>>> pipeline.run().wait_until_finish()
>>>>>>>
>>>>>>> Is there a way I can set the environmental variables in custom
>>>>>>> options available in the worker?
>>>>>>>
>>>>>>> Thanks & Regards,
>>>>>>> Sumit Desai
>>>>>>>
>>>>>>
> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <xq...@google.com> wrote:
>
>> Dataflow VMs cannot know your local env variable. I think you should use
>> custom container:
>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>> Here is a sample project: https://github.com/google/dataflow-ml-starter
>>
>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mm...@gmail.com>
>> wrote:
>>
>>> Hello Sumit
>>>  Thanks. Sorry...I guess if the value of the env variable is always the
>>> same u can pass it as job params?..though it doesn't sound like a
>>> viable option...
>>> Hth
>>>
>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <su...@uplight.com> wrote:
>>>
>>>> Hi Sofia,
>>>>
>>>> Thanks for the response. For now, we have decided not to use flex
>>>> template. Is there a way to pass environmental variables without using any
>>>> template?
>>>>
>>>> Thanks & Regards,
>>>> Sumit Desai
>>>>
>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mm...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>  My 2 cents. .have u ever considered using flex templates to run your
>>>>> pipeline? Then you can pass all your parameters at runtime..
>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>
>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <us...@beam.apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I have a Python application which is using Apache beam and Dataflow
>>>>>> as runner. The application uses a non-public Python package
>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>> creating pipeline_options object. This package expects an environmental
>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not present
>>>>>> in the Dataflow worker, it is resulting in an error during application
>>>>>> startup.
>>>>>>
>>>>>> I am passing this variable using custom pipeline options. Code to
>>>>>> create pipeline options is as follows-
>>>>>>
>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>     project=gcp_project_id,
>>>>>>     region="us-east1",
>>>>>>     job_name=job_name,
>>>>>>     temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>     staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>     runner='DataflowRunner',
>>>>>>     save_main_session=True,
>>>>>>     service_account_email= service_account,
>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>     setup_file=setup_file_path,
>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>     # Set values for additional custom variables as needed
>>>>>> )
>>>>>>
>>>>>>
>>>>>> And the code that executes the pipeline is as follows-
>>>>>>
>>>>>>
>>>>>> result = (
>>>>>>         pipeline
>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>         | "Parse input PCollection" >> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>         | "Fetch bills " >> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>> )
>>>>>>
>>>>>> pipeline.run().wait_until_finish()
>>>>>>
>>>>>> Is there a way I can set the environmental variables in custom
>>>>>> options available in the worker?
>>>>>>
>>>>>> Thanks & Regards,
>>>>>> Sumit Desai
>>>>>>
>>>>>

Re: Environmental variables not accessible in Dataflow pipeline

Posted by Sumit Desai via user <us...@beam.apache.org>.
Hi Sofia and XQ,

The application is failing because I have loggers defined in every file and
the method to create a logger tries to create an object of
UplightTelemetry. If I use flex templated, will the environmental variables
I supply be loaded before the application gets loaded? If not, it would not
serve my purpose.

Thanks & Regards,
Sumit Desai

On Thu, Dec 21, 2023 at 10:02 AM Sumit Desai <su...@uplight.com>
wrote:

> Thank you HQ. Will take a look at this.
>
> Regards,
> Sumit Desai
>
> On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <xq...@google.com> wrote:
>
>> Dataflow VMs cannot know your local env variable. I think you should use
>> custom container:
>> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
>> Here is a sample project: https://github.com/google/dataflow-ml-starter
>>
>> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mm...@gmail.com>
>> wrote:
>>
>>> Hello Sumit
>>>  Thanks. Sorry...I guess if the value of the env variable is always the
>>> same u can pass it as job params?..though it doesn't sound like a
>>> viable option...
>>> Hth
>>>
>>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <su...@uplight.com> wrote:
>>>
>>>> Hi Sofia,
>>>>
>>>> Thanks for the response. For now, we have decided not to use flex
>>>> template. Is there a way to pass environmental variables without using any
>>>> template?
>>>>
>>>> Thanks & Regards,
>>>> Sumit Desai
>>>>
>>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mm...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>  My 2 cents. .have u ever considered using flex templates to run your
>>>>> pipeline? Then you can pass all your parameters at runtime..
>>>>> (Apologies in advance if it does not cover your use case...)
>>>>>
>>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <us...@beam.apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I have a Python application which is using Apache beam and Dataflow
>>>>>> as runner. The application uses a non-public Python package
>>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>>> creating pipeline_options object. This package expects an environmental
>>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not present
>>>>>> in the Dataflow worker, it is resulting in an error during application
>>>>>> startup.
>>>>>>
>>>>>> I am passing this variable using custom pipeline options. Code to
>>>>>> create pipeline options is as follows-
>>>>>>
>>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>>     project=gcp_project_id,
>>>>>>     region="us-east1",
>>>>>>     job_name=job_name,
>>>>>>     temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>>     staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>>     runner='DataflowRunner',
>>>>>>     save_main_session=True,
>>>>>>     service_account_email= service_account,
>>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>>     setup_file=setup_file_path,
>>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>>     # Set values for additional custom variables as needed
>>>>>> )
>>>>>>
>>>>>>
>>>>>> And the code that executes the pipeline is as follows-
>>>>>>
>>>>>>
>>>>>> result = (
>>>>>>         pipeline
>>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>>         | "Parse input PCollection" >> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>>         | "Fetch bills " >> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>>> )
>>>>>>
>>>>>> pipeline.run().wait_until_finish()
>>>>>>
>>>>>> Is there a way I can set the environmental variables in custom
>>>>>> options available in the worker?
>>>>>>
>>>>>> Thanks & Regards,
>>>>>> Sumit Desai
>>>>>>
>>>>>
On Wed, Dec 20, 2023 at 8:13 PM XQ Hu <xq...@google.com> wrote:

> Dataflow VMs cannot know your local env variable. I think you should use
> custom container:
> https://cloud.google.com/dataflow/docs/guides/using-custom-containers.
> Here is a sample project: https://github.com/google/dataflow-ml-starter
>
> On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mm...@gmail.com> wrote:
>
>> Hello Sumit
>>  Thanks. Sorry...I guess if the value of the env variable is always the
>> same u can pass it as job params?..though it doesn't sound like a
>> viable option...
>> Hth
>>
>> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <su...@uplight.com> wrote:
>>
>>> Hi Sofia,
>>>
>>> Thanks for the response. For now, we have decided not to use flex
>>> template. Is there a way to pass environmental variables without using any
>>> template?
>>>
>>> Thanks & Regards,
>>> Sumit Desai
>>>
>>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mm...@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>>  My 2 cents. .have u ever considered using flex templates to run your
>>>> pipeline? Then you can pass all your parameters at runtime..
>>>> (Apologies in advance if it does not cover your use case...)
>>>>
>>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <us...@beam.apache.org>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I have a Python application which is using Apache beam and Dataflow as
>>>>> runner. The application uses a non-public Python package
>>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>>> creating pipeline_options object. This package expects an environmental
>>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not present
>>>>> in the Dataflow worker, it is resulting in an error during application
>>>>> startup.
>>>>>
>>>>> I am passing this variable using custom pipeline options. Code to
>>>>> create pipeline options is as follows-
>>>>>
>>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>>     project=gcp_project_id,
>>>>>     region="us-east1",
>>>>>     job_name=job_name,
>>>>>     temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>>     staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>>     runner='DataflowRunner',
>>>>>     save_main_session=True,
>>>>>     service_account_email= service_account,
>>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>>     setup_file=setup_file_path,
>>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>>     # Set values for additional custom variables as needed
>>>>> )
>>>>>
>>>>>
>>>>> And the code that executes the pipeline is as follows-
>>>>>
>>>>>
>>>>> result = (
>>>>>         pipeline
>>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>>         | "Parse input PCollection" >> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>>         | "Fetch bills " >> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>>> )
>>>>>
>>>>> pipeline.run().wait_until_finish()
>>>>>
>>>>> Is there a way I can set the environmental variables in custom options
>>>>> available in the worker?
>>>>>
>>>>> Thanks & Regards,
>>>>> Sumit Desai
>>>>>
>>>>

Re: Environmental variables not accessible in Dataflow pipeline

Posted by XQ Hu via user <us...@beam.apache.org>.
Dataflow VMs cannot know your local env variable. I think you should use
custom container:
https://cloud.google.com/dataflow/docs/guides/using-custom-containers. Here
is a sample project: https://github.com/google/dataflow-ml-starter

On Wed, Dec 20, 2023 at 4:57 AM Sofia’s World <mm...@gmail.com> wrote:

> Hello Sumit
>  Thanks. Sorry...I guess if the value of the env variable is always the
> same u can pass it as job params?..though it doesn't sound like a
> viable option...
> Hth
>
> On Wed, 20 Dec 2023, 09:49 Sumit Desai, <su...@uplight.com> wrote:
>
>> Hi Sofia,
>>
>> Thanks for the response. For now, we have decided not to use flex
>> template. Is there a way to pass environmental variables without using any
>> template?
>>
>> Thanks & Regards,
>> Sumit Desai
>>
>> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mm...@gmail.com>
>> wrote:
>>
>>> Hi
>>>  My 2 cents. .have u ever considered using flex templates to run your
>>> pipeline? Then you can pass all your parameters at runtime..
>>> (Apologies in advance if it does not cover your use case...)
>>>
>>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <us...@beam.apache.org>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I have a Python application which is using Apache beam and Dataflow as
>>>> runner. The application uses a non-public Python package
>>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>>> creating pipeline_options object. This package expects an environmental
>>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not present
>>>> in the Dataflow worker, it is resulting in an error during application
>>>> startup.
>>>>
>>>> I am passing this variable using custom pipeline options. Code to
>>>> create pipeline options is as follows-
>>>>
>>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>>     project=gcp_project_id,
>>>>     region="us-east1",
>>>>     job_name=job_name,
>>>>     temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>>     staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>>     runner='DataflowRunner',
>>>>     save_main_session=True,
>>>>     service_account_email= service_account,
>>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>>     setup_file=setup_file_path,
>>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>>     # Set values for additional custom variables as needed
>>>> )
>>>>
>>>>
>>>> And the code that executes the pipeline is as follows-
>>>>
>>>>
>>>> result = (
>>>>         pipeline
>>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>>         | "Parse input PCollection" >> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>>         | "Fetch bills " >> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>>> )
>>>>
>>>> pipeline.run().wait_until_finish()
>>>>
>>>> Is there a way I can set the environmental variables in custom options
>>>> available in the worker?
>>>>
>>>> Thanks & Regards,
>>>> Sumit Desai
>>>>
>>>

Re: Environmental variables not accessible in Dataflow pipeline

Posted by Sofia’s World <mm...@gmail.com>.
Hello Sumit
 Thanks. Sorry...I guess if the value of the env variable is always the
same u can pass it as job params?..though it doesn't sound like a
viable option...
Hth

On Wed, 20 Dec 2023, 09:49 Sumit Desai, <su...@uplight.com> wrote:

> Hi Sofia,
>
> Thanks for the response. For now, we have decided not to use flex
> template. Is there a way to pass environmental variables without using any
> template?
>
> Thanks & Regards,
> Sumit Desai
>
> On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mm...@gmail.com> wrote:
>
>> Hi
>>  My 2 cents. .have u ever considered using flex templates to run your
>> pipeline? Then you can pass all your parameters at runtime..
>> (Apologies in advance if it does not cover your use case...)
>>
>> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <us...@beam.apache.org>
>> wrote:
>>
>>> Hi all,
>>>
>>> I have a Python application which is using Apache beam and Dataflow as
>>> runner. The application uses a non-public Python package
>>> 'uplight-telemetry' which is configured using 'extra_packages' while
>>> creating pipeline_options object. This package expects an environmental
>>> variable named 'OTEL_SERVICE_NAME' and since this variable is not present
>>> in the Dataflow worker, it is resulting in an error during application
>>> startup.
>>>
>>> I am passing this variable using custom pipeline options. Code to create
>>> pipeline options is as follows-
>>>
>>> pipeline_options = ProcessBillRequests.CustomOptions(
>>>     project=gcp_project_id,
>>>     region="us-east1",
>>>     job_name=job_name,
>>>     temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>>     staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>>     runner='DataflowRunner',
>>>     save_main_session=True,
>>>     service_account_email= service_account,
>>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>>     extra_packages=[uplight_telemetry_tar_file_path],
>>>     setup_file=setup_file_path,
>>>     OTEL_SERVICE_NAME=otel_service_name,
>>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>>     # Set values for additional custom variables as needed
>>> )
>>>
>>>
>>> And the code that executes the pipeline is as follows-
>>>
>>>
>>> result = (
>>>         pipeline
>>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>>         | "Parse input PCollection" >> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>>         | "Fetch bills " >> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>>> )
>>>
>>> pipeline.run().wait_until_finish()
>>>
>>> Is there a way I can set the environmental variables in custom options
>>> available in the worker?
>>>
>>> Thanks & Regards,
>>> Sumit Desai
>>>
>>

Re: Environmental variables not accessible in Dataflow pipeline

Posted by Sumit Desai via user <us...@beam.apache.org>.
Hi Sofia,

Thanks for the response. For now, we have decided not to use flex template.
Is there a way to pass environmental variables without using any template?

Thanks & Regards,
Sumit Desai

On Wed, Dec 20, 2023 at 3:16 PM Sofia’s World <mm...@gmail.com> wrote:

> Hi
>  My 2 cents. .have u ever considered using flex templates to run your
> pipeline? Then you can pass all your parameters at runtime..
> (Apologies in advance if it does not cover your use case...)
>
> On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <us...@beam.apache.org>
> wrote:
>
>> Hi all,
>>
>> I have a Python application which is using Apache beam and Dataflow as
>> runner. The application uses a non-public Python package
>> 'uplight-telemetry' which is configured using 'extra_packages' while
>> creating pipeline_options object. This package expects an environmental
>> variable named 'OTEL_SERVICE_NAME' and since this variable is not present
>> in the Dataflow worker, it is resulting in an error during application
>> startup.
>>
>> I am passing this variable using custom pipeline options. Code to create
>> pipeline options is as follows-
>>
>> pipeline_options = ProcessBillRequests.CustomOptions(
>>     project=gcp_project_id,
>>     region="us-east1",
>>     job_name=job_name,
>>     temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>>     staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>>     runner='DataflowRunner',
>>     save_main_session=True,
>>     service_account_email= service_account,
>>     subnetwork=os.environ.get(SUBNETWORK_URL),
>>     extra_packages=[uplight_telemetry_tar_file_path],
>>     setup_file=setup_file_path,
>>     OTEL_SERVICE_NAME=otel_service_name,
>>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>>     # Set values for additional custom variables as needed
>> )
>>
>>
>> And the code that executes the pipeline is as follows-
>>
>>
>> result = (
>>         pipeline
>>         | "ReadPendingRecordsFromDB" >> read_from_db
>>         | "Parse input PCollection" >> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>>         | "Fetch bills " >> beam.ParDo(ProcessBillRequests.FetchBillInformation())
>> )
>>
>> pipeline.run().wait_until_finish()
>>
>> Is there a way I can set the environmental variables in custom options
>> available in the worker?
>>
>> Thanks & Regards,
>> Sumit Desai
>>
>

Re: Environmental variables not accessible in Dataflow pipeline

Posted by Sofia’s World <mm...@gmail.com>.
Hi
 My 2 cents. .have u ever considered using flex templates to run your
pipeline? Then you can pass all your parameters at runtime..
(Apologies in advance if it does not cover your use case...)

On Wed, 20 Dec 2023, 09:35 Sumit Desai via user, <us...@beam.apache.org>
wrote:

> Hi all,
>
> I have a Python application which is using Apache beam and Dataflow as
> runner. The application uses a non-public Python package
> 'uplight-telemetry' which is configured using 'extra_packages' while
> creating pipeline_options object. This package expects an environmental
> variable named 'OTEL_SERVICE_NAME' and since this variable is not present
> in the Dataflow worker, it is resulting in an error during application
> startup.
>
> I am passing this variable using custom pipeline options. Code to create
> pipeline options is as follows-
>
> pipeline_options = ProcessBillRequests.CustomOptions(
>     project=gcp_project_id,
>     region="us-east1",
>     job_name=job_name,
>     temp_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/temp',
>     staging_location=f'gs://{TAS_GCS_BUCKET_NAME_PREFIX}{os.getenv("UP_PLATFORM_ENV")}/staging',
>     runner='DataflowRunner',
>     save_main_session=True,
>     service_account_email= service_account,
>     subnetwork=os.environ.get(SUBNETWORK_URL),
>     extra_packages=[uplight_telemetry_tar_file_path],
>     setup_file=setup_file_path,
>     OTEL_SERVICE_NAME=otel_service_name,
>     OTEL_RESOURCE_ATTRIBUTES=otel_resource_attributes
>     # Set values for additional custom variables as needed
> )
>
>
> And the code that executes the pipeline is as follows-
>
>
> result = (
>         pipeline
>         | "ReadPendingRecordsFromDB" >> read_from_db
>         | "Parse input PCollection" >> beam.Map(ProcessBillRequests.parse_bill_data_requests)
>         | "Fetch bills " >> beam.ParDo(ProcessBillRequests.FetchBillInformation())
> )
>
> pipeline.run().wait_until_finish()
>
> Is there a way I can set the environmental variables in custom options
> available in the worker?
>
> Thanks & Regards,
> Sumit Desai
>