You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Jan Lukavský <je...@seznam.cz> on 2021/09/24 13:33:13 UTC

Re: Importing dependencies of Python Pipeline

+dev <ma...@beam.apache.org>

I hit very similar issue even with standard module (math). No matter 
where I put the import statement (even one line preceding the use), the 
module cannot be found and causes

NameError: name 'math' is not defined

I therefore think, that the --setup_file works fine, but there is more 
general problem (or misunderstanding from my side) with importing 
modules. Can this be runner-dependent? I use FlinkRunner and submit jobs 
with --flink_submit_uber_jar, could there be the problem?

  Jan

On 9/23/21 3:12 PM, Jan Lukavský wrote:
> Oops, sorry, the illustration of the three files is wrong. It was 
> meant to be
>
> src/
>
>  | ---- script.py
>
>  | ---- service_pb2.py
>
>  | ---- service_pb2_grpc.py
>
> The three files are in the same directory.
>
> On 9/23/21 3:08 PM, Jan Lukavský wrote:
>> Hi,
>>
>> I'm facing issues importing dependencies of my Python Pipeline. I 
>> intend to use gRPC to communicate with remote RPC service, hence I 
>> have the following project structure:
>>
>> script.py
>>
>>     |---- service_pb2.py
>>
>>     |---- service_pb2_grpc.py
>>
>> I created setup.py with something like
>>
>> setup(name='...',
>>   version='1.0',
>>   description='...',
>>   py_modules=['service_pb2', 'service_pb2_grpc'])
>>
>>
>> That seems to work, it packages the dependencies, for example by 
>> 'python3 setup.py sdist'. I pass this file to the Pipeline using 
>> --setup_file, but I have no luck using the module. Though the script 
>> is executed, it fails once I try to open a channel using (DoFn.setup):
>>
>>   def setup(self):
>>     self.channel = grpc.insecure_channel(self.address)
>>     self.stub = service_pb2_grpc.RpcServiceStub(self.channel)
>>
>> with exception ModuleNotFoundError: No module named 'service_pb2_grpc'.
>>
>> Am I doing something obviously wrong?
>>
>>  Jan
>>

Re: Importing dependencies of Python Pipeline

Posted by Jan Lukavský <je...@seznam.cz>.
Hi Robert,

-dev <ma...@beam.apache.org>, as this seems to be really related to 
improper use. Thanks for the pointer (I somehow missed this in the 
docs), I tried --save_main_session, but without luck. When adding the 
flag, the serialization fails with

RecursionError: maximum recursion depth exceeded

My modules do not import one another in cyclic way (if this could cause 
this problem).

If I try to use the "standard" way through setup.py, I still get errors, 
even when I try to import the module in the function (DoFn, actually) as 
described in [1]. It looks like the module is not known, even though it 
is referenced in setup.py (via py_modules). Is there anything that I'm 
still doing wrong?

Thanks,

  Jan

[1] 
https://cloud.google.com/dataflow/docs/resources/faq#how_do_i_handle_nameerrors

https://cloud.google.com/dataflow/docs/resources/faq#programming_with_the_cloud_dataflow_sdk_for_python

https://cloud.google.com/dataflow/docs/resources/faq#how_do_i_handle_nameerrors

https://cloud.google.com/dataflow/docs/resources/faq#how_do_i_handle_nameerrors

https://cloud.google.com/dataflow/docs/resources/faq#how_do_i_handle_nameerrors

https://cloud.google.com/dataflow/docs/resources/faq#how_do_i_handle_nameerrors

https://cloud.google.com/dataflow/docs/resources/faq#how_do_i_handle_nameerrors

On 9/24/21 6:14 PM, Robert Bradshaw wrote:
> On Fri, Sep 24, 2021 at 6:33 AM Jan Lukavský <je...@seznam.cz> wrote:
>> +dev
>>
>> I hit very similar issue even with standard module (math). No matter where I put the import statement (even one line preceding the use), the module cannot be found and causes
>>
>> NameError: name 'math' is not defined
> This sounds like it was imported in the __main__ module, but
> save_main_session was not used.
>
>> I therefore think, that the --setup_file works fine, but there is more general problem (or misunderstanding from my side) with importing modules. Can this be runner-dependent? I use FlinkRunner and submit jobs with --flink_submit_uber_jar, could there be the problem?
>>
>>   Jan
>>
>> On 9/23/21 3:12 PM, Jan Lukavský wrote:
>>
>> Oops, sorry, the illustration of the three files is wrong. It was meant to be
>>
>> src/
>>
>>   | ---- script.py
>>
>>   | ---- service_pb2.py
>>
>>   | ---- service_pb2_grpc.py
>>
>> The three files are in the same directory.
>>
>> On 9/23/21 3:08 PM, Jan Lukavský wrote:
>>
>> Hi,
>>
>> I'm facing issues importing dependencies of my Python Pipeline. I intend to use gRPC to communicate with remote RPC service, hence I have the following project structure:
>>
>> script.py
>>
>>      |---- service_pb2.py
>>
>>      |---- service_pb2_grpc.py
>>
>> I created setup.py with something like
>>
>> setup(name='...',
>>    version='1.0',
>>    description='...',
>>    py_modules=['service_pb2', 'service_pb2_grpc'])
>>
>>
>> That seems to work, it packages the dependencies, for example by 'python3 setup.py sdist'. I pass this file to the Pipeline using --setup_file, but I have no luck using the module. Though the script is executed, it fails once I try to open a channel using (DoFn.setup):
>>
>>    def setup(self):
>>      self.channel = grpc.insecure_channel(self.address)
>>      self.stub = service_pb2_grpc.RpcServiceStub(self.channel)
>>
>> with exception ModuleNotFoundError: No module named 'service_pb2_grpc'.
>>
>> Am I doing something obviously wrong?
>>
>>   Jan
>>

Re: Importing dependencies of Python Pipeline

Posted by Robert Bradshaw <ro...@google.com>.
On Fri, Sep 24, 2021 at 6:33 AM Jan Lukavský <je...@seznam.cz> wrote:
>
> +dev
>
> I hit very similar issue even with standard module (math). No matter where I put the import statement (even one line preceding the use), the module cannot be found and causes
>
> NameError: name 'math' is not defined

This sounds like it was imported in the __main__ module, but
save_main_session was not used.

> I therefore think, that the --setup_file works fine, but there is more general problem (or misunderstanding from my side) with importing modules. Can this be runner-dependent? I use FlinkRunner and submit jobs with --flink_submit_uber_jar, could there be the problem?
>
>  Jan
>
> On 9/23/21 3:12 PM, Jan Lukavský wrote:
>
> Oops, sorry, the illustration of the three files is wrong. It was meant to be
>
> src/
>
>  | ---- script.py
>
>  | ---- service_pb2.py
>
>  | ---- service_pb2_grpc.py
>
> The three files are in the same directory.
>
> On 9/23/21 3:08 PM, Jan Lukavský wrote:
>
> Hi,
>
> I'm facing issues importing dependencies of my Python Pipeline. I intend to use gRPC to communicate with remote RPC service, hence I have the following project structure:
>
> script.py
>
>     |---- service_pb2.py
>
>     |---- service_pb2_grpc.py
>
> I created setup.py with something like
>
> setup(name='...',
>   version='1.0',
>   description='...',
>   py_modules=['service_pb2', 'service_pb2_grpc'])
>
>
> That seems to work, it packages the dependencies, for example by 'python3 setup.py sdist'. I pass this file to the Pipeline using --setup_file, but I have no luck using the module. Though the script is executed, it fails once I try to open a channel using (DoFn.setup):
>
>   def setup(self):
>     self.channel = grpc.insecure_channel(self.address)
>     self.stub = service_pb2_grpc.RpcServiceStub(self.channel)
>
> with exception ModuleNotFoundError: No module named 'service_pb2_grpc'.
>
> Am I doing something obviously wrong?
>
>  Jan
>

Re: Importing dependencies of Python Pipeline

Posted by Robert Bradshaw <ro...@google.com>.
On Fri, Sep 24, 2021 at 6:33 AM Jan Lukavský <je...@seznam.cz> wrote:
>
> +dev
>
> I hit very similar issue even with standard module (math). No matter where I put the import statement (even one line preceding the use), the module cannot be found and causes
>
> NameError: name 'math' is not defined

This sounds like it was imported in the __main__ module, but
save_main_session was not used.

> I therefore think, that the --setup_file works fine, but there is more general problem (or misunderstanding from my side) with importing modules. Can this be runner-dependent? I use FlinkRunner and submit jobs with --flink_submit_uber_jar, could there be the problem?
>
>  Jan
>
> On 9/23/21 3:12 PM, Jan Lukavský wrote:
>
> Oops, sorry, the illustration of the three files is wrong. It was meant to be
>
> src/
>
>  | ---- script.py
>
>  | ---- service_pb2.py
>
>  | ---- service_pb2_grpc.py
>
> The three files are in the same directory.
>
> On 9/23/21 3:08 PM, Jan Lukavský wrote:
>
> Hi,
>
> I'm facing issues importing dependencies of my Python Pipeline. I intend to use gRPC to communicate with remote RPC service, hence I have the following project structure:
>
> script.py
>
>     |---- service_pb2.py
>
>     |---- service_pb2_grpc.py
>
> I created setup.py with something like
>
> setup(name='...',
>   version='1.0',
>   description='...',
>   py_modules=['service_pb2', 'service_pb2_grpc'])
>
>
> That seems to work, it packages the dependencies, for example by 'python3 setup.py sdist'. I pass this file to the Pipeline using --setup_file, but I have no luck using the module. Though the script is executed, it fails once I try to open a channel using (DoFn.setup):
>
>   def setup(self):
>     self.channel = grpc.insecure_channel(self.address)
>     self.stub = service_pb2_grpc.RpcServiceStub(self.channel)
>
> with exception ModuleNotFoundError: No module named 'service_pb2_grpc'.
>
> Am I doing something obviously wrong?
>
>  Jan
>