You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Jan Lukavský <je...@seznam.cz> on 2021/09/23 13:08:57 UTC
Importing dependencies of Python Pipeline
Hi,
I'm facing issues importing dependencies of my Python Pipeline. I intend
to use gRPC to communicate with remote RPC service, hence I have the
following project structure:
script.py
|---- service_pb2.py
|---- service_pb2_grpc.py
I created setup.py with something like
setup(name='...',
version='1.0',
description='...',
py_modules=['service_pb2', 'service_pb2_grpc'])
That seems to work, it packages the dependencies, for example by
'python3 setup.py sdist'. I pass this file to the Pipeline using
--setup_file, but I have no luck using the module. Though the script is
executed, it fails once I try to open a channel using (DoFn.setup):
def setup(self):
self.channel = grpc.insecure_channel(self.address)
self.stub = service_pb2_grpc.RpcServiceStub(self.channel)
with exception ModuleNotFoundError: No module named 'service_pb2_grpc'.
Am I doing something obviously wrong?
Jan
Re: Importing dependencies of Python Pipeline
Posted by Jan Lukavský <je...@seznam.cz>.
Hi Robert,
-dev <ma...@beam.apache.org>, as this seems to be really related to
improper use. Thanks for the pointer (I somehow missed this in the
docs), I tried --save_main_session, but without luck. When adding the
flag, the serialization fails with
RecursionError: maximum recursion depth exceeded
My modules do not import one another in cyclic way (if this could cause
this problem).
If I try to use the "standard" way through setup.py, I still get errors,
even when I try to import the module in the function (DoFn, actually) as
described in [1]. It looks like the module is not known, even though it
is referenced in setup.py (via py_modules). Is there anything that I'm
still doing wrong?
Thanks,
Jan
[1]
https://cloud.google.com/dataflow/docs/resources/faq#how_do_i_handle_nameerrors
https://cloud.google.com/dataflow/docs/resources/faq#programming_with_the_cloud_dataflow_sdk_for_python
https://cloud.google.com/dataflow/docs/resources/faq#how_do_i_handle_nameerrors
https://cloud.google.com/dataflow/docs/resources/faq#how_do_i_handle_nameerrors
https://cloud.google.com/dataflow/docs/resources/faq#how_do_i_handle_nameerrors
https://cloud.google.com/dataflow/docs/resources/faq#how_do_i_handle_nameerrors
https://cloud.google.com/dataflow/docs/resources/faq#how_do_i_handle_nameerrors
On 9/24/21 6:14 PM, Robert Bradshaw wrote:
> On Fri, Sep 24, 2021 at 6:33 AM Jan Lukavský <je...@seznam.cz> wrote:
>> +dev
>>
>> I hit very similar issue even with standard module (math). No matter where I put the import statement (even one line preceding the use), the module cannot be found and causes
>>
>> NameError: name 'math' is not defined
> This sounds like it was imported in the __main__ module, but
> save_main_session was not used.
>
>> I therefore think, that the --setup_file works fine, but there is more general problem (or misunderstanding from my side) with importing modules. Can this be runner-dependent? I use FlinkRunner and submit jobs with --flink_submit_uber_jar, could there be the problem?
>>
>> Jan
>>
>> On 9/23/21 3:12 PM, Jan Lukavský wrote:
>>
>> Oops, sorry, the illustration of the three files is wrong. It was meant to be
>>
>> src/
>>
>> | ---- script.py
>>
>> | ---- service_pb2.py
>>
>> | ---- service_pb2_grpc.py
>>
>> The three files are in the same directory.
>>
>> On 9/23/21 3:08 PM, Jan Lukavský wrote:
>>
>> Hi,
>>
>> I'm facing issues importing dependencies of my Python Pipeline. I intend to use gRPC to communicate with remote RPC service, hence I have the following project structure:
>>
>> script.py
>>
>> |---- service_pb2.py
>>
>> |---- service_pb2_grpc.py
>>
>> I created setup.py with something like
>>
>> setup(name='...',
>> version='1.0',
>> description='...',
>> py_modules=['service_pb2', 'service_pb2_grpc'])
>>
>>
>> That seems to work, it packages the dependencies, for example by 'python3 setup.py sdist'. I pass this file to the Pipeline using --setup_file, but I have no luck using the module. Though the script is executed, it fails once I try to open a channel using (DoFn.setup):
>>
>> def setup(self):
>> self.channel = grpc.insecure_channel(self.address)
>> self.stub = service_pb2_grpc.RpcServiceStub(self.channel)
>>
>> with exception ModuleNotFoundError: No module named 'service_pb2_grpc'.
>>
>> Am I doing something obviously wrong?
>>
>> Jan
>>
Re: Importing dependencies of Python Pipeline
Posted by Robert Bradshaw <ro...@google.com>.
On Fri, Sep 24, 2021 at 6:33 AM Jan Lukavský <je...@seznam.cz> wrote:
>
> +dev
>
> I hit very similar issue even with standard module (math). No matter where I put the import statement (even one line preceding the use), the module cannot be found and causes
>
> NameError: name 'math' is not defined
This sounds like it was imported in the __main__ module, but
save_main_session was not used.
> I therefore think, that the --setup_file works fine, but there is more general problem (or misunderstanding from my side) with importing modules. Can this be runner-dependent? I use FlinkRunner and submit jobs with --flink_submit_uber_jar, could there be the problem?
>
> Jan
>
> On 9/23/21 3:12 PM, Jan Lukavský wrote:
>
> Oops, sorry, the illustration of the three files is wrong. It was meant to be
>
> src/
>
> | ---- script.py
>
> | ---- service_pb2.py
>
> | ---- service_pb2_grpc.py
>
> The three files are in the same directory.
>
> On 9/23/21 3:08 PM, Jan Lukavský wrote:
>
> Hi,
>
> I'm facing issues importing dependencies of my Python Pipeline. I intend to use gRPC to communicate with remote RPC service, hence I have the following project structure:
>
> script.py
>
> |---- service_pb2.py
>
> |---- service_pb2_grpc.py
>
> I created setup.py with something like
>
> setup(name='...',
> version='1.0',
> description='...',
> py_modules=['service_pb2', 'service_pb2_grpc'])
>
>
> That seems to work, it packages the dependencies, for example by 'python3 setup.py sdist'. I pass this file to the Pipeline using --setup_file, but I have no luck using the module. Though the script is executed, it fails once I try to open a channel using (DoFn.setup):
>
> def setup(self):
> self.channel = grpc.insecure_channel(self.address)
> self.stub = service_pb2_grpc.RpcServiceStub(self.channel)
>
> with exception ModuleNotFoundError: No module named 'service_pb2_grpc'.
>
> Am I doing something obviously wrong?
>
> Jan
>
Re: Importing dependencies of Python Pipeline
Posted by Robert Bradshaw <ro...@google.com>.
On Fri, Sep 24, 2021 at 6:33 AM Jan Lukavský <je...@seznam.cz> wrote:
>
> +dev
>
> I hit very similar issue even with standard module (math). No matter where I put the import statement (even one line preceding the use), the module cannot be found and causes
>
> NameError: name 'math' is not defined
This sounds like it was imported in the __main__ module, but
save_main_session was not used.
> I therefore think, that the --setup_file works fine, but there is more general problem (or misunderstanding from my side) with importing modules. Can this be runner-dependent? I use FlinkRunner and submit jobs with --flink_submit_uber_jar, could there be the problem?
>
> Jan
>
> On 9/23/21 3:12 PM, Jan Lukavský wrote:
>
> Oops, sorry, the illustration of the three files is wrong. It was meant to be
>
> src/
>
> | ---- script.py
>
> | ---- service_pb2.py
>
> | ---- service_pb2_grpc.py
>
> The three files are in the same directory.
>
> On 9/23/21 3:08 PM, Jan Lukavský wrote:
>
> Hi,
>
> I'm facing issues importing dependencies of my Python Pipeline. I intend to use gRPC to communicate with remote RPC service, hence I have the following project structure:
>
> script.py
>
> |---- service_pb2.py
>
> |---- service_pb2_grpc.py
>
> I created setup.py with something like
>
> setup(name='...',
> version='1.0',
> description='...',
> py_modules=['service_pb2', 'service_pb2_grpc'])
>
>
> That seems to work, it packages the dependencies, for example by 'python3 setup.py sdist'. I pass this file to the Pipeline using --setup_file, but I have no luck using the module. Though the script is executed, it fails once I try to open a channel using (DoFn.setup):
>
> def setup(self):
> self.channel = grpc.insecure_channel(self.address)
> self.stub = service_pb2_grpc.RpcServiceStub(self.channel)
>
> with exception ModuleNotFoundError: No module named 'service_pb2_grpc'.
>
> Am I doing something obviously wrong?
>
> Jan
>
Re: Importing dependencies of Python Pipeline
Posted by Jan Lukavský <je...@seznam.cz>.
+dev <ma...@beam.apache.org>
I hit very similar issue even with standard module (math). No matter
where I put the import statement (even one line preceding the use), the
module cannot be found and causes
NameError: name 'math' is not defined
I therefore think, that the --setup_file works fine, but there is more
general problem (or misunderstanding from my side) with importing
modules. Can this be runner-dependent? I use FlinkRunner and submit jobs
with --flink_submit_uber_jar, could there be the problem?
Jan
On 9/23/21 3:12 PM, Jan Lukavský wrote:
> Oops, sorry, the illustration of the three files is wrong. It was
> meant to be
>
> src/
>
> | ---- script.py
>
> | ---- service_pb2.py
>
> | ---- service_pb2_grpc.py
>
> The three files are in the same directory.
>
> On 9/23/21 3:08 PM, Jan Lukavský wrote:
>> Hi,
>>
>> I'm facing issues importing dependencies of my Python Pipeline. I
>> intend to use gRPC to communicate with remote RPC service, hence I
>> have the following project structure:
>>
>> script.py
>>
>> |---- service_pb2.py
>>
>> |---- service_pb2_grpc.py
>>
>> I created setup.py with something like
>>
>> setup(name='...',
>> version='1.0',
>> description='...',
>> py_modules=['service_pb2', 'service_pb2_grpc'])
>>
>>
>> That seems to work, it packages the dependencies, for example by
>> 'python3 setup.py sdist'. I pass this file to the Pipeline using
>> --setup_file, but I have no luck using the module. Though the script
>> is executed, it fails once I try to open a channel using (DoFn.setup):
>>
>> def setup(self):
>> self.channel = grpc.insecure_channel(self.address)
>> self.stub = service_pb2_grpc.RpcServiceStub(self.channel)
>>
>> with exception ModuleNotFoundError: No module named 'service_pb2_grpc'.
>>
>> Am I doing something obviously wrong?
>>
>> Jan
>>
Re: Importing dependencies of Python Pipeline
Posted by Jan Lukavský <je...@seznam.cz>.
+dev <ma...@beam.apache.org>
I hit very similar issue even with standard module (math). No matter
where I put the import statement (even one line preceding the use), the
module cannot be found and causes
NameError: name 'math' is not defined
I therefore think, that the --setup_file works fine, but there is more
general problem (or misunderstanding from my side) with importing
modules. Can this be runner-dependent? I use FlinkRunner and submit jobs
with --flink_submit_uber_jar, could there be the problem?
Jan
On 9/23/21 3:12 PM, Jan Lukavský wrote:
> Oops, sorry, the illustration of the three files is wrong. It was
> meant to be
>
> src/
>
> | ---- script.py
>
> | ---- service_pb2.py
>
> | ---- service_pb2_grpc.py
>
> The three files are in the same directory.
>
> On 9/23/21 3:08 PM, Jan Lukavský wrote:
>> Hi,
>>
>> I'm facing issues importing dependencies of my Python Pipeline. I
>> intend to use gRPC to communicate with remote RPC service, hence I
>> have the following project structure:
>>
>> script.py
>>
>> |---- service_pb2.py
>>
>> |---- service_pb2_grpc.py
>>
>> I created setup.py with something like
>>
>> setup(name='...',
>> version='1.0',
>> description='...',
>> py_modules=['service_pb2', 'service_pb2_grpc'])
>>
>>
>> That seems to work, it packages the dependencies, for example by
>> 'python3 setup.py sdist'. I pass this file to the Pipeline using
>> --setup_file, but I have no luck using the module. Though the script
>> is executed, it fails once I try to open a channel using (DoFn.setup):
>>
>> def setup(self):
>> self.channel = grpc.insecure_channel(self.address)
>> self.stub = service_pb2_grpc.RpcServiceStub(self.channel)
>>
>> with exception ModuleNotFoundError: No module named 'service_pb2_grpc'.
>>
>> Am I doing something obviously wrong?
>>
>> Jan
>>
Re: Importing dependencies of Python Pipeline
Posted by Jan Lukavský <je...@seznam.cz>.
Oops, sorry, the illustration of the three files is wrong. It was meant
to be
src/
| ---- script.py
| ---- service_pb2.py
| ---- service_pb2_grpc.py
The three files are in the same directory.
On 9/23/21 3:08 PM, Jan Lukavský wrote:
> Hi,
>
> I'm facing issues importing dependencies of my Python Pipeline. I
> intend to use gRPC to communicate with remote RPC service, hence I
> have the following project structure:
>
> script.py
>
> |---- service_pb2.py
>
> |---- service_pb2_grpc.py
>
> I created setup.py with something like
>
> setup(name='...',
> version='1.0',
> description='...',
> py_modules=['service_pb2', 'service_pb2_grpc'])
>
>
> That seems to work, it packages the dependencies, for example by
> 'python3 setup.py sdist'. I pass this file to the Pipeline using
> --setup_file, but I have no luck using the module. Though the script
> is executed, it fails once I try to open a channel using (DoFn.setup):
>
> def setup(self):
> self.channel = grpc.insecure_channel(self.address)
> self.stub = service_pb2_grpc.RpcServiceStub(self.channel)
>
> with exception ModuleNotFoundError: No module named 'service_pb2_grpc'.
>
> Am I doing something obviously wrong?
>
> Jan
>