You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by OrielResearch Eila Arich-Landkof <ei...@orielresearch.org> on 2018/07/12 15:51:06 UTC

google.cloud.bigQuery version on workers - please HELP

Hi all,

I am running python pipeline with google.cloud.bigquery library.
on the local runner, everything runs great
bigquery.__version__ is 0.28.0

on the dataflow runner, the version is 0.23.0 bigquery.__version__ is 0.23.0
and there are many API changes between these versions.

What will be the best way to change the installed version on the workers? I
was assuming the the worker has all the master machine libraries installed
when the execution is done from datalab - is that true?
I am not generating any requirements.txt, the execution is done through the
run button on the datalab UI.


please help me solve that issue.
Thanks,
-- 
Eila
www.orielresearch.org
https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>p.co
<https://www.meetup.com/Deep-Learning-In-Production/>
m/Deep-Learning-In-Production/
<https://www.meetup.com/Deep-Learning-In-Production/>

Re: google.cloud.bigQuery version on workers - please HELP

Posted by OrielResearch Eila Arich-Landkof <ei...@orielresearch.org>.
Hi Ahmet, thank you for the detailed explanation. Looking forward for the
latest BQ - beam version upgrade. Best, Eila

On Fri, Jul 13, 2018 at 9:02 PM, Ahmet Altay <al...@google.com> wrote:

>
>
> On Thu, Jul 12, 2018 at 7:35 PM, OrielResearch Eila Arich-Landkof <
> eila@orielresearch.org> wrote:
>
>> Hi Ahmat,
>>
>>
>> I have received the version from the worker using the following commands:
>>
>> *from google.cloud import bigquery*
>> *logging.info <http://logging.info>('bigquery.__version__ is %s
>> ',bigquery.__version__)*
>>
>> I tried few time to install the google-cloud-bigquery on the workers
>> using setup.py with no much success:
>>
>> *from setuptools import setup, find_packages*
>>
>> *setup(*
>> *  name='label-or',*
>> *  version='1.0.0',*
>> *  packages=find_packages(),*
>> *  keywords=[*
>> *  ],*
>> *  license="Apache Software License",*
>> *  install_requires=[*
>> *    'google-cloud-bigquery==0.28.0',*
>> *  ],*
>> *  package_data={*
>> *  },*
>> *  data_files=[],*
>> *)*
>>
>>
>> on the job report UI, this message is being reported ( I dont know if it
>> is relevant to the dependencies)
>> SDK version
>> Google Cloud Dataflow SDK for Python 2.0.0
>>  A newer version of this SDK is available.
>> <https://cloud.google.com/dataflow/support?hl=en_US>
>>
>
> Yes, there is some related to the SDK version you are using. Dataflow
> worker containers will have different dependencies for each new SDK
> version. 2.0.0 is an old version, that explain why you were seeing the
> 0.23.0 as the installed version.
>
>
>>
>>
>> I was able to upgrade to bigquery.__version__ is 0.25.0 but not to
>> 0.28.0 (which has different API) could you please advice what am I missing?
>> Is it impossible to work with newer version?
>>
>
> Beam support BigQuery up to 0.25.0 version. There was a recent attempt to
> upgrade it and it uncovered issues due to the API differences. (Details:
> https://github.com/apache/beam/pull/5895). There is a recent push for
> Beam to upgrade all dependencies to their latest version, and I I assume
> this will be addressed as part of it.
>
> Unfortunately, before that fix it is not possible to use the latest
> version of the bigquery.
>
>
>>
>> Many thanks,
>> Eila
>>
>>
>> On Thu, Jul 12, 2018 at 9:40 PM, Ahmet Altay <al...@google.com> wrote:
>>
>>> Hi Eila,
>>>
>>> You can find a list of dependencies installed in Dataflow workers in
>>> [1]. Dataflow workers will have a set of dependencies that will satisfy the
>>> requirements from setup.py.
>>>
>>> Which bigquery library you are using? There is
>>> a google-cloud-bigquery==0.25.0 dependency, I am not sure where the
>>> 0.23.0 is coming from.
>>>
>>> Workers do not pick up libraries from the client environment as part of
>>> the job submission. I am not sure how datalab UI integration works
>>> however you have a few options for installing any set of dependencies in
>>> the workers. Using requirements.txt is one of those options.
>>>
>>> Ahmet
>>>
>>> [1] https://cloud.google.com/dataflow/docs/concepts/sdk-work
>>> er-dependencies#version-250_1
>>>
>>> On Thu, Jul 12, 2018 at 8:51 AM, OrielResearch Eila Arich-Landkof <
>>> eila@orielresearch.org> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am running python pipeline with google.cloud.bigquery library.
>>>> on the local runner, everything runs great
>>>> bigquery.__version__ is 0.28.0
>>>>
>>>> on the dataflow runner, the version is 0.23.0 bigquery.__version__ is
>>>> 0.23.0
>>>> and there are many API changes between these versions.
>>>>
>>>> What will be the best way to change the installed version on the
>>>> workers? I was assuming the the worker has all the master machine libraries
>>>> installed when the execution is done from datalab - is that true?
>>>> I am not generating any requirements.txt, the execution is done through
>>>> the run button on the datalab UI.
>>>>
>>>>
>>>> please help me solve that issue.
>>>> Thanks,
>>>> --
>>>> Eila
>>>> www.orielresearch.org
>>>> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
>>>> p.co <https://www.meetup.com/Deep-Learning-In-Production/>m/Deep-Le
>>>> arning-In-Production/
>>>> <https://www.meetup.com/Deep-Learning-In-Production/>
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Eila
>> www.orielresearch.org
>> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
>> p.co <https://www.meetup.com/Deep-Learning-In-Production/>m/Deep-Le
>> arning-In-Production/
>> <https://www.meetup.com/Deep-Learning-In-Production/>
>>
>>
>>
>


-- 
Eila
www.orielresearch.org
https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>p.co
<https://www.meetup.com/Deep-Learning-In-Production/>
m/Deep-Learning-In-Production/
<https://www.meetup.com/Deep-Learning-In-Production/>

Re: google.cloud.bigQuery version on workers - please HELP

Posted by Ahmet Altay <al...@google.com>.
On Thu, Jul 12, 2018 at 7:35 PM, OrielResearch Eila Arich-Landkof <
eila@orielresearch.org> wrote:

> Hi Ahmat,
>
>
> I have received the version from the worker using the following commands:
>
> *from google.cloud import bigquery*
> *logging.info <http://logging.info>('bigquery.__version__ is %s
> ',bigquery.__version__)*
>
> I tried few time to install the google-cloud-bigquery on the workers using
> setup.py with no much success:
>
> *from setuptools import setup, find_packages*
>
> *setup(*
> *  name='label-or',*
> *  version='1.0.0',*
> *  packages=find_packages(),*
> *  keywords=[*
> *  ],*
> *  license="Apache Software License",*
> *  install_requires=[*
> *    'google-cloud-bigquery==0.28.0',*
> *  ],*
> *  package_data={*
> *  },*
> *  data_files=[],*
> *)*
>
>
> on the job report UI, this message is being reported ( I dont know if it
> is relevant to the dependencies)
> SDK version
> Google Cloud Dataflow SDK for Python 2.0.0
>  A newer version of this SDK is available.
> <https://cloud.google.com/dataflow/support?hl=en_US>
>

Yes, there is some related to the SDK version you are using. Dataflow
worker containers will have different dependencies for each new SDK
version. 2.0.0 is an old version, that explain why you were seeing the
0.23.0 as the installed version.


>
>
> I was able to upgrade to bigquery.__version__ is 0.25.0 but not to 0.28.0
> (which has different API) could you please advice what am I missing? Is it
> impossible to work with newer version?
>

Beam support BigQuery up to 0.25.0 version. There was a recent attempt to
upgrade it and it uncovered issues due to the API differences. (Details:
https://github.com/apache/beam/pull/5895). There is a recent push for Beam
to upgrade all dependencies to their latest version, and I I assume this
will be addressed as part of it.

Unfortunately, before that fix it is not possible to use the latest version
of the bigquery.


>
> Many thanks,
> Eila
>
>
> On Thu, Jul 12, 2018 at 9:40 PM, Ahmet Altay <al...@google.com> wrote:
>
>> Hi Eila,
>>
>> You can find a list of dependencies installed in Dataflow workers in [1].
>> Dataflow workers will have a set of dependencies that will satisfy the
>> requirements from setup.py.
>>
>> Which bigquery library you are using? There is
>> a google-cloud-bigquery==0.25.0 dependency, I am not sure where the
>> 0.23.0 is coming from.
>>
>> Workers do not pick up libraries from the client environment as part of
>> the job submission. I am not sure how datalab UI integration works
>> however you have a few options for installing any set of dependencies in
>> the workers. Using requirements.txt is one of those options.
>>
>> Ahmet
>>
>> [1] https://cloud.google.com/dataflow/docs/concepts/sdk-work
>> er-dependencies#version-250_1
>>
>> On Thu, Jul 12, 2018 at 8:51 AM, OrielResearch Eila Arich-Landkof <
>> eila@orielresearch.org> wrote:
>>
>>> Hi all,
>>>
>>> I am running python pipeline with google.cloud.bigquery library.
>>> on the local runner, everything runs great
>>> bigquery.__version__ is 0.28.0
>>>
>>> on the dataflow runner, the version is 0.23.0 bigquery.__version__ is
>>> 0.23.0
>>> and there are many API changes between these versions.
>>>
>>> What will be the best way to change the installed version on the
>>> workers? I was assuming the the worker has all the master machine libraries
>>> installed when the execution is done from datalab - is that true?
>>> I am not generating any requirements.txt, the execution is done through
>>> the run button on the datalab UI.
>>>
>>>
>>> please help me solve that issue.
>>> Thanks,
>>> --
>>> Eila
>>> www.orielresearch.org
>>> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
>>> p.co <https://www.meetup.com/Deep-Learning-In-Production/>m/Deep-Le
>>> arning-In-Production/
>>> <https://www.meetup.com/Deep-Learning-In-Production/>
>>>
>>>
>>>
>>
>
>
> --
> Eila
> www.orielresearch.org
> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
> p.co <https://www.meetup.com/Deep-Learning-In-Production/>m/Deep-
> Learning-In-Production/
> <https://www.meetup.com/Deep-Learning-In-Production/>
>
>
>

Re: google.cloud.bigQuery version on workers - please HELP

Posted by OrielResearch Eila Arich-Landkof <ei...@orielresearch.org>.
Hi Ahmat,


I have received the version from the worker using the following commands:

*from google.cloud import bigquery*
*logging.info <http://logging.info>('bigquery.__version__ is %s
',bigquery.__version__)*

I tried few time to install the google-cloud-bigquery on the workers using
setup.py with no much success:

*from setuptools import setup, find_packages*

*setup(*
*  name='label-or',*
*  version='1.0.0',*
*  packages=find_packages(),*
*  keywords=[*
*  ],*
*  license="Apache Software License",*
*  install_requires=[*
*    'google-cloud-bigquery==0.28.0',*
*  ],*
*  package_data={*
*  },*
*  data_files=[],*
*)*


on the job report UI, this message is being reported ( I dont know if it is
relevant to the dependencies)
SDK version
Google Cloud Dataflow SDK for Python 2.0.0
 A newer version of this SDK is available.
<https://cloud.google.com/dataflow/support?hl=en_US>


I was able to upgrade to bigquery.__version__ is 0.25.0 but not to 0.28.0
(which has different API) could you please advice what am I missing? Is it
impossible to work with newer version?

Many thanks,
Eila


On Thu, Jul 12, 2018 at 9:40 PM, Ahmet Altay <al...@google.com> wrote:

> Hi Eila,
>
> You can find a list of dependencies installed in Dataflow workers in [1].
> Dataflow workers will have a set of dependencies that will satisfy the
> requirements from setup.py.
>
> Which bigquery library you are using? There is
> a google-cloud-bigquery==0.25.0 dependency, I am not sure where the
> 0.23.0 is coming from.
>
> Workers do not pick up libraries from the client environment as part of
> the job submission. I am not sure how datalab UI integration works
> however you have a few options for installing any set of dependencies in
> the workers. Using requirements.txt is one of those options.
>
> Ahmet
>
> [1] https://cloud.google.com/dataflow/docs/concepts/sdk-
> worker-dependencies#version-250_1
>
> On Thu, Jul 12, 2018 at 8:51 AM, OrielResearch Eila Arich-Landkof <
> eila@orielresearch.org> wrote:
>
>> Hi all,
>>
>> I am running python pipeline with google.cloud.bigquery library.
>> on the local runner, everything runs great
>> bigquery.__version__ is 0.28.0
>>
>> on the dataflow runner, the version is 0.23.0 bigquery.__version__ is
>> 0.23.0
>> and there are many API changes between these versions.
>>
>> What will be the best way to change the installed version on the workers?
>> I was assuming the the worker has all the master machine libraries
>> installed when the execution is done from datalab - is that true?
>> I am not generating any requirements.txt, the execution is done through
>> the run button on the datalab UI.
>>
>>
>> please help me solve that issue.
>> Thanks,
>> --
>> Eila
>> www.orielresearch.org
>> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
>> p.co <https://www.meetup.com/Deep-Learning-In-Production/>m/Deep-Le
>> arning-In-Production/
>> <https://www.meetup.com/Deep-Learning-In-Production/>
>>
>>
>>
>


-- 
Eila
www.orielresearch.org
https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>p.co
<https://www.meetup.com/Deep-Learning-In-Production/>
m/Deep-Learning-In-Production/
<https://www.meetup.com/Deep-Learning-In-Production/>

Re: google.cloud.bigQuery version on workers - please HELP

Posted by Ahmet Altay <al...@google.com>.
Hi Eila,

You can find a list of dependencies installed in Dataflow workers in [1].
Dataflow workers will have a set of dependencies that will satisfy the
requirements from setup.py.

Which bigquery library you are using? There is
a google-cloud-bigquery==0.25.0 dependency, I am not sure where the 0.23.0
is coming from.

Workers do not pick up libraries from the client environment as part of the
job submission. I am not sure how datalab UI integration works however you
have a few options for installing any set of dependencies in the workers.
Using requirements.txt is one of those options.

Ahmet

[1]
https://cloud.google.com/dataflow/docs/concepts/sdk-worker-dependencies#version-250_1

On Thu, Jul 12, 2018 at 8:51 AM, OrielResearch Eila Arich-Landkof <
eila@orielresearch.org> wrote:

> Hi all,
>
> I am running python pipeline with google.cloud.bigquery library.
> on the local runner, everything runs great
> bigquery.__version__ is 0.28.0
>
> on the dataflow runner, the version is 0.23.0 bigquery.__version__ is
> 0.23.0
> and there are many API changes between these versions.
>
> What will be the best way to change the installed version on the workers?
> I was assuming the the worker has all the master machine libraries
> installed when the execution is done from datalab - is that true?
> I am not generating any requirements.txt, the execution is done through
> the run button on the datalab UI.
>
>
> please help me solve that issue.
> Thanks,
> --
> Eila
> www.orielresearch.org
> https://www.meetu <https://www.meetup.com/Deep-Learning-In-Production/>
> p.co <https://www.meetup.com/Deep-Learning-In-Production/>m/Deep-
> Learning-In-Production/
> <https://www.meetup.com/Deep-Learning-In-Production/>
>
>
>