You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@beam.apache.org by Reuven Lax <re...@google.com> on 2022/06/12 15:53:43 UTC

Re: Not Able to Get Code to Work for BigQuery using DataFlow

Did you create a pipeline object?

On Sun, Jun 12, 2022 at 8:36 AM Vega, Juan <Ju...@schwab.com> wrote:

> I’m trying to read data from a simple query in BigQuery and cannot get it
> to work.
>
>
>
> I’m following the steps from this URL:
>
> https://beam.apache.org/documentation/io/built-in/google-bigquery/
>
>
>
> The process is trying to query a very small table with one column and five
> records.
>
>
>
> I have this code from the URL:
>
>
>
> from apache_beam import pipeline
>
> import apache_beam as beam
>
>
>
> customer_id = (
>
>     pipeline
>
>     | 'QueryTable' >> beam.io.ReadFromBigQuery(
>
>         query='SELECT customer_id FROM
> [cs-clientu-ad00007609-sbx5615:cuwi_acq_int.jv_test_data]')
>
>     # Each row is a dictionary where the keys are the BigQuery columns
>
>     | beam.Map(lambda elem: elem['customer_id']))
>
>
>
> Below is the error output:
>
>
>
> C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\python.exe
> C:/Users/juan.vega/PycharmProjects/us_1715_BQ_Stg_to_Intg/us_1715_main.py
>
> Traceback (most recent call last):
>
>   File
> "C:/Users/juan.vega/PycharmProjects/us_1715_BQ_Stg_to_Intg/us_1715_main.py",
> line 18, in <module>
>
>     | beam.Map(lambda elem: elem['customer_id']))
>
>   File
> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
> line 1092, in __ror__
>
>     return self.transform.__ror__(pvalueish, self.label)
>
>   File
> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
> line 609, in __ror__
>
>     for (ix, v) in enumerate(pvalues)
>
>   File
> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
> line 610, in <dictcomp>
>
>     if not isinstance(v, pvalue.PValue) and v is not None
>
>   File
> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\core.py",
> line 3179, in __init__
>
>     self.values = tuple(values)
>
> TypeError: 'module' object is not iterable
>
>
>
> Process finished with exit code 1
>
>
>
>
>
>
>
> I’ve tried many things and I really need help.
>
>
>
> Do you have sample code to simply query some data from BigQuery using
> Dataflow?
>
>
>
> Thanks.
>
>
>
>
>
> Classification: Schwab Internal
>

Re: Not Able to Get Code to Work for BigQuery using DataFlow

Posted by Sofia’s World <mm...@gmail.com>.

Juan   FYI i am using this, hth

cutoff_date_str = (date.today() - BDay(60)).date().strftime('%Y-%m-%d')
  logging.info('Cutoff is:{}'.format(cutoff_date_str))
  bq_sql = """SELECT TICKER, LABEL, COUNT(*) as COUNTER FROM
`datascience-projects.gcp_shareloader.stock_selection`
      WHERE AS_OF_DATE > PARSE_DATE("%F", "{}") AND LABEL <>
'STOCK_UNIVERSE' GROUP BY TICKER,LABEL
""".format(cutoff_date_str)
  logging.info('executing SQL :{}'.format(bq_sql))
  return (p | 'Reading-{}'.format(cutoff_date_str) >> beam.io.Read(
      beam.io.BigQuerySource(query=bq_sql, use_standard_sql=True))

          )




On Sun, Jun 12, 2022 at 5:17 PM Chamikara Jayalath <ch...@google.com>
wrote:

> Please see here for an example pipeline:
> https://github.com/apache/beam/blob/35bac6a62f1dc548ee908cfeff7f73ffcac38e6f/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes.py#L90
>
> On Sun, Jun 12, 2022 at 8:54 AM Reuven Lax <re...@google.com> wrote:
>
>> Did you create a pipeline object?
>>
>> On Sun, Jun 12, 2022 at 8:36 AM Vega, Juan <Ju...@schwab.com> wrote:
>>
>>> I’m trying to read data from a simple query in BigQuery and cannot get
>>> it to work.
>>>
>>>
>>>
>>> I’m following the steps from this URL:
>>>
>>> https://beam.apache.org/documentation/io/built-in/google-bigquery/
>>>
>>>
>>>
>>> The process is trying to query a very small table with one column and
>>> five records.
>>>
>>>
>>>
>>> I have this code from the URL:
>>>
>>>
>>>
>>> from apache_beam import pipeline
>>>
>>> import apache_beam as beam
>>>
>>>
>>>
>>> customer_id = (
>>>
>>>     pipeline
>>>
>>>     | 'QueryTable' >> beam.io.ReadFromBigQuery(
>>>
>>>         query='SELECT customer_id FROM
>>> [cs-clientu-ad00007609-sbx5615:cuwi_acq_int.jv_test_data]')
>>>
>>>     # Each row is a dictionary where the keys are the BigQuery columns
>>>
>>>     | beam.Map(lambda elem: elem['customer_id']))
>>>
>>>
>>>
>>> Below is the error output:
>>>
>>>
>>>
>>> C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\python.exe
>>> C:/Users/juan.vega/PycharmProjects/us_1715_BQ_Stg_to_Intg/us_1715_main.py
>>>
>>> Traceback (most recent call last):
>>>
>>>   File
>>> "C:/Users/juan.vega/PycharmProjects/us_1715_BQ_Stg_to_Intg/us_1715_main.py",
>>> line 18, in <module>
>>>
>>>     | beam.Map(lambda elem: elem['customer_id']))
>>>
>>>   File
>>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>>> line 1092, in __ror__
>>>
>>>     return self.transform.__ror__(pvalueish, self.label)
>>>
>>>   File
>>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>>> line 609, in __ror__
>>>
>>>     for (ix, v) in enumerate(pvalues)
>>>
>>>   File
>>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>>> line 610, in <dictcomp>
>>>
>>>     if not isinstance(v, pvalue.PValue) and v is not None
>>>
>>>   File
>>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\core.py",
>>> line 3179, in __init__
>>>
>>>     self.values = tuple(values)
>>>
>>> TypeError: 'module' object is not iterable
>>>
>>>
>>>
>>> Process finished with exit code 1
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> I’ve tried many things and I really need help.
>>>
>>>
>>>
>>> Do you have sample code to simply query some data from BigQuery using
>>> Dataflow?
>>>
>>>
>>>
>>> Thanks.
>>>
>>>
>>>
>>>
>>>
>>> Classification: Schwab Internal
>>>
>>

Re: Not Able to Get Code to Work for BigQuery using DataFlow

Posted by Sofia’s World <mm...@gmail.com>.

Juan   FYI i am using this, hth

cutoff_date_str = (date.today() - BDay(60)).date().strftime('%Y-%m-%d')
  logging.info('Cutoff is:{}'.format(cutoff_date_str))
  bq_sql = """SELECT TICKER, LABEL, COUNT(*) as COUNTER FROM
`datascience-projects.gcp_shareloader.stock_selection`
      WHERE AS_OF_DATE > PARSE_DATE("%F", "{}") AND LABEL <>
'STOCK_UNIVERSE' GROUP BY TICKER,LABEL
""".format(cutoff_date_str)
  logging.info('executing SQL :{}'.format(bq_sql))
  return (p | 'Reading-{}'.format(cutoff_date_str) >> beam.io.Read(
      beam.io.BigQuerySource(query=bq_sql, use_standard_sql=True))

          )




On Sun, Jun 12, 2022 at 5:17 PM Chamikara Jayalath <ch...@google.com>
wrote:

> Please see here for an example pipeline:
> https://github.com/apache/beam/blob/35bac6a62f1dc548ee908cfeff7f73ffcac38e6f/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes.py#L90
>
> On Sun, Jun 12, 2022 at 8:54 AM Reuven Lax <re...@google.com> wrote:
>
>> Did you create a pipeline object?
>>
>> On Sun, Jun 12, 2022 at 8:36 AM Vega, Juan <Ju...@schwab.com> wrote:
>>
>>> I’m trying to read data from a simple query in BigQuery and cannot get
>>> it to work.
>>>
>>>
>>>
>>> I’m following the steps from this URL:
>>>
>>> https://beam.apache.org/documentation/io/built-in/google-bigquery/
>>>
>>>
>>>
>>> The process is trying to query a very small table with one column and
>>> five records.
>>>
>>>
>>>
>>> I have this code from the URL:
>>>
>>>
>>>
>>> from apache_beam import pipeline
>>>
>>> import apache_beam as beam
>>>
>>>
>>>
>>> customer_id = (
>>>
>>>     pipeline
>>>
>>>     | 'QueryTable' >> beam.io.ReadFromBigQuery(
>>>
>>>         query='SELECT customer_id FROM
>>> [cs-clientu-ad00007609-sbx5615:cuwi_acq_int.jv_test_data]')
>>>
>>>     # Each row is a dictionary where the keys are the BigQuery columns
>>>
>>>     | beam.Map(lambda elem: elem['customer_id']))
>>>
>>>
>>>
>>> Below is the error output:
>>>
>>>
>>>
>>> C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\python.exe
>>> C:/Users/juan.vega/PycharmProjects/us_1715_BQ_Stg_to_Intg/us_1715_main.py
>>>
>>> Traceback (most recent call last):
>>>
>>>   File
>>> "C:/Users/juan.vega/PycharmProjects/us_1715_BQ_Stg_to_Intg/us_1715_main.py",
>>> line 18, in <module>
>>>
>>>     | beam.Map(lambda elem: elem['customer_id']))
>>>
>>>   File
>>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>>> line 1092, in __ror__
>>>
>>>     return self.transform.__ror__(pvalueish, self.label)
>>>
>>>   File
>>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>>> line 609, in __ror__
>>>
>>>     for (ix, v) in enumerate(pvalues)
>>>
>>>   File
>>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>>> line 610, in <dictcomp>
>>>
>>>     if not isinstance(v, pvalue.PValue) and v is not None
>>>
>>>   File
>>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\core.py",
>>> line 3179, in __init__
>>>
>>>     self.values = tuple(values)
>>>
>>> TypeError: 'module' object is not iterable
>>>
>>>
>>>
>>> Process finished with exit code 1
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> I’ve tried many things and I really need help.
>>>
>>>
>>>
>>> Do you have sample code to simply query some data from BigQuery using
>>> Dataflow?
>>>
>>>
>>>
>>> Thanks.
>>>
>>>
>>>
>>>
>>>
>>> Classification: Schwab Internal
>>>
>>

Re: Not Able to Get Code to Work for BigQuery using DataFlow

Posted by Chamikara Jayalath <ch...@google.com>.

Please see here for an example pipeline:
https://github.com/apache/beam/blob/35bac6a62f1dc548ee908cfeff7f73ffcac38e6f/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes.py#L90

On Sun, Jun 12, 2022 at 8:54 AM Reuven Lax <re...@google.com> wrote:

> Did you create a pipeline object?
>
> On Sun, Jun 12, 2022 at 8:36 AM Vega, Juan <Ju...@schwab.com> wrote:
>
>> I’m trying to read data from a simple query in BigQuery and cannot get it
>> to work.
>>
>>
>>
>> I’m following the steps from this URL:
>>
>> https://beam.apache.org/documentation/io/built-in/google-bigquery/
>>
>>
>>
>> The process is trying to query a very small table with one column and
>> five records.
>>
>>
>>
>> I have this code from the URL:
>>
>>
>>
>> from apache_beam import pipeline
>>
>> import apache_beam as beam
>>
>>
>>
>> customer_id = (
>>
>>     pipeline
>>
>>     | 'QueryTable' >> beam.io.ReadFromBigQuery(
>>
>>         query='SELECT customer_id FROM
>> [cs-clientu-ad00007609-sbx5615:cuwi_acq_int.jv_test_data]')
>>
>>     # Each row is a dictionary where the keys are the BigQuery columns
>>
>>     | beam.Map(lambda elem: elem['customer_id']))
>>
>>
>>
>> Below is the error output:
>>
>>
>>
>> C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\python.exe
>> C:/Users/juan.vega/PycharmProjects/us_1715_BQ_Stg_to_Intg/us_1715_main.py
>>
>> Traceback (most recent call last):
>>
>>   File
>> "C:/Users/juan.vega/PycharmProjects/us_1715_BQ_Stg_to_Intg/us_1715_main.py",
>> line 18, in <module>
>>
>>     | beam.Map(lambda elem: elem['customer_id']))
>>
>>   File
>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>> line 1092, in __ror__
>>
>>     return self.transform.__ror__(pvalueish, self.label)
>>
>>   File
>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>> line 609, in __ror__
>>
>>     for (ix, v) in enumerate(pvalues)
>>
>>   File
>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>> line 610, in <dictcomp>
>>
>>     if not isinstance(v, pvalue.PValue) and v is not None
>>
>>   File
>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\core.py",
>> line 3179, in __init__
>>
>>     self.values = tuple(values)
>>
>> TypeError: 'module' object is not iterable
>>
>>
>>
>> Process finished with exit code 1
>>
>>
>>
>>
>>
>>
>>
>> I’ve tried many things and I really need help.
>>
>>
>>
>> Do you have sample code to simply query some data from BigQuery using
>> Dataflow?
>>
>>
>>
>> Thanks.
>>
>>
>>
>>
>>
>> Classification: Schwab Internal
>>
>

Re: Not Able to Get Code to Work for BigQuery using DataFlow

Posted by Chamikara Jayalath <ch...@google.com>.

Please see here for an example pipeline:
https://github.com/apache/beam/blob/35bac6a62f1dc548ee908cfeff7f73ffcac38e6f/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes.py#L90

On Sun, Jun 12, 2022 at 8:54 AM Reuven Lax <re...@google.com> wrote:

> Did you create a pipeline object?
>
> On Sun, Jun 12, 2022 at 8:36 AM Vega, Juan <Ju...@schwab.com> wrote:
>
>> I’m trying to read data from a simple query in BigQuery and cannot get it
>> to work.
>>
>>
>>
>> I’m following the steps from this URL:
>>
>> https://beam.apache.org/documentation/io/built-in/google-bigquery/
>>
>>
>>
>> The process is trying to query a very small table with one column and
>> five records.
>>
>>
>>
>> I have this code from the URL:
>>
>>
>>
>> from apache_beam import pipeline
>>
>> import apache_beam as beam
>>
>>
>>
>> customer_id = (
>>
>>     pipeline
>>
>>     | 'QueryTable' >> beam.io.ReadFromBigQuery(
>>
>>         query='SELECT customer_id FROM
>> [cs-clientu-ad00007609-sbx5615:cuwi_acq_int.jv_test_data]')
>>
>>     # Each row is a dictionary where the keys are the BigQuery columns
>>
>>     | beam.Map(lambda elem: elem['customer_id']))
>>
>>
>>
>> Below is the error output:
>>
>>
>>
>> C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\python.exe
>> C:/Users/juan.vega/PycharmProjects/us_1715_BQ_Stg_to_Intg/us_1715_main.py
>>
>> Traceback (most recent call last):
>>
>>   File
>> "C:/Users/juan.vega/PycharmProjects/us_1715_BQ_Stg_to_Intg/us_1715_main.py",
>> line 18, in <module>
>>
>>     | beam.Map(lambda elem: elem['customer_id']))
>>
>>   File
>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>> line 1092, in __ror__
>>
>>     return self.transform.__ror__(pvalueish, self.label)
>>
>>   File
>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>> line 609, in __ror__
>>
>>     for (ix, v) in enumerate(pvalues)
>>
>>   File
>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>> line 610, in <dictcomp>
>>
>>     if not isinstance(v, pvalue.PValue) and v is not None
>>
>>   File
>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\core.py",
>> line 3179, in __init__
>>
>>     self.values = tuple(values)
>>
>> TypeError: 'module' object is not iterable
>>
>>
>>
>> Process finished with exit code 1
>>
>>
>>
>>
>>
>>
>>
>> I’ve tried many things and I really need help.
>>
>>
>>
>> Do you have sample code to simply query some data from BigQuery using
>> Dataflow?
>>
>>
>>
>> Thanks.
>>
>>
>>
>>
>>
>> Classification: Schwab Internal
>>
>