You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Reuven Lax <re...@google.com> on 2022/06/12 15:53:43 UTC
Re: Not Able to Get Code to Work for BigQuery using DataFlow
Did you create a pipeline object?
On Sun, Jun 12, 2022 at 8:36 AM Vega, Juan <Ju...@schwab.com> wrote:
> I’m trying to read data from a simple query in BigQuery and cannot get it
> to work.
>
>
>
> I’m following the steps from this URL:
>
> https://beam.apache.org/documentation/io/built-in/google-bigquery/
>
>
>
> The process is trying to query a very small table with one column and five
> records.
>
>
>
> I have this code from the URL:
>
>
>
> from apache_beam import pipeline
>
> import apache_beam as beam
>
>
>
> customer_id = (
>
> pipeline
>
> | 'QueryTable' >> beam.io.ReadFromBigQuery(
>
> query='SELECT customer_id FROM
> [cs-clientu-ad00007609-sbx5615:cuwi_acq_int.jv_test_data]')
>
> # Each row is a dictionary where the keys are the BigQuery columns
>
> | beam.Map(lambda elem: elem['customer_id']))
>
>
>
> Below is the error output:
>
>
>
> C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\python.exe
> C:/Users/juan.vega/PycharmProjects/us_1715_BQ_Stg_to_Intg/us_1715_main.py
>
> Traceback (most recent call last):
>
> File
> "C:/Users/juan.vega/PycharmProjects/us_1715_BQ_Stg_to_Intg/us_1715_main.py",
> line 18, in <module>
>
> | beam.Map(lambda elem: elem['customer_id']))
>
> File
> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
> line 1092, in __ror__
>
> return self.transform.__ror__(pvalueish, self.label)
>
> File
> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
> line 609, in __ror__
>
> for (ix, v) in enumerate(pvalues)
>
> File
> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
> line 610, in <dictcomp>
>
> if not isinstance(v, pvalue.PValue) and v is not None
>
> File
> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\core.py",
> line 3179, in __init__
>
> self.values = tuple(values)
>
> TypeError: 'module' object is not iterable
>
>
>
> Process finished with exit code 1
>
>
>
>
>
>
>
> I’ve tried many things and I really need help.
>
>
>
> Do you have sample code to simply query some data from BigQuery using
> Dataflow?
>
>
>
> Thanks.
>
>
>
>
>
> Classification: Schwab Internal
>
Re: Not Able to Get Code to Work for BigQuery using DataFlow
Posted by Sofia’s World <mm...@gmail.com>.
Juan FYI i am using this, hth
cutoff_date_str = (date.today() - BDay(60)).date().strftime('%Y-%m-%d')
logging.info('Cutoff is:{}'.format(cutoff_date_str))
bq_sql = """SELECT TICKER, LABEL, COUNT(*) as COUNTER FROM
`datascience-projects.gcp_shareloader.stock_selection`
WHERE AS_OF_DATE > PARSE_DATE("%F", "{}") AND LABEL <>
'STOCK_UNIVERSE' GROUP BY TICKER,LABEL
""".format(cutoff_date_str)
logging.info('executing SQL :{}'.format(bq_sql))
return (p | 'Reading-{}'.format(cutoff_date_str) >> beam.io.Read(
beam.io.BigQuerySource(query=bq_sql, use_standard_sql=True))
)
On Sun, Jun 12, 2022 at 5:17 PM Chamikara Jayalath <ch...@google.com>
wrote:
> Please see here for an example pipeline:
> https://github.com/apache/beam/blob/35bac6a62f1dc548ee908cfeff7f73ffcac38e6f/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes.py#L90
>
> On Sun, Jun 12, 2022 at 8:54 AM Reuven Lax <re...@google.com> wrote:
>
>> Did you create a pipeline object?
>>
>> On Sun, Jun 12, 2022 at 8:36 AM Vega, Juan <Ju...@schwab.com> wrote:
>>
>>> I’m trying to read data from a simple query in BigQuery and cannot get
>>> it to work.
>>>
>>>
>>>
>>> I’m following the steps from this URL:
>>>
>>> https://beam.apache.org/documentation/io/built-in/google-bigquery/
>>>
>>>
>>>
>>> The process is trying to query a very small table with one column and
>>> five records.
>>>
>>>
>>>
>>> I have this code from the URL:
>>>
>>>
>>>
>>> from apache_beam import pipeline
>>>
>>> import apache_beam as beam
>>>
>>>
>>>
>>> customer_id = (
>>>
>>> pipeline
>>>
>>> | 'QueryTable' >> beam.io.ReadFromBigQuery(
>>>
>>> query='SELECT customer_id FROM
>>> [cs-clientu-ad00007609-sbx5615:cuwi_acq_int.jv_test_data]')
>>>
>>> # Each row is a dictionary where the keys are the BigQuery columns
>>>
>>> | beam.Map(lambda elem: elem['customer_id']))
>>>
>>>
>>>
>>> Below is the error output:
>>>
>>>
>>>
>>> C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\python.exe
>>> C:/Users/juan.vega/PycharmProjects/us_1715_BQ_Stg_to_Intg/us_1715_main.py
>>>
>>> Traceback (most recent call last):
>>>
>>> File
>>> "C:/Users/juan.vega/PycharmProjects/us_1715_BQ_Stg_to_Intg/us_1715_main.py",
>>> line 18, in <module>
>>>
>>> | beam.Map(lambda elem: elem['customer_id']))
>>>
>>> File
>>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>>> line 1092, in __ror__
>>>
>>> return self.transform.__ror__(pvalueish, self.label)
>>>
>>> File
>>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>>> line 609, in __ror__
>>>
>>> for (ix, v) in enumerate(pvalues)
>>>
>>> File
>>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>>> line 610, in <dictcomp>
>>>
>>> if not isinstance(v, pvalue.PValue) and v is not None
>>>
>>> File
>>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\core.py",
>>> line 3179, in __init__
>>>
>>> self.values = tuple(values)
>>>
>>> TypeError: 'module' object is not iterable
>>>
>>>
>>>
>>> Process finished with exit code 1
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> I’ve tried many things and I really need help.
>>>
>>>
>>>
>>> Do you have sample code to simply query some data from BigQuery using
>>> Dataflow?
>>>
>>>
>>>
>>> Thanks.
>>>
>>>
>>>
>>>
>>>
>>> Classification: Schwab Internal
>>>
>>
Re: Not Able to Get Code to Work for BigQuery using DataFlow
Posted by Sofia’s World <mm...@gmail.com>.
Juan FYI i am using this, hth
cutoff_date_str = (date.today() - BDay(60)).date().strftime('%Y-%m-%d')
logging.info('Cutoff is:{}'.format(cutoff_date_str))
bq_sql = """SELECT TICKER, LABEL, COUNT(*) as COUNTER FROM
`datascience-projects.gcp_shareloader.stock_selection`
WHERE AS_OF_DATE > PARSE_DATE("%F", "{}") AND LABEL <>
'STOCK_UNIVERSE' GROUP BY TICKER,LABEL
""".format(cutoff_date_str)
logging.info('executing SQL :{}'.format(bq_sql))
return (p | 'Reading-{}'.format(cutoff_date_str) >> beam.io.Read(
beam.io.BigQuerySource(query=bq_sql, use_standard_sql=True))
)
On Sun, Jun 12, 2022 at 5:17 PM Chamikara Jayalath <ch...@google.com>
wrote:
> Please see here for an example pipeline:
> https://github.com/apache/beam/blob/35bac6a62f1dc548ee908cfeff7f73ffcac38e6f/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes.py#L90
>
> On Sun, Jun 12, 2022 at 8:54 AM Reuven Lax <re...@google.com> wrote:
>
>> Did you create a pipeline object?
>>
>> On Sun, Jun 12, 2022 at 8:36 AM Vega, Juan <Ju...@schwab.com> wrote:
>>
>>> I’m trying to read data from a simple query in BigQuery and cannot get
>>> it to work.
>>>
>>>
>>>
>>> I’m following the steps from this URL:
>>>
>>> https://beam.apache.org/documentation/io/built-in/google-bigquery/
>>>
>>>
>>>
>>> The process is trying to query a very small table with one column and
>>> five records.
>>>
>>>
>>>
>>> I have this code from the URL:
>>>
>>>
>>>
>>> from apache_beam import pipeline
>>>
>>> import apache_beam as beam
>>>
>>>
>>>
>>> customer_id = (
>>>
>>> pipeline
>>>
>>> | 'QueryTable' >> beam.io.ReadFromBigQuery(
>>>
>>> query='SELECT customer_id FROM
>>> [cs-clientu-ad00007609-sbx5615:cuwi_acq_int.jv_test_data]')
>>>
>>> # Each row is a dictionary where the keys are the BigQuery columns
>>>
>>> | beam.Map(lambda elem: elem['customer_id']))
>>>
>>>
>>>
>>> Below is the error output:
>>>
>>>
>>>
>>> C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\python.exe
>>> C:/Users/juan.vega/PycharmProjects/us_1715_BQ_Stg_to_Intg/us_1715_main.py
>>>
>>> Traceback (most recent call last):
>>>
>>> File
>>> "C:/Users/juan.vega/PycharmProjects/us_1715_BQ_Stg_to_Intg/us_1715_main.py",
>>> line 18, in <module>
>>>
>>> | beam.Map(lambda elem: elem['customer_id']))
>>>
>>> File
>>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>>> line 1092, in __ror__
>>>
>>> return self.transform.__ror__(pvalueish, self.label)
>>>
>>> File
>>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>>> line 609, in __ror__
>>>
>>> for (ix, v) in enumerate(pvalues)
>>>
>>> File
>>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>>> line 610, in <dictcomp>
>>>
>>> if not isinstance(v, pvalue.PValue) and v is not None
>>>
>>> File
>>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\core.py",
>>> line 3179, in __init__
>>>
>>> self.values = tuple(values)
>>>
>>> TypeError: 'module' object is not iterable
>>>
>>>
>>>
>>> Process finished with exit code 1
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> I’ve tried many things and I really need help.
>>>
>>>
>>>
>>> Do you have sample code to simply query some data from BigQuery using
>>> Dataflow?
>>>
>>>
>>>
>>> Thanks.
>>>
>>>
>>>
>>>
>>>
>>> Classification: Schwab Internal
>>>
>>
Re: Not Able to Get Code to Work for BigQuery using DataFlow
Posted by Chamikara Jayalath <ch...@google.com>.
Please see here for an example pipeline:
https://github.com/apache/beam/blob/35bac6a62f1dc548ee908cfeff7f73ffcac38e6f/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes.py#L90
On Sun, Jun 12, 2022 at 8:54 AM Reuven Lax <re...@google.com> wrote:
> Did you create a pipeline object?
>
> On Sun, Jun 12, 2022 at 8:36 AM Vega, Juan <Ju...@schwab.com> wrote:
>
>> I’m trying to read data from a simple query in BigQuery and cannot get it
>> to work.
>>
>>
>>
>> I’m following the steps from this URL:
>>
>> https://beam.apache.org/documentation/io/built-in/google-bigquery/
>>
>>
>>
>> The process is trying to query a very small table with one column and
>> five records.
>>
>>
>>
>> I have this code from the URL:
>>
>>
>>
>> from apache_beam import pipeline
>>
>> import apache_beam as beam
>>
>>
>>
>> customer_id = (
>>
>> pipeline
>>
>> | 'QueryTable' >> beam.io.ReadFromBigQuery(
>>
>> query='SELECT customer_id FROM
>> [cs-clientu-ad00007609-sbx5615:cuwi_acq_int.jv_test_data]')
>>
>> # Each row is a dictionary where the keys are the BigQuery columns
>>
>> | beam.Map(lambda elem: elem['customer_id']))
>>
>>
>>
>> Below is the error output:
>>
>>
>>
>> C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\python.exe
>> C:/Users/juan.vega/PycharmProjects/us_1715_BQ_Stg_to_Intg/us_1715_main.py
>>
>> Traceback (most recent call last):
>>
>> File
>> "C:/Users/juan.vega/PycharmProjects/us_1715_BQ_Stg_to_Intg/us_1715_main.py",
>> line 18, in <module>
>>
>> | beam.Map(lambda elem: elem['customer_id']))
>>
>> File
>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>> line 1092, in __ror__
>>
>> return self.transform.__ror__(pvalueish, self.label)
>>
>> File
>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>> line 609, in __ror__
>>
>> for (ix, v) in enumerate(pvalues)
>>
>> File
>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>> line 610, in <dictcomp>
>>
>> if not isinstance(v, pvalue.PValue) and v is not None
>>
>> File
>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\core.py",
>> line 3179, in __init__
>>
>> self.values = tuple(values)
>>
>> TypeError: 'module' object is not iterable
>>
>>
>>
>> Process finished with exit code 1
>>
>>
>>
>>
>>
>>
>>
>> I’ve tried many things and I really need help.
>>
>>
>>
>> Do you have sample code to simply query some data from BigQuery using
>> Dataflow?
>>
>>
>>
>> Thanks.
>>
>>
>>
>>
>>
>> Classification: Schwab Internal
>>
>
Re: Not Able to Get Code to Work for BigQuery using DataFlow
Posted by Chamikara Jayalath <ch...@google.com>.
Please see here for an example pipeline:
https://github.com/apache/beam/blob/35bac6a62f1dc548ee908cfeff7f73ffcac38e6f/sdks/python/apache_beam/examples/cookbook/bigquery_tornadoes.py#L90
On Sun, Jun 12, 2022 at 8:54 AM Reuven Lax <re...@google.com> wrote:
> Did you create a pipeline object?
>
> On Sun, Jun 12, 2022 at 8:36 AM Vega, Juan <Ju...@schwab.com> wrote:
>
>> I’m trying to read data from a simple query in BigQuery and cannot get it
>> to work.
>>
>>
>>
>> I’m following the steps from this URL:
>>
>> https://beam.apache.org/documentation/io/built-in/google-bigquery/
>>
>>
>>
>> The process is trying to query a very small table with one column and
>> five records.
>>
>>
>>
>> I have this code from the URL:
>>
>>
>>
>> from apache_beam import pipeline
>>
>> import apache_beam as beam
>>
>>
>>
>> customer_id = (
>>
>> pipeline
>>
>> | 'QueryTable' >> beam.io.ReadFromBigQuery(
>>
>> query='SELECT customer_id FROM
>> [cs-clientu-ad00007609-sbx5615:cuwi_acq_int.jv_test_data]')
>>
>> # Each row is a dictionary where the keys are the BigQuery columns
>>
>> | beam.Map(lambda elem: elem['customer_id']))
>>
>>
>>
>> Below is the error output:
>>
>>
>>
>> C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\python.exe
>> C:/Users/juan.vega/PycharmProjects/us_1715_BQ_Stg_to_Intg/us_1715_main.py
>>
>> Traceback (most recent call last):
>>
>> File
>> "C:/Users/juan.vega/PycharmProjects/us_1715_BQ_Stg_to_Intg/us_1715_main.py",
>> line 18, in <module>
>>
>> | beam.Map(lambda elem: elem['customer_id']))
>>
>> File
>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>> line 1092, in __ror__
>>
>> return self.transform.__ror__(pvalueish, self.label)
>>
>> File
>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>> line 609, in __ror__
>>
>> for (ix, v) in enumerate(pvalues)
>>
>> File
>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\ptransform.py",
>> line 610, in <dictcomp>
>>
>> if not isinstance(v, pvalue.PValue) and v is not None
>>
>> File
>> "C:\Users\juan.vega\AppData\Local\Continuum\anaconda3\lib\site-packages\apache_beam\transforms\core.py",
>> line 3179, in __init__
>>
>> self.values = tuple(values)
>>
>> TypeError: 'module' object is not iterable
>>
>>
>>
>> Process finished with exit code 1
>>
>>
>>
>>
>>
>>
>>
>> I’ve tried many things and I really need help.
>>
>>
>>
>> Do you have sample code to simply query some data from BigQuery using
>> Dataflow?
>>
>>
>>
>> Thanks.
>>
>>
>>
>>
>>
>> Classification: Schwab Internal
>>
>