You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Mark Striebeck <ma...@gmail.com> on 2021/10/17 05:50:08 UTC

coder in ReadFromBigQuery doesn't do "anything"

Hi,

I have the following BigQuery table:

Name: DailyVwap
Field name Type
TIMESTAMP	DATE	
SYMBOL_NAME	STRING
DAILY_VWAP	STRING	
inserted_time	TIMESTAMP

I want to read it into the following proto:
message DailyVwap {
    google.protobuf.Timestamp TIMESTAMP = 1;
    string SYMBOL_NAME = 2;
    float DAILY_VWAP = 3;
    google.protobuf.Timestamp inserted_time = 4;
}

This is the call that I make in my pipeline:
ReadFromBigQuery(query='SELECT * FROM `my_project.my_dataset.DailyVwap`', 
                    use_standard_sql=True, 
                    project='my_project',
                    coder=beam.coders.ProtoCoder(DailyVwap().__class__),
                    gcs_location=temp_location)

But the result is always a dictionary in the form:
{'TIMESTAMP': datetime.date(2021, 9, 17), 'SYMBOL_NAME': 'AACIU', 'DAILY_VWAP': 'null', 'inserted_time': datetime.datetime(2021, 9, 27, 16, 45, 33, 779000, tzinfo=datetime.timezone.utc)}

With or without the coder in the call. No error message or warning in the logs.

Any help our pointer appreciated!

Thanks
     Mark

Re: coder in ReadFromBigQuery doesn't do "anything"

Posted by Chamikara Jayalath <ch...@google.com>.
I haven't looked at the code but the usual recommendation is to perform the
conversion from a subsequent ParDo instead of updating the coder provided
to the source.

Thanks,
Cham

On Sun, Oct 17, 2021 at 8:02 AM Mark Striebeck <ma...@gmail.com>
wrote:

> It's the same if I do a beam.Map(print) or write a test against the result.
>
> On Sun, Oct 17, 2021 at 4:42 AM Evan Galpin <ev...@gmail.com> wrote:
>
>> Is “the result” being printed or viewed via debugger? Is there a chance
>> that the __repr__ or similar method for proto produces a dict strictly for
>> printing/serialization?
>>
>> Thanks,
>> Evan
>>
>> On Sun, Oct 17, 2021 at 01:50 Mark Striebeck <ma...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have the following BigQuery table:
>>>
>>> Name: DailyVwap
>>> Field name Type
>>> TIMESTAMP       DATE
>>> SYMBOL_NAME     STRING
>>> DAILY_VWAP      STRING
>>> inserted_time   TIMESTAMP
>>>
>>> I want to read it into the following proto:
>>> message DailyVwap {
>>>     google.protobuf.Timestamp TIMESTAMP = 1;
>>>     string SYMBOL_NAME = 2;
>>>     float DAILY_VWAP = 3;
>>>     google.protobuf.Timestamp inserted_time = 4;
>>> }
>>>
>>> This is the call that I make in my pipeline:
>>> ReadFromBigQuery(query='SELECT * FROM
>>> `my_project.my_dataset.DailyVwap`',
>>>                     use_standard_sql=True,
>>>                     project='my_project',
>>>                     coder=beam.coders.ProtoCoder(DailyVwap().__class__),
>>>                     gcs_location=temp_location)
>>>
>>> But the result is always a dictionary in the form:
>>> {'TIMESTAMP': datetime.date(2021, 9, 17), 'SYMBOL_NAME': 'AACIU',
>>> 'DAILY_VWAP': 'null', 'inserted_time': datetime.datetime(2021, 9, 27, 16,
>>> 45, 33, 779000, tzinfo=datetime.timezone.utc)}
>>>
>>> With or without the coder in the call. No error message or warning in
>>> the logs.
>>>
>>> Any help our pointer appreciated!
>>>
>>> Thanks
>>>      Mark
>>>
>>

Re: coder in ReadFromBigQuery doesn't do "anything"

Posted by Mark Striebeck <ma...@gmail.com>.
It's the same if I do a beam.Map(print) or write a test against the result.

On Sun, Oct 17, 2021 at 4:42 AM Evan Galpin <ev...@gmail.com> wrote:

> Is “the result” being printed or viewed via debugger? Is there a chance
> that the __repr__ or similar method for proto produces a dict strictly for
> printing/serialization?
>
> Thanks,
> Evan
>
> On Sun, Oct 17, 2021 at 01:50 Mark Striebeck <ma...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have the following BigQuery table:
>>
>> Name: DailyVwap
>> Field name Type
>> TIMESTAMP       DATE
>> SYMBOL_NAME     STRING
>> DAILY_VWAP      STRING
>> inserted_time   TIMESTAMP
>>
>> I want to read it into the following proto:
>> message DailyVwap {
>>     google.protobuf.Timestamp TIMESTAMP = 1;
>>     string SYMBOL_NAME = 2;
>>     float DAILY_VWAP = 3;
>>     google.protobuf.Timestamp inserted_time = 4;
>> }
>>
>> This is the call that I make in my pipeline:
>> ReadFromBigQuery(query='SELECT * FROM `my_project.my_dataset.DailyVwap`',
>>                     use_standard_sql=True,
>>                     project='my_project',
>>                     coder=beam.coders.ProtoCoder(DailyVwap().__class__),
>>                     gcs_location=temp_location)
>>
>> But the result is always a dictionary in the form:
>> {'TIMESTAMP': datetime.date(2021, 9, 17), 'SYMBOL_NAME': 'AACIU',
>> 'DAILY_VWAP': 'null', 'inserted_time': datetime.datetime(2021, 9, 27, 16,
>> 45, 33, 779000, tzinfo=datetime.timezone.utc)}
>>
>> With or without the coder in the call. No error message or warning in the
>> logs.
>>
>> Any help our pointer appreciated!
>>
>> Thanks
>>      Mark
>>
>

Re: coder in ReadFromBigQuery doesn't do "anything"

Posted by Evan Galpin <ev...@gmail.com>.
Is “the result” being printed or viewed via debugger? Is there a chance
that the __repr__ or similar method for proto produces a dict strictly for
printing/serialization?

Thanks,
Evan

On Sun, Oct 17, 2021 at 01:50 Mark Striebeck <ma...@gmail.com>
wrote:

> Hi,
>
> I have the following BigQuery table:
>
> Name: DailyVwap
> Field name Type
> TIMESTAMP       DATE
> SYMBOL_NAME     STRING
> DAILY_VWAP      STRING
> inserted_time   TIMESTAMP
>
> I want to read it into the following proto:
> message DailyVwap {
>     google.protobuf.Timestamp TIMESTAMP = 1;
>     string SYMBOL_NAME = 2;
>     float DAILY_VWAP = 3;
>     google.protobuf.Timestamp inserted_time = 4;
> }
>
> This is the call that I make in my pipeline:
> ReadFromBigQuery(query='SELECT * FROM `my_project.my_dataset.DailyVwap`',
>                     use_standard_sql=True,
>                     project='my_project',
>                     coder=beam.coders.ProtoCoder(DailyVwap().__class__),
>                     gcs_location=temp_location)
>
> But the result is always a dictionary in the form:
> {'TIMESTAMP': datetime.date(2021, 9, 17), 'SYMBOL_NAME': 'AACIU',
> 'DAILY_VWAP': 'null', 'inserted_time': datetime.datetime(2021, 9, 27, 16,
> 45, 33, 779000, tzinfo=datetime.timezone.utc)}
>
> With or without the coder in the call. No error message or warning in the
> logs.
>
> Any help our pointer appreciated!
>
> Thanks
>      Mark
>