You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Mark Striebeck <ma...@gmail.com> on 2021/10/17 05:50:08 UTC
coder in ReadFromBigQuery doesn't do "anything"
Hi,
I have the following BigQuery table:
Name: DailyVwap
Field name Type
TIMESTAMP DATE
SYMBOL_NAME STRING
DAILY_VWAP STRING
inserted_time TIMESTAMP
I want to read it into the following proto:
message DailyVwap {
google.protobuf.Timestamp TIMESTAMP = 1;
string SYMBOL_NAME = 2;
float DAILY_VWAP = 3;
google.protobuf.Timestamp inserted_time = 4;
}
This is the call that I make in my pipeline:
ReadFromBigQuery(query='SELECT * FROM `my_project.my_dataset.DailyVwap`',
use_standard_sql=True,
project='my_project',
coder=beam.coders.ProtoCoder(DailyVwap().__class__),
gcs_location=temp_location)
But the result is always a dictionary in the form:
{'TIMESTAMP': datetime.date(2021, 9, 17), 'SYMBOL_NAME': 'AACIU', 'DAILY_VWAP': 'null', 'inserted_time': datetime.datetime(2021, 9, 27, 16, 45, 33, 779000, tzinfo=datetime.timezone.utc)}
With or without the coder in the call. No error message or warning in the logs.
Any help our pointer appreciated!
Thanks
Mark
Re: coder in ReadFromBigQuery doesn't do "anything"
Posted by Chamikara Jayalath <ch...@google.com>.
I haven't looked at the code but the usual recommendation is to perform the
conversion from a subsequent ParDo instead of updating the coder provided
to the source.
Thanks,
Cham
On Sun, Oct 17, 2021 at 8:02 AM Mark Striebeck <ma...@gmail.com>
wrote:
> It's the same if I do a beam.Map(print) or write a test against the result.
>
> On Sun, Oct 17, 2021 at 4:42 AM Evan Galpin <ev...@gmail.com> wrote:
>
>> Is “the result” being printed or viewed via debugger? Is there a chance
>> that the __repr__ or similar method for proto produces a dict strictly for
>> printing/serialization?
>>
>> Thanks,
>> Evan
>>
>> On Sun, Oct 17, 2021 at 01:50 Mark Striebeck <ma...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have the following BigQuery table:
>>>
>>> Name: DailyVwap
>>> Field name Type
>>> TIMESTAMP DATE
>>> SYMBOL_NAME STRING
>>> DAILY_VWAP STRING
>>> inserted_time TIMESTAMP
>>>
>>> I want to read it into the following proto:
>>> message DailyVwap {
>>> google.protobuf.Timestamp TIMESTAMP = 1;
>>> string SYMBOL_NAME = 2;
>>> float DAILY_VWAP = 3;
>>> google.protobuf.Timestamp inserted_time = 4;
>>> }
>>>
>>> This is the call that I make in my pipeline:
>>> ReadFromBigQuery(query='SELECT * FROM
>>> `my_project.my_dataset.DailyVwap`',
>>> use_standard_sql=True,
>>> project='my_project',
>>> coder=beam.coders.ProtoCoder(DailyVwap().__class__),
>>> gcs_location=temp_location)
>>>
>>> But the result is always a dictionary in the form:
>>> {'TIMESTAMP': datetime.date(2021, 9, 17), 'SYMBOL_NAME': 'AACIU',
>>> 'DAILY_VWAP': 'null', 'inserted_time': datetime.datetime(2021, 9, 27, 16,
>>> 45, 33, 779000, tzinfo=datetime.timezone.utc)}
>>>
>>> With or without the coder in the call. No error message or warning in
>>> the logs.
>>>
>>> Any help our pointer appreciated!
>>>
>>> Thanks
>>> Mark
>>>
>>
Re: coder in ReadFromBigQuery doesn't do "anything"
Posted by Mark Striebeck <ma...@gmail.com>.
It's the same if I do a beam.Map(print) or write a test against the result.
On Sun, Oct 17, 2021 at 4:42 AM Evan Galpin <ev...@gmail.com> wrote:
> Is “the result” being printed or viewed via debugger? Is there a chance
> that the __repr__ or similar method for proto produces a dict strictly for
> printing/serialization?
>
> Thanks,
> Evan
>
> On Sun, Oct 17, 2021 at 01:50 Mark Striebeck <ma...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have the following BigQuery table:
>>
>> Name: DailyVwap
>> Field name Type
>> TIMESTAMP DATE
>> SYMBOL_NAME STRING
>> DAILY_VWAP STRING
>> inserted_time TIMESTAMP
>>
>> I want to read it into the following proto:
>> message DailyVwap {
>> google.protobuf.Timestamp TIMESTAMP = 1;
>> string SYMBOL_NAME = 2;
>> float DAILY_VWAP = 3;
>> google.protobuf.Timestamp inserted_time = 4;
>> }
>>
>> This is the call that I make in my pipeline:
>> ReadFromBigQuery(query='SELECT * FROM `my_project.my_dataset.DailyVwap`',
>> use_standard_sql=True,
>> project='my_project',
>> coder=beam.coders.ProtoCoder(DailyVwap().__class__),
>> gcs_location=temp_location)
>>
>> But the result is always a dictionary in the form:
>> {'TIMESTAMP': datetime.date(2021, 9, 17), 'SYMBOL_NAME': 'AACIU',
>> 'DAILY_VWAP': 'null', 'inserted_time': datetime.datetime(2021, 9, 27, 16,
>> 45, 33, 779000, tzinfo=datetime.timezone.utc)}
>>
>> With or without the coder in the call. No error message or warning in the
>> logs.
>>
>> Any help our pointer appreciated!
>>
>> Thanks
>> Mark
>>
>
Re: coder in ReadFromBigQuery doesn't do "anything"
Posted by Evan Galpin <ev...@gmail.com>.
Is “the result” being printed or viewed via debugger? Is there a chance
that the __repr__ or similar method for proto produces a dict strictly for
printing/serialization?
Thanks,
Evan
On Sun, Oct 17, 2021 at 01:50 Mark Striebeck <ma...@gmail.com>
wrote:
> Hi,
>
> I have the following BigQuery table:
>
> Name: DailyVwap
> Field name Type
> TIMESTAMP DATE
> SYMBOL_NAME STRING
> DAILY_VWAP STRING
> inserted_time TIMESTAMP
>
> I want to read it into the following proto:
> message DailyVwap {
> google.protobuf.Timestamp TIMESTAMP = 1;
> string SYMBOL_NAME = 2;
> float DAILY_VWAP = 3;
> google.protobuf.Timestamp inserted_time = 4;
> }
>
> This is the call that I make in my pipeline:
> ReadFromBigQuery(query='SELECT * FROM `my_project.my_dataset.DailyVwap`',
> use_standard_sql=True,
> project='my_project',
> coder=beam.coders.ProtoCoder(DailyVwap().__class__),
> gcs_location=temp_location)
>
> But the result is always a dictionary in the form:
> {'TIMESTAMP': datetime.date(2021, 9, 17), 'SYMBOL_NAME': 'AACIU',
> 'DAILY_VWAP': 'null', 'inserted_time': datetime.datetime(2021, 9, 27, 16,
> 45, 33, 779000, tzinfo=datetime.timezone.utc)}
>
> With or without the coder in the call. No error message or warning in the
> logs.
>
> Any help our pointer appreciated!
>
> Thanks
> Mark
>