You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Valentyn Tymofieiev (JIRA)" <ji...@apache.org> on 2019/03/13 16:31:00 UTC

[jira] [Comment Edited] (BEAM-6769) BigQuery IO does not support bytes in Python 3

    [ https://issues.apache.org/jira/browse/BEAM-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791863#comment-16791863 ] 

Valentyn Tymofieiev edited comment on BEAM-6769 at 3/13/19 4:30 PM:
--------------------------------------------------------------------

Thanks a lot, [~Juta]. 

> The expected way of writing bytes to bq is by passing base-64 encoded strings to the bigquery client

Could you please link where this is prescribed? Thanks.

> Do we expect users to handle the base-64 encoding themselves (as they should when using the bigquery client) or should this happen in bigquery io?

I think Beam would be more user-friendly if we accepted raw bytes and took care of base64 encoding. Do you have a different opinion on this, [~chamikara], [~pabloem]?

> The current test tests reading bytes from bq and then writing them.

Interesting... where do bytes in bq come from in the first place? Is it a pre-populated table?

> Would it be good to add another test that first writes and then reads the bytes to actually test writing bytes from python instead of reading them from bq?

Yes, if we don't have such test, we need it and you are welcome to add one.

 


was (Author: tvalentyn):
Thanks a lot, [~Juta]. 

> The expected way of writing bytes to bq is by passing base-64 encoded strings to the bigquery client

Could you please link where this is prescribed? Thanks.

> Do we expect users to handle the base-64 encoding themselves (as they should when using the bigquery client) or should this happen in bigquery io?

I think Beam would be more user-friendly if we took care of base64 encoding. Do you have a different opinion on this, [~chamikara], [~pabloem]?

> The current test tests reading bytes from bq and then writing them.

Interesting... where do bytes in bq come from in the first place? Is it a pre-populated table?

> Would it be good to add another test that first writes and then reads the bytes to actually test writing bytes from python instead of reading them from bq?

Yes, if we don't have such test, we need it and you are welcome to add one.

 

> BigQuery IO does not support bytes in Python 3
> ----------------------------------------------
>
>                 Key: BEAM-6769
>                 URL: https://issues.apache.org/jira/browse/BEAM-6769
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-py-core
>            Reporter: Juta Staes
>            Assignee: Juta Staes
>            Priority: Major
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> In Python 2 you could write bytes data to BigQuery. This is tested in
>  [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py#L186]
> Python 3 does not support
> {noformat}
> json.dumps({'test': b'test'}){noformat}
> which is used to encode the data in
>  [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L959]
>  
> How should writing bytes to BigQuery be handled in Python 3?
>  * Forbid writing bytes into BigQuery on Python 3
>  * Guess the encoding (utf-8?)
>  * Pass the encoding to BigQuery
> cc: [~tvalentyn]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)