You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2021/10/10 17:25:01 UTC
[jira] [Commented] (BEAM-12669) UpdateDestinationSchema PTransform does not respect source format

    [ https://issues.apache.org/jira/browse/BEAM-12669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17426835#comment-17426835 ] 

Beam JIRA Bot commented on BEAM-12669:
--------------------------------------

This issue was marked "stale-P2" and has not received a public comment in 14 days. It is now automatically moved to P3. If you are still affected by it, you can comment and move it back to P2.

> UpdateDestinationSchema PTransform does not respect source format
> -----------------------------------------------------------------
>
>                 Key: BEAM-12669
>                 URL: https://issues.apache.org/jira/browse/BEAM-12669
>             Project: Beam
>          Issue Type: Bug
>          Components: io-go-gcp, runner-dataflow
>    Affects Versions: 2.30.0
>            Reporter: Sayat Satybaldiyev
>            Priority: P3
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> When multiple load jobs are needed to write data to a destination table, e.g., when the data is spread over more than 10,000 URIs, WriteToBigQuery in FILE_LOADS mode will write data into temporary tables and then update the temporary tables if schema additions is allowed.
> However, update of temporary table scheme does not respect a specified source format of the loading files(i.e. JSON, AVRO). 
> The UpdateDestinationSchema issue schema modification command with a default CSV setting which causing AVRO or JSON nested schema loads to fail with the error:
> {code:java}
> apache_beam.io.gcp.bigquery_file_loads: INFO: Triggering schema modification job beam_bq_job_LOAD_satybald7_SCHEMA_MOD_STEP_994_3869e4dc1dd08c68d20fd047e242161a_7c553f684cce4963a75d669f38a4ec46 on <TableReference
>  datasetId: 'python_write_to_table_1627431111435'
>  projectId: 'DELETED'
>  tableId: 'python_append_schema_update'>
> apache_beam.io.gcp.bigquery_tools: INFO: Failed to insert job <JobReference
>  jobId: 'beam_bq_job_LOAD7_SCHEMA_MOD_STEP_994_3869e4dc1dd08c68d20fd047e242161a_7c553f684cce4963a75d669f38a4ec46'
>  projectId: 'DELETED'>: HttpError accessing ....
>  'content-type': 'application/json; charset=UTF-8', 'content-length': '332', 'date': 'Wed, 28 Jul 2021 00:12:03 GMT', 'server': 'UploadServer', 'status': '400'}>, content <{
>   "error": {
>     "code": 400,
>     "message": "Cannot load CSV data with a nested schema. Field: nested_field",
>     "errors": [
>       {
>         "message": "Cannot load CSV data with a nested schema. Field: nested_field",
>         "domain": "global",
>         "reason": "invalid"
>       }
>     ],
>     "status": "INVALID_ARGUMENT"
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)