You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Pei He (JIRA)" <ji...@apache.org> on 2017/01/10 00:57:58 UTC
[jira] [Created] (BEAM-1252) BigQueryIO.Read: validate exported
files with GCS glob.
Pei He created BEAM-1252:
----------------------------
Summary: BigQueryIO.Read: validate exported files with GCS glob.
Key: BEAM-1252
URL: https://issues.apache.org/jira/browse/BEAM-1252
Project: Beam
Issue Type: Bug
Components: sdk-java-gcp
Reporter: Pei He
Assignee: Pei He
BigQuery has started creating user-visible temp files that we notice and start reading from, but then they get moved. It could cause job failures and data duplication.
On Beam side, we can have stronger validation:
1. When listing files, validate that they match the expected URI.
2. When BQ has finished job, integrity check to verify that # files read from == # files BQ claims exist.
3. If possible, add a prefix to the filename of the glob (*.avro to step*.avro). Step name? Other? This might be as easy as dropping a '/' in the middle of the path. A la #7.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)