You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/05/08 16:12:04 UTC
[jira] [Commented] (BEAM-2122) Writing to partitioned BigQuery
tables from Dataflow is causing errors
[ https://issues.apache.org/jira/browse/BEAM-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001005#comment-16001005 ]
ASF GitHub Bot commented on BEAM-2122:
--------------------------------------
GitHub user reuvenlax opened a pull request:
https://github.com/apache/beam/pull/2953
BEAM-2122] Allow table descriptions to be null
Wrap the coder with a NullableCoder.
R: @jkff
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/reuvenlax/incubator-beam allow_null_table_description
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/beam/pull/2953.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2953
----
commit df8cf750e62846531b0b0260e4c84d3bb6b8d2c7
Author: Reuven Lax <re...@google.com>
Date: 2017-05-08T16:06:55Z
TableDescription is allowed to be null.
----
> Writing to partitioned BigQuery tables from Dataflow is causing errors
> ----------------------------------------------------------------------
>
> Key: BEAM-2122
> URL: https://issues.apache.org/jira/browse/BEAM-2122
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-gcp
> Environment: Running with Beam 0.7.0-SNAPSHOT version 48 for beam-sdks-java-io-google-cloud-platform, 49 for beam-sdks-java-core and beam-runners-google-cloud-dataflow-java in Eclipse using Dataflow service.
> Reporter: Matthias Baetens
> Assignee: Reuven Lax
>
> Using the latest Beam SNAPSHOT which has a new BigQuery connector and trying to write to partitioned tables according to the docs (or this Stackoverflow question http://stackoverflow.com/questions/43505534/writing-different-values-to-different-bigquery-tables-in-apache-beam/43655461#43655461):
> static class PartitionedTableGeneration
> implements SerializableFunction<ValueInSingleWindow<TableRow>, TableDestination> {
> @ProcessElement
> public TableDestination apply(ValueInSingleWindow<TableRow> value) {
> // String dayString =
> // DateTimeFormat.forPattern("yyyy_MM_dd").withZone(DateTimeZone.UTC)
> String dayString = DateTimeFormat.forPattern("yyyyMMdd").withZone(DateTimeZone.UTC)
> .print(((IntervalWindow) value.getWindow()).start());
> TableDestination td = new TableDestination(
> "projecet:dataset.table + '$' dayString, "");
> return td;
> }
> }
> causes the following issues when running (depending on the specification of the dayString):
> 1. "Invalid table ID \"partitioned_sample$20150905\". Table IDs must be alphanumeric (plus underscores) and must be at most 1024 characters long. Also, Table decorators cannot be used.",
> 2. java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: Failed to create load job with id prefix
> ...
> "errorResult" : {
> "message" : "Invalid date partitioned table suffix: 2015_11_26",
> "reason" : "invalid"
> }
> Writing to sharded tables (without the '$'-sign) is working fine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)