You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Fallon (JIRA)" <ji...@apache.org> on 2017/08/10 17:40:00 UTC
[jira] [Updated] (BEAM-2761) Write to empty BigQuery partition
fails with "No schema specified on job or table." despite having provided
schema
[ https://issues.apache.org/jira/browse/BEAM-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Fallon updated BEAM-2761:
-------------------------
Description:
In 2.1.0-SNAPSHOT and 2.2.0-SNAPSHOT, jobs writing an empty PCollection to a BigQuery partition fail with "java.lang.RuntimeException: Failed to create load job with id prefix". This is associated with a message "No schema specified on job or table" even though a schema is provided. See attached stack trace for the more detail on the error.
Command to run job:
{code}
mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.EmptyPCollection \
-Dexec.args="--runner=DataflowRunner --project=<GCP project> \
--gcpTempLocation=<tmp location>" \
-Pdataflow-runner
{code}
Code to reproduce the problem:
{code:title=EmptyPCollection.java|borderStyle=solid}
public class EmptyPCollection {
public static void main(String[] args) {
PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
options.setTempLocation("<your tmp location>");
Pipeline pipeline = Pipeline.create(options);
String schema = "{\"fields\": [{\"name\": \"pet\", \"type\": \"string\", \"mode\": \"required\"}]}";
String table = "mydataset.pets";
List<String> pets = Arrays.asList("Dog", "Cat", "Goldfish");
PCollection<String> inputText = pipeline.apply(Create.of(pets)).setCoder(StringUtf8Coder.of());
PCollection<TableRow> rows = inputText.apply(ParDo.of(new DoFn<String, TableRow>() {
@ProcessElement
public void processElement(ProcessContext c) {
String text = c.element();
if (text.startsWith("X")) { // change to (D)og and works fine
TableRow row = new TableRow();
row.set("pet", text);
c.output(row);
}
}
}));
rows.apply(BigQueryIO.writeTableRows().to(table).withJsonSchema(schema)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED));
pipeline.run().waitUntilFinish();
}
}
{code}
was:
In 2.1.0-SNAPSHOT and 2.2.0-SNAPSHOT, jobs writing an empty PCollection to a BigQuery partition fail with "java.lang.RuntimeException: Failed to create load job with id prefix". This is associated with a message "No schema specified on job or table" even though a schema is provided.
Command to run job:
{code}
mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.EmptyPCollection \
-Dexec.args="--runner=DataflowRunner --project=<GCP project> \
--gcpTempLocation=<tmp location>" \
-Pdataflow-runner
{code}
Code to reproduce the problem:
{code:title=EmptyPCollection.java|borderStyle=solid}
public class EmptyPCollection {
public static void main(String[] args) {
PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
options.setTempLocation("<your tmp location>");
Pipeline pipeline = Pipeline.create(options);
String schema = "{\"fields\": [{\"name\": \"pet\", \"type\": \"string\", \"mode\": \"required\"}]}";
String table = "mydataset.pets";
List<String> pets = Arrays.asList("Dog", "Cat", "Goldfish");
PCollection<String> inputText = pipeline.apply(Create.of(pets)).setCoder(StringUtf8Coder.of());
PCollection<TableRow> rows = inputText.apply(ParDo.of(new DoFn<String, TableRow>() {
@ProcessElement
public void processElement(ProcessContext c) {
String text = c.element();
if (text.startsWith("X")) { // change to (D)og and works fine
TableRow row = new TableRow();
row.set("pet", text);
c.output(row);
}
}
}));
rows.apply(BigQueryIO.writeTableRows().to(table).withJsonSchema(schema)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED));
pipeline.run().waitUntilFinish();
}
}
{code}
> Write to empty BigQuery partition fails with "No schema specified on job or table." despite having provided schema
> ------------------------------------------------------------------------------------------------------------------
>
> Key: BEAM-2761
> URL: https://issues.apache.org/jira/browse/BEAM-2761
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Affects Versions: 2.1.0, 2.2.0
> Reporter: Fallon
> Assignee: Thomas Groh
> Priority: Minor
> Attachments: beam-2761-stacktrace.txt
>
>
> In 2.1.0-SNAPSHOT and 2.2.0-SNAPSHOT, jobs writing an empty PCollection to a BigQuery partition fail with "java.lang.RuntimeException: Failed to create load job with id prefix". This is associated with a message "No schema specified on job or table" even though a schema is provided. See attached stack trace for the more detail on the error.
> Command to run job:
> {code}
> mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.EmptyPCollection \
> -Dexec.args="--runner=DataflowRunner --project=<GCP project> \
> --gcpTempLocation=<tmp location>" \
> -Pdataflow-runner
> {code}
> Code to reproduce the problem:
> {code:title=EmptyPCollection.java|borderStyle=solid}
> public class EmptyPCollection {
> public static void main(String[] args) {
> PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
> options.setTempLocation("<your tmp location>");
> Pipeline pipeline = Pipeline.create(options);
> String schema = "{\"fields\": [{\"name\": \"pet\", \"type\": \"string\", \"mode\": \"required\"}]}";
> String table = "mydataset.pets";
> List<String> pets = Arrays.asList("Dog", "Cat", "Goldfish");
> PCollection<String> inputText = pipeline.apply(Create.of(pets)).setCoder(StringUtf8Coder.of());
> PCollection<TableRow> rows = inputText.apply(ParDo.of(new DoFn<String, TableRow>() {
> @ProcessElement
> public void processElement(ProcessContext c) {
> String text = c.element();
> if (text.startsWith("X")) { // change to (D)og and works fine
> TableRow row = new TableRow();
> row.set("pet", text);
> c.output(row);
> }
> }
> }));
> rows.apply(BigQueryIO.writeTableRows().to(table).withJsonSchema(schema)
> .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
> .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED));
> pipeline.run().waitUntilFinish();
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)