You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Lars Bakke Krogvig (JIRA)" <ji...@apache.org> on 2017/08/29 14:58:00 UTC

[jira] [Commented] (BEAM-2761) Write to empty BigQuery partition fails with "No schema specified on job or table." despite having provided schema

    [ https://issues.apache.org/jira/browse/BEAM-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16145431#comment-16145431 ] 

Lars Bakke Krogvig commented on BEAM-2761:
------------------------------------------

I have the same issue in 2.1.0. Is there a recommended workaround?

> Write to empty BigQuery partition fails with "No schema specified on job or table." despite having provided schema
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-2761
>                 URL: https://issues.apache.org/jira/browse/BEAM-2761
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>    Affects Versions: 2.1.0, 2.2.0
>            Reporter: Fallon
>            Assignee: Reuven Lax
>            Priority: Minor
>         Attachments: beam-2761-stacktrace.txt
>
>
> In 2.1.0-SNAPSHOT and 2.2.0-SNAPSHOT, jobs writing an empty PCollection to a BigQuery partition fail with "java.lang.RuntimeException: Failed to create load job with id prefix". This is associated with a message "No schema specified on job or table" even though a schema is provided. See attached stack trace for the more detail on the error.
> Command to run job:
> {code}
> mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.EmptyPCollection \
>      -Dexec.args="--runner=DataflowRunner --project=<GCP project> \
>                   --gcpTempLocation=<tmp location>" \
>      -Pdataflow-runner
> {code}
> Code to reproduce the problem:
> {code:title=EmptyPCollection.java|borderStyle=solid}
> public class EmptyPCollection {
>   public static void main(String[] args) {
>     PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
>     options.setTempLocation("<your tmp location>");
>     Pipeline pipeline = Pipeline.create(options);
>     String schema = "{\"fields\": [{\"name\": \"pet\", \"type\": \"string\", \"mode\": \"required\"}]}";
>     String table = "mydataset.pets";
>     List<String> pets = Arrays.asList("Dog", "Cat", "Goldfish");
>     PCollection<String> inputText = pipeline.apply(Create.of(pets)).setCoder(StringUtf8Coder.of());
>     PCollection<TableRow> rows = inputText.apply(ParDo.of(new DoFn<String, TableRow>() {
>       @ProcessElement
>       public void processElement(ProcessContext c) {
>         String text = c.element();
>         if (text.startsWith("X")) {  // change to (D)og and works fine
>           TableRow row = new TableRow();
>           row.set("pet", text);
>           c.output(row);
>         }
>       }
>     }));
>     rows.apply(BigQueryIO.writeTableRows().to(table).withJsonSchema(schema)
>         .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
>         .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED));
>     pipeline.run().waitUntilFinish();
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)