You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Chamikara Jayalath (JIRA)" <ji...@apache.org> on 2019/01/10 16:15:00 UTC
[jira] [Comment Edited] (BEAM-5434) Issue with BigQueryIO in Template

    [ https://issues.apache.org/jira/browse/BEAM-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739564#comment-16739564 ] 

Chamikara Jayalath edited comment on BEAM-5434 at 1/10/19 4:14 PM:
-------------------------------------------------------------------

You have to use withTemplateCompatibility() for templates. 

Please re-open if this is still failing for you.


was (Author: chamikara):
You have to use withTemplateCompatiblity() for templates. 

Please re-open if this is still failing for you.

> Issue with BigQueryIO in Template
> ---------------------------------
>
>                 Key: BEAM-5434
>                 URL: https://issues.apache.org/jira/browse/BEAM-5434
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>    Affects Versions: 2.5.0
>            Reporter: Amarendra Kumar
>            Assignee: Chamikara Jayalath
>            Priority: Blocker
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I am trying to build a google Dataflow template to be run from a cloud function.
> The issue is with BigQueryIO trying execute a SQL.
> The opening step for my Dataflow Template is
> {code:java}
> BigQueryIO.readTableRows().withQueryLocation("US").withoutValidation().fromQuery(options.getSql()).usingStandardSql()
> {code}
> When the template is triggered for the first time its running fine.
> But when its triggered for the second time, it fails with the following error.
> {code}
> // Some comments here
> java.io.FileNotFoundException: No files matched spec: gs://test-notification/temp/Notification/BigQueryExtractTemp/34d42a122600416c9ea748a6e325f87a/000000000000.avro
> 	at org.apache.beam.sdk.io.FileSystems.maybeAdjustEmptyMatchResult(FileSystems.java:172)
> 	at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:158)
> 	at org.apache.beam.sdk.io.FileBasedSource.createReader(FileBasedSource.java:329)
> 	at com.google.cloud.dataflow.worker.WorkerCustomSources$1.iterator(WorkerCustomSources.java:360)
> 	at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:177)
> 	at com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:158)
> 	at com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:75)
> 	at com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:391)
> 	at com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:360)
> 	at com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:288)
> 	at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
> 	at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
> 	at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}
> In the second run, why is the process expecting a file in the GCS location?
> This file does get created while the job is running at the first run, but it also gets deleted after the job is complete. 
> How are the two jobs related?
>  Could you please let me know if I am missing something or this is a bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)