You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Siyu Lin <si...@unity3d.com> on 2022/04/06 06:28:55 UTC

Breaking change for FileIO WriteDynamic in Beam 2.34?

Hi Beam community,

We have a batch pipeline which does not run regularly. Recently we
have upgraded to Beam 2.36 and this broke the FileIO WriteDynamic
process.

We are using Dataflow Runner, and the errors are like this when there
are multiple workers:

Error message from worker: java.lang.NoClassDefFoundError: Could not
initialize class
org.apache.beam.runners.dataflow.worker.ApplianceShuffleWriter
org.apache.beam.runners.dataflow.worker.ShuffleSink.writer(ShuffleSink.java:348)
org.apache.beam.runners.dataflow.worker.SizeReportingSinkWrapper.writer(SizeReportingSinkWrapper.java:46)
org.apache.beam.runners.dataflow.worker.util.common.worker.WriteOperation.initializeWriter(WriteOperation.java:71)
org.apache.beam.runners.dataflow.worker.util.common.worker.WriteOperation.start(WriteOperation.java:78)
org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:92)
org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:420)
org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:389)
org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:314)
org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

However when there is only a single worker, the error is like this:

The job failed because a work item has failed 4 times. Look in
previous log entries for the cause of each one of the 4 failures. For
more information, see
https://cloud.google.com/dataflow/docs/guides/common-errors. The work
item was attempted on these workers: xxx Root cause: The worker lost
contact with the service.,

The error guided suggested upgrade machine type.

Those errors happen when using SDK 2.34+. When I switched to SDK 2.33,
everything worked well without any issues. Tried SDK 2.34, 2.35 and
2.36, and found all of them got the same issue.

Context: The code simply just reads from BigQuery with a fixed table
of 4,034 records, does some transform, and outputs to GCS with
FileIO.WriteDynamic. All tests were performed using the same machine
type with the same worker number.

Does anyone know if there are any breaking changes in this SDK /
Dataflow runner?

Thanks so much!
Siyu

Re: Breaking change for FileIO WriteDynamic in Beam 2.34?

Posted by Steve Niemitz <sn...@apache.org>.
Without the full logs it's hard to say, but I've definitely seen that error
in the past when the worker disks are full.  ApplianceShuffleWriter needs
to extract a native library to a temp location, and if the disk is full
that'll fail, resulting in the NoClassDefFoundError.

On Wed, Apr 6, 2022 at 12:46 PM Chamikara Jayalath <ch...@google.com>
wrote:

> I'm not aware of a breaking change along these lines off the top of my
> head. Sounds like the classes required for Dataflow shuffle are missing
> somehow. Unless someone here can point to something, you might have to
> contact Google Cloud support so they can look at your job.
>
> Thanks,
> Cham
>
> On Wed, Apr 6, 2022 at 9:39 AM Ahmet Altay <al...@google.com> wrote:
>
>> /cc @Chamikara Jayalath <ch...@google.com>
>>
>> On Tue, Apr 5, 2022 at 11:29 PM Siyu Lin <si...@unity3d.com> wrote:
>>
>>> Hi Beam community,
>>>
>>> We have a batch pipeline which does not run regularly. Recently we
>>> have upgraded to Beam 2.36 and this broke the FileIO WriteDynamic
>>> process.
>>>
>>> We are using Dataflow Runner, and the errors are like this when there
>>> are multiple workers:
>>>
>>> Error message from worker: java.lang.NoClassDefFoundError: Could not
>>> initialize class
>>> org.apache.beam.runners.dataflow.worker.ApplianceShuffleWriter
>>>
>>> org.apache.beam.runners.dataflow.worker.ShuffleSink.writer(ShuffleSink.java:348)
>>>
>>> org.apache.beam.runners.dataflow.worker.SizeReportingSinkWrapper.writer(SizeReportingSinkWrapper.java:46)
>>>
>>> org.apache.beam.runners.dataflow.worker.util.common.worker.WriteOperation.initializeWriter(WriteOperation.java:71)
>>>
>>> org.apache.beam.runners.dataflow.worker.util.common.worker.WriteOperation.start(WriteOperation.java:78)
>>>
>>> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:92)
>>>
>>> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:420)
>>>
>>> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:389)
>>>
>>> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:314)
>>>
>>> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>>>
>>> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>>>
>>> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>>> java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>> java.lang.Thread.run(Thread.java:748)
>>>
>>> However when there is only a single worker, the error is like this:
>>>
>>> The job failed because a work item has failed 4 times. Look in
>>> previous log entries for the cause of each one of the 4 failures. For
>>> more information, see
>>> https://cloud.google.com/dataflow/docs/guides/common-errors. The work
>>> item was attempted on these workers: xxx Root cause: The worker lost
>>> contact with the service.,
>>>
>>> The error guided suggested upgrade machine type.
>>>
>>> Those errors happen when using SDK 2.34+. When I switched to SDK 2.33,
>>> everything worked well without any issues. Tried SDK 2.34, 2.35 and
>>> 2.36, and found all of them got the same issue.
>>>
>>> Context: The code simply just reads from BigQuery with a fixed table
>>> of 4,034 records, does some transform, and outputs to GCS with
>>> FileIO.WriteDynamic. All tests were performed using the same machine
>>> type with the same worker number.
>>>
>>> Does anyone know if there are any breaking changes in this SDK /
>>> Dataflow runner?
>>>
>>> Thanks so much!
>>> Siyu
>>>
>>

Re: Breaking change for FileIO WriteDynamic in Beam 2.34?

Posted by Chamikara Jayalath <ch...@google.com>.
I'm not aware of a breaking change along these lines off the top of my
head. Sounds like the classes required for Dataflow shuffle are missing
somehow. Unless someone here can point to something, you might have to
contact Google Cloud support so they can look at your job.

Thanks,
Cham

On Wed, Apr 6, 2022 at 9:39 AM Ahmet Altay <al...@google.com> wrote:

> /cc @Chamikara Jayalath <ch...@google.com>
>
> On Tue, Apr 5, 2022 at 11:29 PM Siyu Lin <si...@unity3d.com> wrote:
>
>> Hi Beam community,
>>
>> We have a batch pipeline which does not run regularly. Recently we
>> have upgraded to Beam 2.36 and this broke the FileIO WriteDynamic
>> process.
>>
>> We are using Dataflow Runner, and the errors are like this when there
>> are multiple workers:
>>
>> Error message from worker: java.lang.NoClassDefFoundError: Could not
>> initialize class
>> org.apache.beam.runners.dataflow.worker.ApplianceShuffleWriter
>>
>> org.apache.beam.runners.dataflow.worker.ShuffleSink.writer(ShuffleSink.java:348)
>>
>> org.apache.beam.runners.dataflow.worker.SizeReportingSinkWrapper.writer(SizeReportingSinkWrapper.java:46)
>>
>> org.apache.beam.runners.dataflow.worker.util.common.worker.WriteOperation.initializeWriter(WriteOperation.java:71)
>>
>> org.apache.beam.runners.dataflow.worker.util.common.worker.WriteOperation.start(WriteOperation.java:78)
>>
>> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:92)
>>
>> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:420)
>>
>> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:389)
>>
>> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:314)
>>
>> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>>
>> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>>
>> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>> java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> java.lang.Thread.run(Thread.java:748)
>>
>> However when there is only a single worker, the error is like this:
>>
>> The job failed because a work item has failed 4 times. Look in
>> previous log entries for the cause of each one of the 4 failures. For
>> more information, see
>> https://cloud.google.com/dataflow/docs/guides/common-errors. The work
>> item was attempted on these workers: xxx Root cause: The worker lost
>> contact with the service.,
>>
>> The error guided suggested upgrade machine type.
>>
>> Those errors happen when using SDK 2.34+. When I switched to SDK 2.33,
>> everything worked well without any issues. Tried SDK 2.34, 2.35 and
>> 2.36, and found all of them got the same issue.
>>
>> Context: The code simply just reads from BigQuery with a fixed table
>> of 4,034 records, does some transform, and outputs to GCS with
>> FileIO.WriteDynamic. All tests were performed using the same machine
>> type with the same worker number.
>>
>> Does anyone know if there are any breaking changes in this SDK /
>> Dataflow runner?
>>
>> Thanks so much!
>> Siyu
>>
>

Re: Breaking change for FileIO WriteDynamic in Beam 2.34?

Posted by Ahmet Altay <al...@google.com>.
/cc @Chamikara Jayalath <ch...@google.com>

On Tue, Apr 5, 2022 at 11:29 PM Siyu Lin <si...@unity3d.com> wrote:

> Hi Beam community,
>
> We have a batch pipeline which does not run regularly. Recently we
> have upgraded to Beam 2.36 and this broke the FileIO WriteDynamic
> process.
>
> We are using Dataflow Runner, and the errors are like this when there
> are multiple workers:
>
> Error message from worker: java.lang.NoClassDefFoundError: Could not
> initialize class
> org.apache.beam.runners.dataflow.worker.ApplianceShuffleWriter
>
> org.apache.beam.runners.dataflow.worker.ShuffleSink.writer(ShuffleSink.java:348)
>
> org.apache.beam.runners.dataflow.worker.SizeReportingSinkWrapper.writer(SizeReportingSinkWrapper.java:46)
>
> org.apache.beam.runners.dataflow.worker.util.common.worker.WriteOperation.initializeWriter(WriteOperation.java:71)
>
> org.apache.beam.runners.dataflow.worker.util.common.worker.WriteOperation.start(WriteOperation.java:78)
>
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:92)
>
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:420)
>
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:389)
>
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:314)
>
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>
> However when there is only a single worker, the error is like this:
>
> The job failed because a work item has failed 4 times. Look in
> previous log entries for the cause of each one of the 4 failures. For
> more information, see
> https://cloud.google.com/dataflow/docs/guides/common-errors. The work
> item was attempted on these workers: xxx Root cause: The worker lost
> contact with the service.,
>
> The error guided suggested upgrade machine type.
>
> Those errors happen when using SDK 2.34+. When I switched to SDK 2.33,
> everything worked well without any issues. Tried SDK 2.34, 2.35 and
> 2.36, and found all of them got the same issue.
>
> Context: The code simply just reads from BigQuery with a fixed table
> of 4,034 records, does some transform, and outputs to GCS with
> FileIO.WriteDynamic. All tests were performed using the same machine
> type with the same worker number.
>
> Does anyone know if there are any breaking changes in this SDK /
> Dataflow runner?
>
> Thanks so much!
> Siyu
>