You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2020/11/19 17:12:00 UTC

[jira] [Commented] (BEAM-11002) XmlIO buffer overflow exception

    [ https://issues.apache.org/jira/browse/BEAM-11002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235596#comment-17235596 ] 

Beam JIRA Bot commented on BEAM-11002:
--------------------------------------

This issue is assigned but has not received an update in 30 days so it has been labeled "stale-assigned". If you are still working on the issue, please give an update and remove the label. If you are no longer working on the issue, please unassign so someone else may work on it. In 7 days the issue will be automatically unassigned.

> XmlIO buffer overflow exception 
> --------------------------------
>
>                 Key: BEAM-11002
>                 URL: https://issues.apache.org/jira/browse/BEAM-11002
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-xml
>    Affects Versions: 2.23.0, 2.24.0
>            Reporter: Duncan Lew
>            Assignee: Chamikara Madhusanka Jayalath
>            Priority: P1
>              Labels: Clarified, stale-assigned
>
> We're making using of Apache Beam in Google Dataflow.
> We're using XmlIO to read in an XML file with such a setup
> {code:java}
> pipeline
>                     .apply("Read Storage Bucket",
>                             XmlIO.read<XmlProduct>()
>                                     .from(sourcePath)
>                                     .withRootElement(xmlProductRoot)
>                                     .withRecordElement(xmlProductRecord)
>                                     .withRecordClass(XmlProduct::class.java)
>                     )
> {code}
> However, from time to time, we're getting buffer overflow exception from reading random xml files:
> {code:java}
> "Error message from worker: java.io.IOException: Failed to start reading from source: gs://path-to-xml-file.xml range [1722550, 2684411)
> 	org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:610)
> 	org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:359)
> 	org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
> 	org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
> 	org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
> 	org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:417)
> 	org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:386)
> 	org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:311)
> 	org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
> 	org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
> 	org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
> 	java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> 	java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> 	java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> 	java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.nio.BufferOverflowException
> 	java.base/java.nio.Buffer.nextPutIndex(Buffer.java:662)
> 	java.base/java.nio.HeapByteBuffer.put(HeapByteBuffer.java:196)
> 	org.apache.beam.sdk.io.xml.XmlSource$XMLReader.getFirstOccurenceOfRecordElement(XmlSource.java:285)
> 	org.apache.beam.sdk.io.xml.XmlSource$XMLReader.startReading(XmlSource.java:192)
> 	org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:476)
> 	org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:249)
> 	org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:607)
> 	... 14 more
> {code}
> We can't reproduce this buffer overflow exception locally with the DirectRunner. If we rerun the dataflow job in the Google Cloud, it can run correctly without any exceptions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)