You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2020/11/19 17:12:00 UTC

[jira] [Updated] (BEAM-11002) XmlIO buffer overflow exception

     [ https://issues.apache.org/jira/browse/BEAM-11002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Beam JIRA Bot updated BEAM-11002:
---------------------------------
    Labels: Clarified stale-assigned  (was: Clarified)

> XmlIO buffer overflow exception 
> --------------------------------
>
>                 Key: BEAM-11002
>                 URL: https://issues.apache.org/jira/browse/BEAM-11002
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-xml
>    Affects Versions: 2.23.0, 2.24.0
>            Reporter: Duncan Lew
>            Assignee: Chamikara Madhusanka Jayalath
>            Priority: P1
>              Labels: Clarified, stale-assigned
>
> We're making using of Apache Beam in Google Dataflow.
> We're using XmlIO to read in an XML file with such a setup
> {code:java}
> pipeline
>                     .apply("Read Storage Bucket",
>                             XmlIO.read<XmlProduct>()
>                                     .from(sourcePath)
>                                     .withRootElement(xmlProductRoot)
>                                     .withRecordElement(xmlProductRecord)
>                                     .withRecordClass(XmlProduct::class.java)
>                     )
> {code}
> However, from time to time, we're getting buffer overflow exception from reading random xml files:
> {code:java}
> "Error message from worker: java.io.IOException: Failed to start reading from source: gs://path-to-xml-file.xml range [1722550, 2684411)
> 	org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:610)
> 	org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:359)
> 	org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
> 	org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
> 	org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
> 	org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:417)
> 	org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:386)
> 	org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:311)
> 	org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
> 	org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
> 	org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
> 	java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> 	java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> 	java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> 	java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.nio.BufferOverflowException
> 	java.base/java.nio.Buffer.nextPutIndex(Buffer.java:662)
> 	java.base/java.nio.HeapByteBuffer.put(HeapByteBuffer.java:196)
> 	org.apache.beam.sdk.io.xml.XmlSource$XMLReader.getFirstOccurenceOfRecordElement(XmlSource.java:285)
> 	org.apache.beam.sdk.io.xml.XmlSource$XMLReader.startReading(XmlSource.java:192)
> 	org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:476)
> 	org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:249)
> 	org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:607)
> 	... 14 more
> {code}
> We can't reproduce this buffer overflow exception locally with the DirectRunner. If we rerun the dataflow job in the Google Cloud, it can run correctly without any exceptions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)