You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Arvid Heise (Jira)" <ji...@apache.org> on 2020/02/17 16:32:00 UTC

[jira] [Commented] (FLINK-16014) S3 plugin ClassNotFoundException SAXParser

    [ https://issues.apache.org/jira/browse/FLINK-16014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038494#comment-17038494 ] 

Arvid Heise commented on FLINK-16014:
-------------------------------------

Preliminary assessment:

Root cause is a bug in JDK8. `XMLReaderFactory` [caches|https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/jdk8u222-b10/jaxp/src/org/xml/sax/helpers/XMLReaderFactory.java#L146-L174] the class name independent of the classloader. On EMR, xercesImpl is on classpath (because of HDFS) and will be loaded at some point in time.

We have some workarounds, none of which would solve it in all cases. Since the user may bundle any `SAXParser` in his code or put it in lib, the plugin might always save a `XMLReader` that is not visible from the plugin.

Workarounds:

1) Add xercesImpl to s3 plugin. Would blow up file size but cover the most common case of a `SAXParser`.

2) Add another smaller `SAXParser` to avoid having none.

3) Just add a service descriptor that points to the fallback implementation, such that the fallback will be cached.

For all options, we should eagerly initialize the `XMLReaderFactory` in S3AFileSystem to win against user code and hopefully even `lib`.

> S3 plugin ClassNotFoundException SAXParser
> ------------------------------------------
>
>                 Key: FLINK-16014
>                 URL: https://issues.apache.org/jira/browse/FLINK-16014
>             Project: Flink
>          Issue Type: Bug
>          Components: FileSystems
>    Affects Versions: 1.10.0, 1.11.0
>            Reporter: Arvid Heise
>            Priority: Major
>
> While stress-testing s3 plugin on EMR.
>  
> {noformat}
> org.apache.flink.util.FlinkRuntimeException: Could not perform checkpoint 2 for operator Map (114/160).
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:839)
> 	at org.apache.flink.streaming.runtime.io.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:104)
> 	at org.apache.flink.streaming.runtime.io.CheckpointBarrierUnaligner.notifyBarrierReceived(CheckpointBarrierUnaligner.java:149)
> 	at org.apache.flink.streaming.runtime.io.InputProcessorUtil$1.lambda$notifyBarrierReceived$0(InputProcessorUtil.java:80)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.run(StreamTaskActionExecutor.java:87)
> 	at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78)
> 	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:255)
> 	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:508)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:492)
> 	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
> 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
> 	at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on s3://emr-unaligned-checkpoints-testing-eu-central-1/inflight/9ae223e41008b17568d7f63c12360268_output/part-file.-1: com.amazonaws.SdkClientException: Couldn't initialize a SAX driver to create an XMLReader: Couldn't initialize a SAX driver to create an XMLReader
> 	at org.apache.flink.runtime.io.network.BufferPersisterImpl$Writer.checkErroneousUnsafe(BufferPersisterImpl.java:262)
> 	at org.apache.flink.runtime.io.network.BufferPersisterImpl$Writer.add(BufferPersisterImpl.java:137)
> 	at org.apache.flink.runtime.io.network.BufferPersisterImpl.addBuffers(BufferPersisterImpl.java:66)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.prepareInflightDataSnapshot(StreamTask.java:935)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:898)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:870)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:826)
> 	... 12 more
> Caused by: org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on s3://emr-unaligned-checkpoints-testing-eu-central-1/inflight/9ae223e41008b17568d7f63c12360268_output/part-file.-1: com.amazonaws.SdkClientException: Couldn't initialize a SAX driver to create an XMLReader: Couldn't initialize a SAX driver to create an XMLReader
> 	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177)
> 	at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:145)
> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2251)
> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2149)
> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088)
> 	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1734)
> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:2970)
> 	at org.apache.flink.fs.s3hadoop.common.HadoopFileSystem.exists(HadoopFileSystem.java:152)
> 	at org.apache.flink.core.fs.PluginFileSystemFactory$ClassLoaderFixingFileSystem.exists(PluginFileSystemFactory.java:143)
> 	at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.exists(SafetyNetWrapperFileSystem.java:102)
> 	at org.apache.flink.runtime.io.network.BufferPersisterImpl$Writer.get(BufferPersisterImpl.java:213)
> 	at org.apache.flink.runtime.io.network.BufferPersisterImpl$Writer.run(BufferPersisterImpl.java:167)
> Caused by: com.amazonaws.SdkClientException: Couldn't initialize a SAX driver to create an XMLReader
> 	at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:118)
> 	at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:87)
> 	at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:77)
> 	at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
> 	at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
> 	at com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1554)
> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1272)
> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
> 	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
> 	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
> 	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)
> 	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)
> 	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4266)
> 	at com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:876)
> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$5(S3AFileSystem.java:1262)
> 	at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:317)
> 	at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:280)
> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:1255)
> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2223)
> 	... 9 more
> Caused by: org.xml.sax.SAXException: SAX2 driver class org.apache.xerces.parsers.SAXParser not found
> java.lang.ClassNotFoundException: org.apache.xerces.parsers.SAXParser
> 	at org.xml.sax.helpers.XMLReaderFactory.loadClass(XMLReaderFactory.java:230)
> 	at org.xml.sax.helpers.XMLReaderFactory.createXMLReader(XMLReaderFactory.java:191)
> 	at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.<init>(XmlResponsesSaxParser.java:115)
> 	... 32 more
> Caused by: java.lang.ClassNotFoundException: org.apache.xerces.parsers.SAXParser
> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> 	at org.apache.flink.core.plugin.PluginLoader$PluginClassLoader.loadClass(PluginLoader.java:149)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> 	at org.xml.sax.helpers.NewInstance.newInstance(NewInstance.java:82)
> 	at org.xml.sax.helpers.XMLReaderFactory.loadClass(XMLReaderFactory.java:228)
> 	... 34 more{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)