You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "dejan miljkovic (JIRA)" <ji...@apache.org> on 2018/03/26 08:29:00 UTC

[jira] [Created] (FLINK-9075) BucketingSink S3 does not work on local cluster

dejan miljkovic created FLINK-9075:
--------------------------------------

             Summary: BucketingSink S3 does not work on local cluster
                 Key: FLINK-9075
                 URL: https://issues.apache.org/jira/browse/FLINK-9075
             Project: Flink
          Issue Type: Bug
          Components: Streaming Connectors
    Affects Versions: 1.4.2
            Reporter: dejan miljkovic


Trying to write to S3 using BucketingSink. Got below error when code is executed on local Flink 1.4.2 cluster. Code works from InteliJ. I followed procedure for S3 connection from documentation (copied flink-s3-fs-hadoop-1.4.2.jar to lib). I reported similar issues  before. It looks that they were all related to class loading issues. 

On [https://github.com/dmiljkovic/test-flink-bucketingsink-s3] I provided code that produces below error. pom.xm contains more stuff than is needed. I just copied pom from project that need to write to S3.

 
javax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
	at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311)
	at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
	at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2567)
	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2543)
	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2426)
	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.get(Configuration.java:1240)
	at org.apache.flink.fs.s3hadoop.S3FileSystemFactory.create(S3FileSystemFactory.java:98)
	at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:397)
	at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.createHadoopFileSystem(BucketingSink.java:1126)
	at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initFileSystem(BucketingSink.java:411)
	at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initializeState(BucketingSink.java:355)
	at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
	at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)
	at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:258)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:694)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:682)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:253)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
	at java.lang.Thread.run(Thread.java:748)
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [jira] [Created] (FLINK-9075) BucketingSink S3 does not work on local cluster

Posted by deanding <di...@gmail.com>.
Hi,

I have the similar issue when I was trying running on HDFS

java.io.IOException: Error opening the Input Split
hdfs://localhost:9000/user/yliu/test_big.csv [0,111513]: Provider for class
javax.xml.parsers.DocumentBuilderFactory cannot be created
	at
org.apache.flink.api.common.io.FileInputFormat.open(FileInputFormat.java:705)
	at
org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:477)
	at
org.apache.flink.api.common.io.GenericCsvInputFormat.open(GenericCsvInputFormat.java:301)
	at org.apache.flink.api.java.io.CsvInputFormat.open(CsvInputFormat.java:53)
	at org.apache.flink.api.java.io.CsvInputFormat.open(CsvInputFormat.java:36)
	at
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:145)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
	at java.lang.Thread.run(Thread.java:745)
Caused by: javax.xml.parsers.FactoryConfigurationError: Provider for class
javax.xml.parsers.DocumentBuilderFactory cannot be created
	at
javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311)
	at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
	at
javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
	at
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2516)
	at
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2492)
	at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2405)
	at org.apache.hadoop.conf.Configuration.get(Configuration.java:981)
	at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1031)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2189)
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)
	at
org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:99)
	at
org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:401)
	at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:320)
	at
org.apache.flink.api.common.io.FileInputFormat$InputSplitOpenThread.run(FileInputFormat.java:864)

Anyone help?



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/