You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Steve Bistline <sr...@gmail.com> on 2018/11/15 23:58:40 UTC

Writing to S3

I am trying to write out to S3 from Flink with the following code and
getting the error below. Tried adding the parser as a dependency, etc.

Any help would be appreciated

Table result = tableEnv.sql("SELECT 'Alert ',t_sKey, t_deviceID,
t_sValue FROM SENSORS WHERE t_sKey='TEMPERATURE' AND t_sValue > " +
TEMPERATURE_THRESHOLD);
tableEnv.toAppendStream(result, Row.class).print();
// Write to S3 bucket
DataStream<Row> dsRow = tableEnv.toAppendStream(result, Row.class);
dsRow.writeAsText("s3://csv-lifeai-ai/flink-alerts");


===========================

avax.xml.parsers.FactoryConfigurationError: Provider for class
javax.xml.parsers.DocumentBuilderFactory cannot be created
	at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311)
	at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
	at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2565)
	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2541)
	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2424)
	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.get(Configuration.java:1238)
	at org.apache.flink.fs.s3hadoop.S3FileSystemFactory.create(S3FileSystemFactory.java:98)
	at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:397)
	at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:320)
	at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
	at org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:218)
	at org.apache.flink.api.java.io.TextOutputFormat.open(TextOutputFormat.java:88)
	at org.apache.flink.streaming.api.functions.sink.OutputFormatSinkFunction.open(OutputFormatSinkFunction.java:64)
	at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
	at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
	at org.apache.flink.streaming.api.operators.StreamSink.open(StreamSink.java:48)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:420)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:296)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
	at java.lang.Thread.run(Thread.java:748)

Re: Writing to S3

Posted by Steve Bistline <sr...@gmail.com>.
Hi Ken,

Thank you for the link... I had just found this and when I removed the
Hadoop dependencies ( not using in this project anyway ) things worked
fine.

Now just trying to figure out the credentials.

Thanks,

Steve

On Thu, Nov 15, 2018 at 7:12 PM Ken Krugler <kk...@transpac.com>
wrote:

> Hi Steve,
>
> This looks similar to
> https://stackoverflow.com/questions/52009823/flink-shaded-hadoop-s3-filesystems-still-requires-hdfs-default-and-hdfs-site-con
>
> I see that you have classes
> like org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration
> in your jar.
>
> Which makes me think you’re not excluding those properly.
>
> — Ken
>
>
> On Nov 15, 2018, at 3:58 PM, Steve Bistline <sr...@gmail.com>
> wrote:
>
> I am trying to write out to S3 from Flink with the following code and
> getting the error below. Tried adding the parser as a dependency, etc.
>
> Any help would be appreciated
>
> Table result = tableEnv.sql("SELECT 'Alert ',t_sKey, t_deviceID, t_sValue FROM SENSORS WHERE t_sKey='TEMPERATURE' AND t_sValue > " + TEMPERATURE_THRESHOLD);
> tableEnv.toAppendStream(result, Row.class).print();
> // Write to S3 bucket
> DataStream<Row> dsRow = tableEnv.toAppendStream(result, Row.class);
> dsRow.writeAsText("s3://csv-lifeai-ai/flink-alerts");
>
>
> ===========================
>
> avax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
> 	at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311)
> 	at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
> 	at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
> 	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2565)
> 	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2541)
> 	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2424)
> 	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.get(Configuration.java:1238)
> 	at org.apache.flink.fs.s3hadoop.S3FileSystemFactory.create(S3FileSystemFactory.java:98)
> 	at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:397)
> 	at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:320)
> 	at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
> 	at org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:218)
> 	at org.apache.flink.api.java.io.TextOutputFormat.open(TextOutputFormat.java:88)
> 	at org.apache.flink.streaming.api.functions.sink.OutputFormatSinkFunction.open(OutputFormatSinkFunction.java:64)
> 	at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
> 	at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
> 	at org.apache.flink.streaming.api.operators.StreamSink.open(StreamSink.java:48)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:420)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:296)
> 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
> 	at java.lang.Thread.run(Thread.java:748)
>
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> Custom big data solutions & training
> Flink, Solr, Hadoop, Cascading & Cassandra
>
>

Re: Writing to S3

Posted by Ken Krugler <kk...@transpac.com>.
Hi Steve,

This looks similar to https://stackoverflow.com/questions/52009823/flink-shaded-hadoop-s3-filesystems-still-requires-hdfs-default-and-hdfs-site-con <https://stackoverflow.com/questions/52009823/flink-shaded-hadoop-s3-filesystems-still-requires-hdfs-default-and-hdfs-site-con>

I see that you have classes like org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration in your jar.

Which makes me think you’re not excluding those properly.

— Ken


> On Nov 15, 2018, at 3:58 PM, Steve Bistline <sr...@gmail.com> wrote:
> 
> I am trying to write out to S3 from Flink with the following code and getting the error below. Tried adding the parser as a dependency, etc.
> 
> Any help would be appreciated
> 
> Table result = tableEnv.sql("SELECT 'Alert ',t_sKey, t_deviceID, t_sValue FROM SENSORS WHERE t_sKey='TEMPERATURE' AND t_sValue > " + TEMPERATURE_THRESHOLD);
> tableEnv.toAppendStream(result, Row.class).print();
> // Write to S3 bucket
> DataStream<Row> dsRow = tableEnv.toAppendStream(result, Row.class);
> dsRow.writeAsText("s3://csv-lifeai-ai/flink-alerts");
> 
> ===========================
> 
> avax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
> 	at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:311)
> 	at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
> 	at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
> 	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2565)
> 	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2541)
> 	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2424)
> 	at org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.conf.Configuration.get(Configuration.java:1238)
> 	at org.apache.flink.fs.s3hadoop.S3FileSystemFactory.create(S3FileSystemFactory.java:98)
> 	at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:397)
> 	at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:320)
> 	at org.apache.flink.core.fs.Path.getFileSystem(Path.java:298)
> 	at org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:218)
> 	at org.apache.flink.api.java.io.TextOutputFormat.open(TextOutputFormat.java:88)
> 	at org.apache.flink.streaming.api.functions.sink.OutputFormatSinkFunction.open(OutputFormatSinkFunction.java:64)
> 	at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
> 	at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
> 	at org.apache.flink.streaming.api.operators.StreamSink.open(StreamSink.java:48)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:420)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:296)
> 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
> 	at java.lang.Thread.run(Thread.java:748)

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra