You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by "S. Sahayaraj" <ss...@quark.com> on 2018/05/29 13:26:32 UTC
Java based AWS IO connector
Hello All,
The data source for our Beam pipleline is in S3 bucket, Is there any built-in I/O Connector available with Java samples? If so, can you please guide me how to integrate with them?.
I am using Bean SDK for Java version 2.4.0 and Spark runner in clustered deployment.
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-core</artifactId>
<version>2.4.0</version>
</dependency>
Cheers,
S. Sahayaraj
Re: Java based AWS IO connector
Posted by Alexey Romanenko <ar...@gmail.com>.
Hi Sahayaraj,
Yes, there is a module “beam-sdks-java-io-amazon-web-services” which can help with this.
Also, I’d suggest you to take a look on this example which reads data from S3 bucket:
https://github.com/jbonofre/beam-samples/blob/master/amazon-web-services/src/main/java/org/apache/beam/samples/ingest/amazon/IngestToS3.java <https://github.com/jbonofre/beam-samples/blob/master/amazon-web-services/src/main/java/org/apache/beam/samples/ingest/amazon/IngestToS3.java>
WBR,
Alexey
> On 29 May 2018, at 15:26, S. Sahayaraj <ss...@quark.com> wrote:
>
> Hello All,
> The data source for our Beam pipleline is in S3 bucket, Is there any built-in I/O Connector available with Java samples? If so, can you please guide me how to integrate with them?.
>
> I am using Bean SDK for Java version 2.4.0 and Spark runner in clustered deployment.
>
> <dependency>
> <groupId>org.apache.beam</groupId>
> <artifactId>beam-sdks-java-core</artifactId>
> <version>2.4.0</version>
> </dependency>
>
> Cheers,
> S. Sahayaraj
Re: Java based AWS IO connector
Posted by Eugene Kirpichov <ki...@google.com>.
Beam works with S3 out of the box, you can provide s3://... paths to
anything that works with files. Don't remember if it's already available in
2.4 or only starting 2.5 - does it currently not work for you?
On Tue, May 29, 2018, 2:26 PM S. Sahayaraj <ss...@quark.com> wrote:
> Hello All,
>
> The data source for our Beam pipleline is in S3 bucket, Is
> there any built-in I/O Connector available with Java samples? If so, can
> you please guide me how to integrate with them?.
>
>
>
> I am using Bean SDK for Java version 2.4.0 and Spark
> runner in clustered deployment.
>
>
>
> <dependency>
>
> <groupId>org.apache.beam</groupId>
>
> <artifactId>beam-*sdks*-java-core</artifactId>
>
> <version>2.4.0</version>
>
> </dependency>
>
>
>
> Cheers,
>
> S. Sahayaraj
>