You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by "S. Sahayaraj" <ss...@quark.com> on 2018/05/29 13:26:32 UTC

Java based AWS IO connector

Hello All,
                The data source for our Beam pipleline is in S3 bucket, Is there any built-in I/O Connector available with Java samples? If so, can you please guide me how to integrate with them?.

                I am using Bean SDK for Java version 2.4.0 and Spark runner in clustered deployment.

<dependency>
              <groupId>org.apache.beam</groupId>
              <artifactId>beam-sdks-java-core</artifactId>
              <version>2.4.0</version>
            </dependency>

Cheers,
S. Sahayaraj

Re: Java based AWS IO connector

Posted by Alexey Romanenko <ar...@gmail.com>.
Hi Sahayaraj,

Yes, there is a module “beam-sdks-java-io-amazon-web-services” which can help with this. 

Also, I’d suggest you to take a look on this example which reads data from S3 bucket:
https://github.com/jbonofre/beam-samples/blob/master/amazon-web-services/src/main/java/org/apache/beam/samples/ingest/amazon/IngestToS3.java <https://github.com/jbonofre/beam-samples/blob/master/amazon-web-services/src/main/java/org/apache/beam/samples/ingest/amazon/IngestToS3.java>
WBR,
Alexey

> On 29 May 2018, at 15:26, S. Sahayaraj <ss...@quark.com> wrote:
> 
> Hello All,
>                 The data source for our Beam pipleline is in S3 bucket, Is there any built-in I/O Connector available with Java samples? If so, can you please guide me how to integrate with them?.
>  
>                 I am using Bean SDK for Java version 2.4.0 and Spark runner in clustered deployment.
>  
> <dependency>
>               <groupId>org.apache.beam</groupId>
>               <artifactId>beam-sdks-java-core</artifactId>
>               <version>2.4.0</version>
>             </dependency>
>  
> Cheers,
> S. Sahayaraj


Re: Java based AWS IO connector

Posted by Eugene Kirpichov <ki...@google.com>.
Beam works with S3 out of the box, you can provide s3://... paths to
anything that works with files. Don't remember if it's already available in
2.4 or only starting 2.5 - does it currently not work for you?

On Tue, May 29, 2018, 2:26 PM S. Sahayaraj <ss...@quark.com> wrote:

> Hello All,
>
>                 The data source for our Beam pipleline is in S3 bucket, Is
> there any built-in I/O Connector available with Java samples? If so, can
> you please guide me how to integrate with them?.
>
>
>
>                 I am using Bean SDK for Java version 2.4.0 and Spark
> runner in clustered deployment.
>
>
>
> <dependency>
>
>               <groupId>org.apache.beam</groupId>
>
>               <artifactId>beam-*sdks*-java-core</artifactId>
>
>               <version>2.4.0</version>
>
>             </dependency>
>
>
>
> Cheers,
>
> S. Sahayaraj
>