You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Jyotirmoy Sundi <su...@gmail.com> on 2017/06/22 05:25:10 UTC

reading from s3 file in aws

Hi Folks,

   Is there any way to read from s3 buckets in beam,

Trace:
Exception in thread "main" java.lang.IllegalStateException: Failed to
validate s3://my_bucket/path/to/input-*.csv
at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:254)
at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:165)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:420)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:334)
at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:47)
at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:157)
at com.intuit.pml.sessions.Eyeball.main(Eyeball.java:77)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.io.IOException: Unable to find handler for
s3://my_bucket/path/to/input-*.csv
at
org.apache.beam.sdk.util.IOChannelUtils.getFactory(IOChannelUtils.java:307)
at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:249)
... 11 more
-- 
Best Regards,

Re: reading from s3 file in aws

Posted by Lukasz Cwik <lc...@google.com>.
Filed BEAM-2500 as a feature request.

On Thu, Jun 22, 2017 at 9:00 AM, tarush grover <ta...@gmail.com>
wrote:

> Hi All,
>
> Can we add a module s3-file-system in beam to directly support and have
> integration with s3?
>
> Regards,
> Tarush
>
> On Thu, 22 Jun 2017 at 9:21 PM, Lukasz Cwik <lc...@google.com.invalid>
> wrote:
>
> > You want to depend on the Hadoop File System module[1] and configure
> > HadoopFileSystemOptions[2] with a S3 configuration[3].
> >
> > 1:
> > https://github.com/apache/beam/tree/master/sdks/java/io/
> hadoop-file-system
> > 2:
> >
> > https://github.com/apache/beam/blob/master/sdks/java/io/
> hadoop-file-system/src/main/java/org/apache/beam/sdk/io/
> hdfs/HadoopFileSystemOptions.java#L53
> > 3: https://wiki.apache.org/hadoop/AmazonS3
> >
> > On Wed, Jun 21, 2017 at 10:25 PM, Jyotirmoy Sundi <su...@gmail.com>
> > wrote:
> >
> > >
> > > Hi Folks,
> > >
> > >    Is there any way to read from s3 buckets in beam,
> > >
> > > Trace:
> > > Exception in thread "main" java.lang.IllegalStateException: Failed to
> > > validate s3://my_bucket/path/to/input-*.csv
> > > at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:254)
> > > at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:165)
> > > at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:420)
> > > at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:334)
> > > at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:47)
> > > at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:157)
> > > at com.intuit.pml.sessions.Eyeball.main(Eyeball.java:77)
> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > at sun.reflect.NativeMethodAccessorImpl.invoke(
> > > NativeMethodAccessorImpl.java:62)
> > > at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> > > DelegatingMethodAccessorImpl.java:43)
> > > at java.lang.reflect.Method.invoke(Method.java:498)
> > > at com.intellij.rt.execution.application.AppMain.main(
> AppMain.java:144)
> > > Caused by: java.io.IOException: Unable to find handler for
> > > s3://my_bucket/path/to/input-*.csv
> > > at org.apache.beam.sdk.util.IOChannelUtils.getFactory(
> > > IOChannelUtils.java:307)
> > > at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:249)
> > > ... 11 more
> > > --
> > > Best Regards,
> > >
> >
>

Re: reading from s3 file in aws

Posted by Lukasz Cwik <lc...@google.com.INVALID>.
Filed BEAM-2500 as a feature request.

On Thu, Jun 22, 2017 at 9:00 AM, tarush grover <ta...@gmail.com>
wrote:

> Hi All,
>
> Can we add a module s3-file-system in beam to directly support and have
> integration with s3?
>
> Regards,
> Tarush
>
> On Thu, 22 Jun 2017 at 9:21 PM, Lukasz Cwik <lc...@google.com.invalid>
> wrote:
>
> > You want to depend on the Hadoop File System module[1] and configure
> > HadoopFileSystemOptions[2] with a S3 configuration[3].
> >
> > 1:
> > https://github.com/apache/beam/tree/master/sdks/java/io/
> hadoop-file-system
> > 2:
> >
> > https://github.com/apache/beam/blob/master/sdks/java/io/
> hadoop-file-system/src/main/java/org/apache/beam/sdk/io/
> hdfs/HadoopFileSystemOptions.java#L53
> > 3: https://wiki.apache.org/hadoop/AmazonS3
> >
> > On Wed, Jun 21, 2017 at 10:25 PM, Jyotirmoy Sundi <su...@gmail.com>
> > wrote:
> >
> > >
> > > Hi Folks,
> > >
> > >    Is there any way to read from s3 buckets in beam,
> > >
> > > Trace:
> > > Exception in thread "main" java.lang.IllegalStateException: Failed to
> > > validate s3://my_bucket/path/to/input-*.csv
> > > at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:254)
> > > at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:165)
> > > at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:420)
> > > at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:334)
> > > at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:47)
> > > at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:157)
> > > at com.intuit.pml.sessions.Eyeball.main(Eyeball.java:77)
> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > at sun.reflect.NativeMethodAccessorImpl.invoke(
> > > NativeMethodAccessorImpl.java:62)
> > > at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> > > DelegatingMethodAccessorImpl.java:43)
> > > at java.lang.reflect.Method.invoke(Method.java:498)
> > > at com.intellij.rt.execution.application.AppMain.main(
> AppMain.java:144)
> > > Caused by: java.io.IOException: Unable to find handler for
> > > s3://my_bucket/path/to/input-*.csv
> > > at org.apache.beam.sdk.util.IOChannelUtils.getFactory(
> > > IOChannelUtils.java:307)
> > > at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:249)
> > > ... 11 more
> > > --
> > > Best Regards,
> > >
> >
>

Re: reading from s3 file in aws

Posted by tarush grover <ta...@gmail.com>.
Hi All,

Can we add a module s3-file-system in beam to directly support and have
integration with s3?

Regards,
Tarush

On Thu, 22 Jun 2017 at 9:21 PM, Lukasz Cwik <lc...@google.com.invalid>
wrote:

> You want to depend on the Hadoop File System module[1] and configure
> HadoopFileSystemOptions[2] with a S3 configuration[3].
>
> 1:
> https://github.com/apache/beam/tree/master/sdks/java/io/hadoop-file-system
> 2:
>
> https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.java#L53
> 3: https://wiki.apache.org/hadoop/AmazonS3
>
> On Wed, Jun 21, 2017 at 10:25 PM, Jyotirmoy Sundi <su...@gmail.com>
> wrote:
>
> >
> > Hi Folks,
> >
> >    Is there any way to read from s3 buckets in beam,
> >
> > Trace:
> > Exception in thread "main" java.lang.IllegalStateException: Failed to
> > validate s3://my_bucket/path/to/input-*.csv
> > at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:254)
> > at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:165)
> > at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:420)
> > at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:334)
> > at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:47)
> > at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:157)
> > at com.intuit.pml.sessions.Eyeball.main(Eyeball.java:77)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at sun.reflect.NativeMethodAccessorImpl.invoke(
> > NativeMethodAccessorImpl.java:62)
> > at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> > DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:498)
> > at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
> > Caused by: java.io.IOException: Unable to find handler for
> > s3://my_bucket/path/to/input-*.csv
> > at org.apache.beam.sdk.util.IOChannelUtils.getFactory(
> > IOChannelUtils.java:307)
> > at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:249)
> > ... 11 more
> > --
> > Best Regards,
> >
>

Re: reading from s3 file in aws

Posted by Lukasz Cwik <lc...@google.com>.
You want to depend on the Hadoop File System module[1] and configure
HadoopFileSystemOptions[2] with a S3 configuration[3].

1:
https://github.com/apache/beam/tree/master/sdks/java/io/hadoop-file-system
2:
https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.java#L53
3: https://wiki.apache.org/hadoop/AmazonS3

On Wed, Jun 21, 2017 at 10:25 PM, Jyotirmoy Sundi <su...@gmail.com>
wrote:

>
> Hi Folks,
>
>    Is there any way to read from s3 buckets in beam,
>
> Trace:
> Exception in thread "main" java.lang.IllegalStateException: Failed to
> validate s3://my_bucket/path/to/input-*.csv
> at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:254)
> at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:165)
> at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:420)
> at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:334)
> at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:47)
> at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:157)
> at com.intuit.pml.sessions.Eyeball.main(Eyeball.java:77)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
> Caused by: java.io.IOException: Unable to find handler for
> s3://my_bucket/path/to/input-*.csv
> at org.apache.beam.sdk.util.IOChannelUtils.getFactory(
> IOChannelUtils.java:307)
> at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:249)
> ... 11 more
> --
> Best Regards,
>

Re: reading from s3 file in aws

Posted by Lukasz Cwik <lc...@google.com.INVALID>.
You want to depend on the Hadoop File System module[1] and configure
HadoopFileSystemOptions[2] with a S3 configuration[3].

1:
https://github.com/apache/beam/tree/master/sdks/java/io/hadoop-file-system
2:
https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-file-system/src/main/java/org/apache/beam/sdk/io/hdfs/HadoopFileSystemOptions.java#L53
3: https://wiki.apache.org/hadoop/AmazonS3

On Wed, Jun 21, 2017 at 10:25 PM, Jyotirmoy Sundi <su...@gmail.com>
wrote:

>
> Hi Folks,
>
>    Is there any way to read from s3 buckets in beam,
>
> Trace:
> Exception in thread "main" java.lang.IllegalStateException: Failed to
> validate s3://my_bucket/path/to/input-*.csv
> at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:254)
> at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:165)
> at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:420)
> at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:334)
> at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:47)
> at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:157)
> at com.intuit.pml.sessions.Eyeball.main(Eyeball.java:77)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
> Caused by: java.io.IOException: Unable to find handler for
> s3://my_bucket/path/to/input-*.csv
> at org.apache.beam.sdk.util.IOChannelUtils.getFactory(
> IOChannelUtils.java:307)
> at org.apache.beam.sdk.io.TextIO$Read$Bound.expand(TextIO.java:249)
> ... 11 more
> --
> Best Regards,
>