You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Ismaël Mejía (Jira)" <ji...@apache.org> on 2021/04/06 11:41:00 UTC

[jira] [Commented] (BEAM-11788) TextIO.read() will ignore custom AwsServiceEndpoint

    [ https://issues.apache.org/jira/browse/BEAM-11788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315461#comment-17315461 ] 

Ismaël Mejía commented on BEAM-11788:
-------------------------------------

This should work, the current aws tests on Beam use s3mock so maybe it is a configuration issue. Can you please check https://github.com/apache/beam/blob/master/sdks/java/io/amazon-web-services/src/test/java/org/apache/beam/sdk/io/aws/s3/S3FileSystemTest.java and see if adapting your test like it helps.

> TextIO.read() will ignore custom AwsServiceEndpoint
> ---------------------------------------------------
>
>                 Key: BEAM-11788
>                 URL: https://issues.apache.org/jira/browse/BEAM-11788
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-aws
>    Affects Versions: 2.27.0
>            Reporter: Reza Safi
>            Priority: P2
>
> For testing purposes we makes use of S3Mock to mimic S3 when working with beam. We can do something similar to S3FileSystem unit tests in the apache beam code base. First doing the setup for S3Mock and then create a test:
> {code:java}
> public class S3ReadTransformSuite {
>     private static S3Mock api;
>     private static AmazonS3 client;
>     private static S3Options opt;
>     @BeforeClass
>     public static void beforeClass() {
>         api = new S3Mock.Builder().withInMemoryBackend().build();
>         Http.ServerBinding binding = api.start();
>         AwsClientBuilder.EndpointConfiguration endpoint =
>                 new AwsClientBuilder.EndpointConfiguration(
>                         "http://localhost:" + binding.localAddress().getPort(), "us-west-2");
>         client =
>                 AmazonS3ClientBuilder.standard()
>                         .withPathStyleAccessEnabled(true)
>                         .withEndpointConfiguration(endpoint)
>                         .withCredentials(new AWSStaticCredentialsProvider(new AnonymousAWSCredentials()))
>                         .build();
>         opt = PipelineOptionsFactory.as(S3Options.class);
>        //********** setting the custom entry point for opt
>         opt.setAwsServiceEndpoint(endpoint.getServiceEndpoint());
>     }
>     @AfterClass
>     public static void afterClass() {
>        api.stop();
>     }
>     @Test
>     public void testS3ReadTransform() throws IOException {
>         // create a text file locally
>         byte[] writtenArray = new byte[] {0};
>         ByteBuffer bb = ByteBuffer.allocate(writtenArray.length);
>         bb.put(writtenArray);
>         File tempDir = Files.createTempDir();
>         tempDir.deleteOnExit();
>         File output = new File(tempDir, "output.txt");
>         String outputDir = output.getAbsolutePath();
>         FileWriter fw = new FileWriter(output);
>         BufferedWriter bw = new BufferedWriter(fw);
>         bw.write("something");
>         bw.close();
>         // create a bucket and put the local file there
>         client.createBucket("mytest");
>         client.putObject("mytest","output.txt",outputDir);
>         List<com.amazonaws.services.s3.model.Bucket> l =client.listBuckets();
>         // check whether the bucket and file in it exist
>         assert(l.get(0).getName().equals("mytest"));
>         assert(client.getObject("mytest","output.txt").getKey().equals("output.txt"));
>          // create the pipeline using the options with custom endpointentry set earlier
>          Pipeline p = TestPipeline.create(opt);
>          // the following line will fail with FileNotFound exception 
>          PCollection<String> sr = p.apply(
>                 TextIO.read().from("s3://mytest/output.txt")
>          );
>         PAssert.that(sr).containsInAnyOrder("something");
>         p.run();
>     }
> }
> {code}
> The above test will always with an error like:
> {code:java}
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.io.FileNotFoundException: No files matched spec: s3://mytest/output.txt
> 	at test.mycomp.S3ReadTransformSuite.testS3ReadTransform(S3ReadTransformSuite.java:125)
> Caused by: java.io.FileNotFoundException: No files matched spec: s3://mytest/output.txt
> {code}
> (It should be mentioned that for this test we added "127.0.0.1 mytest.localhost" to "/etc/hosts" to workaround another known issue.)
> It seems that TextIO.read().from() ignores the entrypoint that is set in the pipeline options and basically it tries to read from the normal s3 and as a result it will fail.
> This issue restricts our ability to write unit tests for custom stuff like new Ptrasnforms, since we need to mock s3 for unit tests. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)