You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Reza Safi (Jira)" <ji...@apache.org> on 2021/02/10 05:29:00 UTC

[jira] [Created] (BEAM-11788) TextIO.read() will ignore custom AwsServiceEndpoint

Reza Safi created BEAM-11788:
--------------------------------

             Summary: TextIO.read() will ignore custom AwsServiceEndpoint
                 Key: BEAM-11788
                 URL: https://issues.apache.org/jira/browse/BEAM-11788
             Project: Beam
          Issue Type: Bug
          Components: io-java-aws
    Affects Versions: 2.27.0
            Reporter: Reza Safi


For testing purposes we makes use of S3Mock to mimic S3 when working with beam. We can do something similar to S3FileSystem unit tests in the apache beam code base. First doing the setup for S3Mock and then create a test:
{code:java}
public class S3ReadTransformSuite {
    private static S3Mock api;
    private static AmazonS3 client;
    private static S3Options opt;

    @BeforeClass
    public static void beforeClass() {
        api = new S3Mock.Builder().withInMemoryBackend().build();
        Http.ServerBinding binding = api.start();

        AwsClientBuilder.EndpointConfiguration endpoint =
                new AwsClientBuilder.EndpointConfiguration(
                        "http://localhost:" + binding.localAddress().getPort(), "us-west-2");
        client =
                AmazonS3ClientBuilder.standard()
                        .withPathStyleAccessEnabled(true)
                        .withEndpointConfiguration(endpoint)
                        .withCredentials(new AWSStaticCredentialsProvider(new AnonymousAWSCredentials()))
                        .build();
        opt = PipelineOptionsFactory.as(S3Options.class);
       //********** setting the custom entry point for opt
        opt.setAwsServiceEndpoint(endpoint.getServiceEndpoint());
    }

    @AfterClass
    public static void afterClass() {
       api.stop();
    }

    @Test
    public void testS3ReadTransform() throws IOException {
        // create a text file locally
        byte[] writtenArray = new byte[] {0};
        ByteBuffer bb = ByteBuffer.allocate(writtenArray.length);
        bb.put(writtenArray);
        File tempDir = Files.createTempDir();
        tempDir.deleteOnExit();
        File output = new File(tempDir, "output.txt");
        String outputDir = output.getAbsolutePath();
        FileWriter fw = new FileWriter(output);
        BufferedWriter bw = new BufferedWriter(fw);
        bw.write("something");
        bw.close();
        // create a bucket and put the local file there
        client.createBucket("mytest");
        client.putObject("mytest","output.txt",outputDir);
        List<com.amazonaws.services.s3.model.Bucket> l =client.listBuckets();
        // check whether the bucket and file in it exist
        assert(l.get(0).getName().equals("mytest"));
        assert(client.getObject("mytest","output.txt").getKey().equals("output.txt"));
         // create the pipeline using the options with custom endpointentry set earlier
         Pipeline p = TestPipeline.create(opt);
         // the following line will fail with FileNotFound exception 
         PCollection<String> sr = p.apply(
                TextIO.read().from("s3://mytest/output.txt")
         );
        PAssert.that(sr).containsInAnyOrder("something");
        p.run();
    }
}
{code}
The above test will always with an error like:
{code:java}
org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.io.FileNotFoundException: No files matched spec: s3://mytest/output.txt
	at test.mycomp.S3ReadTransformSuite.testS3ReadTransform(S3ReadTransformSuite.java:125)
Caused by: java.io.FileNotFoundException: No files matched spec: s3://mytest/output.txt
{code}
(It should be mentioned that for this test we added "127.0.0.1 mytest.localhost" to "/etc/hosts" to workaround another known issue.)

It seems that TextIO.read().from() ignores the entrypoint that is set in the pipeline options and basically it tries to read from the normal s3 and as a result it will fail.

This issue restricts our ability to write unit tests for custom stuff like new Ptrasnforms, since we need to mock s3 for unit tests. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)