You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/03 20:13:35 UTC

[GitHub] [beam] kennknowles opened a new issue, #18696: TextIO.Write not handling URI properly

kennknowles opened a new issue, #18696:
URL: https://github.com/apache/beam/issues/18696

   I think I have found a bug in TextIO, in the way it handles URIs.  I am told that the TextIO.Write.to(String) method should take a URI string, but this doesn't seem to work for me.
   
   Test case inline:
   
   
   ```
   
   
   
   import static org.junit.Assert.assertFalse;
   import static org.junit.Assert.assertTrue;
   
   import
   java.io.File;
   import java.io.IOException;
   import java.net.URI;
   import java.nio.file.*;
   import java.util.*;
   import
   org.apache.beam.sdk.io.TextIO;
   import org.apache.beam.sdk.testing.TestPipeline;
   import org.apache.beam.sdk.transforms.Create;
   import
   org.apache.beam.sdk.values.PCollection;
   import org.junit.Rule;
   import org.junit.Test;
   
   public class
   Beam3429Test {
   
       @Rule
       public final transient TestPipeline pipeline = TestPipeline.create();
   
   
      @Test
       public void testBeamTextIO() throws IOException {
           List<String> words = Arrays.asList("tom",
   "huck", "polly");
           Create.Values<String> source = Create.of(words);
           PCollection<String>
   coll = this.pipeline.apply(source);
           String fileName = "test" + System.currentTimeMillis() +
   ".txt";
           String fileURI = new File(fileName).toPath().toUri().toString();
           // ie file:///full/unix/path/to/test.file
   
   
          coll.apply(TextIO.write().to(fileURI));
           //test passes if we use the line below instead
   of the above
   //        coll.apply(TextIO.write().to(fileName));
           pipeline.run();
   
         
    //read all sharded files:
           Path dir = Paths.get(URI.create(fileURI)).getParent();
   
        
     List<Path> files = new ArrayList<>();
           Files.newDirectoryStream(dir, new DirectoryStream.Filter<Path>()
   {
               @Override public boolean accept(Path entry) throws IOException {
                   return
   entry.toString().contains(fileName);
               }
           }).forEach(path -> files.add(path));
   
   
          assertFalse("no files produced!", files.isEmpty());
           List<String> fileContents = new
   ArrayList<>();
           for (Path f : files) {
               fileContents.addAll(Files.readAllLines(f));
   
          }
           assertTrue(fileContents.contains("tom"));
           assertTrue(fileContents.contains("polly"));
   
   
      }
   }
   
   ```
   
   
   
   
   
   Imported from Jira [BEAM-3429](https://issues.apache.org/jira/browse/BEAM-3429). Original Jira may contain additional context.
   Reported by: mdarwin.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org