You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/06/22 13:46:59 UTC

[GitHub] [beam] davidak09 opened a new pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder uses ResourceIdCoder

davidak09 opened a new pull request #12050:
URL: https://github.com/apache/beam/pull/12050


   `DefaultFilenamePolicy.ParamsCoder` used `StringUtf8Coder` for encoding/decoding `baseFilename` and therefore information whether `baseFilename resourceID` is file or directory was lost. Using  `ResourceIdCoder` instead fixes this issue.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Dataflow | Flink | Samza | Spark
   --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/)
   XLang | --- | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/)
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/)
   Portable | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder is not able to decode directory on the local file system

Posted by GitBox <gi...@apache.org>.
lukecwik commented on pull request #12050:
URL: https://github.com/apache/beam/pull/12050#issuecomment-677784438


   > > We should update LocalResourceId and drop the isDirectory field
   > 
   > I removed the `isDirectory` field, nevertheless I left `isDirectory` parameter in the constructor and the factory method untouched. WDYT?
   
   Sg


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on a change in pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder is not able to decode directory on the local file system

Posted by GitBox <gi...@apache.org>.
lukecwik commented on a change in pull request #12050:
URL: https://github.com/apache/beam/pull/12050#discussion_r474136756



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/LocalFileSystem.java
##########
@@ -202,6 +202,10 @@ protected void delete(Collection<LocalResourceId> resourceIds) throws IOExceptio
 
   @Override
   protected LocalResourceId matchNewResource(String singleResourceSpec, boolean isDirectory) {
+    if (singleResourceSpec.endsWith(File.separator) && !isDirectory) {

Review comment:
       Your right, it seems as though Linux doesn't allow `/`, Mac OS X doesn't allow `:`, Windows doesn't allow `\`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] davidak09 commented on a change in pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder uses ResourceIdCoder

Posted by GitBox <gi...@apache.org>.
davidak09 commented on a change in pull request #12050:
URL: https://github.com/apache/beam/pull/12050#discussion_r464331768



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/DefaultFilenamePolicy.java
##########
@@ -182,19 +184,26 @@ public void encode(Params value, OutputStream outStream) throws IOException {
       if (value == null) {
         throw new CoderException("cannot encode a null value");
       }
-      stringCoder.encode(value.baseFilename.get().toString(), outStream);
-      stringCoder.encode(value.shardTemplate, outStream);
-      stringCoder.encode(value.suffix, outStream);
+      STRING_CODER.encode(value.baseFilename.get().toString(), outStream);
+      STRING_CODER.encode(value.shardTemplate, outStream);
+      STRING_CODER.encode(value.suffix, outStream);
+      BOOLEAN_CODER.encode(value.baseFilename.get().isDirectory(), outStream);
     }
 
     @Override
     public Params decode(InputStream inStream) throws IOException {
-      ResourceId prefix =
-          FileBasedSink.convertToFileResourceIfPossible(stringCoder.decode(inStream));
-      String shardTemplate = stringCoder.decode(inStream);
-      String suffix = stringCoder.decode(inStream);
+      String prefix = STRING_CODER.decode(inStream);
+      String shardTemplate = STRING_CODER.decode(inStream);
+      String suffix = STRING_CODER.decode(inStream);
+      ResourceId baseFilename;
+      if (inStream.available() > 0) {
+        baseFilename = FileSystems.matchNewResource(prefix, BOOLEAN_CODER.decode(inStream));
+      } else {
+        // fallback for ensure backward compatibility
+        baseFilename = FileBasedSink.convertToFileResourceIfPossible(prefix);

Review comment:
       OK, I agree with the first solution - taking the breaking change. I will ask on the mailing list.
   
   BTW I did workaround in my project where I use only files (not directories) as `Params` base filenames and then everything works fine. That means I no longer needs this fix but it still seems to me like a bug which is worth fixing.
   
   (P.S.: The suffix wouldn't have to be the path separator (`/`) but e.g. `sometotallyrandomstring123`  - but it's very ugly...)
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder is not able to decode directory on the local file system

Posted by GitBox <gi...@apache.org>.
lukecwik commented on pull request #12050:
URL: https://github.com/apache/beam/pull/12050#issuecomment-677785762


   @dmvk The backwards compat issues have been addressed. Will merge, feel free to add additional comments if there was something that was not addressed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] davidak09 commented on a change in pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder uses ResourceIdCoder

Posted by GitBox <gi...@apache.org>.
davidak09 commented on a change in pull request #12050:
URL: https://github.com/apache/beam/pull/12050#discussion_r464849600



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/DefaultFilenamePolicy.java
##########
@@ -182,19 +184,26 @@ public void encode(Params value, OutputStream outStream) throws IOException {
       if (value == null) {
         throw new CoderException("cannot encode a null value");
       }
-      stringCoder.encode(value.baseFilename.get().toString(), outStream);
-      stringCoder.encode(value.shardTemplate, outStream);
-      stringCoder.encode(value.suffix, outStream);
+      STRING_CODER.encode(value.baseFilename.get().toString(), outStream);
+      STRING_CODER.encode(value.shardTemplate, outStream);
+      STRING_CODER.encode(value.suffix, outStream);
+      BOOLEAN_CODER.encode(value.baseFilename.get().isDirectory(), outStream);
     }
 
     @Override
     public Params decode(InputStream inStream) throws IOException {
-      ResourceId prefix =
-          FileBasedSink.convertToFileResourceIfPossible(stringCoder.decode(inStream));
-      String shardTemplate = stringCoder.decode(inStream);
-      String suffix = stringCoder.decode(inStream);
+      String prefix = STRING_CODER.decode(inStream);
+      String shardTemplate = STRING_CODER.decode(inStream);
+      String suffix = STRING_CODER.decode(inStream);
+      ResourceId baseFilename;
+      if (inStream.available() > 0) {
+        baseFilename = FileSystems.matchNewResource(prefix, BOOLEAN_CODER.decode(inStream));
+      } else {
+        // fallback for ensure backward compatibility
+        baseFilename = FileBasedSink.convertToFileResourceIfPossible(prefix);

Review comment:
       https://lists.apache.org/thread.html/r1d844830648b88948d38d0c28d106dd7fe2ed92af4a170e2679f8c2f%40%3Cdev.beam.apache.org%3E




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on a change in pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder is not able to decode directory on the local file system

Posted by GitBox <gi...@apache.org>.
lukecwik commented on a change in pull request #12050:
URL: https://github.com/apache/beam/pull/12050#discussion_r473191445



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/LocalFileSystem.java
##########
@@ -202,6 +202,10 @@ protected void delete(Collection<LocalResourceId> resourceIds) throws IOExceptio
 
   @Override
   protected LocalResourceId matchNewResource(String singleResourceSpec, boolean isDirectory) {
+    if (singleResourceSpec.endsWith(File.separator) && !isDirectory) {

Review comment:
       Do you want to add the file separator if it doesn't exist if `isDirectory == true`. This will help make it less error prone for users. Similar to [GcsFileSystem](https://github.com/apache/beam/blob/f3ac4822191ca63d74a8fd71f81da976b3d2fbd1/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystem.java#L154).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] davidak09 commented on a change in pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder uses ResourceIdCoder

Posted by GitBox <gi...@apache.org>.
davidak09 commented on a change in pull request #12050:
URL: https://github.com/apache/beam/pull/12050#discussion_r468538993



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/DefaultFilenamePolicy.java
##########
@@ -182,19 +184,26 @@ public void encode(Params value, OutputStream outStream) throws IOException {
       if (value == null) {
         throw new CoderException("cannot encode a null value");
       }
-      stringCoder.encode(value.baseFilename.get().toString(), outStream);
-      stringCoder.encode(value.shardTemplate, outStream);
-      stringCoder.encode(value.suffix, outStream);
+      STRING_CODER.encode(value.baseFilename.get().toString(), outStream);
+      STRING_CODER.encode(value.shardTemplate, outStream);
+      STRING_CODER.encode(value.suffix, outStream);
+      BOOLEAN_CODER.encode(value.baseFilename.get().isDirectory(), outStream);
     }
 
     @Override
     public Params decode(InputStream inStream) throws IOException {
-      ResourceId prefix =
-          FileBasedSink.convertToFileResourceIfPossible(stringCoder.decode(inStream));
-      String shardTemplate = stringCoder.decode(inStream);
-      String suffix = stringCoder.decode(inStream);
+      String prefix = STRING_CODER.decode(inStream);
+      String shardTemplate = STRING_CODER.decode(inStream);
+      String suffix = STRING_CODER.decode(inStream);
+      ResourceId baseFilename;
+      if (inStream.available() > 0) {
+        baseFilename = FileSystems.matchNewResource(prefix, BOOLEAN_CODER.decode(inStream));
+      } else {
+        // fallback for ensure backward compatibility
+        baseFilename = FileBasedSink.convertToFileResourceIfPossible(prefix);

Review comment:
       Hi @lukecwik, finally after the discussion in the mailing list I chose the second proposed option and fixed the underlying filesystem. Turns out that only `LocalFileSystem` is broken, on the other hand e.g. HDFS and S3 look fine and already have this kind of check.
   
   I still think that the proper solution should be to use the `ResourceIdCoder` instead of `StringUtf8Coder` but I understand the consequences.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] davidak09 commented on pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder uses ResourceIdCoder

Posted by GitBox <gi...@apache.org>.
davidak09 commented on pull request #12050:
URL: https://github.com/apache/beam/pull/12050#issuecomment-647928655


   Thanks for the review @dmvk !
   
   You're right, I forgot to ensure backward compatibility. I tried to implement the requested changes, will you take a look at it please?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] dmvk commented on a change in pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder uses ResourceIdCoder

Posted by GitBox <gi...@apache.org>.
dmvk commented on a change in pull request #12050:
URL: https://github.com/apache/beam/pull/12050#discussion_r444762111



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/DefaultFilenamePolicy.java
##########
@@ -182,19 +184,26 @@ public void encode(Params value, OutputStream outStream) throws IOException {
       if (value == null) {
         throw new CoderException("cannot encode a null value");
       }
-      stringCoder.encode(value.baseFilename.get().toString(), outStream);
-      stringCoder.encode(value.shardTemplate, outStream);
-      stringCoder.encode(value.suffix, outStream);
+      STRING_CODER.encode(value.baseFilename.get().toString(), outStream);
+      STRING_CODER.encode(value.shardTemplate, outStream);
+      STRING_CODER.encode(value.suffix, outStream);
+      BOOLEAN_CODER.encode(value.baseFilename.get().isDirectory(), outStream);
     }
 
     @Override
     public Params decode(InputStream inStream) throws IOException {
-      ResourceId prefix =
-          FileBasedSink.convertToFileResourceIfPossible(stringCoder.decode(inStream));
-      String shardTemplate = stringCoder.decode(inStream);
-      String suffix = stringCoder.decode(inStream);
+      String prefix = STRING_CODER.decode(inStream);
+      String shardTemplate = STRING_CODER.decode(inStream);
+      String suffix = STRING_CODER.decode(inStream);
+      ResourceId baseFilename;
+      if (inStream.available() > 0) {
+        baseFilename = FileSystems.matchNewResource(prefix, BOOLEAN_CODER.decode(inStream));
+      } else {
+        // fallback for ensure backward compatibility
+        baseFilename = FileBasedSink.convertToFileResourceIfPossible(prefix);

Review comment:
       Would it be possible to add test case for bw compatible code path?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] dmvk commented on pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder uses ResourceIdCoder

Posted by GitBox <gi...@apache.org>.
dmvk commented on pull request #12050:
URL: https://github.com/apache/beam/pull/12050#issuecomment-648698799


   Run Java PreCommit


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on a change in pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder uses ResourceIdCoder

Posted by GitBox <gi...@apache.org>.
lukecwik commented on a change in pull request #12050:
URL: https://github.com/apache/beam/pull/12050#discussion_r445062331



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/DefaultFilenamePolicy.java
##########
@@ -182,19 +184,26 @@ public void encode(Params value, OutputStream outStream) throws IOException {
       if (value == null) {
         throw new CoderException("cannot encode a null value");
       }
-      stringCoder.encode(value.baseFilename.get().toString(), outStream);
-      stringCoder.encode(value.shardTemplate, outStream);
-      stringCoder.encode(value.suffix, outStream);
+      STRING_CODER.encode(value.baseFilename.get().toString(), outStream);
+      STRING_CODER.encode(value.shardTemplate, outStream);
+      STRING_CODER.encode(value.suffix, outStream);
+      BOOLEAN_CODER.encode(value.baseFilename.get().isDirectory(), outStream);
     }
 
     @Override
     public Params decode(InputStream inStream) throws IOException {
-      ResourceId prefix =
-          FileBasedSink.convertToFileResourceIfPossible(stringCoder.decode(inStream));
-      String shardTemplate = stringCoder.decode(inStream);
-      String suffix = stringCoder.decode(inStream);
+      String prefix = STRING_CODER.decode(inStream);
+      String shardTemplate = STRING_CODER.decode(inStream);
+      String suffix = STRING_CODER.decode(inStream);
+      ResourceId baseFilename;
+      if (inStream.available() > 0) {
+        baseFilename = FileSystems.matchNewResource(prefix, BOOLEAN_CODER.decode(inStream));
+      } else {
+        // fallback for ensure backward compatibility
+        baseFilename = FileBasedSink.convertToFileResourceIfPossible(prefix);

Review comment:
       convertToFileResourceIfPossible attempts to match a file and if that fails attempts to match a directory.
   
   Is the issue that the underlying filesystem erroneously says something is a file when really it is a directory?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik edited a comment on pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder is not able to decode directory on the local file system

Posted by GitBox <gi...@apache.org>.
lukecwik edited a comment on pull request #12050:
URL: https://github.com/apache/beam/pull/12050#issuecomment-677784438


   > > We should update LocalResourceId and drop the isDirectory field
   > 
   > I removed the `isDirectory` field, nevertheless I left `isDirectory` parameter in the constructor and the factory method untouched. WDYT?
   
   Makes sense since LocalResourceId constructor adds `File.separator` onto the path.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on a change in pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder is not able to decode directory on the local file system

Posted by GitBox <gi...@apache.org>.
lukecwik commented on a change in pull request #12050:
URL: https://github.com/apache/beam/pull/12050#discussion_r473199520



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/LocalFileSystem.java
##########
@@ -202,6 +202,10 @@ protected void delete(Collection<LocalResourceId> resourceIds) throws IOExceptio
 
   @Override
   protected LocalResourceId matchNewResource(String singleResourceSpec, boolean isDirectory) {
+    if (singleResourceSpec.endsWith(File.separator) && !isDirectory) {

Review comment:
       What if the character is escaped and being used in the name component of the path string?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] davidak09 commented on pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder is not able to decode directory on the local file system

Posted by GitBox <gi...@apache.org>.
davidak09 commented on pull request #12050:
URL: https://github.com/apache/beam/pull/12050#issuecomment-677319416


   > We should update LocalResourceId and drop the isDirectory field
   
   I removed the `isDirectory` field, nevertheless I left `isDirectory` parameter in the constructor and the factory method untouched. WDYT?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik merged pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder is not able to decode directory on the local file system

Posted by GitBox <gi...@apache.org>.
lukecwik merged pull request #12050:
URL: https://github.com/apache/beam/pull/12050


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] davidak09 commented on a change in pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder is not able to decode directory on the local file system

Posted by GitBox <gi...@apache.org>.
davidak09 commented on a change in pull request #12050:
URL: https://github.com/apache/beam/pull/12050#discussion_r473650443



##########
File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/DefaultFilenamePolicyTest.java
##########
@@ -48,6 +51,26 @@ private static String constructName(
     return constructed.toString();
   }
 
+  private static DefaultFilenamePolicy.Params encodeDecodeParams(

Review comment:
       thanks for the advice :+1: 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on a change in pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder is not able to decode directory on the local file system

Posted by GitBox <gi...@apache.org>.
lukecwik commented on a change in pull request #12050:
URL: https://github.com/apache/beam/pull/12050#discussion_r473191445



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/LocalFileSystem.java
##########
@@ -202,6 +202,10 @@ protected void delete(Collection<LocalResourceId> resourceIds) throws IOExceptio
 
   @Override
   protected LocalResourceId matchNewResource(String singleResourceSpec, boolean isDirectory) {
+    if (singleResourceSpec.endsWith(File.separator) && !isDirectory) {

Review comment:
       Do you want to add the file separator if it doesn't exist if `isDirectory == true`. This will help make it less error prone for users.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on a change in pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder uses ResourceIdCoder

Posted by GitBox <gi...@apache.org>.
lukecwik commented on a change in pull request #12050:
URL: https://github.com/apache/beam/pull/12050#discussion_r445696251



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/DefaultFilenamePolicy.java
##########
@@ -182,19 +184,26 @@ public void encode(Params value, OutputStream outStream) throws IOException {
       if (value == null) {
         throw new CoderException("cannot encode a null value");
       }
-      stringCoder.encode(value.baseFilename.get().toString(), outStream);
-      stringCoder.encode(value.shardTemplate, outStream);
-      stringCoder.encode(value.suffix, outStream);
+      STRING_CODER.encode(value.baseFilename.get().toString(), outStream);
+      STRING_CODER.encode(value.shardTemplate, outStream);
+      STRING_CODER.encode(value.suffix, outStream);
+      BOOLEAN_CODER.encode(value.baseFilename.get().isDirectory(), outStream);
     }
 
     @Override
     public Params decode(InputStream inStream) throws IOException {
-      ResourceId prefix =
-          FileBasedSink.convertToFileResourceIfPossible(stringCoder.decode(inStream));
-      String shardTemplate = stringCoder.decode(inStream);
-      String suffix = stringCoder.decode(inStream);
+      String prefix = STRING_CODER.decode(inStream);
+      String shardTemplate = STRING_CODER.decode(inStream);
+      String suffix = STRING_CODER.decode(inStream);
+      ResourceId baseFilename;
+      if (inStream.available() > 0) {
+        baseFilename = FileSystems.matchNewResource(prefix, BOOLEAN_CODER.decode(inStream));
+      } else {
+        // fallback for ensure backward compatibility
+        baseFilename = FileBasedSink.convertToFileResourceIfPossible(prefix);

Review comment:
       I think that there are a couple of options here:
   1) Take the breaking change to the coder because it is a fix, make sure it is documented in the release notes
   2) Try to fix the underlying filesystem to do a better job of file/dir matching
   3) Deprecate this filename policy, create a new one (DefaultFilenamePolicy2) and tell people to use it in new code.
   
   I'm for #1 but would ask for consensus on the mailing list.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] davidak09 commented on a change in pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder uses ResourceIdCoder

Posted by GitBox <gi...@apache.org>.
davidak09 commented on a change in pull request #12050:
URL: https://github.com/apache/beam/pull/12050#discussion_r445485378



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/DefaultFilenamePolicy.java
##########
@@ -182,19 +184,26 @@ public void encode(Params value, OutputStream outStream) throws IOException {
       if (value == null) {
         throw new CoderException("cannot encode a null value");
       }
-      stringCoder.encode(value.baseFilename.get().toString(), outStream);
-      stringCoder.encode(value.shardTemplate, outStream);
-      stringCoder.encode(value.suffix, outStream);
+      STRING_CODER.encode(value.baseFilename.get().toString(), outStream);
+      STRING_CODER.encode(value.shardTemplate, outStream);
+      STRING_CODER.encode(value.suffix, outStream);
+      BOOLEAN_CODER.encode(value.baseFilename.get().isDirectory(), outStream);
     }
 
     @Override
     public Params decode(InputStream inStream) throws IOException {
-      ResourceId prefix =
-          FileBasedSink.convertToFileResourceIfPossible(stringCoder.decode(inStream));
-      String shardTemplate = stringCoder.decode(inStream);
-      String suffix = stringCoder.decode(inStream);
+      String prefix = STRING_CODER.decode(inStream);
+      String shardTemplate = STRING_CODER.decode(inStream);
+      String suffix = STRING_CODER.decode(inStream);
+      ResourceId baseFilename;
+      if (inStream.available() > 0) {
+        baseFilename = FileSystems.matchNewResource(prefix, BOOLEAN_CODER.decode(inStream));
+      } else {
+        // fallback for ensure backward compatibility
+        baseFilename = FileBasedSink.convertToFileResourceIfPossible(prefix);

Review comment:
       Exactly, that's the issue, for example `LocalFileSystem` matches path `/tmp/dirA/` as file..
   
   How about something like this?
   ```java
   public static class ParamsCoder extends AtomicCoder<Params> {
       private static final ParamsCoder INSTANCE = new ParamsCoder();
       private static final Coder<String> STRING_CODER = StringUtf8Coder.of();
   
       public static ParamsCoder of() {
         return INSTANCE;
       }
   
       @Override
       public void encode(Params value, OutputStream outStream) throws IOException {
         if (value == null) {
           throw new CoderException("cannot encode a null value");
         }
         ResourceId baseFilename = value.getBaseFilename().get();
         String baseFilenameString =
             (baseFilename.isDirectory() && !baseFilename.toString().endsWith("/"))
                 ? baseFilename.toString() + "/"
                 : baseFilename.toString();
         STRING_CODER.encode(baseFilenameString, outStream);
         STRING_CODER.encode(value.getShardTemplate(), outStream);
         STRING_CODER.encode(value.getSuffix(), outStream);
       }
   
       @Override
       public Params decode(InputStream inStream) throws IOException {
         String baseFilenameString = STRING_CODER.decode(inStream);
         ResourceId prefix =
             FileSystems.matchNewResource(baseFilenameString, baseFilenameString.endsWith("/"));
         String shardTemplate = STRING_CODER.decode(inStream);
         String suffix = STRING_CODER.decode(inStream);
         return new Params()
             .withBaseFilename(prefix)
             .withShardTemplate(shardTemplate)
             .withSuffix(suffix);
       }
     }
   ```
   But I'm not sure how to handle the suffix (`/`) to ensure compatibility among (file) systems




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] davidak09 commented on a change in pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder is not able to decode directory on the local file system

Posted by GitBox <gi...@apache.org>.
davidak09 commented on a change in pull request #12050:
URL: https://github.com/apache/beam/pull/12050#discussion_r473650689



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/LocalFileSystem.java
##########
@@ -202,6 +202,10 @@ protected void delete(Collection<LocalResourceId> resourceIds) throws IOExceptio
 
   @Override
   protected LocalResourceId matchNewResource(String singleResourceSpec, boolean isDirectory) {
+    if (singleResourceSpec.endsWith(File.separator) && !isDirectory) {

Review comment:
       sounds good, I added it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on a change in pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder uses ResourceIdCoder

Posted by GitBox <gi...@apache.org>.
lukecwik commented on a change in pull request #12050:
URL: https://github.com/apache/beam/pull/12050#discussion_r445696251



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/DefaultFilenamePolicy.java
##########
@@ -182,19 +184,26 @@ public void encode(Params value, OutputStream outStream) throws IOException {
       if (value == null) {
         throw new CoderException("cannot encode a null value");
       }
-      stringCoder.encode(value.baseFilename.get().toString(), outStream);
-      stringCoder.encode(value.shardTemplate, outStream);
-      stringCoder.encode(value.suffix, outStream);
+      STRING_CODER.encode(value.baseFilename.get().toString(), outStream);
+      STRING_CODER.encode(value.shardTemplate, outStream);
+      STRING_CODER.encode(value.suffix, outStream);
+      BOOLEAN_CODER.encode(value.baseFilename.get().isDirectory(), outStream);
     }
 
     @Override
     public Params decode(InputStream inStream) throws IOException {
-      ResourceId prefix =
-          FileBasedSink.convertToFileResourceIfPossible(stringCoder.decode(inStream));
-      String shardTemplate = stringCoder.decode(inStream);
-      String suffix = stringCoder.decode(inStream);
+      String prefix = STRING_CODER.decode(inStream);
+      String shardTemplate = STRING_CODER.decode(inStream);
+      String suffix = STRING_CODER.decode(inStream);
+      ResourceId baseFilename;
+      if (inStream.available() > 0) {
+        baseFilename = FileSystems.matchNewResource(prefix, BOOLEAN_CODER.decode(inStream));
+      } else {
+        // fallback for ensure backward compatibility
+        baseFilename = FileBasedSink.convertToFileResourceIfPossible(prefix);

Review comment:
       I think that there are a couple of options here:
   1) Take the breaking change to the coder because it is a fix, make sure it is documented in the release notes
   2) Try to fix the underlying filesystem to do a better job of file/dir matching
   3) Deprecate this filename policy, create a new one (DefaultFilenamePolicy2) and tell people to use it in new code.
   
   I'm for 1 but would ask for consensus on the mailing list.
   
   Also, any `/` hacking will make things worse since different file systems use different path separator characters (e.g linux vs windows)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] davidak09 commented on pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder uses ResourceIdCoder

Posted by GitBox <gi...@apache.org>.
davidak09 commented on pull request #12050:
URL: https://github.com/apache/beam/pull/12050#issuecomment-647530984


   R: @dmvk 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] davidak09 commented on a change in pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder uses ResourceIdCoder

Posted by GitBox <gi...@apache.org>.
davidak09 commented on a change in pull request #12050:
URL: https://github.com/apache/beam/pull/12050#discussion_r444838467



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/DefaultFilenamePolicy.java
##########
@@ -182,19 +184,26 @@ public void encode(Params value, OutputStream outStream) throws IOException {
       if (value == null) {
         throw new CoderException("cannot encode a null value");
       }
-      stringCoder.encode(value.baseFilename.get().toString(), outStream);
-      stringCoder.encode(value.shardTemplate, outStream);
-      stringCoder.encode(value.suffix, outStream);
+      STRING_CODER.encode(value.baseFilename.get().toString(), outStream);
+      STRING_CODER.encode(value.shardTemplate, outStream);
+      STRING_CODER.encode(value.suffix, outStream);
+      BOOLEAN_CODER.encode(value.baseFilename.get().isDirectory(), outStream);
     }
 
     @Override
     public Params decode(InputStream inStream) throws IOException {
-      ResourceId prefix =
-          FileBasedSink.convertToFileResourceIfPossible(stringCoder.decode(inStream));
-      String shardTemplate = stringCoder.decode(inStream);
-      String suffix = stringCoder.decode(inStream);
+      String prefix = STRING_CODER.decode(inStream);
+      String shardTemplate = STRING_CODER.decode(inStream);
+      String suffix = STRING_CODER.decode(inStream);
+      ResourceId baseFilename;
+      if (inStream.available() > 0) {
+        baseFilename = FileSystems.matchNewResource(prefix, BOOLEAN_CODER.decode(inStream));
+      } else {
+        // fallback for ensure backward compatibility
+        baseFilename = FileBasedSink.convertToFileResourceIfPossible(prefix);

Review comment:
       ok, I added the requested test for bw compatibility
   
   btw you can see the buggy behavior there (lost information whether baseFilename is file/directory) 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] lukecwik commented on a change in pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder is not able to decode directory on the local file system

Posted by GitBox <gi...@apache.org>.
lukecwik commented on a change in pull request #12050:
URL: https://github.com/apache/beam/pull/12050#discussion_r473186855



##########
File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/DefaultFilenamePolicyTest.java
##########
@@ -48,6 +51,26 @@ private static String constructName(
     return constructed.toString();
   }
 
+  private static DefaultFilenamePolicy.Params encodeDecodeParams(

Review comment:
       Use [CoderUtils.clone](https://github.com/apache/beam/blob/f3ac4822191ca63d74a8fd71f81da976b3d2fbd1/sdks/java/core/src/main/java/org/apache/beam/sdk/util/CoderUtils.java#L140)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] davidak09 commented on a change in pull request #12050: [BEAM-10292] DefaultFilenamePolicy.ParamsCoder is not able to decode directory on the local file system

Posted by GitBox <gi...@apache.org>.
davidak09 commented on a change in pull request #12050:
URL: https://github.com/apache/beam/pull/12050#discussion_r473652666



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/LocalFileSystem.java
##########
@@ -202,6 +202,10 @@ protected void delete(Collection<LocalResourceId> resourceIds) throws IOExceptio
 
   @Override
   protected LocalResourceId matchNewResource(String singleResourceSpec, boolean isDirectory) {
+    if (singleResourceSpec.endsWith(File.separator) && !isDirectory) {

Review comment:
       Is it even possible? For example according to https://stackoverflow.com/questions/9847288/is-it-possible-to-use-in-a-filename it's not. Or do you mean something different?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org