You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by GitBox <gi...@apache.org> on 2021/10/05 16:31:52 UTC

[GitHub] [nifi] simonbence opened a new pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

simonbence opened a new pull request #5437:
URL: https://github.com/apache/nifi/pull/5437


   [NIFI-9265](https://issues.apache.org/jira/browse/NIFI-9265)
   
   A small change in order to prevent failure in case of paths containing multiplied 
   
   <!--
     Licensed to the Apache Software Foundation (ASF) under one or more
     contributor license agreements.  See the NOTICE file distributed with
     this work for additional information regarding copyright ownership.
     The ASF licenses this file to You under the Apache License, Version 2.0
     (the "License"); you may not use this file except in compliance with
     the License.  You may obtain a copy of the License at
         http://www.apache.org/licenses/LICENSE-2.0
     Unless required by applicable law or agreed to in writing, software
     distributed under the License is distributed on an "AS IS" BASIS,
     WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     See the License for the specific language governing permissions and
     limitations under the License.
   -->
   Thank you for submitting a contribution to Apache NiFi.
   
   Please provide a short description of the PR here:
   
   #### Description of PR
   
   _Enables X functionality; fixes bug NIFI-YYYY._
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
        in the commit message?
   
   - [ ] Does your PR title start with **NIFI-XXXX** where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
   
   - [ ] Has your PR been rebased against the latest commit within the target branch (typically `main`)?
   
   - [ ] Is your initial contribution a single, squashed commit? _Additional commits in response to PR reviewer feedback should be made on this branch and pushed to allow change tracking. Do not `squash` or use `--force` when pushing to allow for clean monitoring of changes._
   
   ### For code changes:
   - [ ] Have you ensured that the full suite of tests is executed via `mvn -Pcontrib-check clean install` at the root `nifi` folder?
   - [ ] Have you written or updated unit tests to verify your changes?
   - [ ] Have you verified that the full build is successful on JDK 8?
   - [ ] Have you verified that the full build is successful on JDK 11?
   - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE` file, including the main `LICENSE` file under `nifi-assembly`?
   - [ ] If applicable, have you updated the `NOTICE` file, including the main `NOTICE` file found under `nifi-assembly`?
   - [ ] If adding new Properties, have you added `.displayName` in addition to .name (programmatic access) for each of the new properties?
   
   ### For documentation related changes:
   - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
   
   ### Note:
   Please ensure that once the PR is submitted, you check GitHub Actions CI for build issues and submit an update to your PR as soon as possible.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] Lehel44 commented on a change in pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

Posted by GitBox <gi...@apache.org>.
Lehel44 commented on a change in pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#discussion_r726059622



##########
File path: nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/test/java/org/apache/nifi/processors/hadoop/TestFetchHDFS.java
##########
@@ -59,7 +59,25 @@ public void setup() {
     @Test
     public void testFetchStaticFileThatExists() throws IOException {
         final String file = "src/test/resources/testdata/randombytes-1";
-        runner.setProperty(FetchHDFS.FILENAME, file);
+        final String fileWithMultipliedSeparators = "src/test////resources//testdata/randombytes-1";
+        runner.setProperty(FetchHDFS.FILENAME, fileWithMultipliedSeparators);
+        runner.enqueue(new String("trigger flow file"));
+        runner.run();
+        runner.assertAllFlowFilesTransferred(FetchHDFS.REL_SUCCESS, 1);
+        final List<ProvenanceEventRecord> provenanceEvents = runner.getProvenanceEvents();
+        assertEquals(1, provenanceEvents.size());
+        final ProvenanceEventRecord fetchEvent = provenanceEvents.get(0);
+        assertEquals(ProvenanceEventType.FETCH, fetchEvent.getEventType());
+        // If it runs with a real HDFS, the protocol will be "hdfs://", but with a local filesystem, just assert the filename.
+        assertTrue(fetchEvent.getTransitUri().endsWith(file));
+    }
+
+    @Test
+    public void testFetchStaticFileThatExistsWithAbsolutePath() throws IOException {

Review comment:
       Minor: Would you please remove the `throws IOException`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] simonbence commented on a change in pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

Posted by GitBox <gi...@apache.org>.
simonbence commented on a change in pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#discussion_r725847502



##########
File path: nifi-nar-bundles/nifi-extension-utils/nifi-hadoop-utils/src/main/java/org/apache/nifi/processors/hadoop/AbstractHadoopProcessor.java
##########
@@ -674,39 +681,33 @@ protected Path getNormalizedPath(ProcessContext context, PropertyDescriptor prop
     }
 
     protected Path getNormalizedPath(final String rawPath) {
-        final Path path = new Path(rawPath);
-        final URI uri = path.toUri();
-
-        final URI fileSystemUri = getFileSystem().getUri();
-
-        if (uri.getScheme() != null) {
-            if (!uri.getScheme().equals(fileSystemUri.getScheme()) || !uri.getAuthority().equals(fileSystemUri.getAuthority())) {
-                getLogger().warn("The filesystem component of the URI configured ({}) does not match the filesystem URI from the Hadoop configuration file ({}) " +
-                        "and will be ignored.", uri, fileSystemUri);
-            }
-
-            return new Path(uri.getPath());
-        } else {
-            return path;
-        }
+       return getNormalizedPath(rawPath, Optional.empty());
     }
 
     protected Path getNormalizedPath(final ProcessContext context, final PropertyDescriptor property, final FlowFile flowFile) {
         final String propertyValue = context.getProperty(property).evaluateAttributeExpressions(flowFile).getValue();
-        final Path path = new Path(propertyValue);
-        final URI uri = path.toUri();
+        return getNormalizedPath(propertyValue, Optional.of(property.getDisplayName()));
+    }
 
+    private Path getNormalizedPath(final String rawPath, final Optional<String> propertyName) {

Review comment:
       I understand your point, but in general designing for `null` as an input I find unlucky on multiple levels. Within the project, methods usually assume that the input is not `null`, expect for the boundies of the system. Introducing this would set a dangerous example and might continue to roll. (Note: I do not state that you cannot find a concept like this in the codebase, but in general it should be something to avoid) `Optional` shows intent and forces the client to make a conscious decision. As of leaving the methods duplicated as of this point would come with too much duplication, this left us with `Optional` as an optimal looking solution. Please also not this is someting will not appear in the "API" of the class, thus it will not propogate.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] simonbence commented on a change in pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

Posted by GitBox <gi...@apache.org>.
simonbence commented on a change in pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#discussion_r724969241



##########
File path: nifi-nar-bundles/nifi-extension-utils/nifi-hadoop-utils/src/main/java/org/apache/nifi/processors/hadoop/AbstractHadoopProcessor.java
##########
@@ -674,39 +681,33 @@ protected Path getNormalizedPath(ProcessContext context, PropertyDescriptor prop
     }
 
     protected Path getNormalizedPath(final String rawPath) {
-        final Path path = new Path(rawPath);
-        final URI uri = path.toUri();
-
-        final URI fileSystemUri = getFileSystem().getUri();
-
-        if (uri.getScheme() != null) {
-            if (!uri.getScheme().equals(fileSystemUri.getScheme()) || !uri.getAuthority().equals(fileSystemUri.getAuthority())) {
-                getLogger().warn("The filesystem component of the URI configured ({}) does not match the filesystem URI from the Hadoop configuration file ({}) " +
-                        "and will be ignored.", uri, fileSystemUri);
-            }
-
-            return new Path(uri.getPath());
-        } else {
-            return path;
-        }
+       return getNormalizedPath(rawPath, Optional.empty());
     }
 
     protected Path getNormalizedPath(final ProcessContext context, final PropertyDescriptor property, final FlowFile flowFile) {
         final String propertyValue = context.getProperty(property).evaluateAttributeExpressions(flowFile).getValue();
-        final Path path = new Path(propertyValue);
-        final URI uri = path.toUri();
+        return getNormalizedPath(propertyValue, Optional.of(property.getDisplayName()));
+    }
 
+    private Path getNormalizedPath(final String rawPath, final Optional<String> propertyName) {
+        final URI uri = new Path(rawPath).toUri();
         final URI fileSystemUri = getFileSystem().getUri();
+        final String path;
 
         if (uri.getScheme() != null) {
             if (!uri.getScheme().equals(fileSystemUri.getScheme()) || !uri.getAuthority().equals(fileSystemUri.getAuthority())) {
-                getLogger().warn("The filesystem component of the URI configured in the '{}' property ({}) does not match the filesystem URI from the Hadoop configuration file ({}) " +
-                        "and will be ignored.", property.getDisplayName(), uri, fileSystemUri);
+                if (propertyName.isPresent()) {

Review comment:
       Good finding and a poor choice of test data on my part. I add a case which does not provide false positive. However, I keep the current test in order to prevent regression.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] asfgit closed pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #5437:
URL: https://github.com/apache/nifi/pull/5437


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] simonbence commented on a change in pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

Posted by GitBox <gi...@apache.org>.
simonbence commented on a change in pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#discussion_r724967483



##########
File path: nifi-nar-bundles/nifi-extension-utils/nifi-hadoop-utils/src/main/java/org/apache/nifi/processors/hadoop/AbstractHadoopProcessor.java
##########
@@ -674,39 +681,33 @@ protected Path getNormalizedPath(ProcessContext context, PropertyDescriptor prop
     }
 
     protected Path getNormalizedPath(final String rawPath) {

Review comment:
       I disagree with this approach. `getNormalizedPath` with the `String` argument is indeed being called only at the `FetchHDFS` currently, but calling the method with also containing the `Optional` would be strange from the perspective of the caller: `FetchHDFS` needs to know only that he needs to provide a path, nothing more. Other users might call the method with `ProcessContext` and so, but it is more like for the convinience of the caller. 
   
   In this sense, consider `AbstractHadoopProcessor` as an API for the children classes. The "additional" `Optional` parameter I consider an implementation detail, which should not be published towards possible callers. It intends to help the method to provide as much information during the `warn` as possible. Removing it would cost us some beneficial information to share when a possible issue arises. But to be honest, this is not even a new behaviour I just merged the two`getNormalizedPath` implementations, in order to reduce duplicated code (which with the new `replace` would be even more)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] Lehel44 commented on a change in pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

Posted by GitBox <gi...@apache.org>.
Lehel44 commented on a change in pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#discussion_r724562953



##########
File path: nifi-nar-bundles/nifi-extension-utils/nifi-hadoop-utils/src/main/java/org/apache/nifi/processors/hadoop/AbstractHadoopProcessor.java
##########
@@ -674,39 +681,33 @@ protected Path getNormalizedPath(ProcessContext context, PropertyDescriptor prop
     }
 
     protected Path getNormalizedPath(final String rawPath) {
-        final Path path = new Path(rawPath);
-        final URI uri = path.toUri();
-
-        final URI fileSystemUri = getFileSystem().getUri();
-
-        if (uri.getScheme() != null) {
-            if (!uri.getScheme().equals(fileSystemUri.getScheme()) || !uri.getAuthority().equals(fileSystemUri.getAuthority())) {
-                getLogger().warn("The filesystem component of the URI configured ({}) does not match the filesystem URI from the Hadoop configuration file ({}) " +
-                        "and will be ignored.", uri, fileSystemUri);
-            }
-
-            return new Path(uri.getPath());
-        } else {
-            return path;
-        }
+       return getNormalizedPath(rawPath, Optional.empty());
     }
 
     protected Path getNormalizedPath(final ProcessContext context, final PropertyDescriptor property, final FlowFile flowFile) {
         final String propertyValue = context.getProperty(property).evaluateAttributeExpressions(flowFile).getValue();
-        final Path path = new Path(propertyValue);
-        final URI uri = path.toUri();
+        return getNormalizedPath(propertyValue, Optional.of(property.getDisplayName()));
+    }
 
+    private Path getNormalizedPath(final String rawPath, final Optional<String> propertyName) {
+        final URI uri = new Path(rawPath).toUri();
         final URI fileSystemUri = getFileSystem().getUri();
+        final String path;
 
         if (uri.getScheme() != null) {
             if (!uri.getScheme().equals(fileSystemUri.getScheme()) || !uri.getAuthority().equals(fileSystemUri.getAuthority())) {
-                getLogger().warn("The filesystem component of the URI configured in the '{}' property ({}) does not match the filesystem URI from the Hadoop configuration file ({}) " +
-                        "and will be ignored.", property.getDisplayName(), uri, fileSystemUri);
+                if (propertyName.isPresent()) {

Review comment:
       Could you please add a test case which covers this part? It seems like Path normalizes the paths automatically and the difference can't be seen between the if and else branches related to that.

##########
File path: nifi-nar-bundles/nifi-extension-utils/nifi-hadoop-utils/src/main/java/org/apache/nifi/processors/hadoop/AbstractHadoopProcessor.java
##########
@@ -674,39 +681,33 @@ protected Path getNormalizedPath(ProcessContext context, PropertyDescriptor prop
     }
 
     protected Path getNormalizedPath(final String rawPath) {

Review comment:
       This is only called from FetchHDFS line 128 where the getPath method uses a _FILENAME_ property:
   
   `protected String getPath(final ProcessContext context, final FlowFile flowFile) {
           return context.getProperty(FILENAME).evaluateAttributeExpressions(flowFile).getValue();
    }`
   
   I think if we call 
   
   `path = getNormalizedPath(context, FILENAME, flowFile);`
   
   there, we can eliminate this method and also the optional propertyName attribute from AbtractHadoopProcessor 692.

##########
File path: nifi-nar-bundles/nifi-extension-utils/nifi-hadoop-utils/src/main/java/org/apache/nifi/processors/hadoop/AbstractHadoopProcessor.java
##########
@@ -674,39 +681,33 @@ protected Path getNormalizedPath(ProcessContext context, PropertyDescriptor prop
     }
 
     protected Path getNormalizedPath(final String rawPath) {
-        final Path path = new Path(rawPath);
-        final URI uri = path.toUri();
-
-        final URI fileSystemUri = getFileSystem().getUri();
-
-        if (uri.getScheme() != null) {
-            if (!uri.getScheme().equals(fileSystemUri.getScheme()) || !uri.getAuthority().equals(fileSystemUri.getAuthority())) {
-                getLogger().warn("The filesystem component of the URI configured ({}) does not match the filesystem URI from the Hadoop configuration file ({}) " +
-                        "and will be ignored.", uri, fileSystemUri);
-            }
-
-            return new Path(uri.getPath());
-        } else {
-            return path;
-        }
+       return getNormalizedPath(rawPath, Optional.empty());
     }
 
     protected Path getNormalizedPath(final ProcessContext context, final PropertyDescriptor property, final FlowFile flowFile) {
         final String propertyValue = context.getProperty(property).evaluateAttributeExpressions(flowFile).getValue();
-        final Path path = new Path(propertyValue);
-        final URI uri = path.toUri();
+        return getNormalizedPath(propertyValue, Optional.of(property.getDisplayName()));
+    }
 
+    private Path getNormalizedPath(final String rawPath, final Optional<String> propertyName) {

Review comment:
       If you can successfully eliminate the _getNormalizedPath_ method on 683 (see my previous comment), I think this can be merged with the method right above. This would reduce the number of overloaded methods from 4 to 2.

##########
File path: nifi-nar-bundles/nifi-extension-utils/nifi-hadoop-utils/src/main/java/org/apache/nifi/processors/hadoop/AbstractHadoopProcessor.java
##########
@@ -674,39 +681,33 @@ protected Path getNormalizedPath(ProcessContext context, PropertyDescriptor prop
     }
 
     protected Path getNormalizedPath(final String rawPath) {

Review comment:
       And there would be only one log message type with property.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] Lehel44 commented on a change in pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

Posted by GitBox <gi...@apache.org>.
Lehel44 commented on a change in pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#discussion_r726058729



##########
File path: nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/test/java/org/apache/nifi/processors/hadoop/TestFetchHDFS.java
##########
@@ -59,7 +59,25 @@ public void setup() {
     @Test
     public void testFetchStaticFileThatExists() throws IOException {
         final String file = "src/test/resources/testdata/randombytes-1";
-        runner.setProperty(FetchHDFS.FILENAME, file);
+        final String fileWithMultipliedSeparators = "src/test////resources//testdata/randombytes-1";
+        runner.setProperty(FetchHDFS.FILENAME, fileWithMultipliedSeparators);
+        runner.enqueue(new String("trigger flow file"));
+        runner.run();
+        runner.assertAllFlowFilesTransferred(FetchHDFS.REL_SUCCESS, 1);
+        final List<ProvenanceEventRecord> provenanceEvents = runner.getProvenanceEvents();
+        assertEquals(1, provenanceEvents.size());
+        final ProvenanceEventRecord fetchEvent = provenanceEvents.get(0);
+        assertEquals(ProvenanceEventType.FETCH, fetchEvent.getEventType());
+        // If it runs with a real HDFS, the protocol will be "hdfs://", but with a local filesystem, just assert the filename.
+        assertTrue(fetchEvent.getTransitUri().endsWith(file));
+    }
+
+    @Test
+    public void testFetchStaticFileThatExistsWithAbsolutePath() throws IOException {

Review comment:
       I'd recommend using hamcrest matchers for asserting the string ends with the filename. Currently when it fails, we can't see the difference.
   
   ```suggestion
       ...
       assertThat(fetchEvent.getTransitUri(), StringEndsWith.endsWith(file));
   ```

##########
File path: nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/test/java/org/apache/nifi/processors/hadoop/TestFetchHDFS.java
##########
@@ -59,7 +59,25 @@ public void setup() {
     @Test
     public void testFetchStaticFileThatExists() throws IOException {
         final String file = "src/test/resources/testdata/randombytes-1";
-        runner.setProperty(FetchHDFS.FILENAME, file);
+        final String fileWithMultipliedSeparators = "src/test////resources//testdata/randombytes-1";
+        runner.setProperty(FetchHDFS.FILENAME, fileWithMultipliedSeparators);
+        runner.enqueue(new String("trigger flow file"));
+        runner.run();
+        runner.assertAllFlowFilesTransferred(FetchHDFS.REL_SUCCESS, 1);
+        final List<ProvenanceEventRecord> provenanceEvents = runner.getProvenanceEvents();
+        assertEquals(1, provenanceEvents.size());
+        final ProvenanceEventRecord fetchEvent = provenanceEvents.get(0);
+        assertEquals(ProvenanceEventType.FETCH, fetchEvent.getEventType());
+        // If it runs with a real HDFS, the protocol will be "hdfs://", but with a local filesystem, just assert the filename.
+        assertTrue(fetchEvent.getTransitUri().endsWith(file));
+    }
+
+    @Test
+    public void testFetchStaticFileThatExistsWithAbsolutePath() throws IOException {

Review comment:
       I'd recommend using hamcrest matchers for asserting the string ends with the filename. Currently when it fails, we can't see the difference.
   
   ```
       ...
       assertThat(fetchEvent.getTransitUri(), StringEndsWith.endsWith(file));
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] simonbence commented on a change in pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

Posted by GitBox <gi...@apache.org>.
simonbence commented on a change in pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#discussion_r724969837



##########
File path: nifi-nar-bundles/nifi-extension-utils/nifi-hadoop-utils/src/main/java/org/apache/nifi/processors/hadoop/AbstractHadoopProcessor.java
##########
@@ -674,39 +681,33 @@ protected Path getNormalizedPath(ProcessContext context, PropertyDescriptor prop
     }
 
     protected Path getNormalizedPath(final String rawPath) {
-        final Path path = new Path(rawPath);
-        final URI uri = path.toUri();
-
-        final URI fileSystemUri = getFileSystem().getUri();
-
-        if (uri.getScheme() != null) {
-            if (!uri.getScheme().equals(fileSystemUri.getScheme()) || !uri.getAuthority().equals(fileSystemUri.getAuthority())) {
-                getLogger().warn("The filesystem component of the URI configured ({}) does not match the filesystem URI from the Hadoop configuration file ({}) " +
-                        "and will be ignored.", uri, fileSystemUri);
-            }
-
-            return new Path(uri.getPath());
-        } else {
-            return path;
-        }
+       return getNormalizedPath(rawPath, Optional.empty());
     }
 
     protected Path getNormalizedPath(final ProcessContext context, final PropertyDescriptor property, final FlowFile flowFile) {
         final String propertyValue = context.getProperty(property).evaluateAttributeExpressions(flowFile).getValue();
-        final Path path = new Path(propertyValue);
-        final URI uri = path.toUri();
+        return getNormalizedPath(propertyValue, Optional.of(property.getDisplayName()));
+    }
 
+    private Path getNormalizedPath(final String rawPath, final Optional<String> propertyName) {

Review comment:
       Please see my comment regarding to `683`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] Lehel44 commented on a change in pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

Posted by GitBox <gi...@apache.org>.
Lehel44 commented on a change in pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#discussion_r726491261



##########
File path: nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/test/java/org/apache/nifi/processors/hadoop/TestFetchHDFS.java
##########
@@ -59,7 +59,25 @@ public void setup() {
     @Test
     public void testFetchStaticFileThatExists() throws IOException {
         final String file = "src/test/resources/testdata/randombytes-1";
-        runner.setProperty(FetchHDFS.FILENAME, file);
+        final String fileWithMultipliedSeparators = "src/test////resources//testdata/randombytes-1";
+        runner.setProperty(FetchHDFS.FILENAME, fileWithMultipliedSeparators);
+        runner.enqueue(new String("trigger flow file"));
+        runner.run();
+        runner.assertAllFlowFilesTransferred(FetchHDFS.REL_SUCCESS, 1);
+        final List<ProvenanceEventRecord> provenanceEvents = runner.getProvenanceEvents();
+        assertEquals(1, provenanceEvents.size());
+        final ProvenanceEventRecord fetchEvent = provenanceEvents.get(0);
+        assertEquals(ProvenanceEventType.FETCH, fetchEvent.getEventType());
+        // If it runs with a real HDFS, the protocol will be "hdfs://", but with a local filesystem, just assert the filename.
+        assertTrue(fetchEvent.getTransitUri().endsWith(file));
+    }
+
+    @Test
+    public void testFetchStaticFileThatExistsWithAbsolutePath() throws IOException {

Review comment:
       I'm not sure how this is expected to work on Windows as org.apache.hadoop.fs.Path uses unix style `"/"` separators and the tests are using the Java path which is platform independent. One possible solution might be using `FilenameUtils::separatorsToSystem` on `fetchEvent.getTransitUri()`, however this needs to be checked.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] Lehel44 commented on a change in pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

Posted by GitBox <gi...@apache.org>.
Lehel44 commented on a change in pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#discussion_r726491261



##########
File path: nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/test/java/org/apache/nifi/processors/hadoop/TestFetchHDFS.java
##########
@@ -59,7 +59,25 @@ public void setup() {
     @Test
     public void testFetchStaticFileThatExists() throws IOException {
         final String file = "src/test/resources/testdata/randombytes-1";
-        runner.setProperty(FetchHDFS.FILENAME, file);
+        final String fileWithMultipliedSeparators = "src/test////resources//testdata/randombytes-1";
+        runner.setProperty(FetchHDFS.FILENAME, fileWithMultipliedSeparators);
+        runner.enqueue(new String("trigger flow file"));
+        runner.run();
+        runner.assertAllFlowFilesTransferred(FetchHDFS.REL_SUCCESS, 1);
+        final List<ProvenanceEventRecord> provenanceEvents = runner.getProvenanceEvents();
+        assertEquals(1, provenanceEvents.size());
+        final ProvenanceEventRecord fetchEvent = provenanceEvents.get(0);
+        assertEquals(ProvenanceEventType.FETCH, fetchEvent.getEventType());
+        // If it runs with a real HDFS, the protocol will be "hdfs://", but with a local filesystem, just assert the filename.
+        assertTrue(fetchEvent.getTransitUri().endsWith(file));
+    }
+
+    @Test
+    public void testFetchStaticFileThatExistsWithAbsolutePath() throws IOException {

Review comment:
       I'm not sure how this is expected to work on Windows as org.apache.hadoop.fs.Path uses unix style "/" separators and the tests are using the Java path which is platform independent. One possible solution might be using FilenameUtils::separatorsToSystem on fetchEvent.getTransitUri(), however this needs to be checked.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] simonbence commented on a change in pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

Posted by GitBox <gi...@apache.org>.
simonbence commented on a change in pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#discussion_r732935309



##########
File path: nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/test/java/org/apache/nifi/processors/hadoop/TestFetchHDFS.java
##########
@@ -59,7 +59,25 @@ public void setup() {
     @Test
     public void testFetchStaticFileThatExists() throws IOException {
         final String file = "src/test/resources/testdata/randombytes-1";
-        runner.setProperty(FetchHDFS.FILENAME, file);
+        final String fileWithMultipliedSeparators = "src/test////resources//testdata/randombytes-1";
+        runner.setProperty(FetchHDFS.FILENAME, fileWithMultipliedSeparators);
+        runner.enqueue(new String("trigger flow file"));
+        runner.run();
+        runner.assertAllFlowFilesTransferred(FetchHDFS.REL_SUCCESS, 1);
+        final List<ProvenanceEventRecord> provenanceEvents = runner.getProvenanceEvents();
+        assertEquals(1, provenanceEvents.size());
+        final ProvenanceEventRecord fetchEvent = provenanceEvents.get(0);
+        assertEquals(ProvenanceEventType.FETCH, fetchEvent.getEventType());
+        // If it runs with a real HDFS, the protocol will be "hdfs://", but with a local filesystem, just assert the filename.
+        assertTrue(fetchEvent.getTransitUri().endsWith(file));
+    }
+
+    @Test
+    public void testFetchStaticFileThatExistsWithAbsolutePath() throws IOException {

Review comment:
       The processor is generally intended to work with HDFS (and services such as S3 which might be behind the HDFS api), and these are following the unix format. As of this the other tests (and the prod code) is prepared to work with "/". Unluckily for testing purposes we need to work with the local file system. Which in case of the NiFi works in a Windows environment (what causes the given check's failure) uses the "\" as separator.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] simonbence commented on a change in pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

Posted by GitBox <gi...@apache.org>.
simonbence commented on a change in pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#discussion_r725847502



##########
File path: nifi-nar-bundles/nifi-extension-utils/nifi-hadoop-utils/src/main/java/org/apache/nifi/processors/hadoop/AbstractHadoopProcessor.java
##########
@@ -674,39 +681,33 @@ protected Path getNormalizedPath(ProcessContext context, PropertyDescriptor prop
     }
 
     protected Path getNormalizedPath(final String rawPath) {
-        final Path path = new Path(rawPath);
-        final URI uri = path.toUri();
-
-        final URI fileSystemUri = getFileSystem().getUri();
-
-        if (uri.getScheme() != null) {
-            if (!uri.getScheme().equals(fileSystemUri.getScheme()) || !uri.getAuthority().equals(fileSystemUri.getAuthority())) {
-                getLogger().warn("The filesystem component of the URI configured ({}) does not match the filesystem URI from the Hadoop configuration file ({}) " +
-                        "and will be ignored.", uri, fileSystemUri);
-            }
-
-            return new Path(uri.getPath());
-        } else {
-            return path;
-        }
+       return getNormalizedPath(rawPath, Optional.empty());
     }
 
     protected Path getNormalizedPath(final ProcessContext context, final PropertyDescriptor property, final FlowFile flowFile) {
         final String propertyValue = context.getProperty(property).evaluateAttributeExpressions(flowFile).getValue();
-        final Path path = new Path(propertyValue);
-        final URI uri = path.toUri();
+        return getNormalizedPath(propertyValue, Optional.of(property.getDisplayName()));
+    }
 
+    private Path getNormalizedPath(final String rawPath, final Optional<String> propertyName) {

Review comment:
       I understand your point, but in general designing for null as an input I find unlucky on multiple levels. Within the project, methods usually assume that the input is not null, expect for the boundies of the system. Introducing this would set a dangerous example and might continue to roll. (Note: I do not state that you cannot find a concept like this in the codebase, but in general it should be something to avoid) Optional shows intent and forces the client to make a conscious decision. As of leaving the methods duplicated as of this point would come with too much duplication, this left us with `Optional` as an optimal looking solution. Please also not this is someting will not appear in the "API" of the class, thus it will not propogate.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] Lehel44 commented on a change in pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

Posted by GitBox <gi...@apache.org>.
Lehel44 commented on a change in pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#discussion_r725132050



##########
File path: nifi-nar-bundles/nifi-extension-utils/nifi-hadoop-utils/src/main/java/org/apache/nifi/processors/hadoop/AbstractHadoopProcessor.java
##########
@@ -674,39 +681,33 @@ protected Path getNormalizedPath(ProcessContext context, PropertyDescriptor prop
     }
 
     protected Path getNormalizedPath(final String rawPath) {
-        final Path path = new Path(rawPath);
-        final URI uri = path.toUri();
-
-        final URI fileSystemUri = getFileSystem().getUri();
-
-        if (uri.getScheme() != null) {
-            if (!uri.getScheme().equals(fileSystemUri.getScheme()) || !uri.getAuthority().equals(fileSystemUri.getAuthority())) {
-                getLogger().warn("The filesystem component of the URI configured ({}) does not match the filesystem URI from the Hadoop configuration file ({}) " +
-                        "and will be ignored.", uri, fileSystemUri);
-            }
-
-            return new Path(uri.getPath());
-        } else {
-            return path;
-        }
+       return getNormalizedPath(rawPath, Optional.empty());
     }
 
     protected Path getNormalizedPath(final ProcessContext context, final PropertyDescriptor property, final FlowFile flowFile) {
         final String propertyValue = context.getProperty(property).evaluateAttributeExpressions(flowFile).getValue();
-        final Path path = new Path(propertyValue);
-        final URI uri = path.toUri();
+        return getNormalizedPath(propertyValue, Optional.of(property.getDisplayName()));
+    }
 
+    private Path getNormalizedPath(final String rawPath, final Optional<String> propertyName) {

Review comment:
       What do you think of removing the optional from the method parameter? I think it could be a simple string with a null check. If I remember well optionals are not meant to be used as parameters because they introduce a new object state: null, not empty, empty instead of null and not null. I think in this case passing null to this method would be fine, and I could also find several examples in the code where null was passed as flowfile.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] Lehel44 commented on a change in pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

Posted by GitBox <gi...@apache.org>.
Lehel44 commented on a change in pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#discussion_r726060005



##########
File path: nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/test/java/org/apache/nifi/processors/hadoop/TestFetchHDFS.java
##########
@@ -59,7 +59,25 @@ public void setup() {
     @Test
     public void testFetchStaticFileThatExists() throws IOException {
         final String file = "src/test/resources/testdata/randombytes-1";
-        runner.setProperty(FetchHDFS.FILENAME, file);
+        final String fileWithMultipliedSeparators = "src/test////resources//testdata/randombytes-1";

Review comment:
       Minor: Would you please remove the `throws IOException`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] Lehel44 commented on a change in pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

Posted by GitBox <gi...@apache.org>.
Lehel44 commented on a change in pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#discussion_r726488053



##########
File path: nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/test/java/org/apache/nifi/processors/hadoop/TestFetchHDFS.java
##########
@@ -59,7 +59,25 @@ public void setup() {
     @Test
     public void testFetchStaticFileThatExists() throws IOException {
         final String file = "src/test/resources/testdata/randombytes-1";
-        runner.setProperty(FetchHDFS.FILENAME, file);
+        final String fileWithMultipliedSeparators = "src/test////resources//testdata/randombytes-1";
+        runner.setProperty(FetchHDFS.FILENAME, fileWithMultipliedSeparators);
+        runner.enqueue(new String("trigger flow file"));
+        runner.run();
+        runner.assertAllFlowFilesTransferred(FetchHDFS.REL_SUCCESS, 1);
+        final List<ProvenanceEventRecord> provenanceEvents = runner.getProvenanceEvents();
+        assertEquals(1, provenanceEvents.size());
+        final ProvenanceEventRecord fetchEvent = provenanceEvents.get(0);
+        assertEquals(ProvenanceEventType.FETCH, fetchEvent.getEventType());
+        // If it runs with a real HDFS, the protocol will be "hdfs://", but with a local filesystem, just assert the filename.
+        assertTrue(fetchEvent.getTransitUri().endsWith(file));
+    }
+
+    @Test
+    public void testFetchStaticFileThatExistsWithAbsolutePath() throws IOException {
+        final File destination = new File("src/test/resources/testdata/randombytes-1");
+        final String file = destination.getAbsolutePath();
+        final String fileWithMultipliedSeparators = "/" + destination.getAbsolutePath();

Review comment:
       I think you can use `file` instead of calling `getAbsolutePath()`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nifi] pvillard31 commented on pull request #5437: NIFI-9265 Fixing path handling for HDFS processors when there are multiplied separators in the path

Posted by GitBox <gi...@apache.org>.
pvillard31 commented on pull request #5437:
URL: https://github.com/apache/nifi/pull/5437#issuecomment-953640744


   Merged, thanks @simonbence @Lehel44 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@nifi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org