You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by zenfenan <gi...@git.apache.org> on 2018/06/01 11:47:52 UTC

[GitHub] nifi pull request #2751: NIFI-5221: Added 'Object Tagging' functionalities t...

GitHub user zenfenan opened a pull request:

    https://github.com/apache/nifi/pull/2751

    NIFI-5221: Added 'Object Tagging' functionalities to S3 Processors

    Thank you for submitting a contribution to Apache NiFi.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [x] Is there a JIRA ticket associated with this PR? Is it referenced 
         in the commit message?
    
    - [x] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
    
    - [x] Has your PR been rebased against the latest commit within the target branch (typically master)?
    
    - [x] Is your initial contribution a single, squashed commit?
    
    ### For code changes:
    - [x] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder?
    - [x] Have you written or updated unit tests to verify your changes?
    - [x] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? 
    - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly?
    - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly?
    - [x] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zenfenan/nifi NIFI-5221

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/2751.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2751
    
----
commit 91ddbcd85e1de7239ea6d151150e0fa83504a98e
Author: zenfenan <si...@...>
Date:   2018-06-01T11:35:09Z

    NIFI-5221: Added 'Object Tagging' functionalities to S3 Processors

----


---

[GitHub] nifi pull request #2751: NIFI-5221: Added 'Object Tagging' functionalities t...

Posted by zenfenan <gi...@git.apache.org>.
Github user zenfenan commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2751#discussion_r194751872
  
    --- Diff: nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/ListS3.java ---
    @@ -307,6 +328,20 @@ private boolean commit(final ProcessContext context, final ProcessSession sessio
             return willCommit;
         }
     
    +    private Map<String, String> writeObjectTags(AmazonS3 client, S3VersionSummary versionSummary) {
    +        final GetObjectTaggingResult taggingResult = client.getObjectTagging(new GetObjectTaggingRequest(versionSummary.getBucketName(), versionSummary.getKey()));
    --- End diff --
    
    I agree. I'll change the default value to `false`.


---

[GitHub] nifi pull request #2751: NIFI-5221: Added 'Object Tagging' functionalities t...

Posted by pvillard31 <gi...@git.apache.org>.
Github user pvillard31 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2751#discussion_r194034751
  
    --- Diff: nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/PutS3Object.java ---
    @@ -205,11 +210,21 @@
                 .defaultValue(NO_SERVER_SIDE_ENCRYPTION)
                 .build();
     
    +    public static final PropertyDescriptor OBJECT_TAGS = new PropertyDescriptor.Builder()
    --- End diff --
    
    Yep, wouldn't it be more flexible?
    With the current approach, it assumes that you always have the same set of tags (even if the values can change based on ffs attributes, the keys would always be the same). With the regular expression, you could easily manage the case where you have one ff with attributes tagS3_country=FR, tagS3_security=topsecret and one flow file with only tagS3_country=US. You would set the tag regular expression to tagS3.* and you would have the tags created using the attributes matching the regex. Does it make sense or am I missing something?


---

[GitHub] nifi pull request #2751: NIFI-5221: Added 'Object Tagging' functionalities t...

Posted by pvillard31 <gi...@git.apache.org>.
Github user pvillard31 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2751#discussion_r193976840
  
    --- Diff: nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/ListS3.java ---
    @@ -307,6 +328,20 @@ private boolean commit(final ProcessContext context, final ProcessSession sessio
             return willCommit;
         }
     
    +    private Map<String, String> writeObjectTags(AmazonS3 client, S3VersionSummary versionSummary) {
    +        final GetObjectTaggingResult taggingResult = client.getObjectTagging(new GetObjectTaggingRequest(versionSummary.getBucketName(), versionSummary.getKey()));
    --- End diff --
    
    Can't have a deeper look right now but is the retrieval of tags requiring an additional call to the API or is the info already retrieved anyway? If this line causes an additional call to the API, I'd recommend to change the default value of the property to "false" so that existing/running workflows are not impacted by this change. Thoughts?


---

[GitHub] nifi pull request #2751: NIFI-5221: Added 'Object Tagging' functionalities t...

Posted by pvillard31 <gi...@git.apache.org>.
Github user pvillard31 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2751#discussion_r193977404
  
    --- Diff: nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/PutS3Object.java ---
    @@ -205,11 +210,21 @@
                 .defaultValue(NO_SERVER_SIDE_ENCRYPTION)
                 .build();
     
    +    public static final PropertyDescriptor OBJECT_TAGS = new PropertyDescriptor.Builder()
    --- End diff --
    
    Another option for this property: allow the user to give a regular expression and all the flow file attributes matching the regular expression would be used for tagging. Wouldn't it be more flexible?


---

[GitHub] nifi pull request #2751: NIFI-5221: Added 'Object Tagging' functionalities t...

Posted by zenfenan <gi...@git.apache.org>.
Github user zenfenan commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2751#discussion_r194009838
  
    --- Diff: nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/PutS3Object.java ---
    @@ -205,11 +210,21 @@
                 .defaultValue(NO_SERVER_SIDE_ENCRYPTION)
                 .build();
     
    +    public static final PropertyDescriptor OBJECT_TAGS = new PropertyDescriptor.Builder()
    --- End diff --
    
    I don't fully understand this one. Are you suggesting to replace the JSON approach with a different one?


---

[GitHub] nifi pull request #2751: NIFI-5221: Added 'Object Tagging' functionalities t...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/nifi/pull/2751


---

[GitHub] nifi issue #2751: NIFI-5221: Added 'Object Tagging' functionalities to S3 Pr...

Posted by pvillard31 <gi...@git.apache.org>.
Github user pvillard31 commented on the issue:

    https://github.com/apache/nifi/pull/2751
  
    Thanks for the changes @zenfenan - it all looks good to me, and IT changes are OK. Merging to master.


---

[GitHub] nifi pull request #2751: NIFI-5221: Added 'Object Tagging' functionalities t...

Posted by zenfenan <gi...@git.apache.org>.
Github user zenfenan commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2751#discussion_r194009531
  
    --- Diff: nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/ListS3.java ---
    @@ -307,6 +328,20 @@ private boolean commit(final ProcessContext context, final ProcessSession sessio
             return willCommit;
         }
     
    +    private Map<String, String> writeObjectTags(AmazonS3 client, S3VersionSummary versionSummary) {
    +        final GetObjectTaggingResult taggingResult = client.getObjectTagging(new GetObjectTaggingRequest(versionSummary.getBucketName(), versionSummary.getKey()));
    --- End diff --
    
    Yep. It'll make additional API call but I don't think it will have any negative impact on the existing flows? Do you sense any cases where this might break? This simply calls S3 service to see if this key has any tags associated, if it has, it will add to the flowfile attributes.


---

[GitHub] nifi pull request #2751: NIFI-5221: Added 'Object Tagging' functionalities t...

Posted by zenfenan <gi...@git.apache.org>.
Github user zenfenan commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2751#discussion_r194756360
  
    --- Diff: nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/PutS3Object.java ---
    @@ -205,11 +210,21 @@
                 .defaultValue(NO_SERVER_SIDE_ENCRYPTION)
                 .build();
     
    +    public static final PropertyDescriptor OBJECT_TAGS = new PropertyDescriptor.Builder()
    --- End diff --
    
    The keys not necessarily be the same, right? It can change since the property supports expression language so the key-values can be dynamically generated and substituted using expression language. 
    
    Having said that, I do understand your point. My only thought is that when the developer is assigning tags to the FlowFiles, he/she would be using either `UpdateAttribute` or more preferably `ExecuteScript` to do some basic processing and assign tags according to some conditions. Correct? In that case, I honestly don't see the regular expression approach adding more value than the JSON approach. They seem more or less the same. I mean, if the developer is using `ExecuteScript` to generate the tags and add it to FlowFile attributes, he/she might as well generate the entire JSON and assign it to a FlowFile attribute and that FlowFile attribute can be given as the property value here and the substitution can be done. Makes sense?


---

[GitHub] nifi pull request #2751: NIFI-5221: Added 'Object Tagging' functionalities t...

Posted by pvillard31 <gi...@git.apache.org>.
Github user pvillard31 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2751#discussion_r194033825
  
    --- Diff: nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/ListS3.java ---
    @@ -307,6 +328,20 @@ private boolean commit(final ProcessContext context, final ProcessSession sessio
             return willCommit;
         }
     
    +    private Map<String, String> writeObjectTags(AmazonS3 client, S3VersionSummary versionSummary) {
    +        final GetObjectTaggingResult taggingResult = client.getObjectTagging(new GetObjectTaggingRequest(versionSummary.getBucketName(), versionSummary.getKey()));
    --- End diff --
    
    I just imagine the case where a user is listing a lot of objects in S3 and this could cause a large amount of additional calls. I agree that the overhead should be fairly limited. Just wondering if we want this new behavior to be enabled by default. I honestly don't have any strong opinion on that, just wanted to mention it. Do you have an opinion on that @jvwing ?


---

[GitHub] nifi pull request #2751: NIFI-5221: Added 'Object Tagging' functionalities t...

Posted by jvwing <gi...@git.apache.org>.
Github user jvwing commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2751#discussion_r194140856
  
    --- Diff: nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/ListS3.java ---
    @@ -307,6 +328,20 @@ private boolean commit(final ProcessContext context, final ProcessSession sessio
             return willCommit;
         }
     
    +    private Map<String, String> writeObjectTags(AmazonS3 client, S3VersionSummary versionSummary) {
    +        final GetObjectTaggingResult taggingResult = client.getObjectTagging(new GetObjectTaggingRequest(versionSummary.getBucketName(), versionSummary.getKey()));
    --- End diff --
    
    I agree with @pvillard31 that it should be off by default.  From comments on the users/developer email lists, I understand ListS3 is used to process very large lists of objects, easily 10,000+ on a regular basis.  Even if the additional API calls are quick, it will add up to be a lot of API calls.
    
    Unfortunately, it does not look like the [S3ObjectSummary](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/S3ObjectSummary.html) returned by the listing contains any hints on the number of tags present, if any.


---

[GitHub] nifi pull request #2751: NIFI-5221: Added 'Object Tagging' functionalities t...

Posted by zenfenan <gi...@git.apache.org>.
Github user zenfenan commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2751#discussion_r192371979
  
    --- Diff: nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/pom.xml ---
    @@ -69,6 +69,11 @@
                 <version>2.6.6</version>
                 <scope>test</scope>
             </dependency>
    +        <dependency>
    +            <groupId>com.google.code.gson</groupId>
    +            <artifactId>gson</artifactId>
    --- End diff --
    
    Does it need to be added in nifi-aws-nar's [NOTICE](https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-aws-bundle/nifi-aws-nar/src/main/resources/META-INF/NOTICE) ?


---

[GitHub] nifi pull request #2751: NIFI-5221: Added 'Object Tagging' functionalities t...

Posted by pvillard31 <gi...@git.apache.org>.
Github user pvillard31 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2751#discussion_r195169135
  
    --- Diff: nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/PutS3Object.java ---
    @@ -205,11 +210,21 @@
                 .defaultValue(NO_SERVER_SIDE_ENCRYPTION)
                 .build();
     
    +    public static final PropertyDescriptor OBJECT_TAGS = new PropertyDescriptor.Builder()
    --- End diff --
    
    I completely agree that the same result can be achieved with both options. It's just that ExecuteScript will require someone to write some lines of code. Also, I'm not a big fan of the idea to have a big JSON document as a flow file attribute but I don't expect someone to define hundreds of tags. Anyway, I'm ok to merge as-is, I can do it once the other comment is addressed. Thanks!


---

[GitHub] nifi issue #2751: NIFI-5221: Added 'Object Tagging' functionalities to S3 Pr...

Posted by zenfenan <gi...@git.apache.org>.
Github user zenfenan commented on the issue:

    https://github.com/apache/nifi/pull/2751
  
    Actually after giving it some serious thought, the attribute approach does
    make more sense. Easy from the flow development perspective, and even when
    using ExecuteScript or a custom processor to parse and process content to
    create appropriate tags, the developer can still add them to FlowFile
    attributes.
    
    So I’ll update this one to scrap the JSON approach and instead take a comma
    separated list of tag prefixes which will be taken and scanned against the
    incoming FlowFiles’ attributes to generate the tags. Makes sense?



---