You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/05/19 00:29:00 UTC

[jira] [Commented] (NIFI-4165) Update NiFi FlowFile Repository Toolkit to provide ability to remove FlowFiles whose content is missing

    [ https://issues.apache.org/jira/browse/NIFI-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481380#comment-16481380 ] 

ASF GitHub Bot commented on NIFI-4165:
--------------------------------------

Github user alopresto commented on the issue:

    https://github.com/apache/nifi/pull/2502
  
    @markap14 sorry I got distracted from this review. I have revisited it and I have some points I'd like to discuss:
    
    * I rebased against `master`, as there have obviously been some changes there. These fall into a couple places:
    ** the version bump to `1.7.0-SNAPSHOT` in the `pom.xml` for both this artifact and a dependency
    ** there have been changes to `FlowFileQueue` which `DummyFlowFileQueue` must implement
    * I added some logic to `RemoveFlowFilesWithMissingContent` which loads the *master key* from the expected `bootstrap.conf` file in order to handle a `nifi.properties` file with encrypted configuration values. 
    * The other NiFi Toolkit components have a `*.bat`/`*.sh` script which allows them to be run. This provides a couple features:
    ** named command-line arguments as opposed to positional arguments
    ** Setting up `$JAVA_HOME` and the classpath rather than calling `java` directly on the command-line
    * The `jar-with-dependencies` in `maven-assembly-plugin` only seems to run when you use `mvn clean compile assembly:single` rather than being tied to the `install` phase via a profile (see [Stack Overflow](https://stackoverflow.com/a/574650/70465)). Please let me know if I'm missing something here
    
    I ran the scenario you suggested by generating some flowfiles into a queue and then removing the `content_repository` directory contents. When I did that, I got this message:
    
    ```
    hw12203:/Users/alopresto/Workspace/nifi/nifi-toolkit/nifi-toolkit-flowfile-repo (pr2502) alopresto
    🔓 149s @ 17:17:25 $ cd target/
    hw12203:...ers/alopresto/Workspace/nifi/nifi-toolkit/nifi-toolkit-flowfile-repo/target (pr2502) alopresto
    🔓 0s @ 17:17:31 $ java -cp nifi-toolkit-flowfile-repo-1.7.0-SNAPSHOT-jar-with-dependencies.jar:../../nifi-toolkit-assembly/target/nifi-toolkit-1.7.0-SNAPSHOT-bin/nifi-toolkit-1.7.0-SNAPSHOT/lib/slf4j-api-1.7.25.jar org.apache.nifi.toolkit.repos.flowfile.RemoveFlowFilesWithMissingContent ~/Workspace/nifi/nifi-assembly/target/nifi-1.7.0-SNAPSHOT-bin/nifi-1.7.0-SNAPSHOT/conf/nifi.properties ~/Workspace/nifi/nifi-assembly/target/nifi-1.7.0-SNAPSHOT-bin/nifi-1.7.0-SNAPSHOT/flowfile_repository/
    17:17:35.865 [main] INFO org.apache.nifi.properties.NiFiPropertiesLoader - Loaded 148 properties from /Users/alopresto/Workspace/nifi/nifi-assembly/target/nifi-1.7.0-SNAPSHOT-bin/nifi-1.7.0-SNAPSHOT/conf/nifi.properties
    17:17:35.872 [main] DEBUG org.apache.nifi.properties.ProtectedNiFiProperties - Loaded 148 properties (including 0 protection schemes) into ProtectedNiFiProperties
    17:17:35.872 [main] DEBUG org.apache.nifi.properties.ProtectedNiFiProperties - No protected properties
    Cannot find or cannot read ./content_repository or it is not a directory
    hw12203:...ers/alopresto/Workspace/nifi/nifi-toolkit/nifi-toolkit-flowfile-repo/target (pr2502) alopresto
    🔓 0s @ 17:17:36 $
    ```
    
    The directory definitely exists:
    
    ```
    hw12203:...space/nifi/nifi-assembly/target/nifi-1.7.0-SNAPSHOT-bin/nifi-1.7.0-SNAPSHOT (pr2502) alopresto
    🔓 0s @ 17:17:48 $ ll
    total 416
    drwxr-xr-x   17 alopresto  staff   578B May 10 16:40 ./
    drwxr-xr-x    3 alopresto  staff   102B May 10 10:20 ../
    -rw-r--r--    1 alopresto  staff   119K Mar 13 17:25 LICENSE
    -rw-r--r--    1 alopresto  staff    80K May 10 09:23 NOTICE
    -rw-r--r--    1 alopresto  staff   4.4K Dec 13 15:56 README
    drwxr-xr-x    8 alopresto  staff   272B May 10 10:20 bin/
    drwxr-xr-x   12 alopresto  staff   408B May 18 16:51 conf/
    drwxr-xr-x    2 alopresto  staff    68B May 18 16:51 content_repository/
    drwxr-xr-x    6 alopresto  staff   204B May 18 16:50 database_repository/
    drwxr-xr-x    3 alopresto  staff   102B May 10 10:20 docs/
    drwxr-xr-x    5 alopresto  staff   170B May 18 16:52 flowfile_repository/
    drwxr-xr-x  113 alopresto  staff   3.8K May 10 10:20 lib/
    drwxr-xr-x   10 alopresto  staff   340B May 18 17:00 logs/
    drwxr-xr-x    9 alopresto  staff   306B May 18 16:51 provenance_repository/
    drwxr-xr-x    4 alopresto  staff   136B May 18 16:49 run/
    drwxr-xr-x    3 alopresto  staff   102B May 10 16:40 state/
    drwxr-xr-x    5 alopresto  staff   170B May 18 16:50 work/
    hw12203:...space/nifi/nifi-assembly/target/nifi-1.7.0-SNAPSHOT-bin/nifi-1.7.0-SNAPSHOT (pr2502) alopresto
    🔓 0s @ 17:19:35 $ ll content_repository/
    total 0
    drwxr-xr-x   2 alopresto  staff    68B May 18 16:51 ./
    drwxr-xr-x  17 alopresto  staff   578B May 10 16:40 ../
    ```
    
    I believe this is because in the default `nifi.properties` file, the content repository is defined as a relative path `./content_repository`, so I think there should be code in the tool to resolve this path if it is not absolute. 
    
    Let me know what you think about those comments. I pushed my changes [to a branch NIFI-4165](https://github.com/alopresto/nifi/commit/579da9117d03ad9cc24499bbad6d27fae7c92037). 


> Update NiFi FlowFile Repository Toolkit to provide ability to remove FlowFiles whose content is missing
> -------------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-4165
>                 URL: https://issues.apache.org/jira/browse/NIFI-4165
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Tools and Build
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>
> The FlowFile Repo toolkit has the ability to address issues with flowfile repo corruption due to sudden power loss. Another problem that has been known to occur is if content goes missing from the content repository for whatever reason (say some process deletes some of the files) then the FlowFile Repo can contain a lot of FlowFiles whose content is missing. This causes a lot of problems with stack traces being dumped to logs and the flow taking a really long time to get back to normal. We should update the toolkit to provide a mechanism for pointing to a FlowFile Repo and Content Repo, then writing out a new FlowFile Repo that removes any FlowFile whose content is missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)